User Tools

Site Tools


explore_your_own_dataset

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

explore_your_own_dataset [2014/05/22 12:15]
venturini
explore_your_own_dataset [2015/06/18 13:33] (current)
venturini
Line 11: Line 11:
   * in the same way, the visualizations that you obtain are not loaded or stored on our server. These remain also on your local client,   * in the same way, the visualizations that you obtain are not loaded or stored on our server. These remain also on your local client,
   * more generally, you are free to use VizAssist at your own risk. If you find that the generated visualizations are useful, or if you have used them with success for your own work, then please let us know and mention ​ the project VizAssist,   * more generally, you are free to use VizAssist at your own risk. If you find that the generated visualizations are useful, or if you have used them with success for your own work, then please let us know and mention ​ the project VizAssist,
-  * for user evaluation purposes, we have planned (but it is not implemented yet) to store in a log your VizAssist session (i.e., how much time you spend on each step, how you use the GA, etc). +  * for user evaluation purposes, we store in a log your VizAssist session (i.e., ​your interactions with VizAssist interface: ​how much time you spend on each step, which button you click on, how you use the GA, etc). 
  
  
-===VizAssist file format===+===VizAssist file format ​(notice: this format was updated on May 12, 2015)===
  
 The data format in VizAssist uses the following definitions and principles: The data format in VizAssist uses the following definitions and principles:
  
-  * the file format is the international CSV format, ​with comma ","​ as separator and "​."​ in numbers +  * the file format is the international CSV format, 
-  * the actual version of VizAssist assumes that your file will be in the proper "​VizAssist"​ format (see below) +  * your file can be in the proper "​VizAssist"​ format (see below), or it can be in a tabular format that VizAssist can try to process and automatically convert to VizAssist format, 
-  * several demo files are provided with various characteristics,​ so please have a look at them to get a better idea of the file format: multidimensional data (Iris, Wine, Pima, French Taxes) with images or url (Box office mojo), times series (French demography, Population),​ graphs (VizAssist project). ​+  * several demo files are provided with various characteristics,​ so please have a look at them to get a better idea of the file format: multidimensional data (Iris, Wine, Pima, French Taxes) with images or url (Box office mojo), times series (French demography, Population),​ graphs (VizAssist project) ​or trees
  
  
-The VizAsssist file format is as follows:+The new VizAsssist file format is as follows:
   * Line 1: the name of each data attribute (i.e., the header of each column). These will appear for instance in the legend of the generated visualizations,​   * Line 1: the name of each data attribute (i.e., the header of each column). These will appear for instance in the legend of the generated visualizations,​
-  * Line 2: the type of each data attribute (actual types: numeric, ordinal, nominal, time, imageurl, url, source, target, country),+  * Line 2: the type of each data attribute (actual types: numeric, ordinal, nominal, time, imageurl, url, source, target, country, nodenumeric,​ nodenominal,​ edgenumeric,​ edgenominal),
   * Line 3: the importance of each attribute (in [0,100]). Those values are used to tell VizAssist about your preferences for some attributes over the others. If you have no preferences,​ then please set this line to 50,...,50   * Line 3: the importance of each attribute (in [0,100]). Those values are used to tell VizAssist about your preferences for some attributes over the others. If you have no preferences,​ then please set this line to 50,...,50
-  * Line 4the number of different values for each attribute  +  * Previous line was deleted (the number of different values for each attribute). It is now automatically computed ​ 
-  * Line and others: the data items. One value is required for each column. Missing values are not handled by VizAssist.+  * Line and others: the data items. One value is required for each column. Missing values are not handled by VizAssist.
  
  
Line 36: Line 36:
 numeric,​numeric,​numeric,​numeric,​nominal\\ numeric,​numeric,​numeric,​numeric,​nominal\\
 50,​50,​50,​50,​50\\ 50,​50,​50,​50,​50\\
-150,​150,​150,​150,​3\\ 
 5.1,​3.5,​1.4,​0.2,​Iris-setosa\\ 5.1,​3.5,​1.4,​0.2,​Iris-setosa\\
 4.9,​3.0,​1.4,​0.2,​Iris-setosa\\ 4.9,​3.0,​1.4,​0.2,​Iris-setosa\\
Line 48: Line 47:
 time,​numeric,​numeric,​numeric,​numeric,​numeric,​numeric,​numeric,​numeric,​numeric\\ time,​numeric,​numeric,​numeric,​numeric,​numeric,​numeric,​numeric,​numeric,​numeric\\
 50,​50,​50,​50,​50,​50,​50,​50,​50,​50\\ 50,​50,​50,​50,​50,​50,​50,​50,​50,​50\\
-456,​456,​456,​456,​456,​456,​456,​456,​456,​456\\ 
 1975-01-01,​52600,​15188,​3.4,​62600,​14,​55805,​12.5,​998,​15.9\\ 1975-01-01,​52600,​15188,​3.4,​62600,​14,​55805,​12.5,​998,​15.9\\
 1975-02-01,​52608,​18847,​4.7,​58062,​14.4,​44628,​11.1,​831,​14.3\\ 1975-02-01,​52608,​18847,​4.7,​58062,​14.4,​44628,​11.1,​831,​14.3\\
Line 58: Line 56:
  
  
-More details about the data attributes types:+More details about the data attributes types (VizAssist file format):
  
   * numeric: a dot is use to separate the integer part from the fraction part  ​   * numeric: a dot is use to separate the integer part from the fraction part  ​
Line 66: Line 64:
   * imageurl: a url to an external image that can be included in the visualizations, ​   * imageurl: a url to an external image that can be included in the visualizations, ​
   * url: same idea but for any kind of url,    * url: same idea but for any kind of url, 
-  * country: a code for the name of a country (geographical location that will be used in a world map visualization ​-> not implemented yet, sorry!)+  * country: a code for the name of a country (geographical location that will be used in a world map visualization). To encode the countries, we used this encoding : [[http://​en.wikipedia.org/​wiki/​ISO_3166]]
   * source, target: are used to specify a graph in your CSV file. "​Source"​ and "​Target"​ are nodes labels between which you want to create an edge.    * source, target: are used to specify a graph in your CSV file. "​Source"​ and "​Target"​ are nodes labels between which you want to create an edge. 
   * nodenumeric,​ nodenominal,​ edgenumeric,​ edgenominal:​ these are data attributes that correspond, respectively,​ to properties of nodes and edges. These will only be mapped, respectively,​ to visual properties of nodes or edges only.    * nodenumeric,​ nodenominal,​ edgenumeric,​ edgenominal:​ these are data attributes that correspond, respectively,​ to properties of nodes and edges. These will only be mapped, respectively,​ to visual properties of nodes or edges only. 
  
  
-Even more details about the representation of graphs (see the demo files):+Even more details about the representation of trees/graphs (see the demo files):
  
   * The use a of single CSV file to represent a graph (i.e., nodes and edges) has consequences on the VizAssist format for graphs. ​   * The use a of single CSV file to represent a graph (i.e., nodes and edges) has consequences on the VizAssist format for graphs. ​
-  * To represent a graph, you need one "​source"​ and one "​target"​ attribute. ​+  * To represent ​a tree or a graph, you need one "​source"​ and one "​target"​ attribute. ​
   * When the value of "​source"​ is different from the value of "​target",​ then VizAssist interprets this as an edge (example: "​Node0"​ in the source column and "​Node1"​ in the target column means that an edge "​Node0->​Node1"​ will be created). ​   * When the value of "​source"​ is different from the value of "​target",​ then VizAssist interprets this as an edge (example: "​Node0"​ in the source column and "​Node1"​ in the target column means that an edge "​Node0->​Node1"​ will be created). ​
-  * When the value of "​source"​ is equal to the value of "​target",​ then VizAssist interprets this a node (example: "​Node0"​ in the source column and "​Node0"​ in the target column). All nodes should be represented.+  * When the value of "​source"​ is equal to the value of "​target",​ then VizAssist interprets this as a node (example: "​Node0"​ in the source column and "​Node0"​ in the target column). All nodes should be represented. 
 +  * For trees, the first encountered node is considered as the root
   * The CSV file for a graph contains two contiguous parts: one is about the edges, and the other is about the nodes.   * The CSV file for a graph contains two contiguous parts: one is about the edges, and the other is about the nodes.
-  * Columns of type "​nodenumeric"​ or "​nodenominal"​ denote the properties of the nodes. +  * Columns of type "​nodenumeric"​ or "​nodenominal"​ denote the properties of the nodes (their values are taken into account when "​source=target"​)
-  * Columns of type "​edgenumeric"​ or "​edgenominal"​ denote the properties of the edges.+  * Columns of type "​edgenumeric"​ or "​edgenominal"​ denote the properties of the edges (their values are taken into account when "​source<>​target"​).
  
 Here is an example of a graph (the VizAssist project): Here is an example of a graph (the VizAssist project):
Line 86: Line 85:
 nodenumeric,​nodenumeric,​source,​target,​nodenominal,​edgenominal,​edgenominal,​edgenumeric,​edgenumeric,​nodenominal,​nodenominal,​nodenominal,​nodenominal\\ nodenumeric,​nodenumeric,​source,​target,​nodenominal,​edgenominal,​edgenominal,​edgenumeric,​edgenumeric,​nodenominal,​nodenominal,​nodenominal,​nodenominal\\
 45,​50,​60,​60,​50,​60,​55,​60,​60,​55,​50,​50,​50\\ 45,​50,​60,​60,​50,​60,​55,​60,​60,​55,​50,​50,​50\\
-5,​4,​5,​5,​5,​6,​3,​6,​4,​5,​5,​4,​2\\ 
 0,0,Remy Pradignac,​Aili Dong,​-,​Assistance,​email-meeting,​1,​2,​-,​-,​-,​-\\ 0,0,Remy Pradignac,​Aili Dong,​-,​Assistance,​email-meeting,​1,​2,​-,​-,​-,​-\\
 0,0,Remy Pradignac,​Zui Zhang,​-,​Assistance,​email-meeting,​1,​2,​-,​-,​-,​-\\ 0,0,Remy Pradignac,​Zui Zhang,​-,​Assistance,​email-meeting,​1,​2,​-,​-,​-,​-\\
Line 109: Line 107:
  
   * the visualizations in D3 and in a web interface can handle only a limited number of data items, ​   * the visualizations in D3 and in a web interface can handle only a limited number of data items, ​
-  * the loaded dataset should not exceed a few thousands ​lines (but several ​hundred ​of columns ​seem to be possible),​ +  * the loaded dataset should not exceed a 2000 lines and several ​tens of columns.
-  * In the demo files, we limited the number of data items to a maximum of 2000.+
  
  
 +===Using other tabular file formats===
 +
 +The latest version of VizAssist includes a pre-processing step for files that would not be in VizAssist format. This functionnality allows you to work directly with other file format: VizAssist tries to automatically convert them to VizAssist format. These files can be loaded from an URL, a file on your computer or with copy/paste in a text box (see the "Load your data" page of VizAssist). This conversion works as follows:
 +
 +  * The file should be in tabular format (cells organized by lines and columns)
 +  * The cell separators are automatically detected
 +  * The type of the columns (i.e., numeric, nominal, etc) are automatically detected and the values are converted to VizAssist format
 +  * An interactive interface allows to change the types if those were not correctly detected
 +  * In general, the pre-processing of unformatted tables is a difficult problem, so we solve only some of them!
  
 ===Saving the visualizations=== ===Saving the visualizations===
  
-Once you have found a convenient visualization,​ you can save them in SVG format with the Download button. The legends are generally in a separated SVG in VizAssist, so they will not be downloaded in general. ​+Once you have found a convenient visualization,​ you can save it in SVG format with the Download button. The legends are generally in a separated SVG in VizAssist, so they will not be downloaded in general. ​
explore_your_own_dataset.1400753722.txt.gz · Last modified: 2014/05/22 12:15 by venturini