User Tools

Site Tools


explore_your_own_dataset

This is an old revision of the document!


Explore your own dataset

Privacy concern

We remind you that our policy about privacy is strict:

  • you data is never loaded or stored on our server. It remains on your local client (i.e., your web browser and computer),
  • in the same way, the visualizations that you obtain are not loaded or stored on our server. These remain also on your local client,
  • more generally, you are free to use VizAssist at your own risk. If you find that the generated visualizations are useful, or if you have used them with success for your own work, then please let us know and mention the project VizAssist,
  • for user evaluation purposes, we have planned (but it is not implemented yet) to store in a log your VizAssist session (i.e., how much time you spend on each step, how you use the GA, etc).

VizAssist file format

The data format in VizAssist uses the following definitions and principles:

  • the file format is the international CSV format, with comma ”,” as separator and ”.” in numbers
  • the actual version of VizAssist assumes that your file will be in the proper “VizAssist” format (see below)
  • several demo files are provided with various characteristics, so please have a look at them to get a better idea of the file format: multidimensional data (Iris, Wine, Pima, French Taxes) with images or url (Box office mojo), times series (French demography, Population), graphs (VizAssist project).

The VizAsssist file format is as follows:

  • Line 1: the name of each data attribute (i.e., the header of each column). These will appear for instance in the legend of the generated visualizations,
  • Line 2: the type of each data attribute (actual types: numeric, ordinal, nominal, time, imageurl, url, source, target, country),
  • Line 3: the importance of each attribute (in [0,100]). Those values are used to tell VizAssist about your preferences for some attributes over the others. If you have no preferences, then please set this line to 50,…,50
  • Line 4: the number of different values for each attribute
  • Line 5 and others: the data items. One value is required for each column. Missing values are not handled by VizAssist.

Here is an example, with the Iris dataset:

Petal width,Petal length,Sepal length,Sepal width,Class
numeric,numeric,numeric,numeric,nominal
50,50,50,50,50
150,150,150,150,3
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa


Another example with time series:

Month,Population size,Nb of weddings,Weddings rate (nb of w. for 1000 people),Nb of birth (alive),Birth rate (nb of b. for 1000 people),Nb of death ,Death rate (nb of death for 1000 people),Nb of children death (less than 1 year),Children death rate (nb of c. d. for 1000 birth)
time,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric
50,50,50,50,50,50,50,50,50,50
456,456,456,456,456,456,456,456,456,456
1975-01-01,52600,15188,3.4,62600,14,55805,12.5,998,15.9
1975-02-01,52608,18847,4.7,58062,14.4,44628,11.1,831,14.3
1975-03-01,52623,30838,6.9,65881,14.7,49246,11,959,14.6
1975-04-01,52640,36151,8.4,68355,15.8,47957,11.1,886,13

More details about the data attributes types:

  • numeric: a dot is use to separate the integer part from the fraction part
  • ordinal: symbolic values that are ordered, like “small”, “medium”, “large”. VizAssist uses the alphabetical order, so you should rename those values as “a-small”, “b-medium”, “c-large”, to be sure that they will be processed in the right order,
  • nominal: symbolic values that are unordered
  • time: a ordered time stamp attribute. This attribute is used to define the time axis in time series,
  • imageurl: a url to an external image that can be included in the visualizations,
  • url: same idea but for any kind of url,
  • country: a code for the name of a country (geographical location that will be used in a world map visualization). To encode the countries, we used this encoding : http://en.wikipedia.org/wiki/ISO_3166
  • source, target: are used to specify a graph in your CSV file. “Source” and “Target” are nodes labels between which you want to create an edge.
  • nodenumeric, nodenominal, edgenumeric, edgenominal: these are data attributes that correspond, respectively, to properties of nodes and edges. These will only be mapped, respectively, to visual properties of nodes or edges only.

Even more details about the representation of graphs (see the demo files):

  • The use a of single CSV file to represent a graph (i.e., nodes and edges) has consequences on the VizAssist format for graphs.
  • To represent a graph, you need one “source” and one “target” attribute.
  • When the value of “source” is different from the value of “target”, then VizAssist interprets this as an edge (example: “Node0” in the source column and “Node1” in the target column means that an edge “Node0→Node1” will be created).
  • When the value of “source” is equal to the value of “target”, then VizAssist interprets this as a node (example: “Node0” in the source column and “Node0” in the target column). All nodes should be represented.
  • The CSV file for a graph contains two contiguous parts: one is about the edges, and the other is about the nodes.
  • Columns of type “nodenumeric” or “nodenominal” denote the properties of the nodes (their values are taken into account when “source=target”).
  • Columns of type “edgenumeric” or “edgenominal” denote the properties of the edges (their values are taken into account when “source<>target”).

Here is an example of a graph (the VizAssist project):

Age,Years of study,Edge-Start,Edge-End,Country,Relation,Relation media,Started x years ago,Frequency,Name,Role in project,Status during project,Gender
nodenumeric,nodenumeric,source,target,nodenominal,edgenominal,edgenominal,edgenumeric,edgenumeric,nodenominal,nodenominal,nodenominal,nodenominal
45,50,60,60,50,60,55,60,60,55,50,50,50
5,4,5,5,5,6,3,6,4,5,5,4,2
0,0,Remy Pradignac,Aili Dong,-,Assistance,email-meeting,1,2,-,-,-,-
0,0,Remy Pradignac,Zui Zhang,-,Assistance,email-meeting,1,2,-,-,-,-
0,0,Zui Zhang,Remy Pradignac,-,Asking questions,email-meeting,1,2,-,-,-,-
0,0,Zui Zhang,Jean de Barochez,-,Asking questions,email-meeting,1,2,-,-,-,-
… followed by the node definition part as shown below in which source = target …
25,4,Fabien Buda,Fabien Buda,Chti,-,-,-,0,Fabien Buda,Help on D3js,MSc Student,men
25,4,Guillaume LeBihan,Guillaume LeBihan,France,-,-,-,0,Guillaume LeBihan,Help on D3js,MSc Student,men
25,4,Remy Pradignac,Remy Pradignac,France,-,-,-,0,Remy Pradignac,On-line D3js version,MSc Student,men
25,4,Jean de Barochez,Jean de Barochez,France,-,-,-,0,Jean de Barochez,On-line D3js version,MSc Student,men
45,12,Gilles Venturini,Gilles Venturini,France,-,-,-,0,Gilles Venturini,Project leader,Prof.,men
45,8,Fatma Bouali,Fatma Bouali,France,-,-,-,0,Fatma Bouali,Project leader,Associate Prof.,women
35,5,Noureddine Saifi,Noureddine Saifi,Algeria,-,-,-,0,Nouredine Saifi,Initial dev.,MSc Student,men
32,8,Abdelheq Guettala,Abdelheq Guettala,Algeria,-,-,-,0,Abdelheq Guettala,Off-line version,PhD Student,men
25,4,Aili Dong,Aili Dong,China,-,-,-,0,Aili Dong,Help on D3js,MSc Student,women
25,4,Zui Zhang,Zui Zhang,China,-,-,-,0,Zui Zhang,Help on D3js,MSc Student,men

About the dataset size:

  • the visualizations in D3 and in a web interface can handle only a limited number of data items,
  • the loaded dataset should not exceed a few thousands lines (but several hundred of columns seem to be possible),
  • In the demo files, we limited the number of data items to a maximum of 2000.

Saving the visualizations

Once you have found a convenient visualization, you can save them in SVG format with the Download button. The legends are generally in a separated SVG in VizAssist, so they will not be downloaded in general.

explore_your_own_dataset.1414052260.txt.gz · Last modified: 2014/10/23 10:17 by venturini