DataLab is a compact statistics package aiming at exploratory data analysis. Please visit the DataLab Web site for more information....


Format of ASC files

DataLab uses a simple ASCII-Format to import or export data. This data file (text file) has the following structure:

Line 1 Arbitrary header line, containing a maximum of 255 characters.
Line 2 Parameter NFEAT (integer): number of columns (variables, features) of the data matrix (not including the optional object names and class information). Any comment may follow this number as long as this comment is separated by at least one blank from the numeric value and the whole line is no longer than 255 characters.
Line 3 Parameter NOBJ: number of objects of the data matrix. Any comment may follow this number as long as this comment is separated by at least one blank from the numeric value and the whole line is no longer than 255 characters.
Line 4 Parameters FLAG_ROWATTRIB, FLAG_FEATNAMES, FLAG_OBJNAMES, FLAG_NOMORDVARS (possible values: 'TRUE' or 'FALSE'). These parameters control the presence or absence of some additional information, such as the class information (FLAG_ROWATTRIB), the names of features (FLAG_FEATNAMES), the names of objects (FLAG_OBJNAMES), or the specification of nominal and ordinal variables (FLAG_NOMORDVARS). If any of these parameters is 'TRUE' the specific information is included in the following data table. The format of the data table is adjusted accordingly. The values of the parameters must be separated by at least one blank. Any comment may follow these parameters.
Lines 5..k Names of features: the following line(s), holding the names of the features, is (are) present only if the parameter FLAG_FEATNAMES is set 'TRUE'. The identifiers of the features must be separated by at least one blank or any ASCII character below 32 and they have to be stored in the same sequence as the variables. If a feature identifier contains blanks, the identifier has to be enclosed in double quotes ("). A single double quote can be included by using two double quotes (""). Empty feature names are indicated by the special string '(.;#;.)'. The number of names has to be equal to the number of features. The feature names may be stored in any number of lines and the lines may be of any length. Note that the maximum length of a column identifier is 50 characters.
Lines k+1..n Class information, object names, and data: the data table is stored row by row, starting with the first variable as the first entry. Each row of variables is preceded by optional class information and an optional row identifier (=object name). This additional information is stored only if the parameters 'FLAG_ROWATTRIB' and/or 'FLAG_OBJNAMES' are set 'TRUE'. If an object name contains blanks, the identifier has to be enclosed in double quotes ("). A single double quote can be included by using two double quotes (""). Empty row names are indicated by the special string '(.;#;.)'.
Lines beyond line n Specification of Nominal and Ordinal Variables: non-interval scaled variables (i.e. nominal, ordinal and ratio-scaled variables) are defined in a section enclosed by the tags <VARTYPES> and </VARTYPES>. Each variable is characterized by a single line which consists of three parts separated by at least one blank: first the column number of the variable, next the type of the variable ("ordinal", "nominal", or "ratio"), and at last the definition of nominal/ordinal identifiers. The identifier defintions are enclosed in angle brackets and are specified by the syntax <"ordinal number"="identifier">. The identifiers must not contain angle brackets.

User-defined data: the user may append specific information enclosed by the tags <CUSTDATA>...</CUSTDATA>. The format within the <CUSTDATA> field is completely left to the user.

Between the values of a row any number of carriage returns or blanks are allowed. In any case it is strongly recommended that the data table be stored in such a way that it can be read and edited easily.

The values may be stored in any format (integer, floating point, exponential notation) and they must be separated at least by one blank. Ordinal and nominal values are represented by their ordinal number, the identifier of the nominal/ordinal variables are stored at the end of the file. The class information must be of integer type, the row identifiers are interpreted as strings. The lines can have any length and must not contain any comment. Empty numeric data cells are indicated by the special string '###'.

The following example shows an ASCII data file, which contains 10 rows of 4 variables each. Class information, features names and object names are included. Three data cells in the second column are empty. The third column represents an ordinal variable. Further, three items of custom data are stored

This is a sample file
4                 ;number of features
10                ;number of objects
TRUE TRUE TRUE TRUE    ;class info, feat.names, obj.names, nominal variables
                   F1      F2    quality   "oil speed"
1   S23X4         3.380    2.20     1        -4
1   S24X4        15.900   -2.20     2        -4.033E-01
1   C24X3         3.607    1.20     1        2.2
2   "S12 early"  -3.305    2.20     1        -4
2   S12          35.340   -2.20     2        2.888E-01
1   SWINTER      13.670    ###      3        22
2   "SPG MER 9"  -3.376    ###      3        4.0
1   B1           25.375    ###      3        -1.113E+01
2   B2           -1.650    1.20     1        -0.1
2   B3            2.509    1.20     2        -10.0
<VARTYPES>
3 Ordinal <1=poor><2=usable><3=excellent>
</VARTYPES>
<CUSTDATA>
<CoordsX>22.3661</CoordsX>
<CoordsY>-302.5262</CoordsY>
<Source>local measurement</Source>
</CUSTDATA>