DataLab is a compact statistics package aiming at exploratory data analysis. Please visit the DataLab Web site for more information....


Dichotomization of Data

Command: Tools -> Dichotomization...

Sometimes it is necessary (e.g. for classification purposes) to create dichotomous (binary) variables. The command Tools/Dichotomization... provides an easy and effective interface for dichotomization. Before performing the dichotomization the data have to be selected, the threshold has to be set and the range of the resulting binary values has to be specified.

The range of values to be dichotomized can be selected from the following options: a particular column of row of the data matrix, all marked cells of the matrix, the entire data matrix, or all data columnwise or rowwise. The threshold may be set numerically to a particular value, or the mean of the selected data or a percentile (quantile) of them may be used. The probability of the quantile may be adjusted in steps of 0.01 (in percent).

The selected data will be dichotomized by clicking the button "Execute" which results in their replacement by the corresponding binary values. In the case you make a mistake, the data can be reverted to their original state by clicking the undo button - this will reset the data to the state when the dichotomization command has been called.

Hint: The 0.5 quantile is equal to the median of the selected data. Thus using the 0.5 quantile as threshold results in dichotomized data exhibiting equal numbers of 0 and 1.