DataLab is a compact statistics package aiming at exploratory data analysis. Please visit the DataLab Web site for more information....

Hierarchical Cluster Analysis

Command: Math -> Cluster Analysis...

The command Math/Cluster Analysis provides several methods for constructing dendrograms. The user may select upon five different clustering procedures in combination with three different distance measures. The resulting dendrograms can be used to assign new class numbers to the data objects. After activating the cluster analysis the user has first to select the variables to be used for the clustering. Subsequently the dendrogram can be calculated by clicking the "calculate" button. The dendrogram can be zoomed in and out, and panned by setting the mouse function using the corresponding buttons of the command bar.

Variable Selection Click the list of variables to select new variables for calculating the dendrogram. The user may select any combination of variables by means of the variable selection dialog which is displayed when the Change Variables button is pressed.
  Assign Classes A dendrogram can be used to assign new class numbers to the objects. Two options are available:
(1)  You have to define the minimum distance between the clusters which is used as the criterion for the assignment of new class numbers. The distance can be set interactively moving the dotted red line in the dendrogram.
(2)  You can assign class numbers to individual cluster branches clicking the root of a particular branch. The next unoccupied class number is then assigned to this branch.
 Reset Classes Clicking this button resets all classes of the dendrogram to zero.
 Transfer Classes Class numbers that are assigned to the branches of the dendrogram are transferred to data container of DataLab.
Linkage Type The dendrogram is recalculated whenever any of the parameters are changed. The user may select one of the following clustering methods:
  • Single Linkage
  • Complete Linkage
  • Average Linkage
  • Ward's Method
  • Flexible Strategy (this method requires an extra parameter alpha, which can be set by using the scrollbar below the Linkage Type box)
Distance Measure The dendrograms can be calculated using four different distance measures:
  • Euclidian
  • Squared Euclidian
  • Manhattan
  • Jaccard coefficient

Please note that the Jaccard coefficient is not a distance measure but a measure of similarity. The interpretation of such a dendrogram will thus be different to dendrograms obtained by using "normal" distance measures.

 Store in Newick Format The current dendrogram can be stored using the Newick format.
 Show Report The report contains the numeric description of the dendrogram in two formats. In the first part of the report the dendrogram is described as a table, the second part contains the Newick-String.

The cluster table contains four columns; the first and second column show the object number and the object identifier separated by a pipe symbol. Dendrogram nodes are indicated by the node number and a '+' character. Each object or dendrogram node has a parent node which is specified in the third column. The distance of the object/node to the base line of the dendrogram is listed in column 4. Please note that the table always has N-1 rows (N = number of ojects) and that the nodes are specified by numbers from N+1 upwards.