PLS Discriminant Analysis

DataLab is a compact statistics package aiming at exploratory data analysis. Please visit the DataLab Web site for more information....

Home

Features of DataLab

Mathematical/Statistical Analysis

Classification & Clustering

PLS Discriminant Analysis

Index

Statistical Background

PLS Discriminant Analysis

Command: Math -> PLS Discriminant Analysis -> Create Classifier ...

In order to create a classifier using PLS Discriminant Analysis (PLS-DA) one has to specify both the independent and the dependent variables by clicking the corresponding fields at the top left and the top right, respectively. The dependent variables have to be dichotomous. Further, one can select one of two scaling options: "mean centering" and "standardization" with the first method being the standard PLS approach.

After selecting the variables the classifier can be calculated by clicking the "Calculate" button. The PLS algorithm works on a number of predefined factors (default is currently 20) which, of course, also depends on the dimension of the data matrix. Thus the number of actual factors may decrease in some cases.

After the successful calculation of the PLS-DA classifier it can be stored on disk (button "Save Model") for its later application to new data. Further, the following information is available on different tabs:

Summary A list of the amount of variance explained by each factor (both for the independent and the target variales). Non-existent factors (due to collinear variables) are indicated by a series of stars. In addition, the explained variances are displayed in the diagram to the right of the factor list. This diagram immediately shows how many factors are required to build up a PLS model.

Classification Results This tab shows the confusion matrices for the dependent (target) variables and the ROC (receiver operating characteristic) curve for a selected target variable. See below for details.

Cross Validation This tab is to be used for cross validating the PLS model against the number of factors. Both the size of the test set and the number of repetitions can be set by the user. If the test set size is set to 1 (full cross validation) the number of repetitions is ignored since repetitions are not meaningful.

Loadings X Displays the loadings of the independent variables as a spectral plot.

Reg.Coeffs. Displays the regression coefficients of the model as a spectral plot.

Details Provides a list of details on the PLS results.

Hint: The calculation time of PLS is approximately proportional to the number of factors times the number of target variables times the square of the number of input variables. Thus it is a good idea to restrict the number of input variables to less than 1000 (if possible). Example: taking 10000 input variables instead of 1000 increases the calculation speed by a factor of 100.

Classification results

The classification results are presented in form of confusion matrices which show the false positives and false negatives in orange color, the true positives in green, and the true negatives in gray. Each field of a confusion matrix contains the counts of the objects falling into the particular category. The optimum decision threshold is calculated from the ROC curve at the lower right. In order to switch between the ROC curves of the target variables you can either select one of the target variables at the top right list of variables, or double-click the corresponding confusion matrix.