DataLab is a compact statistics package aiming at exploratory data analysis. Please visit the DataLab Web site for more information....


Principal Component Regression

Command: Math -> Multiple Regression -> Principal Component Regression...

The command Math/Multiple Regression/Principal Component Regression... allows to regress a variable against the principal component scores of selected independent variables. After clicking Math/Principal Component Regression... the user has to select the independent variables, the target variable, and the type of scaling applied to the variables before principal components are calculated:

Next, click the "Calculate" button to perform the principal component analysis of the independent variables and to calculate the regression of the principal component scores against the dependent variable. By default all components having an eigenvalue greater than one are used for the regression. You can change this either by selecting the proper number of components from drop down box "No. of PCs" or by clicking the Scree plot on the "Summary" tab of the PCR results (the selected number of components is indicated by a vertical red line; all components left of the vertical line are selected for the subsequent regression step).

The principal component analysis of the independent variables can be based on differently scaled variables:
no scaling calculate a new PCA using the data as is (PCA is based on the scatter matrix)
mean centering calculate a new PCA using mean centered variables (PCA is based on the covariance matrix)
standardize calculate a new PCA using standardized variables (PCA is based on the correlation matrix)

 

The results of the principal component regression are provided on five different tabs:
Summary The summary tab shows the sorted list of principal components with the percentage of variance and the scree plot. The scree plot shows the logarithm of the eigenvalues plotted against the principal component number. Eigenvalues which are greater than 1.0 (for standardized variables), or eigenvalues with a cumulated sum below 98% (for non standardized variables) are indicated by red crosses. The scree plot can be used to specify the number of principal components which are used for the final regression step. Clicking the scree plot changes the number of used components (indicated by a red line and a yellow "curtain" left to this line).
Actual vs. Estimated This tab plots the estimated values of the target variable against the actual target variables. A 45-degree line is drawn to indicate the optimal fit (the closer the data points are to this line the better is the regression fit). When moving the mouse cursor over the data points, the nearest point is enclosed by a square; the coordinates of this enclosed point and the corresponding row name are displayed right to the chart.
Distribution of Residuals The distribution of the residuals is indicated by a histogram shown on this tab. The ideal normal distribution is indicated by a dashed line across the histogram.
Residuals This tab shows the residuals plotted against the object number. Again, when moving the mouse cursor over the data points, the nearest point is enclosed by a square; the coordinates of this enclosed point and the corresponding row name are displayed right to the chart.
Details The "Details" tab shows the mathematical details of the principal component regression.