DataLab is a compact statistics package aiming at exploratory data analysis. Please visit the DataLab Web site for more information....


Variable Selection

Command: Math -> Multiple Regression -> Multiple Lineare Regression -> Variable Selection

The command Math/Multiple Regression/Multiple Linear Regression/Variable Selection allows to automatically select a set of "best" variables. For that purpose the following four selection methods are available:

  • Forward selection
  • Backward eliminiation
  • Stepwise regression
  • Testing all possible combinations of variables
  • Simulated annealing
You can control which variables are considered during the selection process by ticking off the proper check boxes in the variable list. Variables which are to be included in the model have to be marked in the column "Incl." (include), variables which must not be part of the model, have to be marked in the column "Excl." (exclude). The target variable has to be ticked off in the column "Target" and is automatically excluded from the list of independent variables.

Hint: Please keep in mind that both the exclusion and the inclusion of variables may reduce the required time to perform the selection considerably since the number of possible combinations is reduced massively. Thus, if you know of particular variables which should always be part of the model, or which on no account should be part of the model, it is recommended to tick off the corresponding check boxes.

In the table at the right ("Selected Models") DataLab shows a list of best sub-models as the emerge during the selection process. Each model is described by the following parameters:
RMS "Residual Mean Square" (standard deviation of the residuals)
min|t| smallest absolute t-statistic of the coeffients of the model
AIC "Akaike Information Critieron"
BIC "Bayes Information Criterion"
F F statistic obtained from the ANOVA of the model
r2 goodness of fit of the model

If you choose simulated annealing as the selection method you have to specify both the selection criterion and the cooling rate. Please note that simulating annealing is a random process which heavily depends on the starting conditions. We therefore recommend to repeat the process several times and use the best variable set.

After the variable selection has been completed DataLab indicates the best model by inverting the color of the best model. You can change the selection of the model at any time by clicking the list of models. The selected model may be copied into the MLR window by clicking the button .

Details on the results of the variable selection can be obtained by clicking the button .