DataLab is a compact statistics package aiming at exploratory data analysis. Please visit the DataLab Web site for more information....


Random Forest Regression

Command: Math -> Random Forest -> Create Model

Random forests can be used both for classification and regression purposes. The DataLab random forest tool automatically recognizes dichotomous target variables exhibiting the values 0 and 1 (binary variables) and activates the classification mode in the case of a binary target variable.

The random forest regression tool provides several tabs showing different kinds of information:
  • Actual vs. Estimated: This tab shows the estimated target data plotted against the actual ones.
  • Residuals: The corresponding residuals.
  • Variable Importance: The variable importance plot gives you an indication which of the descriptors contribute most to the model. You can move a horizontal threshold in the bar graph to select the most important ones. Clicking the "copy variables" button () allows to replace the current set of descriptors by the most important ones.
  • R Scan: The R Scan tab allows to find the best resampling parameter R. Click "Start R Scan" to scan the entire allowed range of R values. Please note that the R Scan results depend on the number of trees. Thus you have to repeat the scan when you change the number of trees.
  • Tree Scan: The Tree Scan tab allows to find the optimum number of trees. Click "Start Tree Scan" to scan the number of trees. Please note that the Tree Scan results depend on the R parameter. Thus you have to repeat the scan when you change the R parameter.
  • Details: Here you find the results listed in a text document which can be easily copied for reporting purposes.

How To: Training a random forest regression model is straight forwarward, please follow these steps:
  1. Select the descriptors and the target variable. The target variable has to be continuous.
  2. Set the parameters R and Number of Trees. In most the cases, the default values will work well.
  3. Click "Calculate" to start the calculation of the random forest model.
  4. If you are satisfied with the model save it on disk in order to be able to apply it to unknown data.