
Description 
Obj. 
Var. 
Statistical Methods 
Download 
Bananas 
Some characteristic variables of bananas. The bananas have been obtained from various supermarkets; they have been weighted (both the whole bananas and their skins) and their geometric dimensions have been determined (two measures of length, and the diameter at the broadest location). 
40 
6 
Linear Regression: Try to find a model in order to estimate the weight of a banana from its length. 
DataLab format:
bananas.idt
Text format:
bananas.zip

Banknotes 
Geometric distances of 100 genuine and 100 forged banknotes. The data have been taken courtesy to H. Riedwyl from the book B. Flury, H. Riedwyl, Angewandte multivariate Statistik, G.Fischer Verlag, Stuttgart (1983). 
200 
7 
Discriminant Analysis: Try to develop a classifier which is able to discriminate genuine and forged banknotes. 
DataLab format:
fluriedw.idt
Text format:
fluriedw.zip

Countries of the World 
Some demographic and economic data of the countries of the world around 1989. The data have been obtained from the CIA Factbook (1989). 
122 
10 
Multiple Regression: Which factors does the life expectance depend on, which have a positive influence, which have a negative? Cluster Analysis: Which countries are most similar to Austria? 
DataLab format:
worldpop.idt
Text format:
worldpop.zip

Linguistic Analysis 
Frequencies of 2character combinations obtained from two nearly identical statistical textbooks, one written in German, the other written in English: see Grundlagen der Statistik and Fundamentals of Statistics. The set of variables has been reduced to the 180 most abundant character combinations. 
1054 
180 
Principal Component Analysis: check whether PCA indicates any differences of the two books.
PLS Discriminant Analysis: develop a binary classifier which is able to discriminate between English and German texts; find out which of the twocharacter combinations are most important to distinguish the two languages. 
DataLab format:
fundstat_lang_180.idt
Text format:
fundstat_lang_180.zip

Lynx Pelts 
Number of traded lynx pelts in Canada between 1821 and 1910. The data have been obtained from Elton, C. and M. Nicholson: "The tenyear cycle in numbers of the lynx in Canada", Journal of Animal Ecology 11 (1942):215244 
90 
2 
Autocorrelation and Fourier Transform: What is the approximate population cycle length? 
DataLab format:
lynx_pelts.idt
Text format:
lynx_pelts.zip

Mineral Waters 
The data set contains the results of chemical analyses of 32 mineral waters and the geographical coordinates of their sources. The data of the analyses have been taken from the labels of the water bottles.

32 
10 
Multiple Linear Regression: Which constituents of the mineral waters play a role in forming the solid residues? Cluster Analysis: Which mineral waters are most similar? 
DataLab format:
minwater.idt
Text format:
minwater.zip

Residuals 
Artificial dataset for simple regression exhibiting three different structures of the residuals.

100 
4 
Linear Regression: What is the effect of nonsymmetric redisuals on the results of a linear regression? See the DataLab blog for more details (in German language). 
DataLab format:
reg_residuals.idt
Text format:
reg_residuals.zip

Boiling Points 
This data set contains the boiling points and some physicochemical properties of 185 chemical substances. 
185 
13 
Stewise Regression: Try ro find a model which predicts the boiling points using MLR. ANOVA: Does the boiling point depend on the number of branches in the molecule? PLS: create an optimal PLS model and compare it to the MLR model obtained by stepwise regression. 
DataLab format:
boilpts.idt
Text format:
boilpts.zip

Temperature Sensors 
Resistance thermometers use the electric resistance of a thin platin wire to determine the temperature. The data set contains of 15 calibration measurements, two of them being slightly incorrect. Because of the small deviations of the incorrect values the errors can be seen only in the residual plot. 
15 
2 
Parabolic Regression: Compare the calibration curves obtained by parabolic regression with and without the erroneous measurements. 
DataLab format:
pt100sensor.idt
Text format:
pt100sensor.zip
