## Residual standard deviation, z-score and percentile

The scatter that remains after making allowance for independent variables - that is the difference between measured and predicted (the deviation) - is the residual scatter. Some observations are larger, some are smaller than the mean. The average of all deviations (observed – predicted value) is nil, and at that point predicted and observed values are identical. The standard deviation of the residuals is the residual standard deviation (RSD).

The yellow dome-shaped bell depicts the frequency with which the observations exceeded or fell below a value predicted from the regression equation; it is called a frequency distribution or probability density function. The x-axis does not depict the absolute values of the deviations, but they have instead been divided by the RSD, and are thus dimensionless numbers. If deviations are normally distributed, as in the figure, 50% of all observations are below predicted and 50% above. This point divides the population in equal halves and is therefore called the median; the 50th percentile in a normal population equals 0 RSD: in a normal distribution mean = median.

It is convenient to express the difference between the observed and predicted value as the standard deviation score, *i.e.* in the number of RSD that the observed value differs from predicted, as in the illustration. It is also called the z-score. In 95% of individuals (observed – predicted) < 1.64•RSD; this therefore marks the 95th percentile. In only 5% of cases (observed – predicted) < -1.64•RSD; this then marks the 5th percentile. The area between –1.64•RSD en +1.64•RSD in a normal distribution therefore comprises 90% of the population, and hence delineates the 90% reference interval.

The 95% confidence interval in a normal
distribution is between – 1.96·RSD and +1.96·RSD, *i.e*. 2½% of all observations are smaller than (predicted
– 1.96·RSD), and 2½% are larger than (predicted
+ 1.96·RSD); -1.96·RSD marks the lower 2½ percentile.