# Modelling lung function

Until very recently regression equations for lung function were based on simple additive linear regression techniques. The by far most popular models had the following form:

Y = a + b•height + c•age + error (adults)
log(Y) = a + b•log(height) + error (children)

 Fig. 1 - Relationship between age and FEV1 in 28,690 white, healthy females. About half of the scatter is due to differences in standing height.

Y is the predicted value, for example FEV1. The “error”, also called residual, is the difference between measured and predicted value. For children and adolescents the indices are usually log transformed, and age is rarely taken into account. When using the above linear models it is commonly assumed that the residuals are the same at any combination of age and height.

Figure 1 displays FEV1 as a function of age in a large number of healthy females aged 3-95 year. It illustrates a few points:

1. The relationship cannot be characterised by straight lines.
2. The scatter (“error”) is not constant.
3. The scatter is not proportional to the predicted value.
 Fig. 2 - Difference between measured and predicted FEV1 in healthy white females when using the ECSC/ERS [1] prediction equations.

We can calculate the predicted values for FEV1 for the females in the above figure using the widely used ECSC/ERS [1] prediction equations. The mean difference between measured and predicted value of FEV1 should be 0 if the equation fits the data perfectly. The figure on the right shows that there is a systematic difference: the measured FEV1 is on average 180 mL larger than predicted. The values predicted by ECSC/ERS are therefore systematically too low.

This brief introduction leads to the following conclusions: