Abstract
The present study examined two contrasting multilevel model structures to describe the developmental (longitudinal) changes in strength and aerobic power in children: 1) an additive polynomial structure and 2) a multiplicative structure with allometric body size components. On the basis of the maximum loglikelihood criterion, the multiplicative “allometric” model was shown to be superior to the additive polynomial model when fitted to the data from two published longitudinal studies and to provide more plausible solutions within and beyond the range of observations.The multilevel regression analysis ofstudy 1 confirmed that aerobic power develops approximately in proportion to body mass,m ^{1/3}. The analyses from study 2 identified a significant increase in quadriceps and biceps strength, in proportion to body size, plus an additional contribution from age, centered at about peak height velocity (PHV). The positive “age” term for boys suggested that at PHV the boys were becoming stronger in the quadriceps and biceps in relation to their body size. In contrast, the girls’ age term was either negligible (quadriceps) or negative (biceps), indicating that at PHV the girls’ strength was developing in proportion to or, in the case of the biceps, was becoming weaker in relation to their body size.
 multilevel regression
 longitudinal growth
 multiplicative models
 allometric body size components
in previous crosssectional studies (14, 17), the effect of lifestyle factors, e.g., physical activity, training, and diet, on developmental changes in performance variables, e.g., aerobic power and strength, has been confounded by the simultaneous changes due to growth and development. To identify and separate the relative contribution of these lifestyle factors from the changes due to growth and maturation, there is a need to collect such “growth” data longitudinally. An appropriate method to analyze such longitudinal data is some form of multilevel modeling process (9). When modeling growth data, Goldstein (7, 8) incorporates “age” as additive polynomial terms, where any systematic change in the residual error, e.g., heteroscedastic (multiplicative) errors, can also be modeled simultaneously within the multilevel analysis. However, recently, Nevill and Holder (14, 15) criticized the use of additive models to explain differences in variables such as aerobic power. They argue that because variables such as strength and aerobic power are known to be proportional to but nonlinear with body mass (i.e., m ^{1/3}), an additive polynomial model is unlikely to satisfactorily explain developmental changes over time. The authors propose an alternative multiplicative (proportional) model with allometric body size components to describe these developmental changes that should successfully accommodate the nonlinear but proportional changes with body mass and naturally help overcome the heteroscedastic (multiplicative) errors observed with such variables.
Hence, in the present study we shall examine the two multilevel model structures (i.e., the additive polynomial structure and the alternative multiplicative “allometric” structure) using two longitudinal studies: study 1, that of BaxterJones et al. (2), where the additive polynomial model structure was originally applied, and study 2, that of Round et al. (19), where the alternative multiplicative model structure was chosen. With the assumption that the additive polynomial model adopted by BaxterJones et al. was optimal, a comparison will be made with the equivalent multiplicative model to examine the hypothesis that a multiplicative allometric model structure is preferable to an additive polynomial structure when fitted to both data sets fromstudies 1 and2.
METHODS
Study 1
Subjects.
A detailed description of the study design and selection criteria of the athletes has been published previously (3). Briefly, a random sample of 453 coachnominated athletes (231 boys and 222 girls) from four sports (gymnastics, soccer, swimming, and tennis) was studied for 3 yr consecutively. The study used a linked longitudinal design (11), following five age cohorts, to include prepubertal, pubertal, and postpubertal children (Table 1).
The age at which the youngest child entered the study differed according to the requirements of each sport. On entry into the study, the youngest athletes were 8 yr of age and the oldest were 16 yr of age. During the course of the study the composition of these clusters remained the same. Inasmuch as there were overlaps in ages between the clusters, it was possible to estimate a consecutive 11yr development pattern over the much shorter period of 3 yr.
One of the disadvantages of longitudinal studies is the number of dropouts during the course of the studies. In total 187 subjects left the study. Of these, 63 were excluded because they retired from their sport, another 34 were excluded because they were not training intensively enough [as defined by thresholds of intensity related to hours trained per week (20)], and 90 (19.9% of the total sample) withdrew themselves. Those subjects who retired from their sport or who were not considered to be intensively training were not invited to return for reassessment because of financial constraints, a common problem with longitudinal studies (21).
Measurements.
Body height, weight, pubertal development, and maximal O_{2} uptake (V˙o _{2 max}) were measured annually for 3 yr consecutively. Subjects were grouped into three pubertal stages to ensure adequate cell sizes within each sport: prepubertal (PP), midpubertal (MP), and late pubertal (LP). Puberty was determined by visually assessing sex characteristics: stage of breast development in girls and genitalia development in boys. Five discrete stages have been clearly described (23) in this study:stage 1 (PP), stages 2 and 3 (MP), andstages 4 and5 (LP).V˙o _{2 max} was measured with the subject running on a motordriven treadmill. Subjects ran at an individually predetermined rate, on a 3.4% grade, for 1 min, then at 0.5 km/h every minute until exhaustion. Measurements of gas exchange were obtained by standard opencircuit techniques. Subjects breathed through a facemask, and ventilation was measured through a turbine volume transducer attached to a control unit with digital display. Expired air was analyzed. The system was calibrated before each session with standard gases of known O_{2}and CO_{2} concentration. Heart rate was continuously recorded during exercise. The highest O_{2} uptake was accepted asV˙o _{2 max} if a plateau occurred (an increase of <2 ml/kg with an increase in workload) or if one of the following criteria was met: heart rate >95% of the predicted maximum corrected for age (predicted maximum heart rate = 220 − age) or respiratory exchange ratio >1.1.
Study 2
Subjects.
Details of the study design have been published previously (19). Briefly, 100 North London schoolchildren were studied: 50 boys and 50 girls. Three sets of measurements were made each year at ∼4mo intervals. The duration of followup was 5 yr, with children being recruited in groups commencing at 8–12 yr of age and finishing at 13–17 yr of age at the end of the study.
Permission to approach the children was obtained from the Local Education Authority, the head teachers of the schools, and the Committee on the Ethics of Human Investigation at the Middlesex and University College Hospitals (now the University College London Hospitals). Children volunteered for the study after being fully informed about the tests involved and how long they would continue in “the team.” Parental consent in writing was then obtained for all the children who had volunteered. The parents were asked to supply the name of their general practitioner, who was informed in writing about the study and given the names of the children on their list who were participating.
As in all longterm studies, there was some loss of subjects during the 5yr period. Sixty percent of the children completed the full 5 yr, with similar proportions of boys and girls. The main loss occurred when children left school at 15 and 16 yr of age, but most of these children had been in the study for at least 4 yr by this time. The other major loss occurred when the 11yrold children changed from junior to senior school, with some of the children continuing their education at schools outside the area. The duration of the followup was 5 yr, with 84% of the girls and 82% of the boys being studied for at least 4 yr.
Measurements.
Maximum voluntary contraction strengths of the knee extensors and forearm flexors (hereafter referred to as quadriceps and biceps muscles, respectively) were measured using a chair described by Parker et al. (17). The investigators were experienced in the use of the various procedures, and the main members of the team responsible for overseeing the measurements remained the same throughout the study.
Each child was allowed to become familiar with the apparatus and procedures and then performed three maximum voluntary contractions, the highest of which was recorded. As in a previous study (17), there was no consistent trend among the three strength measurements. On the rare occasions when there was appreciable variation among the three efforts, the contractions were repeated until three values were obtained that differed by ≤4% from the highest force.
Standing height was measured using a portable stadiometer (Holtain). Children removed their shoes and stood with their heels and back against the upright with the neck gently extended by upward manual pressure. Height was recorded to the nearest millimeter. Weight was measured with portable scales to the nearest 0.1 kg with the children in light indoor clothing without shoes.
The multilevel analysis reported was conducted on a subset of the children (33 boys and 25 girls) who had their growth spurt during the time of the study. For these children, the number of visits was 9.8 ± 1.9 and 9.6 ± 2.3 (SD) for the boys and girls, respectively. In these cases, the time of peak height velocity (PHV) was identified and used in the multilevel analysis to assess the contribution of height, weight, age, and sex to the differences in strength observed between boys and girls.
Statistical Methods
An appropriate method of analyzing longitudinal (repeatedmeasures) data is to adopt this multilevel modeling approach (9), using the program Multilevel Models Project MLn (18). Multilevel modeling is an extension of ordinary multiple regression, where the data have a hierarchical or clustered structure. A hierarchy consists of units or measurements grouped at different levels. One example is repeatedmeasures data, where individuals are measured on more than one occasion. Here, the subjects or individuals, assumed to be a random sample, represent the level 2 units, with the subjects’ repeated measurements recorded at each visit being the level 1 units. In contrast to traditional repeatedmeasures analyses, the number of visits is also assumed to be a random variable over time. The two levels of random variation take into account the fact that growth characteristics of individual children, such as their average growth rate, vary around a population mean and also that each child’s observed measurements vary around his or her own growth trajectory.
In the present work the multilevel regression analyses were performed using the MLn software to identify those factors (different sports and levels of maturity in study 1 and sex differences in study 2) associated with the development of aerobic power and strength, respectively, with adjustment for differences in body size (height and weight) and age. The two levels of hierarchical or nested observational units used in both studies were the number of visits at level 1 (within individuals) and the sample of children (between individuals) at level 2.
Additive Model Structure
Additive polynomial models were proposed by Goldstein (7, 8) to describe longitudinal changes in growthrelated data over time. On the basis of these models, BaxterJones et al. (2) adopted the following additive polynomial model to describe the developmental changes in aerobic power (y =V˙o
_{2 max}, in l/min)
Other explanatory variables were incorporated into the analysis by introducing them as indicator variables. For example, instudy 2, sex was introduced as an indicator variable (boys = 0, girls = 1), since, in this way, the boys’ constant term would be incorporated within a baseline parametera_{i} , from which the girls’ constant term would deviate. Similarly, instudy 1 the performance of PP swimmers was used as the baseline parameter (the constanta_{i} ) with which the performance of other sports (tennis, soccer, and gymnastics) and levels of maturity (MP and LP subjects) were compared, i.e., allowed to deviate from the constant baselinea_{i} . To allow different growth rates to be associated with various groups, the product of age and a group indicator variable was introduced as an additional predictor variable into the multilevel analysis, i.e., by introducing a sport ∗ age interaction term.
Multiplicative Model Structure
However, in contrast to the additive polynomial model adopted by BaxterJones et al. (2), an alternative multiplicative allometric model was proposed for strength (y = strength, in N) by Round et al. (19) on the basis of the work of Nevill (12) and Nevill and Holder (15) as follows
The model can be linearized with a logarithmic transformation, and a multilevel regression analysis on log_{e}(y) can be used to estimate the unknown parameters. The transformed loglinear multilevel regression model becomes
Model Comparison
Clearly, because the models are not nested or hierarchical, a direct comparison between two competing model forms, given byEqs. 1 and 3 , is not possible using traditional criteria such as the residual sum of squares, the standard error, and the coefficient of determination (R ^{2}). However, when proposing tests to compare separate model forms, Cox (6) chose the same maximumlikelihood criterion that was subsequently developed by Box and Cox (5) to help select the most suitable transformation to provide normally distributed errors with constant variance.
Because the multilevel regression analysis software MLn also adopts the maximum log likelihood as its standard criterion of model assessment (quality of fit), we use this criterion to compare the two competing models (Eqs. 1 and 3 ). This will be done in two stages. First, by use of the transformation methods of Box and Cox (5), a comparison will be made by allowing the dependent variable (y) to be untransformed or logarithmically transformed, e.g., y =V˙o _{2 max} ory = log_{e}(V˙o _{2 max}), and the independent predictor variables to be incorporated as the additive polynomial model (Eq. 1 ). Briefly, the method of Box and Cox examines a wide range of transformations and, on the basis of a maximum loglikelihood criterion, selects the optimum transformation for the chosen multiple linear regression model.
The second stage will examine the use of the three polynomial terms, weight, height, and height^{2}, and consider replacing them with the two logarithmically transformed terms log_{e}(weight) and log_{e}(height). Once again, the competing models will be assessed using the maximum loglikelihood criterion. A final check would be to reassess the first stage, if the second stage indicates that the independent variables of weight and height should be included as logarithmically transformed terms.
A simple modification of the maximum loglikelihood criterion would produce the Akaike information criterion (1) (AIC = −2 × maximum log likelihood + 2 × number of parameters fitted) that would take into account the different number of fitted parameters in the two model structures to be compared. When the performance of competing models is assessed, the model that fits the data best is the one with the largest maximum log likelihood or with the minimum AIC value. However, because the maximum loglikelihood criterion is always negative, we are seeking to identify the model that minimizes both criteria in absolute terms.
RESULTS
Study 1
Although BaxterJones et al. (2) did not report the maximum loglikelihood criteria when fitting the model (Eq.1 ) to the 231 young male and 222 young female athletes separately (Tables 4 and 5 in Ref. 2), the criteria were found to be −95.07 and −9.47 for the boys and girls, respectively. Simple replacement ofV˙o _{2 max} with log_{e}(V˙o _{2 max}) increased the maximum loglikelihood criteria for boys and girls to −91.58 and −8.25, respectively. However, replacement of the three terms weight, height, and height^{2} with just two terms, log_{e}(weight) and log_{e}(height), further increased the maximum loglikelihood criteria to −86.49 and −5.07 for the boys and girls, respectively, confirming the better fit of the logarithmically transformed multiplicative allometric model (Eq. 3 ) than the additive polynomial model (Eq. 1 ) originally adopted by BaxterJones et al. These results, together with the number of fitted parameters for each model, are summarized in Table2.
Indeed, on the basis of the logarithmically transformed multiplicative allometric model (Eq. 3 ), the resulting parsimonious solution for the aerobic power of young male athletes was simpler (Table 3), no longer requiring the maturity variable MPPP (thus allowing the same constant for MP and PP male athletes) or the tennis ∗ age and gymnastics ∗ age interaction terms. The predictedV˙o _{2 max} values of the young male athletes, based on the results in Table 3, are plotted in Fig. 1.
Similarly, the parsimonious solution for the aerobic power of young female athletes (Table 3) did not require maturity indicator variables MPPP or LPPP, which were reported originally by BaxterJones et al. (2). The predictedV˙o _{2 max} values of the young female athletes, based on the results in Table 3, are plotted in Fig. 2.
Because the multiplicative model requires fewer parameters than the additive model, the AIC criterion (1) would provide stronger support for the multiplicative model than the maximum loglikelihood criterion reported above.
BaxterJones et al. (2) found that by allowing (fitting) age^{2} to be a random effect term in their additive model (Eq. 1 ) the increase in maximum log likelihood did not justify the three additional fitted parameters (i.e., the covariances constant ∗ age^{2} and age ∗ age^{2} and the variance age^{2}) at level 1 or 2. The same conclusion was reached when age^{2}was declared as a random effect in the multilevel analysis on the basis of the logarithmically transformed multiplicative model (Eq. 3 ).
Study 2
Separate multilevel regression analyses were performed on the strength of quadriceps and biceps, centered about age at PHV, of 58 normal British schoolchildren (19). PHV was at 12.2 ± 0.94 yr among the girls (n = 25) and 13.4 ± 0.85 yr among the boys (n = 33). When the additive polynomial model (Eq. 1 ) was fitted to the untransformed quadriceps and biceps strength measurements, the maximum loglikelihood criteria were −2,649.4 and −2,350.5, respectively. Simple replacement ofy = strength withy = log_{e}(strength) increased the maximum loglikelihood criteria for quadriceps and biceps strength to −2,612.5 and −2,302.0, respectively. Indeed, for the quadriceps strength measurements, replacement of the three terms weight, height, and height^{2} with just the two terms log_{e}(weight) and log_{e}(height) further increased the maximum loglikelihood criterion to −2,612.4, explaining more variation in quadriceps strength with one less predictor. However, for the strength in the biceps, replacement of the three terms weight, height, and height^{2} with the two terms log_{e}(weight) and log_{e}(height) decreased the maximum loglikelihood criterion to −2,304.8. This reduction is not unreasonable considering that the model (Eq.2 ) requires one less predictor variable than the additive polynomial model (Eq. 1 ). Once again, these results support the use of the multiplicative allometric model (Eq. 2 ) when explaining the strength measurements of Round et al. (19) compared with the additive polynomial model (Eq.1 ). These results, together with the number of fitted parameters for each model, are summarized in Table4.
The multilevel regression analyses of the quadriceps and biceps strength measurements, based on the multiplicative allometric model (Eq. 2 ), are given in Table5. The predicted strength of the quadriceps and biceps is plotted in Figs. 3 and4, respectively.
The parsimonious solutions identified a significant increase in quadriceps and biceps strength explained by the developmental component in body size (weight and height), a sex difference, and an additional contribution identified by age and age^{2} components that were different for boys and girls (identified by the significant age ∗ sex interaction terms).
When the multiplicative model is compared with the equivalent additive model to predict the strength of quadriceps and biceps, the AIC criterion (1) provides evidence similar to the maximum loglikelihood criterion in favor of the multiplicative model.
DISCUSSION
When choosing multilevel modeling to explain the developmental changes in aerobic power in young athletes, BaxterJones et al. (2) argued that the use of ratio standards, such asV˙o _{2 max} per kilogram body mass, to “normalize” or control for growth may lead to spurious correlations, incorrect conclusions, and misinterpretation of the data [arguments originally proposed by Tanner (22)]. To avoid these problems, the authors argued that by choosing to analyze their data using the additive polynomial model (Eq.1 ) the contribution of developmental growth and maturation would be separated appropriately from other factors (the effects of different training methods in this instance).
A limitation of the additive polynomial approach is that the fitted model is valid only within the range of observations collected and may give absurd predictions outside that range. However, when considering the experimental design effects and other problems associated with scaling for growth and maturation, Nevill et al. (16) and Nevill and Holder (15) demonstrated the value of an important class of multiplicative models, often referred to as allometric or power function models, that provide more plausible solutions within and beyond the range of observations. For these models, the concept of a ratio is an integral part of the model form, and the variables and errors are assumed to be proportional and multiplicative, respectively. In addition, there is also evidence to suggest that, as children grow into adults, their leg volume increases in a greater proportion to their body mass,m ^{1.1} (13). To accommodate the effect of this possible disproportionate increase in leg muscle on performance variables, such as aerobic power and quadriceps strength, Nevill (12) suggested introducing “height” as well as “mass” as a continuous covariate to explain the developmental changes in such variables. For these reasons, the multiplicative allometric model (Eq.2 ) was chosen, within a multilevel structure, to explain the developmental changes in aerobic power and strength fromstudies 1 and2, respectively.
One possible method of analyzing growth data, such asV˙o _{2 max}and strength, from longitudinal studies might have been to fit ontogenetic allometric models to each subject’s data separately, allowing the mass exponent to vary from subject to subject, i.e., between subjects (4, 10). However, this approach requires a twostage analysis (within and between groups), in which the overall effect of the covariates (e.g., height, age, and maturation) and the correlation of intermediate statistics (the slope and intercept parameters from the individual allometric models) can only be partially accommodated at the secondstage (betweengroup) analysis. In contrast, by adopting the multiplicative allometric model (Eq.2 ) within a multilevel structure, we can adopt a onestage allometric analysis that is able to incorporate all covariates simultaneously and, in addition, is flexible enough to allow each subject to have his or her own individual body mass exponent, i.e., simply by fitting the variable, logtransformed body mass, as a random component at level 2. In fact, in the present study, when age and body mass were allowed to vary between subjects at level 2, the age term explained the most variation in the multilevel analyses ofV˙o _{2 max} and strength, as described in methods andresults.
On the basis of the maximum loglikelihood criterion, not only did the multiplicative allometric model provide a superior fit to the data from both studies compared with the additive polynomial model, but the model also provided a simpler and more plausible interpretation of the data. In study 1 the mass exponents for boys and girls are very close to the anticipated theoretical values,m ^{1/3}. The significant height exponents for boys and girls were 0.73 and 0.48, respectively, which might indicate a greater relative increase in muscle mass in relation to body mass (12).
The age parameters in Table 3, 0.0366 and 0.061 for boys and girls, respectively, were highly significant (for statistical accuracy, age was measured about an origin of 12 yr). This suggests that the aerobic power of the male and female young athletes was increasing at a greater proportion of their body size at 12 yr of age, a finding explained by their training status. At 12 yr of age, postpubertal boys had greater levels of aerobic power than prepubertal boys, indicating that biological age as well as chronological age was an important factor in the development of aerobic power. Furthermore, the aerobic power was increasing at a greater rate in male soccer players than in other male athletes at 12 yr of age (identified by the 0.0134 soccer ∗ age interaction term). Also, the aerobic power was increasing at a significantly greater rate at 12 yr of age in female swimmers than in female gymnasts and tennis players (indicated by the −0.0251 gymnast ∗ age and −0.0177 tennis ∗ age interaction terms).
The mass exponents in study 2 are a little more difficult to explain (i.e., 0.38 and 0.36 for quadriceps and biceps, respectively). However, because the two measures of strength are specific to a localized muscle group, using the entire body mass to control for developmental increases in body size is unlikely to be the most appropriate body size covariate. The height exponents, 1.23 and 1.06 for the strength in the quadriceps and biceps, respectively, may simply reflect a mechanical advantage of being taller, as reflected in the approximate linear contribution of the term “height” when strength is predicted.
The multilevel regression analysis in Table 5 identified a significant increase in quadriceps strength explained by the developmental component in body size (weight and height) plus a sex difference and an additional contribution identified by the age and age^{2} components. Interpreting the sex and age terms in Table 5 suggests that at PHV the sex differences in the strength of the quadriceps and biceps would appear to be equivalent to ∼3 yr of developmental growth for boys. The age contribution in the model for the girls’ quadriceps strength was negligible at the age of PHV, identified by the value of the age ∗ sex interaction term (−0.0441) that must be subtracted from the boys’ age parameter (0.0533), i.e., 0.0533 − 0.0441 = 0.0092. This suggests that at the age of PHV the girls’ quadriceps strength is developing at the same rate in proportion to their body size, whereas boys’ quadriceps strength is developing at a greater rate or disproportionately to their body size. Although not relevant to the present study, this finding was explained by the introduction of the hormone testosterone into the multilevel regression analysis (19).
Similarly, in the solution for biceps strength, given in Table 5, the gap between the boys’ and girls’ age parameter (−0.0717) is even greater, with the girls’ age term found to be negative (0.0483 − 0.0717 = −0.0234), i.e., the girls are becoming weaker with increasing years in the region of PHV, having already adjusted for the developmental changes in body size.
Hence, by adopting the multiplicative (proportional) allometric model (Eq. 2 ) in the multilevel regression analyses, further valuable insight into the developmental differences between boys’ and girls’ strength and aerobic power was obtained. Instudy 1 the highly significant age parameters from the boys’ and girls’ analyses of aerobic power indicate that, at 12 yr of age, aerobic power was increasing at a greater proportion to body size, a finding explained by the young athletes’ elite training status. However, the analysis of boys’ aerobic power identified an additional significant maturation indicator variable, late puberty (LPPP), but with no such indicator variable required for the girls’ analysis. A similar finding was observed instudy 2, with the significant age parameter indicating that boys’ strength at PHV (both quadriceps and biceps) was increasing at a greater rate relative to their body size. In contrast, the girls’ age parameter was negligible (quadriceps) or negative (biceps), indicating that the girls’ strength at PHV was increasing only “in proportion” to their body size and, in the case of the girls’ biceps strength, becoming marginally weaker relative to their body size. We can only speculate that these observed developmental differences in the boys’ and girls’ strength and aerobic power are due to hormonal factors such as testosterone.
Finally, although not essential, if a model provides plausible estimates of the dependent variable beyond the range of observation, the model is more acceptable on theoretical grounds. For example, instudy 1 the additive polynomial model would predict a female gymnasts’V˙o _{2 max} to be >1.0 l/min at birth (Table 5 and Fig. 3 in Ref. 2). Of course, this prediction is physiologically unsound. However, by using the multiplicative (proportional) allometric model (Eq.2 ), such a prediction would be impossible, since, as weight and height would tend toward zero, so too would the dependent variable V˙o _{2 max}.
Acknowledgments
The work in study 1 was supported by The (UK) Sports Council, Research Unit and the work instudy 2 by Action Research (UK).
Footnotes

Address for reprint requests: A. M. Nevill, Centre for Sport and Exercise Sciences, School of Human Sciences, Liverpool John Moores University, Byrom St., Liverpool L3 3AK, UK.
 Copyright © 1998 the American Physiological Society