Abstract
Batterham, Alan M., Keith Tolfrey, and Keith P. George.Nevill’s explanation of Kleiber’s 0.75 mass exponent: an artifact of collinearity problems in least squares models?J. Appl. Physiol. 82(2): 693–697, 1997.—Intraspecific allometric modeling (Y =a ⋅ mass^{b}, where Y is the physiological dependent variable and a is the proportionality coefficient) of peak oxygen uptake (V˙o _{2 peak}) has frequently revealed a mass exponent (b) greater than that predicted from dimensionality theory, approximating Kleiber’s 3/4 exponent for basal metabolic rate. Nevill (J. Appl. Physiol. 77: 2870–2873, 1994) proposed an explanation and a method that restores the inflated exponent to the anticipated 2/3. In human subjects, the method involves the addition of “stature” as a continuous predictor variable in a multiple loglinear regression model: ln Y = lna +c ⋅ ln stature +b ⋅ ln mass + ln ε, where c is the general body size exponent and ε is the error term. It is likely that serious collinearity confounds may adversely affect the reliability and validity of the model. The aim of this study was to critically examine Nevill’s method in modelingV˙o _{2 peak} in prepubertal, teenage, and adult men. A mean exponent of 0.81 (95% confidence interval, 0.65–0.97) was found when scaling by mass alone. Nevill’s method reduced the mean mass exponent to 0.67 (95% confidence interval, 0.44–0.9). However, variance inflation factors and tolerance for the logtransformed stature and mass variables exceeded published criteria for severe collinearity. Principal components analysis also diagnosed severe collinearity in two principal components, with condition indexes >30 and variance decomposition proportions exceeding 50% for two regression coefficients. The derived exponents may thus be numerically inaccurate and unstable. In conclusion, the restoration of the mean mass exponent to the anticipated 2/3 may be a fortuitous statistical artifact.
 allometry
 multiple regression
 loglinear models
recently, in the human sciences, there has been a renewed interest in the influence of body size on selected physiological measurements. Several authors (20, 21, 29) have demonstrated the statistical and physiological validity of allometric equations in modeling such relationships. Huxley’s general allometric equation (11)
Nevill (18) proposed an explanation for these findings derived from Alexander et al. (1), who found that larger mammals have a greater proportion of proximal leg muscle mass in relation to their total body mass (leg muscle mass proportional tom
^{1.1}). Nevill (18) suggested that, in humans, this would inflate the derived mass exponent because of the disproportionate increase in metabolically active musculature within the sample, resulting in a higherV˙o
_{2 max} than anticipated from body mass. To accommodate this confound when modeling physiological variables in human subjects, Nevill argued that “stature” be entered together with body mass in a multiple allometric regression model
As the theoretical basis for Nevill’s method (18) is derived from the work of Alexander et al. (1), it depends on the validity of stature as a proxy for body mass to accurately reflect “general body size” (6). Alexander et al. (1) did not provide data to evaluate the strength of the relationship between linear dimensions and mass in their sample. However, it can safely be assumed that, in humans, there is a strong relationship between stature and mass (3). Unfortunately, whereas Nevill’s method (18) depends on high collinearity between the two body size variables, paradoxically, this is its primary flaw. Two predictor variables are collinear with each other if the data vectors representing them lie on, or close to, the same line (22). If collinearity is severe, multiple allometric regression equations may be unstable and unreliable, and the exponents derived may be numerically inaccurate (4). It has been demonstrated that severe collinearity may even result in exponent sign changes, indicating a directional relationship contrary to the investigator’s knowledgeable expectations (17). Clearly, such concerns are especially important when attempting to distinguish within 95% confidence limits between 0.67 and 0.75 mass exponents, and they reduce confidence in the interpretation of least squares multipleregression models.
Berlin and Antman (5) identified three early warning signs for collinearity: large pairwise correlations between predictor variables, large changes in coefficients caused by the addition or deletion of other variables, and inflated SE values for coefficients. Mandel (16) stated that collinearity is the greatest problem encountered when using least squares regression models. It is curious, therefore, that despite the widespread use of multipleregression techniques physiologists in the exercise sciences have paid little attention to the diagnosis and treatment of collinearity. McGiffen et al. (17) have urged that collinearity diagnostics be calculated and reported for all applications of multiple least squares regression models. Although Nevill’s method (18) has been employed with apparent success in three previous studies, it is plausible that the restoration of the mass exponent to the “anticipated” 2/3 is a fortuitous statistical artifact resulting from collinearity confounds. The aim of this study was to critically examine Nevill’s explanation of Kleiber’s 3/4 mass exponent in modeling peak oxygen uptake (V˙o _{2 peak}) in prepubertal, teenage, and adult males.
METHODS
Subjects.
Twentysix prepubertal boys, mean age 11.0 ± 0.4 (SD) yr; 26 teenage boys, age 14.1 ± 0.3 yr; and 23 adults, age 22.4 ± 2.7 yr volunteered for the study. The prepubertal subjects were classified at pubic hair and genitalia stage 1, according to the criteria of Tanner (25). Maturational indexes were not secured for the teenage group. Institutional ethics approval for the project and written informed consent from all subjects (including parental consent for the children) were obtained. Subject characteristics are displayed in Table1. All subjects were habituated to test procedures before actual testing. Stature was measured to the nearest 0.005 m by using an Harpenden stadiometer. Body mass was assessed before exercise testing to the nearest 0.1 kg by using Avery beam balance scales. An approximately twofold mean body mass ratio was evident across the groups, with a size range of 29 kg (smallest prepubertal subject) to 103 kg (largest adult). A wide size range is vital to derive meaningful scaling expressions, and Calder (7) has urged that publication of size range should be mandatory in all allometry studies.
Measurement ofV˙o_{2 peak}.
V˙o _{2 peak} was determined via a discontinuous, incremental treadmill protocol. After a 5min warmup, the subjects began the test at the following speeds at zero incline: prepubertal subjects 1.94 m/s; teenage subjects 2.22 m/s; and adult subjects 2.78 m/s. After a belt speed of 2.78 m/s was attained, speed was held constant and grade increased by 2.5% increments for each 3min stage (stages separated by 1 min), to volitional exhaustion. Expired air was monitored throughout the test via online indirect calorimetry (Oxyconsigma, Mijnhardt). The system was calibrated before each testing session according to the manufacturers’ instructions. Endofstage oxygen uptake was determined from the last 30 s of each stage. Heart rate was monitored throughout by using a Rigel (Morden, UK) electrocardiogram. Criteria forV˙o _{2 peak} were1) a heart rate plateau before the final exercise intensity, or attainment of 95% of agepredicted maximum; and/or 2) a respiratory exchange ratio of 1.0 or above (24).
Allometric analyses. All analyses were carried out by using the statistical package SPSS 6.0 for Windows (SPSS, Chicago, IL). The allometric relationships betweenV˙o
_{2 peak} and body size variables (stature and body mass) were derived via log transformations of the absolute data. The general curvilinear allometric equationY =a ⋅ X ^{b}
can be linearized by taking natural logarithms of both sides: lnY = lna + blnX. The exponentb is simply the slope of the loglinear plot, and a is derived from the antilogue of the Y intercept. After first establishing commonality of slopes between groups (30), single exponents (separately for body mass and stature) common to all groups were fitted by including “group” as a class variable. This was achieved by creating two discrete dummy variables, “prepubertal” and “adult” (coded “1” for belonging to that group and “0” for not). The reference class (“teenage”) was thus represented by a coding of 0, 0
Collinearity diagnostics. All diagnostic procedures were carried out via SPSS 6.0 for Windows. To test for early warning signs for collinearity, a pairwise correlation (Pearson’s r) was performed between the logtransformed stature and mass variables. Preliminary diagnostics involved testing for tolerance and its reciprocal variance inflation factor (VIF). Tolerance represents the degree of overlap between the predictor variables and alerts the investigator to instability problems. It is defined as 1 −R ^{2}, whereR ^{2} represents the correlation of one predictor variable with the others in the model. Hence, low tolerance signifies variable redundancy. The VIF is a standardized and dimensionless measure of a regression coefficient’s contribution to the total variance of the coefficients. If predictor variables are orthogonal (uncorrelated), the VIF = 1. A VIF value in excess of 10 indicates severe collinearity (10).
A more sophisticated diagnosis was achieved via a principal components analysis of the standardized predictor variables. The first principal component represents the linear combination of predictor variables that explains the most variance within the set (17). Subsequent principal components are orthogonal (uncorrelated with) to those determined previously. Condition indexes (CI) and variance decomposition proportions (VDP) were calculated for each principal component. The VDP is defined as the percentage of the variability in a parameter estimate due to a specific principal component (17). Belsey et al. (4) suggested that a principal component with a CI >30 and a VDP of >50% for two or more regression coefficients indicates severe collinearity that should be corrected.
RESULTS
Testing for the commonality of slopes revealed no differences between groups (P > 0.05) for either body mass or stature. Multivariate loglinear regression revealed common slopes of 0.81 (SE = 0.08, 95% confidence interval 0.65–0.97;R ^{2} = 0.93,P < 0.05) for body mass and 2.33 (SE = 0.33, 95% confidence interval 1.68–2.98;R ^{2} = 0.91,P < 0.05) for stature for the three groups. Loglinear modeling including stature alongside mass as predictor variables (Eq.5 ) reduced the mass exponent to 0.67 (SE = 0.12, 95% confidence interval 0.44–0.9) and the stature exponent to 0.66 (SE = 0.4, 95% confidence interval −0.13–1.46). Analysis of the standardized regression coefficients (beta weights) indicated that only the contribution of the mass exponent was significant (P < 0.05).
Pairwise comparisons between logtransformed stature and mass revealed a strong positive correlation (r = 0.96, P < 0.05). For Nevill’s method (18) (Eq. 5 ), tolerance and VIF for stature and body mass were 0.08 and 11.5 and 0.07 and 13.7, respectively. The results of the principal components diagnostics are displayed in Table 2.
DISCUSSION
The values attained forV˙o _{2 peak}(l/min; Table 1) are consistent with previous treadmillderivedV˙o _{2 peak} in similar samples (2, 28). The mean mass exponent common to all three groups of 0.81 approximates the 3/4 mass exponent identified by Kleiber (14) for basal metabolic rate in a range of mammals. Similar common mass exponents for V˙o _{2 peak}have been reported previously for children and adults (2, 23, 28). These mass exponents appear higher than the 2/3 exponent anticipated from dimensionality theory and have led some investigators to postulate mechanisms to explain why theoretical principles have failed to provide an adequate account (18, 28). In the present study, however, it is noteworthy that the 95% confidence interval for the mass exponent includes both 2/3 and 3/4. It is, therefore, impossible to confidently reject dimensionality theory predictions. Many investigators fail to report confidence intervals for the derived mass exponents. This information is essential if meaningful interpretations are to be made. The confidence intervals for mass exponents forV˙o _{2 peak} in human studies appear wider than those cited in investigations with the use of animals. Taylor et al. (26) reported a mean mass exponent of 0.79 (95% confidence interval, 0.75–0.83) in a range of wild mammals for treadmillderivedV˙o _{2 peak}. This finding precludes the anticipated 2/3 exponent and indicates thatV˙o _{2 peak} scales similarly to basal metabolism.
If the true mass exponent forV˙o _{2 peak} in humans is >2/3, it is possible that the body size ranges commonly studied are insufficient to detect it within 95% confidence limits. Kleiber (15) required a ninefold size ratio to distinguish between the 3/4 and 2/3 mass exponents. Interestingly, the mean stature exponent in the present study of 2.33 (95% confidence interval, 1.68–2.98) also conformed to dimensionality theory predictions (V˙o _{2 peak} α mass^{2/3} α stature^{2}).
Despite the relatively wide confidence intervals, it is clear that the mean mass exponent more closely approximates 3/4 than the anticipated 2/3. Nevill’s method (18) of including stature as a continuous covariate to restore the mean mass exponent to 2/3 appears to have a sound physiological basis (1). The results of the collinearity diagnostics in the present study, however, indicate that the method may be numerically inaccurate, unreliable, and, therefore, invalid. Low tolerances and high VIFs were evident for both mass and stature predictor variables, with values exceeding published criteria for severe collinearity (10). Nevill’s approach (18) assumes that stature is an effective proxy for mass to reflect the “general size” of the body. This assumption is supported by the strong pairwise correlation between the logtransformed mass and stature variables (r = 0.96,P < 0.05). Unfortunately, however, this also alerts the investigator to potential collinearity problems (5) and indicates variable redundancy. With stature and mass included in the same multiple loglinear regression (Eq.5 ), the SE values for the exponents were inflated considerably, in comparison with those generated when modeling stature and mass separately (Eq. 4 ). Nevill’s method (18) also reduced the mean stature exponent dramatically from 2.33 to 0.66. Hence, the three early warning signs for collinearity problems identified by Berlin and Antman (5) are all present. Moreover, it is noteworthy that the analysis of the beta weights revealed that the stature exponent no longer contributed to the prediction of V˙o _{2 peak}(P > 0.05), even though stature has a strong positive bivariate correlation with the dependent variable. Inclusion of stature alone as a body size variable explained 91% of the variance inV˙o _{2 peak}. The lack of significance for the stature exponent in Eq.5 is due to its inflated variance and may lead to the erroneous conclusion that stature is not important in determiningV˙o _{2 peak}. In addition, Nevill’s method (18) provides a stature exponent that is physiologically and theoretically implausible.
The mass exponent was reduced from the original mean of 0.81 to 0.67, exactly the value predicted from theoretical considerations. Without due consideration of collinearity confounds, this would, again, appear to support Nevill’s method, adding a further study to those included in his metaanalysis (18). Due to variance inflation, however, the confidence interval for the “restored” 0.67 exponent was widened to 0.44–0.9, a range that includes 2/3, 3/4, and the original 0.81 exponent. Similarly, in the most recent of the three studies where the method has been applied (28), the confidence interval for the restored mean mass exponent of 0.71 (calculated from the SE provided) was 0.59–0.83. Notwithstanding the collinearity problems, the method is clearly unable to distinguish between theoretically predicted and empirically derived mass exponents within reasonable confidence limits.
The results of the collinearity diagnostics derived from the principal components analysis raise further doubts about the reliability and validity of Nevill’s method (18). Severe collinearity was detected, with values exceeding the criterion defined by Belsey et al. (4). CI values of 32.3 and 98.8, and VDPs >50% for two coefficients, were obtained in principal components 4 and5, respectively (Table 2).
The warning signs for collinearity, the preliminary diagnostics via tolerance and VIFs, and the specific principal components diagnostics, all indicate that little confidence can be placed in Nevill’s method (18). The overlap between stature and mass is such that including both in the model (Eq. 5 ) is a somewhat redundant method of indicating body size. Among their suggestions for coping with collinearity, Berlin and Antman (5) include removing redundant variables from the model and reducing reliance on interpretation of coefficients for confounding variables. Both these suggestions have serious implications for Nevill’s method (18). A further suggestion for coping with collinearity is to form a “summary” variable (5). With body size, this is a difficult task, as variables such as stature, mass, and surface area are expressed in different units and are highly interrelated. Indeed, the selection of an appropriate size variable has been described as the fundamental problem confronting all studies of allometry (12). As stated by Nevill (18), a fundamental assumption is that the physiological variable is influenced by active muscle mass, with body mass usually used as a proxy in the absence of muscle mass estimates. Although body mass and muscle mass are related (8), differences in body composition within samples can severely distort derived mass exponents (9). Where possible, therefore, measurements or estimates of involved musculature should be the scaling variable of choice (31).
Notwithstanding these considerations, with a large sample size and a body size range sufficient to overwhelm body composition variance (7), meaningful mass exponents may be derived, offering an advantage in practicality over sophisticated muscle mass measurement methods (27). The fact that the mean mass exponents reported in the literature frequently exceed theoretical predictions should not encourage a confirmatory bias in research. Recent attempts to account for apparent inconsistencies with theoretical predictions are a modern parallel to the process identified by Kleiber 50 years ago (14). The belief in the “surface law” for basal metabolism was so well established that empirical deviations were explained by particular conditions or measurement error. For example, when rabbits failed to comply with predicted daily heat production, their ear surface was removed from the model to restore the theoretical relationship (14).
The present study has demonstrated that Nevill’s (18) explanation of Kleiber’s 3/4 mass exponent is confounded by the severe collinearity problems detected in the multiple least squares regression model (Eq. 5 ). Empirical evidence in humans appears unable to distinguish between the 2/3 and 3/4 exponent for V˙o _{2 peak}, as both are supported by the data. Further research, using large sample sizes selected for body size heterogeneity, is required to more fully elucidate the relationship between body size andV˙o _{2 peak} in humans.
Footnotes

Address for reprint requests: A. M. Batterham, Dept. of Exercise and Sport Science, Manchester Metropolitan Univ., Crewe and Alsager Faculty, Hassall Rd., Alsager ST7 2HL, UK (Email:A.Batterham{at}mmu.ac.uk).
 Copyright © 1997 the American Physiological Society