Using molecular classification to predict gains in maximal aerobic capacity following endurance exercise training in humans

James A. Timmons, Steen Knudsen, Tuomo Rankinen, Lauren G. Koch, Mark Sarzynski, Thomas Jensen, Pernille Keller, Camilla Scheele, Niels B. J. Vollaard, Søren Nielsen, Thorbjörn Åkerström, Ormond A. MacDougald, Eva Jansson, Paul L. Greenhaff, Mark A. Tarnopolsky, Luc J. C. van Loon, Bente K. Pedersen, Carl Johan Sundberg, Claes Wahlestedt, Steven L. Britton, Claude Bouchard


A low maximal oxygen consumption (V̇o2max) is a strong risk factor for premature mortality. Supervised endurance exercise training increases V̇o2max with a very wide range of effectiveness in humans. Discovering the DNA variants that contribute to this heterogeneity typically requires substantial sample sizes. In the present study, we first use RNA expression profiling to produce a molecular classifier that predicts V̇o2max training response. We then hypothesized that the classifier genes would harbor DNA variants that contributed to the heterogeneous V̇o2max response. Two independent preintervention RNA expression data sets were generated (n = 41 gene chips) from subjects that underwent supervised endurance training: one identified and the second blindly validated an RNA expression signature that predicted change in V̇o2max (“predictor” genes). The HERITAGE Family Study (n = 473) was used for genotyping. We discovered a 29-RNA signature that predicted V̇o2max training response on a continuous scale; these genes contained ∼6 new single-nucleotide polymorphisms associated with gains in V̇o2max in the HERITAGE Family Study. Three of four novel candidate genes from the HERITAGE Family Study were confirmed as RNA predictor genes (i.e., “reciprocal” RNA validation of a quantitative trait locus genotype), enhancing the performance of the 29-RNA-based predictor. Notably, RNA abundance for the predictor genes was unchanged by exercise training, supporting the idea that expression was preset by genetic variation. Regression analysis yielded a model where 11 single-nucleotide polymorphisms explained 23% of the variance in gains in V̇o2max, corresponding to ∼50% of the estimated genetic variance for V̇o2max. In conclusion, combining RNA profiling with single-gene DNA marker association analysis yields a strongly validated molecular predictor with meaningful explanatory power. V̇o2max responses to endurance training can be predicted by measuring a ∼30-gene RNA expression signature in muscle prior to training. The general approach taken could accelerate the discovery of genetic biomarkers, sufficiently discrete for diagnostic purposes, for a range of physiological and pharmacological phenotypes in humans.

  • endurance training
  • genotype
  • personalized medicine

low exercise aerobic capacity is associated with an increased risk of metabolic and cardiovascular disease, as well as premature death. Exercise capacity, in prospective follow-up analyses, is a stronger predictor of morbidity and mortality than other established risk factors, such as hypertension or diabetes (5, 6, 19, 29, 37). A notable observation in the search for relevant mechanisms that connect aerobic capacity with disease is that humans can increase peak oxidative power through regular exercise with a degree of heterogeneity ranging from 0% to >100%, and thus some subjects are unable to improve their aerobic capacity (7, 34, 54). Maximal aerobic capacity is commonly thought to be limited by maximal delivery of oxygen to the periphery and, hence, by cardiac function (44). This being the case, it is somewhat counterintuitive that baseline aerobic capacity neither positively nor negatively associates with the gains in exercise training-induced maximal aerobic power. The estimated heritability for gains in aerobic capacity in response to endurance training is ∼50% (10, 11). Discovery of the genetic contribution to the heterogeneous response to training (20, 50) may provide the greatest opportunity to contribute to a theoretical molecular framework to link determinants of oxidative capacity to disease pathways and to discover the molecular determinants of this critical physiological system.

We presume that at least part of the heterogeneity in adaptation to exercise therapy originates from variation in gene sequence that influences the complex biological networks mediating the response to an aerobic exercise-training stimulus. Identification of genomic markers for complex traits in humans has required enormous sample sizes, and each single-nucleotide polymorphism (SNP) seems to contribute only weakly, at least for chronic complex human diseases (16). For example, after genome-wide association analysis in type 2 diabetes patients, 18 robust SNPs explain <7% of the total disease variance (46). Enrichment of disease-associated gene networks, generated from RNA expression data, with genotyping data has successfully enhanced the explanatory power of the analysis (13). However, the opposite strategy, where an RNA expression-based molecular classifier (28) is used to locate or rank a discrete number of genes for subsequent targeted genotyping and, hence, discovery of key genetic variants, has not been attempted to our knowledge. We believe that generation of an independently validated RNA-based classifier and then identification of genetic variants in those classifier genes in an independent study not only further validates the molecular classifier, but also exemplifies a strategy to enhance the statistical power of genetic biomarker studies. Thus we hypothesized that baseline RNA expression profiling would identify genes with relevant genetic variants that relate to the large intersubject trait variation observed in humans. Here we apply this rationale to study maximal aerobic power and identify and blindly validate a novel gene signature that predicts the magnitude of adaptation to supervised exercise-training programs.


Three independent clinical studies contributed to this project. The first was used to generate an RNA-based molecular classifier, the second to independently validate the RNA classifier, and the third to provide the link between the classifier and genetic variation (Fig. 1). Each clinical study represents a variation on a typical supervised endurance-training program, with subjects of different levels of physical fitness helping establish that our results can be applied broadly to various types of aerobic exercise therapy and subjects.

Fig. 1.

An overview of the basic research strategy, along with the approximate sample sizes, used in this study. SNP, single-nucleotide polymorphism.

Group 1.

Twenty-four young sedentary healthy Caucasian men (see supplemental Table S1 in the online version of this article) volunteered to take part in the study. Body mass did not change during the study period (78.6 ± 2.7 vs. 78.8 ± 2.6 kg). Mean resting blood pressure (systolic/diastolic) and heart rate were 126/72 mmHg and 66 ± 3 beats/min, respectively. Subjects did not undertake any regular sporting activities in the 6 mo prior to the study and abstained from any strenuous exercise during the 3 wk before pretraining muscle biopsies were obtained. Each of the volunteers provided written informed consent. The study was approved by the Ethics Committee at the Karolinska Institutet. Subjects performed a fully supervised 6-wk training program consisting of four 45-min cycling sessions per week at an intensity customized to 70% of pretraining maximal oxygen consumption (V̇o2max; 100% compliance). Physiological measurements and vastus lateralis muscle biopsies were taken prior to the training program and 24 h after the last training session, as previously described (49, 50). V̇o2max was determined during an exhaustive incremental cycling test (Rodby), with continuous analysis of respiratory gases using a SensorMedics ventilator. At V̇o2max, the respiratory exchange ratio exceeded 1.10 on all occasions. The clinical study was approved by the Ethics Committee at the Karolinska Institutet.

Group 2.

Seventeen young active Caucasian subjects (see supplemental Table S2) volunteered to train on a cycle ergometer (model 839E, Monark, Varberg, Sweden) five times a week for 12 wk. The training load was incrementally customized during the study, such that these active/trained subjects trained at a higher intensity and volume than group 1 subjects. As part of the training, subjects performed a peak power (Pmax) test every Monday to determine the intensity of the training for the following days. The Pmax test was performed the same way as the V̇o2max test without measurement of oxygen consumption (V̇o2). On Tuesdays, the training consisted of ten 3-min intervals at 85% Pmax separated by 3-min intervals at 40% Pmax. On the next day, the training consisted of 60 min at 60% Pmax. On Thursdays, subjects performed five 8-min intervals at 75% Pmax separated by 4-min intervals at 40% Pmax. On Fridays, subjects cycled for 120 min at 55% Pmax continuously. During the first 6 wk, the duration of each training session was increased by 5% each week; during the last 6 wk, the duration remained stable, and the relative intensity was increased 1% per week. This protocol represents the upper limit of intervention one could expect to apply to a general population of healthy and fit subjects. Since our aim was to find genes that contribute universally to aerobic adaptation, we believed that genes deemed regulatory during constant-load (group 1) and interval-based (group 2) aerobic training would be the best candidates for genotyping. The compliance to training was ∼100%. The study was approved by the Ethical Committee of Copenhagen and Frederiksberg Communities, Denmark, and performed according to the Declaration of Helsinki.

HERITAGE Family Study aerobic-training program.

The study cohort in this analysis consists of 473 Caucasian subjects (230 male and 243 female) from 99 nuclear families who completed ≥58 of the prescribed 60 exercise-training sessions. The study design and inclusion criteria have been described previously (9) and were established before group 1 or 2 training formats. To be eligible, the individuals were required to be in good health, i.e., free of diabetes, cardiovascular diseases, or other chronic diseases that would prevent their participation in an exercise-training program. Subjects were also required to be sedentary, defined as not having engaged in regular physical activity over the previous 6 mo. Individuals with resting systolic blood pressure >159 mmHg and/or diastolic blood pressure >99 mmHg or taking medication for hypertension, dyslipoproteinemia, or hyperglycemia were excluded. Other exclusion criteria are fully described in a previous publication (9). The prevalence of overweight and obesity was 30.8% and 19.3%, respectively. The study protocol had been approved by each of the Institutional Review Boards of the HERITAGE Family Study research consortium. Written informed consent was obtained from each participant.

The exercise intensity of the 20-wk program was customized for each participant based on the heart rate (HR)-V̇o2 relationship measured at baseline (11). During the first 2 wk, the subjects exercised at a HR corresponding to 55% of the baseline V̇o2max for 30 min per session. Duration and intensity of the sessions were gradually increased to 50 min and 75% of the HR associated with baseline V̇o2max, which were then sustained for the last 6 wk. Frequency of sessions was three times per week, and all exercise was performed on cycle ergometers in the laboratory. HR was monitored during all training sessions by a computerized cycle ergometer system (Universal FitNet System) that adjusted ergometer resistance to maintain the target HR. Trained exercise specialists supervised all exercise sessions. Before and after the 20-wk training program, each subject completed three cycle ergometer (Ergo-Metrics 800S, SensorMedics, Yorba Linda, CA) exercise tests conducted on separate days: a maximal exercise test (Max), a submaximal exercise test (Submax), and a submaximal/maximal exercise test (Submax/Max). The Max test started at 50 W for 3 min, and the power output was increased by 25 W every 2 min thereafter to the point of exhaustion. For older, smaller, or less fit subjects, the test was started at 40 W and increased by 10- to 20-W increments. On the basis of the results of the Max test, the Submax test was performed at 50 W and 60% of the initial V̇o2max. Finally, the Submax/Max test was started with the Submax protocol and progressed to a maximal level of exertion. For all tests, V̇o2, CO2 production, expiratory minute ventilation, and tidal volume were determined every 20 s and reported as a rolling average of the three most recent 20-s values. All the respiratory phenotypes were measured using a SensorMedics 2900 metabolic measurement cart. V̇o2max was defined as the mean of the highest V̇o2 values determined on each of the maximal tests or the higher of the two values if they differed by >5%.

Affymetrix microarray process.

Total RNA was extracted from frozen muscle using TRIzol reagent. Frozen pieces were homogenized for 60 s in 1 ml of TRIzol using a 7-mm Polytron aggregate (model PT-DA 2107, Kinematica) adapted to a Polytron homogenizer (model PT-2100) running at maximum speed. RNA concentration and quality were controlled using a NanoDrop spectrophotometer and a bioanalyzer. In vitro transcription (IVT) was made using the Bioarray high-yield RNA transcript labeling kit (P/N 900182, Affymetrix). Unincorporated nucleotides from the IVT reaction were removed using the RNeasy column (Qiagen). Group 2 IVT was performed using the MessageAmp II biotin-enhanced aRNA kit (Ambion). The effect of the IVT kit was assessed by processing two samples with the Affymetrix kit used for group 1. Hybridization, washing, staining, and scanning of the arrays were performed according to the manufacturer's instructions (Affymetrix, As a means to control the quality of the individual arrays, all were examined using hierarchical clustering and quality control calculations from Bioconductor (e.g., NUSE) to identify outliers prior to statistical analysis, in addition to the standard quality assessments, including scaling factors and housekeeper 5′-to-3′ ratios. The raw data for the predictor data set have been submitted to the Gene Expression Omnibus database (GSE18583). The remaining files will be deposited at GEO following a second article, where we examine the training-responsive muscle mRNA transcriptome (TRT) in detail and contrast the responses across high- and low-responder subjects.

General array analysis methods.

The microarray data (GSE18583) were subjected to global normalization using MAS5.0, and present-absent calls were used to improve the sensitivity of the differential gene expression analysis by improving the power while potentially removing some genuinely expressed genes (14). We chose to retain probe sets with ≥25% of the chips declaring a “present” detection, on the basis that there will be subject-to-subject variability and that some genes may be only expressed before or following training. The normalized log2 file was analyzed with the Significance Analysis of Microarray (SAM) in R (∼tibs/SAM/) (20). SAM provides an estimate of the false discovery rate (FDR), which represents the percentage of genes that could be identified by chance, and is comparable to a P value corrected for the number of initial comparisons, a process called multiple testing correction. For the data presented here, genes were considered significantly changed following training, when a delta value corresponding to the number of false significant genes of 5% (q value) and an average fold change of 1.5 were achieved. We previously demonstrated that it can be difficult to predict the impact of applying arbitrary filtering criteria prior to statistical analysis (30). We therefore also relied on several statistical models to analyze and interpret our data. For example, we used the web-based bioinformatics tool INGENUITY pathway analysis (, which is based on >1.7 million published PubMed articles and the web-based tool DAVID (Database for Annotation, Visualization, and Integrated Discovery) (15, 25).

Using baseline array data to produce a quantitative predictor of response to training.

A quantitative predictor of response to training was developed by correlating measured training-induced change in V̇o2max to RNA expression level of genes in a muscle biopsy obtained prior to training. CEL files were logit-normalized, and an RNA expression index was calculated using model-based analysis of oligonucleotide arrays (32). The probe weights, determined for the training set files, were reused for the test set to increase comparability. To calculate the correlation between V̇o2max response and gene RNA expression for a given gene or probe set, the Pearson correlation for each perfect-match probe in the probe set was calculated, and only the median correlation was retained. If the median correlation of the selected “probe” exceeded 0.3, the entire probe set was retained as originally correlated. We have found that this two-step process helps compensate for the multiple comparisons being made. Correlated probe sets were identified 24 times on the 24-sample training set, each time with 1 sample left out of the calculation. Probe sets were ranked according to the number of times out of 24 (the number of subjects) they were selected as having a median correlation >0.3 (the median value reflecting the probe-level data). The top 29 genes, selected ≥22 of 24 times, gave the best correlation to V̇o2max on the training set (group 1). The sum of the RNA expression indexes was calculated and was evaluated using the independent test gene chip data sets derived from group 2 samples. The logit-normalized model-based RNA expression index (33) values for each of the 29 genes were centered (around 1), and the index was scaled, using all genes, over the 24 subjects in group 1. The RNA expression value of the 29 genes was then determined in group 2, and the sum of the expression of the 29 genes was correlated to the measured change in V̇o2max by an observer blinded to sample identity. Genes were clustered and plotted using the Heatplus package of Bioconductor ( To allow comparison between cohorts with a different baseline V̇o2max, we used the percent change in V̇o2max. Finally, for genes and SNPs identified in the HERITAGE Family Study (see below), we validated the genetic association data using RNA expression-based correlation analysis in the (group 2) blind validation data set.

Genotype validation of the RNA expression-based predictor.

Linkage disequilibrium (LD) cluster-tagging SNPs (tagSNPs) were selected from the Caucasian data set of the International HapMap consortium (data release 23, March 2008). Target areas for the SNP selection were defined as the coding region of the gene plus 20 kb upstream of the 5′ end and 10 kb downstream of the 3′ end of the gene. TagSNPs were selected using the pairwise algorithm of the Tagger program (45). Minor allele frequency was required to be >10%, and the pairwise LD threshold for the LD clusters was set to r2 ≥ 0.80.

Genomic DNA was prepared from permanent lymphoblastoid cells by a commercial DNA extraction kit (Gentra Systems, Minneapolis, MN). The tagSNPs were genotyped using Illumina (San Diego, CA) GoldenGate chemistry and Sentrix Array Matrix technology on the BeadStation 500GX. Genotype calling was done with Illumina BeadStudio software, and each call was confirmed manually. For quality control purposes, each 96-sample array matrix included 1 sample in duplicate, and 47 samples were genotyped in duplicate on different arrays. In addition, six Centre d'Etude du Polymorphisme Humain control DNA samples (NA10851, NA10854, NA10857, NA10859, NA10860, and NA10861; all samples included in the HapMap Caucasian panel) were genotyped. Concordance between the replicates, as well as with the SNP genotypes from the HapMap database, was 100%.

A χ2 test was used to verify whether the observed genotype frequencies were in Hardy-Weinberg equilibrium. Associations between the individual tagSNPs and cardiorespiratory fitness phenotypes were analyzed using a variance components and likelihood ratio test-based procedure in the quantitative trait disequilibrium test (QTDT) software package (1). The total association model of QTDT software utilizes a variance-components framework to combine the phenotypic means model and the estimates of additive genetic, residual genetic, and residual environmental variances from a variance-covariance matrix into a single likelihood model. The evidence of association is evaluated by maximizing the likelihoods under two conditions: the null hypothesis (L0) restricts the additive genetic effect of the marker locus to zero (βa = 0), whereas the alternative hypothesis does not impose restrictions on βa. The quantity of twice the difference of the log likelihoods between the alternative and the null hypotheses {2[ln(L1) − ln(L0)]} is distributed as χ2 with 1 degree of freedom (difference in number of parameters estimated). V̇o2max training responses were adjusted for age, sex, baseline body weight, and baseline value of V̇o2max. Differences in allele and genotype frequencies between top and bottom quartiles of V̇o2max training response distribution (defined using sex- and generation-specific quartile cutoffs) were tested using the case-control procedure (Proc Casecontrol) of the SAS version 9.1 Statistical Software package. Finally, the total contribution of the SNPs on V̇o2max training response was tested using multivariate regression analysis. A backward-elimination process was used to filter out redundant SNPs due to strong pairwise LD. Then the SNPs retained by the backward-elimination model were analyzed using a stepwise regression model.


Generating a robust molecular profile of aerobic-training adaptation.

We established a global muscle RNA expression profile in vastus lateralis in group 1 (young sedentary) subjects before and after 6 wk of intense aerobic exercise training. Preintervention V̇o2max was 3.7 ± 0.6 l/min. After training, V̇o2max increased on average by ∼14% (range −2.8% to 27.5%), while exercise respiratory exchange ratio decreased by ∼10% (at 70% of preexercise V̇o2max), indicating that we had a greater reliance on lipid substrates, as previously reported (54) (Fig. 2, A and B). With use of routinely measured physiological and biochemical biomarkers, it was not possible to predict whether an individual has a high or low potential for improved maximal aerobic capacity (54). Using significance analysis of microarrays (SAM) (52), 1,000 probe sets representing ∼800 genes (FDR <5%, >1.5-fold change) were differentially regulated during exercise training-induced aerobic adaptation [referred to hereafter as the training-responsive transcriptome (TRT); see page 1 of supplemental data]. Gene ontology analysis demonstrated that the TRT generated using the U133 Plus 2.0 platform produced a data set with biological features very similar to those of an earlier study using the U95A-E platform (50), providing confirmation of the reliability of the technology platform and the present data set. Downregulated genes had a higher baseline mean expression than upregulated genes, an interesting observation that appears logical for a dynamically remodeling tissue undergoing a transitional phase. Gene set enrichment analysis, a statistical technique that relies on the principle that genes act in a coordinated (group) manner, and not in isolation, found an upregulation in the oxidative phosphorylation (OXPHOS) gene set (FDR = 1.1%), consistent with the increased lipid oxidation profile. However, few individual OXPHOS genes were found to be changed when examined individually, consistent with the idea that, unlike mice, OXPHOS genes are not highly regulated in vivo in human skeletal muscle (51). The data set demonstrates that skeletal muscle adaptation reflects a distributed and a coordinated enhancement in gene networks. The TRT represents the molecular signature of ongoing and long-term aerobic adaptation, because <5% of these genes are acutely influenced by a single bout of endurance training (27, 36).

Fig. 2.

A and B: aerobic capacity and substrate oxidation characteristics of young sedentary human subjects before and after 6 wk of aerobic exercise training. Maximum oxygen consumption (V̇o2max) increased 14% (P < 0.0001) and submaximal exercise respiratory exchange ratio (RER) decreased, indicating greater reliance on lipid oxidation in vivo (P < 0.0001). Insets: geometric means ± SE pre- and posttraining. C and D: molecular predictor for aerobic adaptation to exercise training in humans involving 29 Affymetrix U133 Plus 2.0 probe sets. Logit-transformed normalized probe set intensities are correlated with ml/min increase in aerobic capacity (V̇o2max). C: green indicates higher gene expression, while red indicates lower gene expression. D: linear relationship between the sum of the expression of the molecular predictor genes and changes in aerobic capacity with endurance training derived from young sedentary subjects following 6 wk of constant-load aerobic exercise training (n = 24, r2 = 0.58, P < 0.00001).

Development and validation of a quantitative predictor of response to training.

With gain in aerobic capacity (ΔV̇o2max) as our response parameter, a quantitative molecular classifier (predictor) was developed by relating ΔV̇o2max to baseline muscle gene RNA expression in group 1 subjects. This analysis was repeated using a leave-one-out strategy and identified 29 significant Affymetrix probe sets (predictor genes; Fig. 2C). The sum of the RNA expression values of these 29 genes (Σ29predict-RNA), each of which ranked highly ≥22 times during the correlation analysis, performed best in predicting ΔV̇o2max in group 1 subjects (Fig. 2D). No further optimization was attempted. The predictor genes formed a high-scoring INGENUITY network (score = 46), characterized as “development-related” (P < 0.001).

To provide validation that this molecular classifier would predict the plasticity of aerobic capacity under diverse circumstances, it was tested in a blinded manner in an independent study (group 2). Affymetrix profiles were generated from pretraining muscle biopsy samples taken from group 2 (preintervention V̇o2max = 4.1 ± 0.5 l/min). These young physically active subjects underwent 10 wk of intense interval-based aerobic-training, increasing their V̇o2max, on average, by the same amount as group 1 (∼14%) and with the same heterogeneity. The Σ29predict-RNA significantly correlated to the change in V̇o2max in the blind validation group (see supplemental Fig. S1a). Thus we demonstrated that the molecular classifier can be applied to human subjects with a wide range in aerobic fitness capacities and confirmed the validity of the gene selection process. The Σ29predict-RNA genes were then used to assist in the identification of the key genetic variants predicting the ability to alter aerobic capacity in response to exercise training.

Validation of molecular classifier using genetic association analysis.

A new analysis of the HERITAGE Family Study (n = 473) was carried out using ∼300 tagSNPs for the 29 predictor probe sets. Sedentary subjects from 99 nuclear families were trained for 20 wk with a fully standardized and monitored exercise program. The mean gain in V̇o2max was similar to that in the studies described above (∼400 ml O2), with a standard deviation of ∼200 ml O2. With use of a model-fitting procedure, the heritability of the changes in V̇o2max was calculated to be 47% (7), and thus our genotyping analysis could, at most, expect to capture ∼47% of the total variance in the gains in maximal aerobic capacity. We were able to identify six genes from the predictor signature set that harbored genetic variants associated with gains in aerobic capacity (P < 0.01 for each). When the upper quartile of the V̇o2max response distribution was compared with the lower quartile, SNPs in SMTNL2, DEPDC6, SLC22A3, METTL3, and BTNL9 differed in genotype or allele frequencies (see supplemental data). In addition, in the comparison of the V̇o2max response by genotype for the entire HERITAGE Family Study population, a variant in ID3 was evidenced (rs11574; P = 0.0058). With use of all relevant genetic variants identified (see supplemental Table S4) from the molecular predictor (n = 25) plus those from ongoing quantitative trait locus (QTL) and candidate gene studies within the HERITAGE project (n = 10), a stepwise regression model was applied using the residual V̇o2max responses, adjusted for major confounding variables. The results were striking: 11 SNPs captured 23% of the total variance in aerobic capacity responses (Table 1). Reciprocal analysis, where HERITAGE-derived genes retained in the model were tested in the RNA expression model, independently validated three of four of the HERITAGE genes. Furthermore, addition of SVIL and NRP2 yielded an improved correlation coefficient (r2 = 0.60) and/or stronger P value (P = 0.009) for the RNA-based blind validation data set (group 2; see supplemental Fig. S2b). Interestingly, MIPEP RNA expression was negatively correlated (r2 = −0.64, P = 0.0051) but did not worsen or improve the performance of the tissue-based classifiers.

View this table:
Table 1.

Stepwise regression model for standardized residuals of maximal oxygen consumption training response in the HERITAGE Family Study


It is reasonable to state that molecular classifiers (predictors) will be essential for implementing personalized medicine, yet there are limited examples of validated predictors that are able to tailor interventions relevant to the most pressing factors impacting on public health. Rather, those that have been discovered typically relate to rarer single-gene polymorphisms and selective drug toxicity. In the present study, we produce and validate a molecular classifier that tracks a much more complex phenotype, i.e., aerobic capacity changes in humans subjected to endurance exercise training. While the RNA predictor represents a significant development in its own right, generating the RNA predictor also led us (by identifying candidate genes for genotyping) to discover ∼50% of the estimated genetic variance for the V̇o2max gains following aerobic training in humans. The outcome of our study suggests that using RNA-based molecular classification to direct targeted genotyping can be a powerful and new strategy for studying the heterogeneous responses of additional physiological phenotypes.

Regulators of cardiovascular adaptation to endurance training.

Maximal aerobic power is arguably one of the most important physiological properties that can be manipulated to prevent cardiovascular disease (55, 57). An association between maximal oxidative metabolism and health may be linked, to some extent, by the notion that evolution of multicellular organisms was afforded by the steep thermodynamic gradient of an oxygen-enriched atmosphere (4, 43). Under this scenario, evolving biochemical strategies, aimed at coping with the obligatory reactive nature of oxygen chemistry (47), would have firmly shaped our genome in a manner that is still obscure. In the present study, an important observation is that the molecular classifier genes have been associated with tissue development, oxidative gene activation, and imprinting (see below), implying that aerobic exercise adaptability may be reliant on or predetermined by molecular mechanisms responsible for regulating organogenesis and angiogenesis, in particular, mechanisms clearly linked to the sensing and delivery of molecular oxygen. Indeed, developmental pathways have been associated with cardiovascular disease (40) and metabolic syndrome (17), providing support for the likelihood that this association could be informative.

Much consideration has been given to the biochemical signals that drive adaptation in response to exercise training, with the aim to better understand the health benefits of physical activity. Mutation and knockout of candidate genes, such as AMP-activated protein kinase (AMPK) and peroxisome proliferator-activated receptor-γ coactivator-1α (PGC-1α), have been postulated as causal regulators of muscle adaptation (2, 21, 26, 31). However, these genes do not appear critical for exercise-induced endurance-training adaptation in terms of muscle biochemistry or performance (2, 21, 26, 31), also suggesting that single-gene knockouts do not necessarily facilitate physiological understanding. In humans, when a global analysis was carried out on the TRT in this study or previous studies (23, 24, 27, 42, 50), there was little evidence generated suggesting that AMPK or PGC-1α links to the gene networks that characterize muscle phenotype changes occurring with endurance training (e.g., gene set enrichment analysis did not identify any PGC-1α-related transcription factors in the TRT). Furthermore, we could not demonstrate a link between metabolic homeostasis, directly measured in vivo in human muscle for the first time at the training intensity, and subsequent changes in aerobic capacity (54).

There have been many acute exercise canonical pathway signaling studies in humans (3, 22, 41, 56) also aimed at discovering the molecules responsible for “translating” acute exercise effects into long-term exercise adaptation. However, a pathway analysis of the acute response to endurance exercise (36) appears to suggest that the immediate consequences reflect molecular responses to short-term energy and ionic homeostasis challenges. Ontological analysis of the TRT (see supplemental data) supports the idea that extracellular matrix and major histocompatability complex modulation and calcium signaling may be the potential transducers of cardiovascular adaptation in humans, connecting mechanical work to muscle tissue adaptation during endurance training. These observations suggest that acutely regulated molecules, such as AMPK and IL-6, are not related to the adaptive process. Rather, modulation of these molecules may be more related to recovery of short-term energy homeostasis within the muscle or reflect a deregulation of muscle molecular phenotype when the cell is exposed to an acute energy “crisis.” A further point for consideration is that, given the very large range of training-induced adaptation noted in humans (34), one would need to establish, in each acute exercise study, whether the responses measured following acute exercise actually lead to, or associate with, a measure of chronic adaptation of the physiological system being studied. Indeed, it is very clear that many of the ∼800 TRT genes are equally regulated in high and low responders, an analysis that highlights the idea that only a subset of genes may “set” or limit the potential for physiological adaptation. However, rarely, for any physiological system, will this “minimum” set be fewer than two genes, and thus many “key” genes will be required to coordinate adaptation.

Do the predictors of cardiovascular adaptation to endurance training help identify causal regulators?

The present study highlights an alternative approach to transgenic models for producing candidate mechanisms for subsequent hypothesis testing, an approach that is reliant on in vivo data generated under conditions relevant for human disease and physiology. For example, the biological profile of the TRT is robust (reproduced across different chip generations and biological samples) and provides new candidates that regulate training adaptation. However, it is challenging to establish a direct cause-effect relationship between these ∼800 RNA changes and physiological adaptation. In contrast, the evidence for a causal role for the 29 predictor genes and, in particular, the 11 SNP genes that captured ∼50% of the estimated variance attributable to genetic differences can be considered much stronger and, thus, worthy of some detailed discussion.

First, the genotype-RNA associations appear to be driven by factors (including genetic variation) independent of chronic exercise training, as the RNA expression levels of the predictor genes (which capture ∼50% of the total variance for changes in maximal aerobic capacity) were stable to exercise training. Furthermore, the RNA expression was remarkably stable to exercise in low and high responders (Fig. 3). Thus the largely higher “preset” levels of the 29 predictor genes appear to offer high responders a greater potential to activate some components of the TRT. While the vast majority of the genes within the predictor gene set are not well studied, especially in the context of cardiovascular biology, some do provide obvious links to cardiovascular adaptation and proposed signaling molecules involved in cardiovascular adaptation and cardiovascular disease (Fig. 4A). Network analysis provides additional linking molecules, some of which are regulated by exercise (Fig. 4B). For example, ID3 is a TGFβ1- and superoxide-regulated gene that interacts (39) with another member of the baseline predictor (KLF4) and appears essential for angiogenesis (35). Transforming growth factor-β (TGFβ) signaling is important for any tissue remodeling, and, clearly, genes that alter the potential for a safe and controlled proangiogenesis are highly relevant for expansion of maximal oxygen transport capacity.

Fig. 3.

Skeletal muscle expression of all the predictor genes (based on RNA), which together explain ∼58% of the variance in exercise training-induced changes in V̇o2max in young sedentary human subjects (n = 24) before and after 6 wk of aerobic exercise training. Top: high responders following 6 wk of aerobic training; bottom: low responders following 6 wk of aerobic training. Expression values are generated using Affymetrix U133 Plus 2.0 gene chips (>47,000 transcripts), normalized using MAS5.0, and corrected for multiple comparisons using the significance of microarrays analysis methodology. This analysis strategy avoids issues with unstable housekeeping genes. Only H19, KLF4, OCT3, SMTNL2, and BTNL9 demonstrated a modest change in expression below a false discovery rate of 5% (∗), and not consistently in high and low responders. All other genes (>90% of predictor genes) were unchanged.

Fig. 4.

Genes identified as being able to predict which subjects demonstrate high response to exercise training formed a gene network found to be related to development. A: predictor genes (gray symbols with blue labeling) are not regulated by exercise (Table 1). Additional genes within the network represent genes known to interact with the predictor genes. B: genes shown in red are those that are increased by exercise training in group 1 (while those shown in blue are decreased). This demonstrates that most of the predictor genes, and many of those with which they interact in this analysis, are not responsive to regular exercise. Several members of the network are, however, known to be regulated by exercise at the protein level.

In addition, the imprinted transcript SLC22A3 (OCT3), which harbored genetic variation associated with training response (P = 0.0047), is part of the Air noncoding RNA imprinted locus mechanism, which interacts (38) with another of our predictor genes, H19. This suggests that our predictor genes may participate in the regulation of imprinting and that the mechanisms that link aerobic capacity and cardiovascular-metabolic disease may share common features with developmental processes (18, 53). Interestingly, OCT3 and H19 were the predictor genes most differentially regulated between high and low responders (Fig. 3). OCT3 RNA appears downregulated in only high responders, while H19 RNA is unchanged. In contrast, OCT3 RNA is unchanged in the low responders, while H19 RNA is significantly upregulated (FDR <5%). Whether this relates to the longer-term mechanisms operational during imprinting and, hence, regulation of insulin-like growth factor II (IGF2) signaling is unclear but merits detailed analysis (Fig. 4B). However, we would still caution against the overinterpretation that this discrete list of genes alone determines aerobic adaptation but, rather, prefer to assume that they appear to be important and may directly or indirectly influence many of the genes we show to be regulated by exercise (e.g., TRT). Briefly, while we can generate a molecular classifier for the purpose of diagnosis or for randomization in future hypothesis-testing studies, it is unclear whether any gene can be ruled out from playing at least a minor role in the complex in vivo process of aerobic adaptation, even the much emphasized AMPK, PGC-1α, and IL-6 systems.

Conclusions and Future Perspectives

Despite the use of baseline physiological data to personalize the clinical intervention in all three studies, ∼20% of subjects demonstrated <5% improvement in maximal aerobic capacity. It should be noted that our molecular screen for exercise responsiveness worked across diverse situations (short- and long-term moderate-intensity aerobic-training and interval-based maximal exercise-training regimens), such that failure to adapt to exercise is a generalized observation and is not model-specific. The fact that we can forecast gains in aerobic capacity using a peripheral gene signature (validated using 2 independent clinical cohorts) demonstrates that RNA profiling captures critical information and is able to inform about an integrated physiological processes. If we consider the strong relationships between aerobic capacity and morbidity and mortality, the approach defined in this study has the potential to provide insight into the role of regular exercise and its molecular determinants in metabolic control of glucose and lipid metabolism, blood pressure regulation, and inflammation in humans.

Given that ∼20% of subjects fail to improve aerobic capacity with intense endurance training, while ∼30% do not enhance their insulin sensitivity (12), it is clear that we must develop genotype-tailored lifestyle interventions. The present data enable us to test the efficacy of a molecular screen for exercise responses (8), so that, for example, low responders for V̇o2max can be randomized to alternative interventions to establish if they are low responders for additional parameters, such as changes in postexercise insulin sensitivity. Such an application of our data is consistent with the idea of using personalized medicine (48) to tailor lifestyle modification, and, hence, we believe that we provide a significant new milestone for the field. For those individuals with an impaired ability to improve or even maintain their aerobic capacity with exercise therapy, we may need to promote alternative exercise intervention paradigms or offer more intensive pharmacological and dietary protocols to help compensate for their genomic profile. Identification of individuals with a particular risk profile is the first step toward the implementation of personalized medicine.


J. A. Timmons is partly supported by a Wellcome Value in People Award. This study was supported by an Affymetrix Translational Medicine Grant (J. A. Timmons), the Swedish Diabetes Society (J. A. Timmons), Chief Scientists Office, Scotland (J. A. Timmons), Lundbeck Foundation (P. Keller), Pfizer Global Research and Development, United Kingdom (J. A. Timmons and C. J. Sundberg), and Swedish National Centre for Research in Sports (J. A. Timmons, C. J. Sundberg, and E. Jansson). The Centre of Inflammation and Metabolism is supported by Danish National Research Foundation Grant DG02-512-555. The HERITAGE Family Study has been funded by National Heart, Lung, and Blood Institute Grants HL-45670, HL-47323, HL-47317, HL-47327, and HL-47321 (to C. Bouchard, T. Rankinen, D. C. Rao, Arthur Leon, James Skinner, and Jack Wilmore). L. G. Koch and S. L. Britton were supported by National Center for Research Resources Grant RR-17718. C. Bouchard is partially funded by the George A. Bray Chair in Nutrition.


A patent has been filed for the low-responder predictor.


We thank Prof. Michael Rennie for helpful comments on the draft manuscript.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
View Abstract