Journal of Applied Physiology

Necessary advances in exercise genomics and likely pitfalls

Yannis Pitsiladis, Guan Wang

a number of methodological approaches within the field of genetic epidemiology have been utilized to unravel the genetic basis of physical performance. Popular gene discovery methods for polygenic traits are summarized in Fig. 1. The basic family/twin study approach has provided useful and reliable genetic data. A recent twin study, which comprised 37,051 twin pairs from seven European countries, suggested additive genetic variants contribute significantly to exercise participation among the twin pairs (10). No similar studies of such magnitude have been conducted in the area of human performance. Because of the development of more advanced gene discovery techniques, genetic studies are no longer restricted to family/twin studies but expanded to include the assessment of genetic variants within populations. Population-based studies are extensively being used, particularly involving two groups: cases and controls. Population case-control studies can be further differentiated into hypothesis-free and the more commonly used hypothesis-driven approaches. The most extensively used candidate gene association study approach requires a prior hypothesis that the genetic polymorphisms of interest are causal variants or in strong linkage disequilibrium with a causal variant. This population-based genetic approach aims to define alleles or markers that segregate with a particular phenotype or disease at a significantly higher rate than predicted by chance alone, by genotyping the variants in both affected and unaffected individuals. This approach is effective in detecting genetic variants with small or modest influence on common disease or complex traits. Functional single-nucleotide polymorphisms (SNPs) with tag SNPs that would cover the entire candidate gene have been used in many candidate gene association studies. Further advances in molecular technologies have enabled researchers to apply genomewide approaches to the field. The genomewide association study (GWAS) is a hypothesis-free approach used to detect the common variants underlying complex diseases and traits so as to help predict the disease risk and develop targeted therapy. GWAS has been successful in identifying novel genetic variants for Type 2 diabetes mellitus (1) and the interleukin 23 pathway in Crohn's disease (8). The GWAS approach is not without important limitations. For example, human height is a highly heritable quantitative trait as well as stable and easy to measure. In theory, the application of GWAS would be suitable in finding height-related genes. However, despite significant investment in large sample numbers, GWAS results have been disappointing as only 10% of phenotypic variation in height could be explained from the 180 associated loci to adult height in the largest (n = 183,272) study published to date (6) and typically 5% or less of the phenotypic variation in other smaller studies [e.g., n ∼ 30,000 with 47 identified loci (7)]. A large sample size is indeed needed in the example of human height, and the occurrence of rare variants that are not well captured by GWAS may partly explain this limited success in determining the genomics of adult height. The application of the genomewide linkage analysis approach has been successful in identifying disease genes related to monogenic disorders (4) but only partially successful in detecting complex genetic traits related to multiple genes. The lack of greater success is probably due to the low heritability of the examined complex traits (common variants with modest effect).

Fig. 1.

Popular gene discovery methods for polygenic traits.

Despite numerous reports of genetic associations with health-related fitness phenotypes, there has been limited progress in discovering and characterizing the genetic contribution to these phenotypes due to few coordinated research efforts involving major funding initiatives and the use primarily of the candidate gene approach. It is timely that exercise genetic research has moved into the genomics era with a paper by Bouchard and colleagues (2) in this issue of the Journal of Applied Physiology. In this first study to apply the GWAS approach to exercise genetics, these authors report the results of an investigation aimed at identifying the genetic variants associated with gains in maximal oxygen uptake (V̇o2max) using the resources of the subsample of whites of HERITAGE (3). A total of 324,611 SNPs were genotyped and the most significant SNPs tested for replication in the subsample of blacks from HERITAGE, the women of DREW (9), and the men and women of STRRIDE (5), who were all exposed to different but standardized and supervised exercise training programs. Based on single-SNP analysis, 39 SNPs were significantly associated with the gains in V̇o2max. Stepwise multiple regression analysis of the 39 SNPs identified a panel of 21 SNPs that accounted for 49% of the variance in V̇o2max trainability. Intriguingly, subjects who carried 9 or fewer favorable alleles at these 21 SNPs improved their V̇o2max by 221 ml/min, while those who carried 19 or more of these alleles gained on average 604 ml/min. Notably, the strongest association of the identified SNPs was with rs6552828 located in the ACSL1 gene, accounting for ∼6% of the training response of V̇o2max. Generally, this is a well-designed study in terms of cohort selection, replication, use of predictor SNP score, and reliable determination of V̇o2max. However, like all studies, especially those pioneering in any field, this study is not without an important limitation i.e., the small sample size (n = 483). The three replication studies (HERITAGE blacks, DREW, and STRRIDE) given their even smaller sample size (n = 247, n = 112, and n = 183, respectively) are not ideal and hence result in limited replication (i.e., 5 SNPs in total). Most of the knowledge in exercise genetics has been generated primarily using classical genetic methods such as SNPs and applied to cohorts with small sample sizes. The data generated therefore from most published studies in exercise genetics need to be examined in light of the view held by some “hard core” geneticists that a study of any complex phenotype in humans is futile unless a cohort size of between 20,000 and 100,000 is used. Bouchard and colleagues (2) attempt to overcome this major limitation of observational and cross-sectional designs by conducting a carefully controlled intervention study that significantly reduces the number of confounders and therefore the sample size required to detect SNPs with a significant effect size. With the present sample size including replication, Bouchard and colleagues (2) appear to detect SNPs with an effect size of ≥2% with some confidence and ≥4% with greater confidence. Contrast this with the perceived requirement for ∼4,000 subjects to detect a SNP with an effect size of ∼1% in a classic GWAS with a continuous trait. As a result, the newly identified genomic predictors of the response of V̇o2max to regular exercise provide new targets for the study of the biology of aerobic fitness and adaptation to regular exercise. More importantly, however, this study demonstrates the unique capabilities of applying whole genome technologies to the classic intervention study for gene discovery, in particular, the ability of this approach to circumvent the need for very large (and typically impractical) cohort sizes and the currently prohibitive cost to subject large cohorts to whole genome analysis. This study has also reaffirmed the importance of well-phenotyped cohorts in reducing the required sample size.

It is accepted that there will be many interacting genes involved in exercise-related traits, including sporting performance, and hence it is timely that genetic research has moved to the genomics era, i.e., the simultaneous testing of multiple genes. The approaches and technologies used by Bouchard and colleagues (2) will no doubt be increasingly applied to searching the whole human genome instead of studying single genes or indeed SNPs as the cost of using whole genome methods becomes more affordable. Particularly, the cost of large-scale sequencing will become cheaper over the next years. Seemingly reputable claims have been made that it is only a matter of time before the entire human genome can be sequenced for $1,000. Recently, the newest Illumina sequencing machine HiSeq 2000 costs less than $10,000 in a single run (2 human genomes and 30× coverage); this cost has dramatically dropped from $60,000 in 2008. No matter the success or failure of the GWAS approach, this approach is certainly providing the insight into genetic architecture and the molecular basis underlying human diseases and complex traits. In the near future, large cohorts will be routinely studied by GWAS and will provide good resources for all scientific fields including exercise genomics. This development will require a move away from the traditional method of researching in exercise science/medicine (i.e., predominantly single laboratory studies) to large well-funded laboratory collaborations and therefore substantial statistical/technological power and know-how. Only with such resources can the most strongly acting genes be identified with confidence, gene × environment interactions be studied accurately, and clinically meaningful gene × gene interactions revealed.


No conflicts of interest, financial or otherwise, are declared by the author(s).


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.