Previous genetic association studies of physical activity, in both animal and human models, have been limited in number of subjects and genetically homozygous strains used as well as number of genomic markers available for analysis. Expansion of the available mouse physical activity strain screens and the recently published dense single-nucleotide polymorphism (SNP) map of the mouse genome (≈8.3 million SNPs) and associated statistical methods allowed us to construct a more generalizable map of the quantitative trait loci (QTL) associated with physical activity. Specifically, we measured wheel running activity in male and female mice (average age 9 wk) in 41 inbred strains and used activity data from 38 of these strains in a haplotype association mapping analysis to determine QTL associated with activity. As seen previously, there was a large range of activity patterns among the strains, with the highest and lowest strains differing significantly in daily distance run (27.4-fold), duration of activity (23.6-fold), and speed (2.9-fold). On a daily basis, female mice ran further (24%), longer (13%), and faster (11%). Twelve QTL were identified, with three (on Chr. 12, 18, and 19) in both male and female mice, five specific to males, and four specific to females. Eight of the 12 QTL, including the 3 general QTL found for both sexes, fell into intergenic areas. The results of this study further support the findings of a moderate to high heritability of physical activity and add general genomic areas applicable to a large number of mouse strains that can be further mined for candidate genes associated with regulation of physical activity. Additionally, results suggest that potential genetic mechanisms arising from traditional noncoding regions of the genome may be involved in regulation of physical activity.
- single-nucleotide polymorphism
- murine strains
- running wheel
it is well accepted that moderate physical activity leads to many health benefits, not the least of which are decreases in risk for a large number of chronic diseases (e.g., cardiovascular disease, diabetes, stroke, certain forms of cancers). However, when directly measured by an accelerometer, only a low percentage of adults participate in moderate activity levels on a daily basis (46). This paradox has led to a suggestion of a significant genetic basis for activity, and in fact a multitude of recent studies have lent support to this hypothesis (e.g., Refs. 13, 23, 33, 34, 36, 44, 45).
Both human (e.g., Refs. 23, 36, 44, 45) and animal (e.g., Refs. 13, 24, 33, 34) models have been employed in an attempt to understand the genetic basis of physical activity regulation. The heterogeneity of the human genome, the well-known limits on human experimentation, and the difficulty in directly measuring activity in the requisite thousands of subjects have made linkage and other genetic studies with humans difficult. On the other hand, animal models specifically using inbred and/or selected bred mouse lines have facilitated linkage studies that have been relatively free of environmental effects on activity. Currently, it is estimated that ∼75–79% of the genes in the human and mouse genome are simple orthologs that represent conserved mammalian functional genes (37). Thus this large core of conserved genes directly facilitates mouse to human translational efforts. Furthermore, the use of wheel running behavior in rodents as a surrogate for voluntary physical activity behavior in humans is justified by multiple correlates between humans and rodents in responses to voluntary exercise and wheel running, respectively, including similar changes in 1) cardiovascular functioning parameters (e.g., Ref. 8), 2) muscle and mitochondrial enzyme activities (e.g., Refs. 8, 28), and 3) brain neurotransmitters [e.g., brain-derived neurotrophic factor (BDNF); Refs. 11, 22]. Furthermore, both humans and mice, when given access to means of voluntary activity, self-select similar levels of exercise intensity during autonomous exercise periods (10, 38).
While there have been several studies that have associated genetic influence with physical activity, the animal studies often have been conducted with only one sex or a limited number of strains, thus reducing the genomic coverage and generality of the results (e.g., Refs. 13, 34). For example, Festing (13) collected wheel running data from mice in 26 different strains, but the mice varied in their ages at testing and there was no indication of the sexes involved. While we used both male and female mice in an initial study from our lab (33), our initial strain screen only tested a limited number of strains. Several subsequent studies have identified genomic quantitative trait loci (QTL) associated with wheel running behavior (29, 34, 35) but have necessarily been conducted with strains identified in previous limited screens and/or selective breeding protocols, and thus it is possible that all of the genomic loci associated with wheel running in mice have not been captured.
Since the time of our original strain screen study (33), the use of computational methods to identify QTL from strain distribution patterns (in silico analysis; see Ref. 17) has been refined to deal with many of the early criticisms of this method (18, 41, 43, 49). The use of these computational methods, now generally referred to as haplotype association mapping, has also been spurred by the exponential increase in the availability of denser single-nucleotide polymorphism (SNP) maps. In 2001, Grupe et al. (17) only had access to a genomic-spanning database of 500 SNPs. This poor coverage led to concerns about the power of the analysis and the potential effect of minor alleles upon the results (7). With the recent availability of a very dense SNP map containing 8.27 million SNPs per strain in 55 inbred strains (14) and more powerful computational approaches (43), it is now feasible to apply these methods to larger strain screens to identify genomic QTL associated with wheel running traits as has been done with other complex traits (e.g., Ref. 19). Therefore, the purpose of this project was to apply haplotype association mapping strategies to several wheel running activity traits measured in male and female mice from a large number of inbred strains of mice to expand our knowledge of the genomic locations of QTL associated with physical activity.
Forty-one inbred mouse strains were purchased from Jackson Laboratories to expand our original strain screen database (33), and the mice received were ∼6 wk of age, although older in some strains because of the lack of general availability (Table 1). The expanded cohort was composed of 448 mice, with 212 female mice and 236 male mice. Upon receipt, the mice were group housed and quarantined until 8 wk of age, at which time they were singly housed and given access to a running wheel. When the mice reached 9 wk of age (63 days), their wheel running data (see below) were collected for either 7 or 21 consecutive days. Four strains (CE/J, LP/J, PL/J, and SM/J) were monitored over a 7-day period, four strains (C57BL/6J, Cast/EiJ, BALB/cJ, and NZB/BinJ) had mice that underwent either 7- or 21-day exposures, while the remaining 30 strains all were monitored for 21 days. Comparison of the four strains that had both 7- and 21-day activity data showed no difference in the average daily activity measures (data not shown), and thus all data were pooled for subsequent analysis. A similar trend toward comparable activity values within strain over the same time period (weeks 10–12 in their figures) was also noted by Turner et al. (47).
All mice were housed in the same room in the University Vivarium, which was maintained at 18–21°C and 20–40% humidity with 12:12-h light-dark cycles that initiated at 6:30 AM. Food (Harland Teklad 8604 Rodent Diet, Madison, WI) and water were provided ad libitum. Body masses (to the nearest 0.1 g) were collected once per week throughout the study. All procedures used in this study were approved by the University of North Carolina Charlotte Institutional Animal Care and Use Committee.
Running wheel measurements.
At 8 wk of age, a solid-surface running wheel (450 mm in circumference and 35 mm wide) interfaced with a computer (Sigma Sport BC500 and BC600, Olney, IL) was placed in each cage with each individually housed mouse. Starting at ∼63 days of age (9 wk), distance run (km) and duration (min) were noted every 24 h and average speed of activity (m/min) was subsequently calculated. The onset of data collection varied depending on the availability of the mice. Phenotypic data were reported as average distance/day (km), duration/day (min), and speed (m/min) across the monitoring period to wash out any acute fluctuations in activity due to sex hormones or temperature variations in the Vivarium. Using similar methods, we have previously shown (26) wheel running behavior to be highly repeatable in a large cohort of mice. The running wheels required an average mass of 4.55 g to achieve a quarter of a turn, this relationship being equivalent to that found for larger wheels (5). The use of a solid-surface wheel eliminated the possibility of the mice “coasting” on the wheel, as sometimes happens with mesh wheels and which can result in an overestimation of daily activity levels.
Genomic data for each strain were derived from the very dense SNP database developed by Perlegen Sciences (14). This SNP database initially contained 8,234,636 SNPs sequenced across the mouse genome for each of 15 classical inbred strains referenced against the C57BL/6J strain with an average 329.6 base pair (bp) distance between SNPs. This database was subsequently expanded with an additional 40 inbred strains and the use of a hidden Markov model to impute genotypic values. From this dense SNP database, we synthesized genotype analysis files, using custom-written JAVA code that matched SNP reference numbers across all strains used in this study, and removed SNPs from the database that were not measured/imputed in the original Perlegen database. Because the Perlegen database only contained genotypic data for 38 of our 41 strains, phenotypic data collected for three strains (BALB/cJ, C3Heb/FeJ, and C57BL/10J) were not included in the haplotype association mapping analysis.
Descriptive data (e.g., body mass, age) are represented as means ± SD and were compared between strains and sexes with a two-way ANOVA. We partitioned the total variance into within- and between-strain components, and this allowed the calculation both of broad-sense heritability estimates (h2) and coefficients of genetic determination (g2) (12). Associations between the activity indexes (distance, duration, and speed) and between the wheel running measures and beam-break activity measurements reported in 16 inbred strains (129S1/SvImJ, A/J, AKR/J, BALB/cJ, BTBR_T+_tf/J, C3H/HeJ, C57BL/6J, CAST/EiJ, CBA/J, FVB/NJ, LP/J, NOD/LtJ, NZB/BlNJ, PL/J, SJL/J, SWR/J; Ref. 40) were estimated with Pearsonian correlations. All statistical tests were considered significant when P < 0.05.
QTL determinations for all activity indexes were made with haplotype association mapping methods and conducted with the freely available Bayesian Imputation-Based Association Mapping (BIMBAM) software (18, 41). Reflecting the comparison across the inbred strains (i.e., no heterozygous loci) and standardized approaches in the literature (e.g., Refs. 3, 19), we used each strain's phenotypic mean on each activity index in the haplotype association model, using an additive model with no dominance estimates in the analysis. To determine the effect of the wild-derived strains (CAST/EiJ, SPRET/EiJ, WSB/EiJ, MOLF/EiJ, and PWD/PhJ) on the QTL determinations, we repeated the haplotype mapping on a set of genotyping data that did not include the wild strains' genotypes.
There are theoretical and practical advantages of using Bayes factors rather than traditional regression-based methods for QTL determination in strain screens and genomewide association studies (GWAS) (43, 48, 49). One advantage of this approach is that it provides a heightened discrimination of true QTL because of an elimination of the reliance on P values (43, 49). It is known that P value-based determination of QTL using large genome databases suffers from a statistical tendency to markedly increase the number of false-positive P values because of the effect of large sample sizes and the effect of minor allele frequencies (MAF). To offset this tendency, a conservative Bonferroni adjustment usually is applied in which the α-level for significance is reduced to ensure that the experimentwise type I error rate does not exceed 5%. The basic effect, however, is that the threshold for significance decreases as the sample size increases. Conversely, Bayes factors used to determine significance thresholds do not vary with sample size, but rather with the predicted prior probability of gene association with each SNP. Previous use of Bayes factors has been limited by the requirement of an accurate prior probability of gene association, but this has been eliminated by recent advances using asymptomatic Bayes factors that do not require the prior distributions to be specified (49).
A second and perhaps more relevant advantage of the use of the Bayes factors approach is that it reduces the possible effect of alleles that exist in a few individuals or strains (i.e., MAF) on the ultimate determination of QTL. As both Wakefield (49) and Guan and Stephens (18) point out, using P values to rank SNP associations with phenotypes implicitly assumes that “effect sizes tend to be larger for SNPs with a small MAF” or rather that rare alleles will have larger biological effects. While currently there are no data available in this area that support the general notion of rare alleles having larger effects, the use of P values tends to bias QTL discovery to those alleles existing in lower quantity. Thus the use of traditional, regression-based methods to determine an association of SNPs with a phenotype when the data contain wild-derived strains can bias QTL discovery to the MAF found in the wild-derived strains (51). The use of Bayes factors can control the effect of MAF on QTL discovery. While this theory is seemingly solid (18), it is dependent on the MAF threshold used in the analysis, which is normally set at 0.01. In this study, we show that the use of Bayes factors actually only partially controlled the effect of MAF on QTL discovery.
We interpreted the calculated Bayes factors (BFlog10) with methods similar to those of Varona et al. (48). A priori we determined that the criterion thresholds for Bayes factors that ranged from 2 to 3 would be considered slight (i.e., suggestive), values from 3 to 10 would be considered moderate indicators of an association (i.e., significant), and values above 10 would provide strong evidence of linkage (i.e., very significant). Whereas it has been observed that under general assumptions a BF > 2.5 is equivalent to a P value of 0.05 (43), which is generally the a priori standard for null hypothesis rejection, our setting of BF = 3.0 for significance is appropriate. However, as has been noted in the Bayes/non-Bayes compromise (43), by using 1,000 permutations we also calculated the P values associated with each SNP in the data set, which provided an indication of the proportion of the permutations whose single-SNP BF for that SNP exceeded that of the observed data. Confidence intervals (95%) were defined with a modification of the 1-LOD (logarithm of odds) rule; the 95% confidence interval was defined as the range where the Bayes factors decreased one BFlog10 on each side of the peak BF value.
Table 1 shows the characteristics of the mouse cohort used in this study. All three activity traits showed significant sex (P < 0.0001) and strain (P < 0.0001) differences but also significant strain by sex interactions (P < 0.0001). In general, the female mice had less body mass (21%), ran further (24%), ran longer (13%), and ran faster (11%) than male mice. However, after appropriate adjustment for multiple comparisons, the number of strains showing significant differences between sexes for any of the three activity traits was somewhat limited. Specifically, females exhibited significantly higher means than males for distances in seven strains (AKR/J, C3HebFeJ, C57L/J, CE/J, LG/J, NON/ShiLtJ, WSB/EiJ), duration in five strains (129x1, CE/J, LG/J, NON/ShiLtJ, WSB/EiJ), and speed in five strains (A/J, C57L/J, CBA/J, SJ/L, WSB/EiJ). Conversely, males showed significantly higher means compared with females for distance in the BTBR_T+tf/J and LP/J strains, duration in the BTBR_T+_tf/J, CBA/J, DBA/2J, and LP/J strains, and speed in the PL/J and BTBR_T+_tf/J strains. It should be noted that the C58/J mice were uniquely consistent in the speeds they ran (Table 1). While the duration of activity varied by animal within the C58/J strain, their speed of activity was almost exactly the same for each animal, whether they were male or female.
Across all strains, each of the physical activity indexes were significantly correlated with each other (distance and duration, r = 0.88, P < 0.0001; distance and speed, r = 0.70, P < 0.0001; duration and speed, r = 0.39, P < 0.0001). While age of the mice at start of the data collection varied slightly because of availability (Table 1), this variation in age did not influence weight (r = 0.01, P = 0.79), distance run (r = 0.04, P = 0.45), or speed (r = 0.01, P = 0.89). Age and duration of activity showed a correlation that reached conventional significance (r = 0.10, P = 0.04), although this significance was lost after sequential Bonferroni adjustment. Weight was not associated with either distance (r = 0.05, P = 0.21) or duration (r = 0.02, P = 0.63) but was significantly associated with speed of activity (r = 0.14, P = 0.008).
Figures 1–3 show the strain distribution patterns of distance, duration, and speed across all of the strains, and Table 2 indicates the post hoc statistical differences between the strains in each of the activity indexes. There was a 27.4-fold difference in daily distance between the strain that showed the highest activity (C57BR/CDJ, 10.95 ± 4.3 km/day, average ± SD) and the lowest strain (129S1/SvImJ, 0.40 ± 0.51 km/day). Similarly, there was a 23.6-fold higher duration of activity on a daily basis in C58/J mice (580.63 ± 70.65 min/day) versus 129S1/SvImJ mice (24.64 ± 28.93 min/day). While differences from the highest to the lowest activity in distance and duration were multiple folds, the speed of the fastest strain (PWD/PhJ, 38.64 ± 6.21 m/min) was only 2.9-fold higher than the slowest strain (C3H/HeJ, 13.31 ± 2.86 m/min). Broad-sense heritability estimates for distance were h2 = 0.55 and g2 = 0.38, for duration were h2 = 0.57 and g2 = 0.40, and for speed were h2 = 0.60 and g2 = 0.43. Comparison with beam-break measurements of cage activity in 16 inbred strains reported in the Mouse Phenome database (40) showed no correlation with distance (r = 0.003, P = 0.99), duration (r = 0.12, P = 0.65), or speed (r = 0.009, P = 0.97).
The haplotype association mapping results for distance, duration, and speed are shown in the Manhattan plots in Figs. 4–6, with the characteristics of each significant QTL identified in Table 3. Bayes factor values <1.00 were omitted from the figures to simplify plotting. Each graph consists of four subgraphs: the genotyping of the total cohort (A), the results from the cohort with the exclusion of the wild-type strains (B), and then separate male (C) and female (D) cohorts, both including the wild-type strains. Within the total cohort and in the separate male and female cohorts, 8,443 SNPs were excluded from the analysis because of a MAF <0.01. Excluding the wild strains resulted in the exclusion of 4,326,400 SNPs from the genotyping data because of a MAF <0.01.
The analysis identified three significant QTL linked to distance in the total cohort (GDIST12.1, GDIST18.1, and GDIST19.1). Two of these QTL (GDIST12.1, GDIST19.1) were still evident in both male and female mice after the exclusion of the wild strains (Fig. 4, Table 3), while GDIST18.1 was significant in the analysis of the total cohort, the cohort excluding the wild-strains, and in the female-only cohort. However, in the male cohort, GDIST18.1 was highly suggestive (QTL BFlog10 = 2.742). Given the density of the SNP map used and the apparent robustness of these three QTL, the confidence intervals reported for these QTL were narrow (Table 3), especially compared with other QTL identified (e.g., GDIST8.1m). Four distance QTL were significant in male mice, but not in females (GDIST5.1m, GDIST6.1m, GDIST8.1m, and GDIST13.1m), while two distance QTL were significant in female mice but not in males (GDIST8.2f, GDIST11.1f). The two sex-specific distance QTL on chromosome 8 (GDIST8.1m and GDIST8.2f) are in different locations on the chromosome and thus present as distinct peaks on the total cohort shown in Fig. 4A. Perusal of Fig. 4B shows that when the wild strains were removed from the cohort none of these sex-specific QTL was present, even though the effect of the QTL on the total cohort (Fig. 4A) was sufficient to raise several above the significance threshold. All significant distance QTL identified (GDIST12.1, GDIST18.1, GDIST19.1, GDIST5.1m, GDIST6.1m, GDIST8.1m, GDIST13.1m, GDIST8.2f, and GDIST11.1f) exhibited a Bayes/non-Bayes compromise P value < 0.001.
Surprisingly, there were virtually no QTL linked with duration in any of the cohorts (Fig. 5). The exception to this was the identification of GDURX.1f, which was associated with duration only in the female mice (Table 3). Excluding the wild strains from the analysis, speed also showed no significant QTL that associated with both sexes (Fig. 6); however, two sex-specific QTL were identified associated with speed: GSPD11.1f and GSPD6.1m. The GSPD11.1f appears to colocalize with the GDIST11.1f QTL, suggesting pleiotropic activity of the responsible genetic factors in these QTL. The identified significant duration and speed QTL all exhibited a Bayes/non-Bayes compromise P value < 0.001.
The search for genes underlying physical activity should serve to increase our growing knowledge of the biological mechanisms that exert a significant influence on daily activity. The present study, which makes use of the largest data set of inbred mouse wheel running activity available, adds to this knowledge base by contributing a multistrain genome map of potential chromosomal sites linked to physical activity. Specifically, we identified 12 significant QTL linked with different activity traits that exhibit minimal overlap with QTL identified previously in both mouse (e.g., Refs. 20, 24, 29, 34, 35) and human (e.g., Refs. 4, 9, 42) studies. Thus these QTL constitute additional areas to consider for potential candidate genes involved in the regulation of physical activity.
A plethora of data have suggested that the heritability of physical activity is significant (e.g., Refs. 13, 23, 33, 34, 36, 44, 45), with estimates ranging from as low as 0.2 (36) to as high as 0.92 (23). This variability probably reflects both the activity index used and the type of heritability statistic used, but the heritability estimates are generally similar between both human and animal models. The heritability estimates for the activity traits derived from the present data set (0.38–0.60) align with these ranges and thus continue to add support to the conclusion that physical activity levels are heritable.
While the evidence is strong that activity levels are heritable in both human and animal models, the identity and function of the genetic factors that serve to regulate activity levels are not known. Most attempts to locate such genes have relied on the association of various genetic markers with activity levels. However, typical human studies require very large genome data sets to provide sufficient power to identify linked genomic loci; this can be observed in the recently published GWAS in 2,622 Dutch and American subjects using a fairly dense human SNP map (1,607,535 common SNPs; Ref. 9). While De Moor and colleagues (9) identified three significant QTL associated with exercise participation, the authors suggest that one of the limitations of their study was the lower power with which to detect regions having smaller effects on physical activity. Similar issues arise when considering earlier, smaller GWAS studies in children (4) and in a family cohort of Canadians (42) that provided relatively few significant QTL linked with activity in humans. The animal linkage studies that are available to date are also not immune from lower power to detect QTL. Two of the five animal association studies available at this time (29, 34) were based on a single, relatively small sample (n = 310) of F2 animals derived from only two mouse strains, which decreases the generalizability of those results. The present study, that by De Moor and colleagues (9), and others using a large number of backcross and advanced intercross mice (20, 24, 35) have considerably more statistical power than previous studies and should add substantially to the previously identified activity-related QTL.
The QTL that we observed share some genomic localization with previously published QTL from both human and mouse models. The human ortholog of the robust GDIST18.1 QTL from this study (11.576 mbp, 18q11.2) colocalizes with the QTL associated with percentage of time spent in sedentary activities (18q12–q21) identified by Cai and coworkers (4) in their cohort of Hispanic children. Two of the suggestive QTL identified by Simonen and colleagues (42) associated with the amount of moderate and strenuous activity completed (4q28.2 and 9q31.1) localize closely to the human orthologs of the present study's GDIST19.1 QTL (human location 9q21.2) and the male-specific GDIST8.1m (human location 4q34.1). There is also common genomic localization between the present study QTL and the available animal QTL studies; GDIST6.1m from this study is similar in genomic location to an epistatic QTL associated with distance (Chr. 6, 80 cM × Chr. 15, 4 cM; Ref. 29) and the MMU6 QTL (Chr. 6, 66.69 cM; Ref. 35). While Kelly and coworkers' (24) MMU5 distance QTL for days 2 and 3 of wheel exposure was on chromosome 5, its location was ∼50 mbp closer to the centromere than the GDIST5.1m QTL we discovered (50 and 52.9 mbp vs. 118 mbp). Interestingly, GDIST18.1 and GDIST19.1 also colocalize with two QTL associated with the relationship between physical activity and weight (30). Furthermore, an epistatic QTL from the same study (Act12epi.1) is in a genomic location similar to the QTL GDIST12.1 from the present study. Thus, while our results have added to the potential QTL to be mined for candidate genes that regulate activity, there is also evidence, supported also by the colocalization of GDIST11.1f and GSPD11.1f and as suggested by others (24, 30), that there may be pleiotropic relationships between the various activity-related QTL.
It is interesting that while the estimates of the heritability of physical activity levels fall in the moderate to high ranges, the total number of activity QTL that have been identified are still somewhat limited. At least three possibilities could account for this phenomenon: 1) each QTL identified explains a large portion of variance in the genetic influence on physical activity, i.e., there are few genes that control physical activity and not all have been discovered; 2) there are a large number of smaller-effect QTL not yet discovered; and/or 3) a portion of the total genetic variance is nonadditive in origin, being generated by dominance or epistatic interactions.
It is possible that the three strong QTL we observed explain a large portion of the variance and thus there are relatively few QTL controlling activity traits. In fact, on the basis of on previous intercross data (34), only four QTL were predicted to be responsible for the regulation of physical activity (12). Refined methods to estimate the percentage of variance explained by each QTL using Bayesian factors are not yet available; however, tentative estimates using the additive values reported from the Bayes factor calculation suggest that the identified generalizable QTL in the present study account for a large portion of variation (GDIST12 = 23.9%; GDIST18 = 19.9%; GDIST19 = 19.0%). These estimates may well be inflated (12) and in any event are much greater than the highest value of 6% contributed by any QTL discovered in our previous intercross study (34). Thus it seems more probable that all of the models used thus far have been underpowered to detect small-effect QTL, a point that we (29, 34) and De Moor and colleagues (9) have made previously. In addition, earlier work from our lab (29) has suggested strongly that epistatic interactions may explain up to 50% of the genetic effect on phenotypic variation in physical activity. While a strength of the present study was the large number of genetic markers available, unfortunately this strength also imposed a computational limitation that prevented the determination of the epistatic interactions in this data set. For example, with the present data set, a two-gene epistatic analysis requires an array that uses ∼1 petabyte (1,000,000 gigabytes) of memory, a requirement that currently outstrips commonly available computational devices. Thus further exploration of why there are relatively few QTL identified at this time for this complex trait must await refined variance equations for Bayes factor QTL analysis, more efficient calculation algorithms for the nonadditive genetic factors, and/or larger computing capabilities.
A large number of genes have been postulated to be involved in regulation of physical activity based primarily on their functional relevance. Similar to that found by De Moor et al. (9), several of these genes, including calcium sensing receptor (Casr), aromatase (Cyp19a1), dopamine receptor two (Drd2), leptin receptor (Lepr), and melanocortin-4-receptor (Mc4r), did not localize within any of the QTL regions we observed in this study (2). Interestingly, PAPSS2 (3′-phosphoadenosine 5′-phosphosulfate synthase 2), a gene involved in both blood coagulation and bone development that was strongly associated with human activity in the De Moor et al. study (9), localizes on Chr. 19 between 32.694 and 32.742 mbp in the mouse genome (2) which is ∼16 mbp above the robust GDIST19.1 QTL we identified.
It is noteworthy that of the 12 QTL identified in the present study 8 of the 12 SNP markers (67%) with the highest Bayes factors in these QTL all fell within intergenic areas. It is probable that some of these eight QTL actually include surrounding genes, especially given the larger confidence intervals shown for the sex-specific QTL (Table 3). However, none of the annotated genes that fall within the sex-specific QTL confidence intervals has other lines of evidence supporting its role in activity regulation other than speculated functional relevance. Therefore, on the basis of our mapping results alone, we are hesitant to categorize any genes within any of the confidence intervals surrounding our sex-specific QTL as potential candidate genes.
Interestingly, our three strongest QTL (GDIST12.1, GDIST18.1, GDIST19.1) all exhibited relatively narrow confidence intervals (100–400 kbp), within which there are eight predicted genes (i.e., genomic areas thought to be part of specific, unannotated genes) and only one annotated gene (Pomc-ps1, proopiomelanocortin, pseudogene 1, Chr. 19, 15,436,825–15,437,360 bp; Ref. 2). As a pseudogene, Pomc-ps1 is not thought to code for any protein; however, it has been suggested that an unknown gene in the immediate vicinity of Pomc-ps1 may be involved in the translational regulation of the dopamine transporter gene (Dat) since it colocalizes with a strong QTL affecting Dat expression (21). A potential candidate for this dopamine translational function is guanine nucleotide binding protein, αq polypeptide (Gnaq, Chr. 19, 16,207,321–16,461943 bp; Ref. 2), which is <100 kb from the GDIST19.1 QTL and is involved in inhibiting dopamine receptor activation of the phospholipase C pathway. This potential dopaminergic pathway of action for the GDIST19.1 QTL—through both Pomc-ps1 and Gnaq—is attractive, given other literature that has suggested the involvement of various dopaminergic genes and pathways in the regulation of physical activity (25, 39). Thus, given the localization within the GDIST19.1 QTL and other literature addressing the importance of the dopamine pathways in regulating activity, we would consider Gnaq a potential candidate gene for the regulation of physical activity.
Since it has been estimated that at least 95–98% of the mammalian genome is composed of intergenic, conserved noncoding sequences (1), it is not surprising that most of our QTL fell into these areas. This result was also noted by De Moor et al. (9), with two of their three significant QTL in their large human GWAS falling into intergenic areas. If future activity QTL localize within intergenic areas, it will further complicate future candidate gene searches by suggesting that at least some of the genetic control of physical activity arises from regulatory sequences that lie in intergenic areas. To this end, it is possible that RNA interference (RNAi) mechanisms may play an important role in the regulation of physical activity. In particular, RNAi mechanisms are thought to arise from intergenic conserved noncoding sequences; interestingly, the genomic sequences surrounding and including the three generalized QTL found in this study all partially match (81–89%) known stem-loop RNA segments in plants (16). Thus whether our identified QTL actually represent direct gene influence on physical activity or mechanisms that modify the genetic process of translation, which in turn affects physical activity, remains an interesting area of inquiry.
Comparison of the activity levels of the inbred strains in this study with other published wheel running data from the same strains (13, 31) shows that some strains exhibit fairly consistent activity patterns regardless of lab (e.g., DBA/2J, FVB/NJ, C57BL/6J, BALB/cJ, CE/J) while others appear to show differential activity patterns (e.g., DBA/1, BALB/cByJ, CBA/J). However, these comparisons are difficult given that Lerman et al. (31), while using the same general wheel running methods as we did, tested much older mice (20–24 wk of age) for a shorter time period (14 days). Furthermore, Festing (13) tested inbred strains without regard for sex of the animal, with a large range of ages (3.7–19.5 wk), used a much larger running wheel (1.1 m), and monitored the mice for only 24–48 h. Additionally, in an elegant and well-controlled study, Crabbe and colleagues (6) showed that the lab environment, despite rigorous standardization, can influence mouse behavioral responses. Thus, given the methodology differences and potential unknown environmental influences, it was not surprising that our strain distribution patterns of wheel running activity were not completely similar to previous published research. However, it should be noted that of the strains we have tested most extensively (e.g., C3H/HeJ, C57L/J, C57BL/6J), all show remarkably small variance between animals in their activity patterns (Figs. 1–3).
Generally, it has been observed that female mice exhibit more wheel running activity overall than male mice (27, 33, 34). We previously hypothesized (34) that sex hormone effects on activity occur primarily downstream of genetic mechanisms. The fact that the large majority of QTL discovered in the present study were sex specific (≈67%) and the recent finding of a male-linked QTL associated with duration of activity by Nehrenberg and colleagues (35) suggests that there are specific sex-related genetic mechanisms acting to influence physical activity. If so, our previous hypothesis based on a two-strain intercross cohort is not generalizable to all mice. We anticipate that the identification of potential sex-specific genetic mechanisms associated with physical activity regulation will be a fruitful line of research given the general lesser activity levels of human females compared with males in both westernized and hunter-gatherer populations (32, 46).
In an earlier report (34), while we had observed similarity in QTL derived from an F2-intercross cohort and a limited 27-strain haplotype association, the majority of those QTL identified were not present in the present study. In addition to the already mentioned lack of generalizability in using only two strains, it is probable that the fewer strains used (n = 27), the smaller SNP genome map (n = 1,272 SNPs), and the haplotype analysis of only two chromosomes led to a reduction of statistical power that allowed the C57L/J data (highly active mice) in that cohort to inappropriately influence the outcome. Additionally, given that with an F2 model you can calculate dominance effects, which we have shown to be present in the F2 model (34), it is possible that the present haplotype association mapping, which cannot calculate dominance effects given the lack of heterozygous loci among the strains, did not reveal certain QTL. Given these potential issues, perhaps it should not be surprising that few of the QTL identified in the present study replicated those identified previously.
Darvasi (7) suggested that even with the use of optimal conditions 40–150 inbred strains would be needed to provide appropriate discriminatory power for haplotype association mapping. In fact, we had initially designed the study to reach the 40-mouse strain minimum, but with the release of the second phase of the Perlegen SNP database, which did not include the BALB/cJ, C3Heb/FeJ, and C57BL/10J strains, we did not have the full 40-strain cohort and other inbred strains were not readily available. However, since Darvasi's estimations, papers using dense mapping of the mouse genome have shown limited genomic diversity among the classic inbred strains (15, 50, 51). Frazer et al. (15) suggested that using only 12 inbred strains would capture 95% of the variability in the mouse genome, while Yalcin et al. (50) showed that increasing the number of strains that were tested did not necessarily increase the resolution of the genomic scan. Furthermore, haplotype mapping approaches with fewer than 40 strains have been used successfully in investigating other complex traits (e.g., Refs. 3, 19), with at least one study estimating that the use of 33 strains in a regression-based approach to haplotype (i.e., without control for MAF and sample size) results in a QTL false discovery rate of ∼22%. However, it is recognized that because many genomic areas are identical by descent, this genomic similarity can result in “blind spots” in genomic data (51). Thus the inclusion of some wild strains in the data set can help to offset some of these blind spots, especially if MAF is controlled (18). This may explain why the sex-specific QTL present when the wild strains were included in the cohort disappeared upon removal of the wild strains' genotype data. Despite these limitations, we identified three robust QTL for distance that arose whether we used the full 38-strain cohort or subsets of that cohort. Thus, given the number of strains we used, the ability of the Bayes factor approach to control for MAF effects, inclusion of wild strains (3, 19), and the relative lack of genetic diversity in the inbred mouse genome (15, 50, 51), we have confidence that our cohort contained sufficient discrimination capability to detect major single-effect QTL associated with physical activity.
In summary, we have used activity data in a large number of inbred strains in conjunction with a dense SNP map to identify three robust and significant QTL associated with distance run per day and nine significant sex-specific QTL that associated with distance, duration, and speed of activity. These QTL represent new genomic locations associated with physical activity and, because most of these QTL fall within intergenic areas, may represent new challenges to understanding the genetic regulation of physical activity.
This project was supported by National Institutes of Health Grants DK-61635 (J. T. Lightfoot), AR-050085 (J. T. Lightfoot, M. J. Turner, L. Leamy, A. Knab, R. S. Bowen, D. Ferguson, T. Moore-Harrison, A. Hamilton, and A. A. Fodor), AG-022417 (M. J. Turner), and DK-076050 (D. Pomp).
No conflicts of interest, financial or otherwise, are declared by the author(s).
The authors thank Jessica Moser, Sarah Carter, Matt Yost, Anna Vordermark, Felicia Dangerfield-Persky, Sean Courtney, Mark Lindley, Amber Monroe, and Matt Belles for their technical expertise and the Vivarium staff for their animal care and husbandry skills. Additionally, we thank Dr. Yongtao Guan for his assistance in troubleshooting and customizing BIMBAM to run with this large data set and Dr. Steven R. Kleeberger for his continued support and advice.
- Copyright © 2010 the American Physiological Society