## Abstract

This investigation developed models to estimate aspects of physical activity and sedentary behavior from three-axis high-frequency wrist-worn accelerometer data. The models were developed and tested on 20 participants (*n* = 10 males, *n* = 10 females, mean age = 24.1, mean body mass index = 23.9), who wore an ActiGraph GT3X+ accelerometer on their dominant wrist and an ActiGraph GT3X on the hip while performing a variety of scripted activities. Energy expenditure was concurrently measured by a portable indirect calorimetry system. Those calibration data were then used to develop and assess both machine-learning and simpler models with fewer unknown parameters (linear regression and decision trees) to estimate metabolic equivalent scores (METs) and to classify activity intensity, sedentary time, and locomotion time. The wrist models, applied to 15-s windows, estimated METs [random forest: root mean squared error (rSME) = 1.21 METs, hip: rMSE = 1.67 METs] and activity intensity (random forest: 75% correct, hip: 60% correct) better than a previously developed model that used counts per minute measured at the hip. In a separate set of comparisons, the simpler decision trees classified activity intensity (random forest: 75% correct, tree: 74% correct), sedentary time (random forest: 96% correct, decision tree: 97% correct), and locomotion time (random forest: 99% correct, decision tree: 96% correct) nearly as well or better than the machine-learning approaches. Preliminary investigation of the models' performance on two free-living people suggests that they may work well outside of controlled conditions.

- ActiGraph
- GT3X+
- high frequency
- triaxial

processing accelerometer data to estimate aspects of physical activity and sedentary behavior remains a challenge. The original approach used data collected in the laboratory to obtain hip activity monitor counts and energy expenditure from a few or several activities (3, 9, 12, 21). These laboratory calibration data were used to build simple linear regression models for estimating energy expenditure or activity intensity from counts averaged over some user specified time frame (e.g., 15 s, 1 min). The most widely used simple linear regression model for the ActiGraph single axis accelerometer worn on the hip was that developed by Freedson et al. (9) and uses counts per minute as the single independent variable. Solving the regression for activity counts using the lower and upper boundaries of absolute moderate intensity activity as the dependent variable [3 and 6 metabolic equivalent scores (METs)] became known as the Freedson cut-point method. Despite its limitations, this model has endured multiple challenges and remains a popular method of choice for processing field based ActiGraph accelerometer data. A modified version of the original Freedson cut-points defining the count range for light, moderate, and vigorous activity was used in the National Health and Nutrition Examination Survey (NHANES) ActiGraph data analysis, which is the largest nationally representative database for objectively monitored physical activity (23).

All of the “cut-point” algorithms for adults that use derived activity counts as inputs into simple linear regression models are for accelerometers worn on the hip. Crouter et al. (5) have developed wrist cut-points for youths. One modification to the hip accelerometer activity count linear regression modeling was developed by Crouter et al. (4, 6) where activity counts were directed to one of two regressions models based on the variability in the activity counts. That approach used the size of the standard deviation of activity counts to direct the activity counts to one of three equations: *1*) resting level of energy expenditure; *2*) walking and running; and *3*) lifestyle activities. Another approach to processing accelerometer data is to use more sophisticated statistical methods, also known as statistical or machine-learning techniques. These methods extend cut-point algorithms in two ways. First, they summarize accelerometer signals using more than just the total count in an interval. Instead, they use statistical summaries of counts that describe both the distribution of acceleration in an interval and the temporal dynamics of the signal. After that, they use those statistical summaries as inputs to (or covariates for) models that estimate either energy expenditure or classify an aspect of the activity that is being performed. Examples of this general approach include (17, 19, 20, 25). In recent work, Lyden et al. (15) developed and validated count-based machine-learning methods designed to be used for accelerometers worn on the hip on individuals outside of controlled laboratory settings.

The 2011-2014 NHANES data collection uses a newer version of an accelerometer (ActiGraph GT3X+) and collected high-frequency (80 Hz) acceleration from a wrist-worn monitor. Advantages of this new protocol include the possibility of objective examination of sleep measures (18) and significant improvement in 7-day wear compliance over a hip-worn accelerometer (24). However, it is not clear how these data will be processed to examine activity and sedentary behavior since there is limited research to develop validated algorithms for the trixaxial, high-frequency accelerometer data from this wrist-worn sensor. The limited research includes work by Hildebrand et al. (13) that developed regression equations to estimate energy expenditure from a wrist-worn accelerometer and work by Zhang et al. (25) that develops methods to classify four activity types from wrist-worn data. The current investigation develops algorithms to estimate four measures of physical activity: *1*) MET-hours; *2*) time (min) in light (METs <3), moderate (3 ≤ METs < 6), and vigorous (METs ≥6) activity; *3*) time in sedentary activities vs. not; and *4*) time in locomotion vs. not. We estimate locomotion time because locomotion is the primary mode by which most people accumulate moderate to vigorous activity. For each metric, we develop and compare both a machine-learning method and simpler linear regression or decision tree methods. The latter methods could be implemented by a nonstatistical expert using the ActiLife software and a spreadsheet. We also compare the MET-hour and activity intensity estimates for the wrist-worn algorithm to the Freedson cut-point method (9) and to the refined Crouter two-regression method (6) for the hip-worn accelerometer. The discussion contains a preliminary evaluation of the new wrist methods on two additional individuals engaged in several hours of free-living activity. These subjects are different from the *n* = 20 on whom the new wrist methods were developed.

## METHODS

#### Participants and data collection.

Twenty healthy individuals between the ages of 20 and 39 yr participated in this study. Participant characteristics are shown in Table 1. Before commencing the study protocol, participants completed a health history and a physical activity status questionnaire. Participants read and signed an informed consent document approved by the Institutional Review Board at the University of Massachusetts Amherst.

Each participant completed a treadmill routine and one of three activity routines comprised of sedentary, lifestyle, and sport activities (see Table 3). The two routines were completed in two separate visits. During each visit, participants wore an ActiGraph GT3X+ (ActiGraph, Pensacola, FL) secured on the dominant wrist with a velcro strap and a hip-worn Actigraph GT3X accelerometer secured on the right hip with an adjustable belt. These accelerometers record gravity as well as movement. Oxygen consumption was measured breath-by-breath using the Oxycon mobile indirect calorimetry system (Carefusion, Yorba Linda, CA). All participants completed the treadmill routine during *visit 1*. Each treadmill speed was performed for 6 min followed by a 3- to 4-min recovery break. The other activity routine was conducted during *visit 2*. These activities were also performed for 6 min each. The activities were performed as similar to free-living conditions (e.g., gardening and raking were performed outside and basketball was performed in a gym on a court) as possible. Subjects were seated exclusively throughout the driving and office work activities.

#### Instrumentation.

The ActiGraph GT3X+ monitor was secured on the dominant wrist with a velcro strap. This device is a small (4.6 × 3.3 × 1.5 cm), lightweight (19 g), triaxial accelerometer. The sampling frequency of the GT3X+ ranges from 30 to 100 Hz; an 80-Hz sampling frequency was used in this study, which is the same sampling frequency used in the 2011–2014 NHANES accelerometer study.

The Oxycon mobile indirect calorimetry system (Carefusion) was used to collect breath-by-breath respiratory gas exchange data. To calculate actual METs for each activity, the average oxygen consumption in (ml·kg^{−1}·min^{−1}) for min 3–5 was divided by 3.5 ml·kg^{−1}·min^{−1}.

#### Statistical methods.

We developed the methods in two steps. First, we computed statistical summaries of the accelerometer signals, and then we used those summaries as covariates in models to estimate aspects of physical activity and sedentary behavior. We describe each step in more detail below.

The summary statistics we use are listed in Table 2. The majority of these statistics have been used in previous work (7, 19, 25) where the motivations for these statistics are discussed. In addition, since we hypothesize that arm position may be related to activity, we use an angle of acceleration of the wrist-worn accelerometer To compute the angle, we determined the axis that recorded −1 g when the arm was hanging straight down and 1 g when the arm was raised vertically. The other axes were zero. We computed the angle as arcsin(axis used/vector magnitude)/(pi/2). We describe this procedure in detail because the definitions of *x*, *y*, and *z*-axes are manufacturer, and even model, specific.

We note that the GT3X+ does not have a gyroscope in it, so the angle is relative to the accelerometer's axes, not relative to a line that is perpendicular to the ground. All statistics for the wrist models were computed from nonoverlapping 15-s intervals of acceleration.

These summary statistics are then used as covariates in two groups of four models that each estimate aspects of physical activity. The models in the first group (linear regression and decision trees) are relatively simple as they will be easier for others to use and have fewer parameters, and the parameters are easier to interpret. The ones in the second group are more complex and potentially powerful machine-learning models.

For the relatively simpler modeling approaches, we use multiple linear regression to estimate METs, and we use decision tree models (1) to classify activity intensity and the other two outcomes (sedentary or not and locomotion or not). The potential inputs to these models are the statistical summaries of acceleration that were discussed previously. We built the models by considering all possible models that used two inputs each, and we selected the models that achieved the lowest leave-one-subject out cross-validated estimates of mean squared error (for MET estimation) and misclassification error (for the other outcomes). We note that a separate left-out subject was used to estimate the reported model performance as is recommended in Ref. 11 (Section 7.10.2).

For the machine-learning approach, we implemented three machine-learning models (neural networks, support vector machines, and random forests). These approaches have been used to estimate physical activity from accelerometers previously (7, 19, 25). These models are nonlinear regression models that can flexibly represent a wide variety of relationships between covariates and outcomes. A technical description of these methods is outside the scope of this article, and more detail can be found in Ref. 14.

For the comparison to existing hip linear regression models, we used the ActiLife software to calculate the counts per second for the vertical axis. Subsequently, we used the published methods in Refs. 6 and 9 to estimate METs.

All statistical analyses, including the preprocessing to compute the statistical summary inputs and specific implementation of the statistical learning methods, were performed using R-software (www.r-project.org) (22). This software is available from the first author.

## RESULTS

Table 3 lists the activities, the associated mean (SD) METs, and the mean (SD) of each of the accelerometer summary statistics. The table also lists which activities are considered as sedentary or not and which are considered as locomotion or not.

Cross-validated estimates of performance are in Fig. 1, *A* and *B*. The models use accelerometer statistics computed from nonoverlapping 15-s windows. Windows with length between 5 and 60 s were considered, and results were similar for windows that were at least 15-s long.

Figure 1*A*, *left*, shows that the random forest, neural network, and wrist linear regression estimates of METs are not significantly biased. The support vector machine method results in statistically significantly (*P* < 0.05) biased estimates, but the biases are not large. The hip method estimates are significantly biased. We note that lack of significant bias only indicates that the mean of the estimates is not significantly different from the mean of the measured values, and many estimation methods (including a single constant mean) are likely to be unbiased. Figure 1*A*, *right*, contains estimates of root mean squared error (rMSE) for each method. The random forest results in the smallest rMSE, followed by the support vector machine, the neural network, and the linear regression for the wrist and the hip methods. The wrist linear regression rMSE is ∼30% larger than the random forest. Since rMSE is the square root of the estimate variability plus the squared bias of the estimate, and both methods are approximately unbiased, the larger rMSE for the linear regression method is due to greater variability in estimates.

Figure 2, *A* and *B*, shows the estimated residuals for the linear regression and random forest estimates of METs. Figure 2, *A* and *B*, indicates that all methods tend to overestimate METs when the actual METs are low and underestimate when actual METs are high, even for methods that are unbiased overall. These over- and underestimations suggest that there are other factors influencing the relationship between activity and METs that are not accounted for by wrist acceleration. This may also be an instance of regression to the mean.

Figure 3 shows the average MET estimate for random forests and linear regression (wrist) by activity and indicates a good agreement between estimates and the criterion measure on average. The smaller variability of the random forest can be seen as well.

Table 4 and Fig. 1*B*, *left*, contain more detail about the performance of the classification estimates of MET level. The decision tree method estimates MET level and classifies sedentary behavior as well as the more sophisticated machine-learning methods. The machine-learning methods identify locomotion slightly better than the decision tree, but the difference is small. All methods for wrist data can identify MET level, sedentary time, and locomotion relatively well.

The machine-learning methods are cumbersome to illustrate; instead they are available as R code and objects from the first author. The linear regression model to estimate METs from wrist acceleration is where sdvm is the standard deviation of the vector magnitude for the interval and mangle is mean angle of acceleration relative to vertical on the device for the interval (see Table 2 for equation to compute mangle). We recommend that if the model estimates a MET level less than 1.0, a value of 1.0 should be used. In our dataset, this situation only occurred 0.7% of the time (27 out of 3,660 intervals).

We present the decision tree classification models next, and the variable definitions are in Table 2. We use a tree model to estimate activity intensity (MET level) in one of three categories: light (METs <3), moderate (≥3, <6), and vigorous (≥6). The estimated model is below.

#### Activity intensity estimation algorithm.

If sdvm ≤ 0.26 and mangle > −52, then Light.

If sdvm ≤ 0.26 and mangle ≤ −52, then Moderate.

If 0.26 < sdvm ≤ 0.79 and mangle > −53, then Moderate.

If 0.26 < sdvm ≤ 0.79 and mangle ≤ −53, then Vigorous.

If sdvm > 0.79, then Vigorous (for any mangle).

Below are the two decision-tree models to classify whether an interval is sedentary or not and whether it is locomotion or not.

#### Sedentary or not estimation algorithm.

–*C*, describes these models graphically and can be interpreted as follows. One figure is displayed for each of the three classification models, and each figure has one of the statistics that is used to classify on the *x*-axis and another on the *y*-axis. The accelerometer statistics from each time interval define a point in the panel, and each time interval is classified according to where it falls in the panel. The classification tree defines the boundary for each class. For instance, the MET level classification model uses the standard deviation of the vector magnitude (sdvm, *x*-axis) and the mean angle (mangle, *y*-axis) to classify each time interval. If a 15-s interval had a sdvm = 0.1 g and an angle of −30°, it would be classified as Light. If the standard deviation remained at 0.1 and the angle decreased to −70°, the boundary at −53° would be crossed, and the interval would be classified as Moderate.

## DISCUSSION

We have developed and evaluated statistical models to estimate aspects of physical activity from an ActiGraph GT3X+ that is worn on the wrist and collects triaxial data at 80 Hz. The models estimate MET-hours, time in different activity intensity categories (light, moderate, and vigorous), the amount of time the wearer is sedentary or not, and the amount of time the wearer is undertaking locomotion or not. We consider two types of statistical models: sophisticated machine-learning models (neural networks, support vector machines, and random forests) and simpler methods (multiple linear regression and decision trees) that could be implemented in a spreadsheet. All models are available from the authors. As inputs, both sets of models use summaries of the accelerometer signals that can be obtained from ActiGraph ActiLife software.

This investigation provides further evidence that acceleration measurements from the wrist can be used to estimate energy expenditure accurately and relatively precisely. Starting with high-frequency, triaxial wrist GT3X+ data, the sophisticated machine-learning methods estimated energy expenditure more precisely than the multiple linear regression approach, but both the sophisticated methods performed quite similarly to decision tree methods for estimated intensity level. Cross-validated estimates of both sets of approaches to the wrist data were more accurate and precise than methods from Freedson et al. (9) and Crouter et al. (6), which use low-frequency one-axis hip acceleration measurements. Those methods were developed on different datasets, and our methods were evaluated by cross validation. That may explain some of the differences in performance. Additionally, we note that the primary purpose of the current study is to develop methods to estimate aspects of physical activity and inactivity from high-frequency triaxial wrist acceleration measurements. It is beyond the scope of this article to consider all of the pros and cons of measuring acceleration at various locations, sampling frequencies, and numbers of axes.

The new methods tend to overestimate METs at lower MET levels and underestimate METs at the higher MET levels (Fig. 2, *A* and *B*). This is not surprising as wrist motion and acceleration is probably a relatively larger fraction of total activity at lower intensity levels and a relatively smaller one at higher intensity levels. As others have shown, accelerometer signals from sensors worn at multiple locations reduce these systematic errors (2). Additionally, both sets of analytics applied to the wrist data were able to detect both sedentary time and locomotion time. Algorithms to accurately detect locomotion and sedentary time from a wrist-worn accelerometer may be useful and meaningful metrics for the NHANES accelerometer study to examine the relationship between these objectively measured behaviors relative to health outcomes. Our results suggest that when the right summaries of acceleration are used as inputs, useful estimates of physical activity and sedentary behavior can be made without reliance on “black-box” machine-learning algorithms. One limitation of the current study is that acceleration was measured on the dominant wrist, rather than the nondominant wrist that is used in NHANES. The effect of this difference on the validity of our estimates is unclear. Additionally, differences in accelerometer orientation may cause some acceleration summary statistics to be different depending on whether the accelerometer is worn on the right or left wrist. We believe that these issues warrant investigation. We note that Zhang et al. (25) found little difference in classification accuracy using data from the left or the right wrist.

As second limitation is that our linear regression and decision tree models both only use two statistical summaries as inputs to calculate their estimates. While we chose to use two inputs to make the models visually interpretable (e.g., Fig. 4, *A–C*), it is certainly possible that models with more (or even fewer) inputs would perform better. There is limited room for improvement in some of the classification tasks, but the there is more room for improvement in the estimate of METs and the classification of activity intensity.

An anonymous referee raised the question of whether our nonsedentary tasks included intervals of standing still and pointed out that classification of sedentary behavior (vs. not) probably would be much easier if standing still was not included. While we do not have a video record to see exactly what each subject was doing, we did examine detailed plots of acceleration vector magnitude from the wrist over time for each subject and activity. These plots revealed that, similar to driving and office work, the golf, basketball, laundry, gardening, raking, vacuuming, and dusting activities all include stretches of time when the vector magnitude remains at 1 g with only small variation. This suggests standing still, but in the future it is recommended that video recordings of free-living behavior be obtained to be able to identify standing time within each activity segment.

A final limitation is that the current investigation uses calibration data that were collected from a relatively small group of young subjects who completed a scripted set of activities. While these data cover a range of activities (including driving) and energy expenditure levels, several recent investigators have found that models estimated from laboratory-based data can perform poorly when applied to data from free-living people (8, 10). We preliminarily investigated this by applying the new methods to wrist data from two additional individuals who were directly observed by trained observers for 2 h each. These new participants were free-living in the sense that they were not told what to do, and the observers recorded what these participants did using a protocol similar to one described in Ref. 16. Figure 5 summarizes the results. These methods and results are promising, but the question of how methods described in this investigation will generalize to free-living people requires further investigation.

## GRANTS

This project was supported by National Cancer Institute Grant R01-CA-121005 at the University of Massachusetts Amherst. The work of S. He was supported by a gift by Joan Barksdale to support research experience for undergraduates in the Department of Mathematics and Statistics at the University of Massachusetts Amherst.

## DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

## AUTHOR CONTRIBUTIONS

Author contributions: J. Staudenmayer conception and design of research; J. Staudenmayer and S.H. analyzed data; J. Staudenmayer, A.H., J. Sasaki, and P.S.F. interpreted results of experiments; J. Staudenmayer and S.H. prepared figures; J. Staudenmayer drafted manuscript; J. Staudenmayer, A.H., J. Sasaki, and P.S.F. edited and revised manuscript; J. Staudenmayer, A.H., J. Sasaki, and P.S.F. approved final version of manuscript; A.H. and J. Sasaki performed experiments.

## ACKNOWLEDGMENTS

We thank three anonymous referees for comments that substantially improved this paper.

- Copyright © 2015 the American Physiological Society