## Abstract

Phase 2 pulmonary oxygen uptake kinetics (ϕ2 τV̇o_{2P}) reflect muscle oxygen consumption dynamics and are sensitive to changes in state of training or health. This study identified an unbiased method for data collection, handling, and fitting to optimize V̇o_{2P} kinetics estimation. A validated computational model of V̇o_{2P} kinetics and a Monte Carlo approach simulated 2 × 10^{5} moderate-intensity transitions using a distribution of metabolic and circulatory parameters spanning normal health. Effects of averaging (interpolation, binning, stacking, or separate fitting of up to 10 transitions) and fitting procedures (biexponential fitting, or ϕ2 isolation by time removal, statistical, or derivative methods followed by monoexponential fitting) on accuracy and precision of V̇o_{2P} kinetics estimation were assessed. The optimal strategy to maximize accuracy and precision of τV̇o_{2P} estimation was 1-s interpolation of 4 bouts, ensemble averaged, with the first 20 s of exercise data removed. Contradictory to previous advice, we found optimal fitting procedures removed no more than 20 s of ϕ1 data. Averaging method was less critical: interpolation, binning, and stacking gave similar results, each with greater accuracy compared with analyzing repeated bouts separately. The optimal procedure resulted in ϕ2 τV̇o_{2P} estimates for transitions from an unloaded or loaded baseline that averaged 1.97 ± 2.08 and 1.04 ± 2.30 s from true, but were within 2 s of true in only 47–62% of simulations. Optimized 95% confidence intervals for τV̇o_{2P} ranged from 4.08 to 4.51 s, suggesting a minimally important difference of ~5 s to determine significant changes in τV̇o_{2P} during interventional and comparative studies.

**NEW & NOTEWORTHY** We identified an unbiased method to maximize accuracy and precision of oxygen uptake kinetics (τV̇o_{2P}) estimation. The optimum number of bouts to average was four; interpolation, bin, and stacking averaging methods gave similar results. Contradictory to previous advice, we found that optimal fitting procedures removed no more than 20 s of phase 1 data. Our data suggest a minimally important difference of ~5 s to determine significant changes in τV̇o_{2P} during interventional and comparative studies.

- oxygen uptake kinetics
- accuracy and precision
- data handling
- computational modeling

at the onset of constant power exercise below the lactate threshold (LT) in humans, mitochondrial oxidative phosphorylation and, subsequently, muscle oxygen uptake (V̇o_{2m}) in activated muscle increase in a manner that is an approximate first-order exponential in vivo (2, 22, 48; cf. 30). The kinetics of phase (ϕ) 2 of the pulmonary V̇o_{2} (V̇o_{2P}), characterized by the response time constant (τ) from repeated breath-by-breath gas exchange measurements, are commonly used to infer V̇o_{2m} kinetics and provide a noninvasive tool to investigate the control of exercise energetics (27, 41, 46). Fast ϕ2 V̇o_{2P} kinetics reflect effective cardiopulmonary and neuromuscular integration and are associated with high-endurance exercise performance (29, 38, 41), whereas ϕ2 V̇o_{2P} kinetics are slowed in the elderly (1) and with chronic disease (12, 23, 40, 46, 51). In addition, ϕ2 V̇o_{2P} kinetics are sensitive to interventions that influence blood flow distribution and muscle O_{2} delivery, muscle metabolism, or muscle recruitment (41, 46), making them a useful prognosticator (49) and method for evaluation of therapeutic benefit (44). Furthermore, the kinetics of ϕ1 of the V̇o_{2P} response (ϕ1 duration and amplitude) are clinically discriminatory (50) and sensitive to age (37). Thus the strong link between V̇o_{2P} kinetics and state of health provides the basis for an inherently attractive, noninvasive, and effort-independent method to characterize the efficacy of the integrated physiological systems response to exercise.

Although there are general guidelines for characterizing V̇o_{2P} kinetics in terms of data collection, processing, and fitting procedures (56), a range of proposals exist for each of these steps (e.g., 10, 14, 19, 20, 26, 33, 39, 58). However, a systematic quantification of the effects of these different procedures on the precision and accuracy of the final ϕ1 duration and amplitude and ϕ2 τV̇o_{2P} characterization, as well as a standardization of these procedures, is lacking.

This study therefore aimed to identify an unbiased (i.e., free from human error) method for V̇o_{2P} data collection, handling, and fitting that allows the most accurate and precise estimation of V̇o_{2P} kinetics. We identified this optimal criterion by systematically determining the influences of a range of common and uncommon collection, averaging, and fitting strategies on both the precision and accuracy of ϕ1 duration and amplitude and ϕ2 τV̇o_{2P} estimation, using a validated cardiopulmonary simulation of exercise gas exchange (8) and a Monte Carlo approach.

## THEORETICAL CONSIDERATIONS

The process linking V̇o_{2P} data collection in the laboratory or clinic, to kinetics characterization, is typically undertaken in three distinct steps: *1*) data collection, *2*) data processing, and *3*) data fitting.

#### Step 1: data collection.

Strategies employed in this step include identification of the optimal algorithms for calculating breath-by-breath gas exchange to improve signal-to-noise for kinetic fitting (6, 13, 14, 55). Strategies to improve primary V̇o_{2P} data also include the repetition of identical bouts of exercise with the intention of combining and averaging those data in the data processing step (Fig. 1*B*) (10, 26, 33, 57). The breath-by-breath fluctuations (also referred to as “noise”) inherent in any V̇o_{2P} measurement are uncorrelated (33) and have a Gaussian distribution in adults (although not in children; 42) with the standard deviation (SD) of this distribution ranging from ~30 to 110 ml/min (33, 47), independent of metabolic rate (33). What is less clear, however, is how different signal-to-noise ratios (or, analogously, the number of combined exercise bouts) affect V̇o_{2P} kinetics estimation and, therefore, whether there is an optimal number of exercise bouts required to estimate V̇o_{2P} kinetics to a given level of confidence.

#### Step 2: data processing.

After the removal of outlying breaths generated by swallows or coughs or other “mistriggers” of the breath identification algorithms, and unrelated to tidal breathing [typically those breaths more than 3 or 4 SDs from the local mean (33, 57)], the second step involves averaging of the data collected from multiple exercise bouts to obtain a single (processed) V̇o_{2P} signal with a high signal-to-noise ratio, before kinetic characterization. Several averaging techniques are employed (Fig. 1, *C–E*), the most widely used involving some form of interpolation and/or averaging. Linear interpolation of data before averaging (commonly to 1-s intervals) is necessary to normalize gas exchange sampling frequency, from the nonuniform breath-by-breath sampling, and therefore ensure equal weighting of data among repeated trials (Fig. 1*C*) (57). Averaging may be in the form of postinterpolation ensemble averaging (56), or by arranging uninterpolated data from all bouts in time (10) before averaging the combined breaths into bins whose size depends on the number of averaged bouts (38) or time (9, 26) (Fig. 1*D*). This “binning” approach to averaging, although improving the signal-to-noise ratio, may help to maintain the density of the data close to that at which it was collected (i.e., breathing frequency), and improve the validity of the estimated confidence intervals (21, 38). Despite the general popularity and acceptance of these approaches, several other data processing methods warrant investigation. Recent simulation studies have suggested that simple superimposition of all data from all bouts before fitting can give accurate ϕ2 τV̇o_{2P} estimates, with the added simplicity of reducing the requirement for complex data treatments (Fig. 1*E*) (19). Another alternative averaging approach, and maybe one that is statistically more robust (16) yet is not typically used for estimating V̇o_{2P} kinetics, involves fitting the individual exercise bouts, then averaging the resulting fit parameters (32). Kier et al. (26) showed that various stacking, interpolation, and bin or ensemble averaging procedures had essentially no effect on the precision of subsequent τV̇o_{2P} estimation. It remains unclear, however, how averaging strategies affect both the precision and accuracy of V̇o_{2P} kinetics estimation in the context of different numbers of averaged bouts and different approaches to fitting the data.

#### Step 3: data fitting.

The third step involves the fitting of the processed V̇o_{2P} data to obtain an estimate of the kinetics of V̇o_{2P}. The V̇o_{2P} response to a step change in work rate in the moderate-intensity domain consists of an initial “cardiodynamic” phase (largely a result of increased blood flow through the pulmonary circulation; 56) followed by a “fundamental” phase, the kinetics of which closely represent those of V̇o_{2m} in young healthy adults (Fig. 1*A*) (22, 48). This entire response has been described mathematically using a piecewise biexponential equation of the form
(1)where *t* is time, V̇o_{2P,base} is baseline V̇o_{2P}, *A*_{1} and *A*_{2} are the amplitudes of the first and second phases of the response, τ_{1} and τ_{2} are time constants associated with each phase of the response, *TD* is a time delay, and *H*(*t*) is the Heaviside step function (cf. 36). Generally, the parameter of most interest is τ_{2}, i.e., ϕ2 τV̇o_{2P}. However, ϕ1 is a complex physiological construct, influenced by several processes including changes in mixed-venous gas tensions, pulmonary perfusion, and end-expiratory lung volume, which sum to generate a response that often deviates from a monoexponential (15, 55). In addition, there are several practical difficulties when using *Eq. 1* to fit V̇o_{2P} data: Phase 1 typically contains only a few breaths (typically 5 or 6 in our simulations; see Fig. 1*B*), and fitting so few data points with the first exponential term in *Eq. 1* drastically reduces the confidence of the parameter estimations in that first exponential term. The influence of this potentially unconfident ϕ1 fit continues into ϕ2, affecting τ_{2} (ϕ2 τV̇o_{2P}) estimation, particularly if the fit to the ϕ1 data does not reach a steady state before ϕ2 begins (i.e., at *t* = *TD*). Furthermore, most nonlinear least-squares algorithms used by data-fitting software (the Levenberg-Marquardt algorithm being the standard; 43) require the calculation of derivatives and cannot handle the Heaviside step function in *Eq. 1*; the parameters *A*_{1} and τ_{1} are shared over, and influenced by the data in, the two different subdomains (*t* < *TD* or ϕ1, and *t* ≥ *TD* or ϕ2), and the extents of the subdomains themselves are determined by the parameter *TD*. As such, fitting *Eq. 1* is difficult without custom implementation of alternative, potentially less robust, nonlinear fitting algorithms such as direct search methods (35). As the parameter of most interest is the time constant of ϕ2, an alternative (and the most commonly used) approach is to isolate the ϕ2 data, then fit these data with a monoexponential equation of the form
(2)
Such a monoexponential equation accurately describes the ϕ2 V̇o_{2P} response to moderate-intensity step exercise (4, 5) and can be handled by most nonlinear least-squares algorithms. If *Eq. 2* is used to fit the V̇o_{2P} data and obtain an estimate of ϕ2 τV̇o_{2P}, it is necessary to omit the ϕ1 data from the fit. The most widely used methods for removing ϕ1 data are empirically derived time-removal methods, where “at least” the first 20 s of data from the exercise transient are removed before fitting (7, 39, 54, 57). The rationale behind this strategy is that, because ϕ1 is expected to last less than 20 s and the ϕ2 V̇o_{2P} response is expected to be exponential, starting the fit from any given point past the ϕ1–2 transition will yield an identical time constant that represents the underlying ϕ2 kinetics; whereas starting the fit from any point before the ϕ1–2 transition will result in a larger (incorrect) time constant for ϕ2 (39, 54, 57). However, the ϕ2 V̇o_{2P} response is not truly exponential, but rather is a nonlinear distortion of a monoexponential V̇o_{2m} response (3, 5, 8, 25; cf. 18). Thus contrary to τV̇o_{2m}, the τV̇o_{2P} is not a “true” constant throughout the transient, and fitting an exponential equation from different points in such a nonexponential ϕ2 will yield varying values for τV̇o_{2P}; progressively larger values as the fit is started from later in ϕ2 (cf. 8). Such behavior is suggested in the empirical results of Murias et al. (39) where τV̇o_{2P} becomes larger as the imposed exponential fit is started from later in the exercise transient, at least in older adults. Although τV̇o_{2P} is influenced by a complex interaction of circulatory and gas exchange responses to exercise, and ϕ2 V̇o_{2P} is not quite exponential, a monoexponential fit of moderate-intensity V̇o_{2P} kinetics remains a useful, concise, and effort-independent method to characterize the integrated dynamic responsiveness of cardiopulmonary and neuromuscular health. Nevertheless, it seems crucial that all data contained in the ϕ2 response, but none of the ϕ1 data, are fitted to obtain the most accurate characterization of V̇o_{2P} kinetics (57). As such, accurate identification of the ϕ1–2 transition is paramount.

When using the monoexponential Eq. 2 to fit V̇o_{2P} data, human error in selecting the ϕ1–2 transition can lead to an unintended bias in τV̇o_{2P} estimation, and so an ideal, unbiased method for isolating ϕ2 data for such a fit would be based on either *1*) identification of some consistent time period (rather than leaving the choice to the individual researcher) at the start of exercise during which data should be removed, or *2*) some other information in the data itself that could algorithmically identify the ϕ1–2 transition.

Rather than employing empirical time-removal methods, the abrupt change in V̇o_{2P} at the ϕ1–2 transition may be identifiable from the V̇o_{2P} data using either the peak time derivative of the V̇o_{2P} data (34) or statistical measures reflecting the best confidence in the fit parameters [e.g., the smallest confidence interval of the obtained time constant (48)]. Although theoretically sound, in that both methods can identify abrupt changes in a continuous signal, their application to experimental V̇o_{2P} data may be hindered by the low sampling rate (relative to the duration of ϕ1) and noise inherent in those data. Whether the use of derivatives or statistical methods to identify the ϕ1–2 transition results in improved τV̇o_{2P} estimates over the empirical time-removal methods currently favored remains to be investigated.

Several studies have examined the effects of the different strategies employed in the three steps described above on the confidence of V̇o_{2P} kinetic parameter estimates using experimental data [e.g., ϕ1–2 transition and ϕ2 τV̇o_{2P} (10, 26, 39, 54)]. However, a limitation of such studies is that the true underlying V̇o_{2P} kinetic parameters are unknown: such experimental methods can therefore give an indication of the precision of V̇o_{2P} kinetics estimation but not of its accuracy. Computational approaches using Monte Carlo methods (17) can overcome some of these limitations. For this, a simulation is first used to produce a clean, continuous V̇o_{2P} trace with known kinetic parameters. This trace is then sampled using simulations of breathing frequency, and Gaussian noise is added (using known characteristics) to produce a data set with similar sampling, noise, and kinetic characteristics as experimentally obtained V̇o_{2P} data, but where the underlying V̇o_{2P} kinetic parameters are known (33). In addition, the same clean trace can be randomly resampled and new noise added to produce further noisy data sets (but all with the same underlying kinetic parameters), analogous to obtaining experimental V̇o_{2P} data during repeated bouts of exercise from a single subject. Thus these Monte Carlo methods allow both the precision and accuracy of V̇o_{2P} fitting methods to be systematically assessed.

Computational approaches have been previously applied using a simple delayed monoexponential (19, 20) or a biexponential (10, 33) V̇o_{2P} response generated in silico. However, as the underlying V̇o_{2P} kinetics do not follow a simple mono- or biexponential time course (3, 5, 8), it is necessary to use a validated simulation of V̇o_{2P} kinetics that takes into account how circulatory dynamics modulate the monoexponential V̇o_{2m} response to produce the ϕ1 and ϕ2 V̇o_{2P} responses (8). Such computationally produced data sets can therefore contain the influence of normal variation in the steady states and kinetics of, for example, cardiac output, muscle blood flow, and V̇o_{2m}, to derive a distribution of V̇o_{2P} characteristics (including ϕ1 duration and amplitude, and ϕ2 τV̇o_{2P}), analogous to collecting experimental V̇o_{2P} data from a large number of healthy human subjects.

## METHODS

We used a validated simulation of V̇o_{2} and circulatory dynamic interactions during moderate-intensity cycling exercise in humans (8) that accounts for the vascular capacitances and circulatory dynamics that cause a monoexponential V̇o_{2m} response to manifest at the lungs as a three-phase V̇o_{2P} response, with a cardiodynamic ϕ1, a near-exponential fundamental ϕ2, and a steady-state ϕ3. The simulation V̇o_{2P} outputs initially have no noise, so the baseline V̇o_{2P} steady state, ϕ1 duration and amplitude, ϕ2 τV̇o_{2P}, and ϕ3 V̇o_{2P} steady state for each output are precisely known. This allows quantification of both the accuracy and the precision of subsequent fits to the data.

#### Data production.

The minimum required number of Monte Carlo iterations, *n*, was estimated from the central limit theorem (17) using *n* = (*z*_{α/2}σ/ε)^{2}, where *z*_{α/2} is the *z* score associated with significance level α, σ is the estimated SD of the simulation output, and ε is the acceptable margin of error for the simulation output (equal to half the required confidence interval). We set α at 0.05 to give *z*_{α/2} = 1.96; it was assumed that the SD of ϕ2 τV̇o_{2P} (our parameter of interest) produced by stochastic simulations would be 4.3 s [based on the experimental data used to parameterize the simulations (8, 22)]; and the acceptable margin of error was set at 0.1 s (the same as the simulation time resolution). This predicted a minimum iteration number of = 7,104; we therefore performed 10^{4} iterations during the Monte Carlo simulations.

We examined two protocols for a step increase in work rate (WR), both constrained to be within the moderate-intensity exercise domain: the first from unloaded pedaling (UP-WR) and the second from a raised baseline (WR-WR). For each of these two protocols, 10^{4} clean (time resolution = 0.1 s) V̇o_{2P} simulations, each with different kinetics, were produced (see Fig. 1*A* for an example). The start of the step increase in WR was set to *t* = 0 s. Simulation input parameters were varied stochastically (43) using distributions taken from the data of Grassi et al. (22) and Benson et al. (8) (Table 1). This provided simulations with normal physiological variation in, for example, baseline V̇o_{2P}, V̇o_{2P} gain (ΔV̇o_{2P}/ΔW), the relative increase in cardiac output (ΔQ̇_{m}/ΔV̇o_{2m}), and the kinetics of cardiac output and V̇o_{2m} (τQ̇_{m}/τV̇o_{2m}). Parameter sets that resulted in venous O_{2} concentration dropping to zero at any point during the simulated exercise transient were discarded, and a new parameter set was generated.

Each of these 2 × 10^{4} clean traces (one set of UP-WR, and one set of WR-WR simulations) was then sampled at a variable breathing frequency. The sampling interval was based on the relationship between breathing frequency (bf) and V̇o_{2P} in data collected during moderate-intensity exercise in our laboratory, and was given by bf(*t*) = 8 × V̇o_{2P}(*t*) + 8. Gaussian noise with an SD of 0.25 × bf(*t*) was subsequently added to this interval (11, 28), with the noise constrained to be no greater than 2 SDs to avoid unphysiologically large intervals between sampled “breaths.”

We then added Gaussian V̇o_{2P} noise to each “breath”: the SD of this noise distribution was randomly sampled for each clean trace from a Gaussian distribution with a mean of 67.96 ml/min and an SD of 25.54 ml/min [calculated from the individual values reported in Lamarra et al. (33) and Rossiter et al. (47); *n* = 22], with the obtained value constrained to be within 2 SD of the mean, to avoid data sets that were unphysiologically noisy.

These procedures produced, from the clean simulation output, a trace with the sampling, noise, and kinetic characteristics observed in experimentally collected data (see Fig. 1*B* for examples). For all 2 × 10^{4} clean simulations, this sampling and noise procedure was performed 10 times to simulate 10 bouts of exercise repeated by a single subject (see Fig. 1, *A* and *B*, for examples). At the end of this Monte Carlo procedure, we therefore had 10^{4} noisy UP-WR data sets, i.e., 10^{4} “subjects,” each with different physiological characteristics, who performed moderate-intensity step exercise from unloaded pedaling: each data set contained 10 noisy traces from separate “exercise bouts,” i.e., each subject performed the same WR protocol 10 times. A further 10^{4} noisy WR-WR data sets, with each data set again containing 10 traces from separate exercise bouts, were produced. Thus a total of 2 × 10^{5} simulated moderate-intensity “exercise bouts” in 2 × 10^{4} “subjects” were produced, which sampled the normal variation of key parameters observed in healthy young humans. Note that, despite the sampling and noise procedure used to produce the data, the true underlying kinetic characteristics of any given noisy trace were known from the kinetics of the original clean simulation from which it was produced.

#### Data processing.

Outlying breaths were first removed by fitting *Eq. 2* to the noisy traces and removing breaths that lay further than 3 SDs away from the local mean (i.e., outside the 99.7% prediction bands of the fit) (33). For each data set, we used the following data processing techniques, covering a range of commonly used or potentially useful methods, to process up to 10 bouts of noisy data (see Fig. 1 for examples): *1*) interpolation of each bout to 1-s intervals before ensemble averaging across bouts (“interpolated”); *2*) time alignment of data from the bouts to be averaged, before bin averaging into bins whose size depends on the number of bouts being averaged (“binned”); *3*) superimposition, or stacking, of the data from different bouts, with no further interpolation or averaging (“stacked”); *4*) fitting of individual bouts (see below) followed by averaging of fit parameters across bouts (“separate”).

#### Data fitting.

For each processed V̇o_{2P} trace, we fit the biexponential *Eq. 1* to the entire ϕ1 and ϕ2 data, and used the following strategies for identification of the ϕ1–2 transition and subsequently fit the monoexponential *Eq. 2* to the isolated ϕ2 data: *1*) empirical time-removal methods, where 10, 15, 20, 25, or 30 s of data was removed from the beginning of each processed V̇o_{2P} trace; *2*) use of V̇o_{2P} time derivatives on both unsmoothed and smoothed (with a moving 5-breath average) processed data, where the highest derivative of V̇o_{2P} with respect to time during the first 60 s of exercise was taken as the ϕ1–2 transition; *3*) statistical methods to identify the ϕ1–2 transition, where a datum was incrementally removed from the beginning of each data set (until 60 s into exercise) and the remaining data were fit using the monoexponential *Eq. 2*; the reduced chi-squared (), adjusted coefficient of determination (), confidence interval for the time constant (CI_{τ}) and the corrected Akaike information criterion (AICc) were then calculated for each fit (42, 46). The first datum in the fit that returned the minimum statistical value (or maximum for ) was taken as the identified ϕ1–2 transition for that statistical method [see Rossiter et al. (48) for an example using CI_{τ} to identify the ϕ1–2 transition]. For each processed trace we therefore obtained 12 fits to the data: one using the biexponential fit to the entire ϕ1 and ϕ2 data, and 11 using a monoexponential fit to isolated ϕ2 data (five using empirical time removal methods, two using V̇o_{2P} time derivatives, and four using statistical measures). As a control condition, for each processed trace we also fit the true isolated noisy ϕ2 data with *Eq. 2*, i.e., the data were fit beginning at the true first “breath” in ϕ2, known from the clean simulation. Each of these 13 methods provided an estimate of the ϕ1–2 transition (i.e., *TD* from *Eq. 1* when using the biexponential fit, or the identified first breath in ϕ2 when using the monoexponential fits) and an estimate of ϕ2 τV̇o_{2P} (i.e., τ_{2} from fits using *Eq. 1*, or τ from fits using *Eq. 2*). The ϕ1 amplitude (as a percentage of the steady-state response) was estimated from the value of the fit at the identified ϕ1–2 transition. Each of the ϕ1–2 transition, ϕ1 amplitude, and ϕ2 τV̇o_{2P} estimates were then compared with the known true underlying values obtained from the clean simulated V̇o_{2P} trace. These true values represent the most accurate estimates possible of ϕ1 and ϕ2 V̇o_{2P} kinetics.

#### Numerical methods and statistical analyses.

Details of the model used to produce the clean V̇o_{2P} data, along with numerical methods, are given in Benson et al. (8). Because of its unique piecewise nature, *Eq. 1* was fit using a custom direct search method (35), although this precluded calculation of parameter confidence intervals. *Equation 2* was fit using the Levenberg-Marquardt algorithm (43). Values are presented as means ± SD unless otherwise stated. Significant differences between data were tested for using two-sample *t*-tests, or one-way repeated-measures analysis of variance (ANOVA) with Tukey’s post hoc tests, as appropriate. Significance level was set at *P* < 0.05.

## RESULTS

#### Simulation outputs.

Simulation input WR and output V̇o_{2P} characteristics are summarized in Table 2. Time of the ϕ1–2 transition was significantly different between UP-WR and WR-WR simulations (19.8 ± 3.4 vs. 15.9 ± 3.4 s, respectively; *P* < 0.05, *t*-test), as was ϕ1 amplitude (reported as percentage of the steady-state response: 28.2 ± 8.3% vs. 28.9 ± 7.7%, respectively; *P* < 0.05, *t*-test) and ϕ2 τV̇o_{2P} (22.4 ± 7.2 vs. 25.0 ± 7.2 s, respectively; *P* < 0.05, *t*-test). These different V̇o_{2P} characteristics from UP-WR and WR-WR protocols can be explained by the increased baseline cardiac output associated with starting an exercise transition from a raised WR: muscle-to-lung transit time is shortened, reducing ϕ1 duration (3), and the altered blood flow during the exercise transient modifies the association between muscle and pulmonary V̇o_{2} kinetics (8). The Monte Carlo simulation output data (10^{4} clean UP-WR traces and 10^{4} clean WR-WR traces, along with the corresponding 2 × 10^{5} noisy traces, and details of the input and output characteristics for each simulation) are available from the corresponding author upon request.

The results below present in detail the findings for UP-WR simulations. The key differences between UP-WR and WR-WR simulations are then presented. For the sake of brevity, we present only data pertinent to our significant findings.

#### Number of averaged exercise bouts.

Figure 2 shows the effects of averaging exercise bouts on the precision and accuracy of ϕ2 τV̇o_{2P} estimation (generally the parameter of most interest) during UP-WR simulations. For this example, data from different bouts were interpolated to 1-s intervals then ensemble averaged (see *Averaging methods* below), and fitting was made beginning at the known first breath in ϕ2 (i.e., control fits). Qualitatively similar results were found for the other averaging and fitting methods. The mean and SD of the estimated ϕ2 τV̇o_{2P} are shown in Fig. 2*A*, and example distributions of the estimated ϕ2 τV̇o_{2P} for 1, 4, and 10 exercise bouts are shown in Fig. 2*B*. The ϕ2 τV̇o_{2P} estimates obtained by averaging 1, 2, or 3 bouts were significantly greater than using 10 bouts (*P* < 0.05, ANOVA; there was no difference when averaging 4–9 bouts; Fig. 2*A*). This indicates that precision and accuracy of ϕ2 τV̇o_{2P} estimation is not statistically improved by averaging data from more than four bouts of exercise.

Figure 2, *A* and *B*, demonstrates that τV̇o_{2P} tends to be overestimated on average by ~2 s, irrespective of the number of bouts averaged: mean difference between estimated and true τV̇o_{2P} was 1.92 ± 4.24 s with 1 bout, 1.68 ± 2.06 s with 4 bouts, and 1.62 ± 1.37 s with 10 bouts. Figure 2*C* shows the percentage of estimated ϕ2 τV̇o_{2P} values that lay within ±2 s of true. Using data from a single exercise bout, the estimated ϕ2 τV̇o_{2P} was within 2 s of the true value in only 41.3% of cases. When 4 bouts were averaged, the percentage of estimated values within 2 s of the true value increased to 53.0%, even when the first breath in ϕ2 is known precisely (see also *Data fitting and kinetic characterization*). The asymptote of this relationship is 62.0% (Fig. 2*C*), indicating that the maximum probability of returning a ϕ2 τV̇o_{2P} estimate within 2 s of true is 62%, even when the first breath in ϕ2 is known and no matter how many bouts are averaged.

#### Averaging methods.

Figure 3*A* shows the effects on ϕ2 τV̇o_{2P} estimation of the different averaging methods during UP-WR simulations. For the example shown, data from four exercise bouts were averaged and fitting was from the known first breath in ϕ2 (i.e., control fits). Qualitatively similar results were found for other numbers of averaged bouts and for the other fitting methods. Each averaging method returned significantly different ϕ2 τV̇o_{2P} estimates (*P* < 0.05, ANOVA), although the mean ϕ2 τV̇o_{2P} values obtained using the interpolated, binned, and stacked averaging methods were quantitatively very similar, being within 0.1 s of each other (i.e., within the acceptable margin of error set for our Monte Carlo simulations). Mean ϕ2 τV̇o_{2P} estimation with the interpolation method was 1.68 ± 2.06 s from true (53.0% of values within ± 2 s of true), compared with 1.76 ± 2.17 s (50.7%) for binned, 1.72 ± 2.13 s (51.4%) for stacked, and 2.04 ± 2.34 s (46.9%) for separate. The distribution of the confidence intervals of the estimated ϕ2 τV̇o_{2P} are shown in Fig. 3*B*. Each averaging method returned a significantly different confidence interval distribution (*P* < 0.05, ANOVA), although the confidence interval distributions for the binned and stacked averaging methods were quantitatively similar (the difference between the means of these two distributions was 0.14 s).

#### Data fitting and kinetic characterization.

Figures 4–6 compare the different methods for estimating the ϕ1–2 transition (Fig. 4), and the subsequent estimation of ϕ1 amplitude (Fig. 5) and ϕ2 τV̇o_{2P} (Fig. 6), during UP-WR simulations. In Figs. 5 and 6, the distributions of ϕ1 amplitude and ϕ2 τV̇o_{2P} estimates obtained from control fits (i.e., fits from the known first ϕ2 breath) are shown as dashed curves. The examples shown use data from four bouts averaged using the interpolation method, although qualitatively similar results were found for other numbers of averaged bouts and for the other averaging methods. Only removal of the first 20 s of data (Figs. 4*B*, 5*B*, and 6*B*) resulted in the accurate identification of the first breath in ϕ2, and ϕ1 amplitude and ϕ2 τV̇o_{2P} values that were not significantly different from the control fits; all other methods were significantly different from true (*P* < 0.05, ANOVA). Using this empirical 20 s removal method, the identified ϕ1–2 transition was within ±2 breaths of true in 99.3% of cases, estimated ϕ1 amplitude was within ±5% of true in 32.6% of cases (vs. 34.2% with control fits), and estimated ϕ2 τV̇o_{2P} was within ±2 s of true in 46.5% of cases (vs. 53.0% with control fits). Although the biexponential fitting method (Figs. 4*A*, 5*A*, and 6*A*) returned the second best estimates of the ϕ1–2 transition (93.8% of estimates within ±2 breaths of true), the overparameterization of the model resulted in less accurate and precise ϕ2 τV̇o_{2P} estimates (only 32.0% of estimates within ±2 s of true) than both the empirical 15 s and 25 s removal methods (37.9% and 37.6%, respectively) (Figs. 4*B*, 5*B*, and 6*B*). Interestingly, removal of 15 s of data (i.e., including some ϕ1 data in the fit) gave more accurate and precise ϕ1 amplitude and ϕ2 τV̇o_{2P} estimates than removal of 25 s of data (i.e., excluding the initial portion of ϕ2 data). Basing ϕ1–2 identification on time-derivative or statistical methods resulted in skewed distributions (Fig. 4, *C* and *D*), and ϕ1–2 transition, ϕ1 amplitude, and ϕ2 τV̇o_{2P} values that were furthest from true (Fig. 5, *C* and *D*, and *Fig. 6, C* and *D*).

#### Optimal protocol.

Having identified that removal of the first 20 s of data, followed by a monoexponential fit to the isolated ϕ2 data, was the optimal fitting method for UP-WR transitions, we repeated the previous analyses that were performed on the control, i.e., known ϕ2, data (as shown in Figs. 2 and 3) using this empirical 20 s removal fitting method (Fig. 7). Qualitatively, the results were identical, in that four averaged bouts provided no more accuracy and precision than 10 averaged bouts, and the interpolated averaging method gave the most accurate and precise ϕ1–2 transition, ϕ1 amplitude, and ϕ2 τV̇o_{2P} estimates, that were not significantly different from the control fits. Quantitatively, the mean estimate of the ϕ1–2 transition was 0.06 ± 0.85 breaths from true, with 99.3% of values within ±2 breaths of true; the mean ϕ1 amplitude estimate was 6.63 ± 10.61% from true (vs. 6.65 ± 4.46% from true with control data), with 32.6% of values within ±5% of true (vs. 34.2% with control fits); and the mean ϕ2 τV̇o_{2P} estimate was 1.97 ± 2.08 s from true (vs. 1.68 ± 2.06 s from true with control data), with 46.5% of estimates within ±2 s of true (vs. 53.0% with control fits). Again, the binned and stacked averaging methods gave very similar (but slightly less precise and accurate) ϕ2 τV̇o_{2P} estimates to the interpolated method: 2.00 ± 2.19 and 1.98 ± 2.16 s from true, respectively. Using the optimal methods, the asymptote of the exponential fit to the proportion of ϕ2 τV̇o_{2P} estimates within ± 2 s across all numbers of averaged bouts (Fig. 7*C*) was 51.3%.

#### WR-WR simulations.

The analyses performed for the UP-WR simulations (Figs. 2–7) were repeated for the WR-WR simulations, where “exercise” was initiated from a raised baseline WR between 0 and 100 W. These analyses are summarized in Fig. 8. As with UP-WR simulations, averaging of four bouts (Fig. 8, *A–C*), using interpolated, binned, or stacked data, optimized ϕ2 τV̇o_{2P} estimation while minimizing the number of required bouts (Fig. 8, *D* and *E*). However, for WR-WR data, removal of the first 15 or 20 s of data gave statistically similar results to control fits (where the first breath in ϕ2 is known), although quantitatively the removal of 15 s of data gave more precise and accurate estimates of V̇o_{2P} kinetics than removing 20 s of data: 97.2% (with 15 s removal) vs. 93.1% (with 20 s removal) of the ϕ1–2 transition estimates within ±2 breaths of true; 41.5% vs. 16.9% of ϕ1 amplitude values within ±5% of true; and 61.9% vs. 57.6% of ϕ2 τV̇o_{2P} values within ±2 s of true (Fig. 8*F*). Phase 2 τV̇o_{2P} estimation was more accurate for WR-WR data than for UP-WR data: using four interpolated and ensemble averaged bouts with ϕ2 isolated by removal of the first 15 s of data, the mean difference between estimated and known τV̇o_{2P} was 1.04 ± 2.30 s (vs. 1.97 ± 2.08 s with the optimal UP-WR analysis; *P* < 0.05, *t*-test) and the percentage of values lying within ±2 s of the true value was 61.9% (vs. 46.5% with UP-WR data). The asymptote of the exponential fit to these data (Fig. 8*C*) suggested that a maximum of 75.9% of ϕ2 τV̇o_{2P} values would lie within ±2 s of the true value (vs. 51.3% for UP-WR data).

#### Minimally important difference.

The optimal collection, handling, and fitting procedures for UP-WR and WR-WR simulations were used to determine the minimally important difference for significant changes in τV̇o_{2P} during moderate-intensity exercise. Table 3 shows that the 95% confidence limits of τV̇o_{2P} estimation narrows from 8.25 to 4.08 s for UP-WR, and from 9.43 to 4.51 s for WR-WR, as the number of bouts averaged is increased from 1 to 4. These data propose a minimal important difference of ~5 s to detect differences in τV̇o_{2P} among groups or within individuals for comparative or interventional studies.

#### Robustness of Monte Carlo simulations.

To confirm the robustness of the Monte Carlo simulations, the entire data production procedure was repeated (i.e., a second set of 10^{4} UP-WR and 10^{4} WR-WR clean simulations was produced, and noise was added to each trace 10 times, to give 2 × 10^{5} noisy traces) and these data were analyzed as described above. There were no differences in the key findings with this second set of simulations (data not shown). As with the original Monte Carlo data, the output data from this second set of Monte Carlo simulations (2 × 10^{4} clean and 2 × 10^{5} noisy traces, along with simulation input and output characteristics) are available from the corresponding author upon request.

## DISCUSSION

We used a validated computational model together with a Monte Carlo approach to produce 2 × 10^{5} simulated V̇o_{2P} data sets with similar sampling, noise, and kinetic characteristics as experimentally obtained V̇o_{2P} data. As the true underlying V̇o_{2P} kinetic parameters of these data sets were known from the clean simulation traces from which they were produced, we could assess both the accuracy and the precision of various averaging and fitting procedures on the estimation of τV̇o_{2P}; something that is not feasible using experimentally obtained data where the true underlying τV̇o_{2P} is not known. We showed that the optimal data handling steps to give the most accurate and precise estimation of τV̇o_{2P} were linear interpolation with ensemble averaging data from four bouts of exercise, followed by removal of the first 20 s (if exercise was from unloaded pedaling) or 15 s (if exercise was from a raised work rate) of data before monoexponential fitting of the isolated ϕ2 data. Variations on the averaging method led to substantially similar results, with the exception that the confidence interval for kinetic estimation was significantly wider for the technique of independently fitting repeats of the same exercise transition (the separate method). This suggests that different data processing techniques currently used among different laboratories is unlikely to substantially influence the derived parameters. However, it is of note that even the optimal procedures that we identified yielded τV̇o_{2P} estimates that were within 2 s of true in just 47% of simulations from unloaded pedaling, rising to only 62% for protocols where exercise started from a raised work rate.

#### Data collection.

The simulated data of exercise transitions either from unloaded pedaling or from a raised work rate spanned a wide range of variable and parameter estimates expected for subLT exercise (Table 2). Simulated ϕ1 duration ranged from 7 to 30 s and was 9% to 72% of the steady-state response in amplitude, and simulated ϕ2 τV̇o_{2P} spanned ~7 to 40 s, across transitions ranging from 50 to 150 W in amplitude, making our findings widely generalizable to the study of moderate-intensity V̇o_{2P} kinetics in healthy adults. We showed that averaging data from four exercise bouts optimized accuracy and precision of τV̇o_{2P} estimation, while minimizing experimental burden, regardless of the averaging or fitting methods subsequently used. Averaging more bouts did not give a significantly more precise or accurate estimation of τV̇o_{2P}. Some investigators may be willing to accept lower accuracy and precision in τV̇o_{2P} estimation to reduce the testing burden of four exercise bouts. For example, interpolating and averaging three bouts of UP-WR exercise, and removing 20 s of data to isolate ϕ2, resulted in τV̇o_{2P} estimations that were 2.00 ± 2.39 s from true, with 45.0% of these estimations within 2 s of true, a relatively small reduction in accuracy and precision compared with the same data handling method with four exercise bouts (1.97 ± 2.08 s and 46.5%). These differences are associated with an increase in the minimal detectable difference for τV̇o_{2P}, e.g., for use in comparative and interventional studies, from ~5 to ~6 s. The data shown in Table 3 can be used to inform such decisions.

Our four-bout data collection recommendation is only applicable to data that have similar breath-by-breath fluctuation characteristics as the data produced in our simulation studies (68 ± 26 ml/min). Nevertheless, our simulated transitions mimicked very well typical observations using many standard gas exchange measurement approaches. Our findings indicate that to provide more precise estimations of τV̇o_{2P} from experimental data, strategies should focus not on averaging additional exercise bouts, but on increasing the signal-to-noise ratio in the collected data. These findings echo those of Lamarra et al. (33), who also used a Monte Carlo approach to show that increasing V̇o_{2P} noise, expressed as a percentage of the steady-state change in the V̇o_{2P} response, increased the confidence intervals for the estimated fit parameters (ϕ1 duration and ϕ2 τV̇o_{2P}). We showed that approaches that increase the signal-to-noise ratio have a substantial effect on precision, but little effect on accuracy, of kinetic estimates. These fluctuations are expected to arise from the interaction of a number of variables, not least the breath-by-breath variations in tidal volume and pulmonary blood flow, within which fluctuation and timing of stroke volume and thoracic pressure changes may variably sum or counteract one another to give rise to fluctuations in gas exchange. Therefore, algorithms for breath-by-breath gas exchange measurement that reduce the inherent fluctuation of the data, e.g., by accounting for changes in alveolar gas storage, or by recharacterizing a breath to be equal to a tidal breathing cycle that returns to an identical end-expiratory lung volume (6, 13), would be expected to further reduce the testing burden while maintaining optimal precision and accuracy of kinetic estimates.

#### Data processing.

Although there are many possible methods for data averaging, the four techniques examined in this study (interpolation, binning, stacking, and separate fitting) provide a cross-section of the most commonly used methods. Although we have identified linear interpolation followed by ensemble averaging as the optimal method for averaging data [similar to the findings of Keir et al. (26)], both the breath binning and stacking methods produced quantitatively similar estimates of τV̇o_{2P}. As such, researchers who have previously used, or currently use, any of these methods should be confident that their choice of averaging procedure does not unduly influence their estimates of τV̇o_{2P}. Although averaging of the exponential fit parameters from separate bouts of exercise offers the simplicity of avoiding potentially complicated and assumption-laden averaging procedures on large data sets, τV̇o_{2P} estimation using this averaging method reduced accuracy and markedly lessened the confidence in the derived parameter estimates and should therefore be avoided. This likely arose because the influence on τV̇o_{2P} of breath-by-breath fluctuations is nonlinear: large “noise” in the early transient has more influence on τV̇o_{2P} than the same “noise” in the later transient (57). Therefore, data handling approaches that first reduce breath-by-breath fluctuations and then characterize the fit (rather than the other way around) appear to result in more robust parameterization of the kinetics.

Another cautionary note is evident in our data for the interpolation method of averaging. This method appears to return a substantially narrowed confidence interval for τV̇o_{2P} estimation (Figs. 3*B*, 7*E*, and 8*E*). However, because the confidence interval is dependent on the number of samples (i.e., breaths), interpolation artificially increases the sampling frequency of the original data. The interpolation method therefore returns an artificial confidence interval that is more dependent on the characteristics of the interpolation than on the original measurements (21). The true confidence interval of parameter estimation for the interpolation method is likely better reflected in the binned and stacked methods (Fig. 3*B*), which were substantially similar across all simulations.

Each data processing method investigated resulted in a similar degree of accuracy around the true value, and therefore approaches to data processing should focus on attempts to optimize the confidence of parameter estimation. As with data collection, valid and appropriate processing methods that reduce breath-by-breath fluctuations in the data will result in increased confidence.

#### Data fitting.

We found that empirical time removal methods to isolate the ϕ2 data for fitting resulted in significantly more accurate and precise estimations of τV̇o_{2P} than either a biexponential fit, or statistical and time-derivative methods to identify the ϕ1–2 transition followed by a monoexponential fit to the isolated ϕ2 data. The majority of published experimental studies that have quantified the kinetics of V̇o_{2P} have used such empirical time removal methods (usually removing the first 20 s of data), and so researchers have historically used the ϕ2 isolation method that we have now shown provides the most accurate and precise estimations of τV̇o_{2P}. Furthermore, this empirical time removal approach is far simpler to implement than the biexponential, statistical, or time-derivative methods. Previous recommendations have been to remove at least 20 s of data from the beginning of the data set to completely remove ϕ1 data, even though some data from the start of ϕ2 may also be removed (7, 57). However, our results suggest that, somewhat counterintuitively, it is better to include a small amount of data from the end of ϕ1 in the fitting procedure than exclude data from the start of ϕ2. This is seen in Figs. 5*B* and 6*B*, where ϕ1 amplitude and ϕ2 τV̇o_{2P} estimation for exercise from unloaded pedaling was more precise and accurate when the initial 15 s of data was removed than when the initial 25 s of data was removed (the true ϕ1–2 transition for these data occurred at 19.5 ± 3.3 s). We suggest that this is because the inherent fluctuations in the V̇o_{2P} data means that including a small amount of ϕ1 data in the fit has minimal effect on the resultant ϕ1 amplitude and ϕ2 τV̇o_{2P} estimation. The rapidly changing initial portion of ϕ2 data (which changes rapidly with respect to the breath-by-breath fluctuations at the end of ϕ1) is key to obtaining accurate and precise estimations. Qualitatively similar results were found for exercise that started from a raised work rate, but here the best τV̇o_{2P} estimation was with the removal of the first 15 s of data (Fig. 8*F*). This is likely due to the increased baseline work rate elevating cardiac output, which reduces muscle-to-lung blood transit times and, therefore, the cardiodynamic ϕ1 duration. Nevertheless, the accuracy and precision of τV̇o_{2P} estimation was statistically similar for WR-WR transitions when either 15 or 20 s of data were removed. We therefore recommend that researchers err on the side of caution when isolating ϕ2 V̇o_{2P} data and remove no more than 20 s of data to optimize τV̇o_{2P} estimation.

#### Implications for interpretation of ϕ2 V̇o_{2P} kinetics.

There are two significant findings from our simulations that have implications for interpretation of ϕ2 V̇o_{2P} kinetics. First, we found that, on average, ϕ2 τV̇o_{2P} was overestimated in all the data collection and handling strategies investigated. This overestimation can be explained, at least in part, by the two-phase V̇o_{2P} response and the nonexponentiality of ϕ2 (3, 5, 8, 25; cf. 18). Figure 9 shows the effects on ϕ2 τV̇o_{2P} estimation when the monoexponential *Eq. 2* is fit to clean simulation output data from different points throughout the V̇o_{2P} response. If the monoexponential fit is started during ϕ1 (i.e., from any point before 19.4 s in this example), then the estimated ϕ2 τV̇o_{2P} is larger than true, due to the inclusion of some ϕ1 data in the fit. If the fit is started after the ϕ1–2 transition, then the ϕ2 τV̇o_{2P} estimation is also larger than true, becoming larger as the fit is started further from the ϕ1–2 transition, because the underlying ϕ2 response is not a pure monoexponential; it initially increases more rapidly than a monoexponential before slowing down as it reaches the steady state (8). Only a fit that starts exactly at the ϕ1–2 transition returns the true ϕ2 τV̇o_{2P}. For these clean simulated data, inaccurate identification of the ϕ1–2 transition by just 2 s can result in a ϕ2 τV̇o_{2P} estimation that is 1.6 s larger than the true value; the influence of noise in experimentally obtained data may exacerbate this error. Because of these effects on ϕ2 τV̇o_{2P} estimation, when using the identified optimal data processing and fitting procedures, we were only able to estimate ϕ2 τV̇o_{2P} to within 2 s of true in 47% of the 10^{4} UP-WR simulations, and in 62% of the 10^{4} WR-WR simulations [2 s represents an effect size of ~10% for a healthy young human, where τV̇o_{2P} is typically ~20 s (45)]. Extrapolating this analysis further, we calculated the 95% confidence limits of our τV̇o_{2P} estimate distributions (as shown in Fig. 7, *B* and *D*, and Fig. 8. *B*, *D*, and *F*); τV̇o_{2P} estimates from outside this confidence interval are statistically likely to come from a different distribution/population. These 95% confidence limits, for τV̇o_{2P} estimates using our predetermined optimal data processing and fitting procedures, are ±4.08 and ± 4.51 s from the mean, for transitions from unloaded pedaling or a raised work rate, respectively (Table 3). We therefore propose that the minimally important difference for a significant change in τV̇o_{2P}, e.g., during interventional and comparative studies, should be 5.0 s. If the number of averaged bouts is reduced from the optimum of four, this minimally important difference should be increased in accordance with the confidence limits shown in Table 3.

The second implication for interpretation of τV̇o_{2P} from our data is to question whether an exponential fit should be used at all. We have previously shown that the dynamics and mixing of circulatory compartments between muscle and lung distort the approximately exponential muscle V̇o_{2} kinetics into a nonexponential ϕ2 V̇o_{2P} response at the lung (8). A recent meta-analysis of available data measuring both muscle and lung V̇o_{2} kinetics during cycling and knee extension exercise demonstrates a wide variability of τV̇o_{2} between muscle and lung (27). Some have proposed alternative methods to assess kinetic responses, such as the time to steady state (45). However, such approaches have been demonstrated to be both inherently more variable than relying on a method that maximizes the utility of available non-steady-state data (24, 47) and is conceptually flawed on the basis that the time to steady state of a nonexponential process is continually changing (8). Alternative approaches to kinetics estimation using, for example, pseudorandom binary sequence exercise testing and time-series analysis may allow for muscle τV̇o_{2} to be resolved by alternative methods (24, 31). It remains to be determined whether such methods provide increased accuracy for noninvasive estimation of muscle V̇o_{2} kinetic responses compared with ϕ2 τV̇o_{2} estimation by repeated step transitions. Our simulations here demonstrate that a monoexponential fit to ϕ2 V̇o_{2P} is a useful and concise method for accurately describing the overall kinetics of the exponential-like pulmonary ϕ2 V̇o_{2} kinetic response.

#### Limitations.

The means and SDs of the parameters used in our Monte Carlo simulations were representative of healthy young adults (8, 22). Quantitatively different results may be found for other populations with different ϕ2 V̇o_{2P} kinetic parameters, such as the elderly or heart failure patients who have slowed V̇o_{2P} kinetics (9, 39). Nevertheless, our main qualitative findings will still be pertinent when collecting, processing, and fitting V̇o_{2P} data from these other populations. In particular, our main point regarding optimal data collection and processing methods—that methods should be employed to minimize breath-by-breath fluctuations and that it is essential to include all ϕ2 V̇o_{2P} data in the fit—will more than likely stand for these populations, as it is still expected that the (potentially slowed) initial portion of ϕ2 V̇o_{2P} will change rapidly with respect to the noise in the data at the end of ϕ1.

For populations where individuals are expected to have a reduced cardiac output and slowed cardiac output kinetics, and a concomitant prolongation of ϕ1 duration compared with young healthy adults [such as heart failure patients (52)], the use of a biexponential fit, or statistical or derivative methods, to automatically identify the ϕ1–2 transition is inherently attractive. However, our results highlight that the noise in the V̇o_{2P} data limit the ability of these methods to correctly identify the ϕ1–2 transition, reducing the accuracy and precision of subsequent τV̇o_{2P} estimation. In this study, the empirical time-removal methods (removal of the first 20 s of data for exercise from unloaded pedaling, or 15 s if exercise was started from a raised baseline) were the only methods that gave statistically similar τV̇o_{2P} estimates to control fits, despite ϕ1 duration ranging from 7 to 39 s across all simulations. It remains to be determined whether removal of the first 20 s of data results in the most accurate and precise τV̇o_{2P} estimations for populations where ϕ1 is prolonged, but it may be necessary to compensate for the prolonged ϕ1 duration when removing ϕ1 data from the fitting window.

Only on-transient exercise in the moderate-intensity domain was simulated in this study. It is still to be determined whether the identified optimal fitting procedures will produce the most accurate and precise τV̇o_{2P} estimations for on-transient data in higher exercise intensity domains where fitting can be complicated by the emergence of a V̇o_{2P} slow component (40, 45). Similarly, the applicability of our identified optimal procedures for off-transient data, where cardiac output is expected to be initially elevated and so produce a much shorter ϕ1, potentially influencing the amount of data that should be removed before fitting, is still to be determined.

#### Conclusions.

We used a validated computational model together with a Monte Carlo approach to assess the accuracy and the precision of various averaging and fitting procedures on the estimation of V̇o_{2P} kinetics. Our analyses showed that four bouts of exercise was the optimal number to average to increase accuracy and precision of τV̇o_{2P} estimation. Choice of averaging strategy was not so critical, with interpolation, bin averaging, and stacking all giving quantitatively similar τV̇o_{2P} estimates. The interpolation, binning, and stacking methods did, however, allow more confident parameter estimates when compared with analyzing repeated bouts separately. Data collection and processing strategies should therefore focus on increasing the signal-to-noise ratio in the collected data. Contradictory to previous advice that suggests removal of at least 20 s of data to isolate ϕ2 V̇o_{2P} before fitting, our analyses show that data fitting procedures should remove no more than 20 s of data, as this provided the most precise and accurate estimates of τV̇o_{2P}. Our analyses showed the widely used standard approaches for data collection, processing, and fitting, although often different between laboratories, did not have a substantial effect on the quantitation of ϕ2 V̇o_{2P} kinetics per se. However, we found that even this optimal procedure yielded τV̇o_{2P} estimates that were within ±2 s of true in only 47–62% of simulations. Thus we identified the minimally important difference for τV̇o_{2P} for use in interventional and comparative studies to be 5 s.

## GRANTS

This work was supported by a Biotechnology and Biological Science Research Council UK research grant (BR/I00162X/1) and a University of Leeds International Research Collaboration Award.

## DISCLOSURES

Conflict of interest statement: No conflicts of interest, financial or otherwise, are declared by the authors.

## AUTHOR CONTRIBUTIONS

A.P.B., T.S.B., C.F., S.R.M., and H.B.R. conceived and designed research; A.P.B. performed experiments; A.P.B. analyzed data; A.P.B., T.S.B., C.F., S.R.M., and H.B.R. interpreted results of experiments; A.P.B. prepared figures; A.P.B. drafted manuscript; A.P.B., T.S.B., C.F., S.R.M., and H.B.R. edited and revised manuscript; A.P.B., T.S.B., C.F., S.R.M., and H.B.R. approved final version of manuscript.

- Copyright © 2017 the American Physiological Society

Licensed under Creative Commons Attribution CC-BY 3.0: © the American Physiological Society.