|
|
||||||||
Laryngeal and Speech Section, National Institute of Neurological Disorders and Stroke, Bethesda, Maryland 20892
Submitted 28 January 2004 ; accepted in final form 30 April 2004
| ABSTRACT |
|---|
|
|
|---|
0.0005) and task (P = 0.034) were found on the r (transformed to Fisher's Z') values. All of the posterior cricoarytenoid recordings related significantly with vocal opening, whereas CT activity was significantly correlated with opening only during sniff. The TA and lateral cricoarytenoid activities were significantly correlated with vocal fold closing during cough. During speech, the CT and TA activity correlated with both opening and closing. Laryngeal muscle patterning to produce vocal fold movement differed across tasks; reciprocal muscle activity only occurred on cough, whereas speech and sniff often involved simultaneous contraction of muscle antagonists. In conclusion, different combinations of muscle activation are used for biomechanical control of vocal fold opening and closing movements during respiratory, airway protection, and speech tasks. correlation; electromyography; larynx; nasolaryngoscopy; muscle activation; intrinsic laryngeal muscles
One approach to determining the role of each of the laryngeal muscles in vocal fold movement is to use strings or force application to emulate the action of muscle contraction in excised larynges (3, 27). Considerable strides have been made in understanding the biomechanics of vocal fold vibration by using excised larynges (53). Such studies, however, may not accurately predict the biomechanical effects of intact, normal human intrinsic laryngeal muscle contraction because simultaneous forces generated by contractions of multiple muscles could alter the resultant movement. Furthermore, no information on the central nervous system control is available from these studies.
In vivo studies, therefore, relating muscle activation to movement or biomechanical force may be more representative of the use of the laryngeal muscles in humans (1012). Several studies have focused on the roles of the intrinsic musculature in respiration (6, 2830, 34, 3640, 42, 5658). Little attention, however, has focused on understanding the dynamic control of laryngeal opening and closing for different tasks, such as phonation onset and offset, swallowing, cough, etc.
Different central nervous system patterning of laryngeal muscle activation may be used for respiration, deglutition, and speech, because the motor control demands on movement speed, extent, and precision may differ between tasks. Speech production requires the most rapid and precise control of vocal fold movement in the larynx; the changes in muscle activation during speech, in contrast with cough, for example, are small and rapid (46). As speakers produce syllables with voiceless consonants and vowels, voice onset and offset are controlled by several different gestures (43). These include the following: hyperadduction to offset voice, a gesture known as a glottal stop /?/, which is often used to mark word boundaries; vocal fold partial opening for voice offset to produce voiceless consonants such as /h/ and /s/; full adduction or closing to onset phonation following inspiration; and partial adduction to produce voice onset following a voiceless consonant, as in the syllable /si/.
Observations of the relationship between EMG patterns and speech sound production have assumed that similar movements occurred during each utterance (17). Speakers may approach the production of some gestures differently from other speakers, and movements may vary from one production to another within a speaker. EMG recordings also show a great deal of variation during repetition of the same speech sounds in normal speakers (48).
One interpretation of the variation in EMG patterns during repetition of speech gestures has been that individuals learn to use alterations in subglottal air pressure to contribute to vocal fold opening and closing, making voice changes less dependent on laryngeal muscle activation (48). The first step in examining this hypothesis is to determine whether vocal fold movement is closely related to intrinsic laryngeal muscle activation. If this is the case, then subglottal pressure may not be sufficient for vocal fold opening and closing gestures for some speech sounds.
Data on the dynamic interrelationships between intrinsic muscle activations and different laryngeal configurations would provide valuable insight into laryngeal biomechanics and would aid greatly in our understanding of laryngeal control. Intramuscular EMG recordings can provide practical estimates of muscle activation. The relationship between EMG and muscle force, however, is only well understood for static contractions; no generalized model has been developed to relate EMG to muscle fiber force production during movement. Measured EMG also varies with electrode placement in a muscle, and the EMG patterns may vary between individuals, even when producing the same movement or gestures. Additionally, all of the intrinsic muscles may interact to control vocal fold adduction and abduction during speech (48).
There is an approximately linear relationship between EMG and active isometric muscle force production (2, 5, 7, 26, 31), at least when the correct electromechanical delay is considered (8). Good correlations have been shown between EMG, intramuscular force, and isometric force for arm (33) and shoulder (31) muscles, but the EMG-force relationship is only linear for isometric contractions and depends on velocity, position, and whether the contraction is eccentric or concentric (26). Previous studies have related laryngeal EMG and either photoglottography, which reflects the overall degree of opening of the glottis during speech (44, 47), or the anterior glottal angle measured through nasolaryngoscopy (42). These techniques, however, do not allow matching the movement with muscle activation information for the same vocal fold. This study examined the relationship between the EMG and ipsilateral vocal fold movements.
It is well recognized that there is a delay between muscle activation and the resultant movement (8). The duration of the delay will depend on several factors, including 1) the contraction time of the muscle; 2) the relationship of force generated by the muscle to the ongoing level of activation; 3) the number and size of the motor units recruited; 4) the timing of cocontraction of muscle antagonists; and 5) tissue compliance. The first three factors depend on the characteristics of the particular muscle; therefore, we measured the delay between muscle contraction onset and the movement onset, for each muscle, in each subject, for each task. Then, using the median delays computed for each muscle across all subjects and tasks, we measured the correspondence of muscle EMG and vocal fold movement, by computing a point-by-point Pearson cross-correlation coefficient for each trial.
Another way to determine the delay between muscle contraction and the resulting movement is to calculate the correlation between muscle activation and movement signals at different delays between the onset of the two signals to identify at what delay the correlation is greatest. We designated this as the optimum delay and examined the correspondence between these two approaches.
Given the previous findings regarding the role of the intrinsic laryngeal muscles in speech and other gestures, we proposed the following hypotheses regarding the role of the intrinsic laryngeal muscles in sniffing (maximum vocal fold opening), cough (forceful vocal fold closing), and rapid alternation between partial opening and partial closing for repetition of syllables with voiceless consonants and vowels (e.g., /hihihihi/ or /sisisisi/).
1) The PCA activation is positively correlated with vocal fold abduction, with maximum correlations occurring for rapid and unopposed opening gestures such as sniff.
2) The TA and LCA muscle activations are negatively correlated with vocal fold abduction, with maximum negative correlations occurring for unopposed, ballistic closing gestures such as cough.
3) Activations of both abductors and adductors are less well correlated with vocal fold abduction during speech due to antagonist cocontraction and the possible role of air pressure.
4) The direction of the correlation for CT depends on the gesture being produced, with maximal positive correlation for gestures that require vocal fold lengthening, such as sniff.
| METHODS |
|---|
|
|
|---|
Task training and tasks. Before the study, each subject was trained in performing the muscle verification and laryngeal movement tasks to be performed. The muscle verification tasks included Valsalva produced with the mouth open with a glottal release to demonstrate that closure was only at the glottis; ascending and descending pitch glides on the vowel /i/, prolongation of the vowel /i/ for 35 s, deep respiration, and swallow.
The experimental tasks were selected to provide rapid vocal fold movements that could be easily visualized by using nasolaryngoscopy. Visualization during phonatory tasks was optimized by using the vowel /i/ to move the tongue and epiglottis forward. Tasks were selected that required rapid dynamic vocal fold movement. A sniff produces rapid maximum vocal fold abduction (opening). Rapid repetitions of /hi/ required alternating between a glottal fricative with the vocal folds open for /h/, then vocal fold closure for the vowel /i/, repeated five to seven times on one exhalation. Rapid repetition of /si/ required alternating between partial opening of the vocal folds for the /s/, accompanied by a fricative in the oral cavity followed by rapid closing for the vowel /i/, repeated five to seven times on one exhalation. Cough required a rapid vocal fold closure, buildup of subglottal pressure, followed by a rapid vocal fold opening with expulsion of air. Finally, "sniff-i" included a rapid vocal fold opening for the sniff followed by a rapid closure for the vowel /i/. The initial design of the study included repetition of the vowel /i/ with phonation offset by a glottal stop. However, it was quickly observed that this voice offset was not accompanied by a change in vocal fold position. The vocal folds remained constantly adducted during rapid repetition of /?i/, while the ventricular folds produced further medialization, squeezing the vocal folds further together to offset voice. Because no change in vocal fold position occurred on this task, the correlation between movement and EMG could not be measured, and it was eliminated from the study.
Subjects were trained to produce the speech syllable repetitions at a relatively fast rate, between five and seven repetitions in 12 s, whereas the nonspeech gestures, such as cough and sniff, were produced at
1/s. Subjects were also trained not to shift their voice pitch during performance of any of the tasks to avoid active changes in fundamental frequency or vocal fold tension.
Electrode placements and recording procedures. Hooked wire electrodes were placed in four intrinsic laryngeal muscles: two vocal fold adductors (TA and the LCA), one abductor (the PCA), and the CT. Small subcutaneous injections of 1% Xylocaine with epinephrine (1:100,000) were used to reduce discomfort during electrode insertion. Electrodes were placed with the subject supine in a dental exam chair. We first used a 27-gauge bipolar needle electrode with a 0.015-mm2 recording surface (DISA 13K80, Medtronic) to locate a muscle before inserting bipolar, hooked-wire electrodes. Our aim was to record from each of the muscles on the two sides to relate to movement of the fold on the same side. Placement of electrodes into the PCA muscle requires twisting the larynx to allow access to the posterior larynx (4). This laryngeal movement usually dislocates wires already placed in other muscles; therefore, placement in only one PCA muscle was attempted in each subject before insertion of the other electrodes. The same direction of insertion was used for the TA and the LCA, through the CT membrane and directed superiorly and laterally into the muscle. The LCA placement was more posterior and lateral to position the wires in the portion of the muscle that inserts on the edge of the cricoid cartilage (9). We confirmed proper electrode positioning using the following verification gestures (20). For TA placement, activation increased during a prolonged vowel, throat clear, effort closure (Valsalva), and swallow. Verification gestures for the LCA included increased activation during swallow, throat clear, and effort closure with onset and offset bursts for phonation onset and offset but no sustained activation during prolonged phonation. CT verifications were increased activation during the high end of a pitch glide but no activation during swallow. Absence of activity with head raise or turning indicated no contamination of the CT signals with strap muscle activity. PCA verification consisted of increased activity during deep inhalation but no activation during swallowing or vowel prolongation.
Only recordings meeting the verification criteria were used in the study; consequently, only a subset of muscles was available on each subject. Each of the four muscle types (PCA, CT, TA, and LCA) was available on at least one side in each subject, except the PCA, which met the verification criteria in only three of the four subjects.
Nasolaryngoscopy and recording. Four percent lidocaine was sprayed into one nostril to relieve discomfort during insertion of the Pentax PNL-10RP3 fiber-optic laryngoscope (Pentax Precision Instruments, Orangeburg, NY). The laryngoscope was interfaced with a Pentax camera and a Kay Elemetrics Digital Stroboscope system (Kay Elemetrics, Lincoln Park, NJ) with a halogen light source. The head-held microphone was placed 2 in. from the subject's mouth. Marks on the scope where it exited the nares helped the physician maintain the same scope position throughout the recording. The video and audio signals were recorded at 29.97 frames/s on a Panasonic VHS-s recorder. This yielded two fields of video image per frame, or 59.94 fields/s. A Horita TRG-50 PC produced SMPTE time stamps that were visible in each field of the video image. Speech was recorded with a Sony EM-2 head-held microphone.
Raw, amplified, bipolar, hooked-wire EMG signal recordings were band-pass filtered between 30 and 3,000 Hz on a Nicolet Viking V and output to a multiple-channel FM instrumentation tape recorder (TEAC XR-70), along with the voice signal and the Horita signal. The binary time stamp signal of the Horita was also streamed simultaneously to both the VHS-s and the FM instrumentation recorder, along with the EMG signals, to allow time alignment of the video and EMG signals. Interruptions of the time stamp signal before and after the session facilitated the time alignment. The EMG signals were digitized off-line at 6,000 samples/s, after eighth-order Butterworth anti-alias filtering at 2 kHz, along with the voice signal.
With the fiber-optic nasolaryngoscope in place, recording was started, the time stamp signal interrupted, and then the subject was instructed to imitate the examiner on six tasks: 1) volitional cough/throat clear; 2) sniff; 3) repeating /hi/; 4) repeating /i?/; 5) repeating /si/; and 6) repeating sniff followed by /i/. Not all subjects performed all tasks; tasks 5 and 6 were only conducted in one subject.
We used a Peak Performance (Motus) system to digitize each video field. An experimenter identified four points on each digitized video field (Fig. 1): the anterior commissure (AC), the tip of the vocal process of the left arytenoid, the tip of the vocal process of the right arytenoid, and the midpoint of the posterior glottal wall. Cases where the labeled structures were not clearly visible were not used, with the exception of the midpoint of the posterior wall, which had to be inferred during tight adduction when the vocal folds were closed. The midline of the glottis was defined as the line connecting the AC to the midpoint of the posterior glottal wall. Similarly, lines were defined from the vocal process of the arytenoid on each side to the AC. The angle between the line from the vocal process to the AC and the midline was the angle reflecting the position of that vocal fold. The angle was zero when the vocal process was exactly at the midline. For each video frame analyzed, the Peak Performance software exported the angle values to a text file. A Matlab program was written to read the angle data files and normalize all of the angles to the maximum angle recorded for that subject over the entire session. The angle data were time-aligned with the processed EMG data based on the time stamps. Interruptions in the time stamps at the beginning and end of the sessions were evident in both the data recorded on the FM tape and the video frames, allowing an investigator to time-align the angle data with the EMG data to within one video field sampling interval (±16.68 ms).
|
A delay was expected between the onset of muscle activation and the change in vocal fold position because of contraction time, biomechanical damping, and antagonistic muscle cocontraction. Because this information was not known for these gestures, where many muscles were active simultaneously, we calculated the delays between muscle activation onset and the onset of change in vocal fold angle on the same side during tasks where the muscle was expected to be the prime mover (Fig. 2). The EMG burst onset times were chosen manually with visual inspection, using both the raw and the smoothed EMG for guidance, and without the angle data visible. The primary criterion used was the earliest sign of a sustained increase in the slope of the smoothed EMG (i.e., both the velocity and acceleration of the smoothed EMG were positive). The gestures used to calculate the delays were sniff for PCA and cough and phonation onset for the TA and the LCA. Calculating delays for CT was complicated by the fact that CT rarely acts alone to produce either abduction or adduction. The sniff task showed the greatest unopposed CT activation and was, therefore, used to compute delays for the CT. The median delay across subjects was then computed and used as the delay for all subjects and gestures for that muscle. These median delays were then rounded to the time of the closest video field, making them accurate to ±16.68 ms.
|
We confirmed that the median onset delays described above were appropriate by also calculating the optimum delay that maximized the strength of each correlation between muscle activity and vocal fold angle for each trial. The optimal delay was found by comparing the r values at different delays. These delays were for comparison purposes only; we could not use the optimal delays to compute the correlations that we reported, because they would have biased the correlation coefficient (by violating the assumption of random sampling). The "optimal" delays were the averages of those that maximized the correlations, whereas the "median" onset delays were the medians of those found by comparing the differences in onset times between the EMG and movement signals.
| RESULTS |
|---|
|
|
|---|
|
|
|
0.0005] and task (F = 2.608, df = 2, P = 0.034) but not for muscle side (F = 0.295, df = 1, P = 0.589). Muscle-by-task interaction effects approached significance (F = 1.841, df = 15, P = 0.053), reflecting the variable roles of the TA, LCA, and CT, according to task. Because side did not differ, results from the same muscle on the two sides were combined in further examinations of the results. Each of the r values are displayed in Fig. 4 for each muscle, according to task type. Because the number of points used to calculate the correlations varied from 62 to 213 (mean = 142), the exact value of r required for statistical significance at P = 0.05 varied from trial to trial; however, the approximate correlation values required for significance are indicated by the dashed lines in Fig. 4.
|
| DISCUSSION |
|---|
|
|
|---|
Our third hypothesis, that correlations would be lowest for the speech tasks, was neither fully confirmed nor discounted. Some high correlations occurred for both abductors and adductors during the speech tasks. The TA and LCA correlations were not consistent in magnitude or direction, and the correlations that were all negative varied in magnitude. During voiceless consonant syllables, the LCA and PCA served more as reciprocal pairs, with the PCA correlated with opening (abduction) and the LCA only being significantly correlated with vocal fold closing (adduction), as has previously been reported (23, 43). This pattern can be seen in Fig. 5, where the subject repeatedly produced the syllable /hi/. During the vowels, the CT and TA tended to cocontract, perhaps to control vocal fold stiffness, as has previously been reported during pitch change (21, 54). The variation in direction and degree of correlation during speech suggest that all of the muscles are active and that the resultant movement is the sum of these various forces acting on the cricoarytenoid joint. Thus many combinations of laryngeal activation levels may be associated with the same movement, and vice versa, as has previously been observed (48).
|
A low correlation does not necessarily imply that the muscle was not active during a given task, only that it was not reliably associated with vocal fold adduction or abduction. For example, a series of three coughs appears in Fig. 6. The airflow timing is indicated by the subject microphone acoustic signal. Although the CT was active during cough, its correlation was nearly zero (rCT = 0.007), whereas the correlations of the other muscles were higher (rLCA = 0.511, rTA = 0.200, rPCA = 0.375), possibly because its activity spanned both the closing and opening phase. Others have observed CT activation that was not clearly in phase with either vocal fold abductors or adductors (52), consistent with our results. The PCA correlation, although significant, was also lower during cough than during sniff or speech. This may be because both air pressure and PCA activation contributed to vocal fold opening in these gestures.
|
One of the limitations of using the nasoendoscope for image acquisition was demonstrated by the results of our repeated-/i/ gesture. During this gesture, the vocal processes remained adducted with very little change in position, while the EMG signals fluctuated. Likely three-dimensional imaging of the tract would have shown that the extent of closure was changing in the superior-inferior dimension within the glottis that could not be seen from the nasoendoscope. Spiral computerized tomography imaging of the vocal tract could provide increased information on changes in vocal tract shaping and the superior-inferior extent of movement associated with adductor muscle activity. The use of computerized tomography, however, would require extensive event-related sampling to visualize changes in shape associated with rapid changes in laryngeal muscle activity (45), causing significant radiation exposure in subjects.
Another limitation was the low sampling rate of the video recording, with an intervideo field interval of 16.68 ms. High-speed video recording techniques are available and could be used in future studies to provide better time resolution (22, 24). Because EMG activity is roughly proportional to active muscle force, the EMG should correlate better with the angular acceleration of the vocal processes than with the vocal process angle. Unfortunately, the coarse temporal resolution of the vocal process angle signal that we acquired was not sufficient to allow accurate, stable computations of its derivatives. High-speed imaging at 2,000 images/s would be adequate, and recent technologies have improved image resolution that would now allow identification of the vocal fold processes for tracking. Because of the need for bright illumination, however, most of these systems can only be used with a rigid oral scope, restricting the range of tasks that can be sampled.
Overall, given these limitations, the results from this study are promising. The expected significant correlations between muscle activation and vocal fold position were found. The pattern of correlations differed between tasks, however, demonstrating that vocal fold movement may be achieved somewhat differently for respiratory, airway protection, and speech tasks. This will limit the ability to use the same biomechanical models of muscle activity to predict movement across tasks. Different biomechanical models may need to be developed for different tasks. The central movement patterning for respiration, cough, and speech may be interrelated but independent in the medulla (19, 32, 59). It is not surprising, therefore, that muscle activation patterning for vocal fold movement may differ between these functions.
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
F. C. Holsinger, M. S. Kies, Y. E. Weinstock, J. S. Lewin, S. Hajibashi, D. D. Nolen, R. Weber, and O. Laccourreye Examination of the Larynx and Pharynx N. Engl. J. Med., January 17, 2008; 358(3): e2 - e2. [PDF] |
||||
![]() |
V. M. Henriquez, G. M. Schulz, S. Bielamowicz, and C. L. Ludlow Laryngeal reflex responses are not modulated during human voice and respiratory tasks J. Physiol., December 15, 2007; 585(3): 779 - 789. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |