Abstract
Through perceptual and acoustic experiments, we investigated the production and perception of adverb-oriented ambiguous sentences by native Chinese speakers and Chinese learners from non-tonal language backgrounds. Prosodic features play a significant role in the perception and production of Chinese adverb-oriented ambiguous sentences, specifically manifested as a close correlation between the prosodic features of adverb-oriented ambiguous sentences and sentence information. During perception, native Chinese speakers can identify changes in prosodic patterns under different sentence meanings and interpret sentence information through prosodic features; during production, they can utilize prosodic features to express different meanings of ambiguous sentences. Chinese learners from non-tonal language backgrounds have developed the ability to identify different prosodic patterns of adverb-oriented ambiguous sentences approaching native Chinese speaker levels, and can rely on prosodic features to interpret sentence information to a certain extent; however, they generally struggle to reasonably disambiguate adverb-oriented ambiguous sentences through prosodic features.
Full Text
Preamble
Title: The Role of Prosodic Features in the Perception and Production of Adverbially Ambiguous Sentences in Mandarin
Authors: He Jiarui¹²³, Zhang Ting¹
(¹ College of Chinese Language and Literature, Nanjing Normal University, Nanjing 210097, China)
(² Department of Chinese Language and Literature, Fudan University, Shanghai 200433, China)
(³ International Cultural Exchange School, Fudan University, Shanghai 200433, China)
Abstract: This study investigates the production and perception of adverbial ambiguity in Mandarin Chinese among native speakers and learners whose first language lacks lexical tone. Through both perception and acoustic experiments, we examine how prosodic features contribute to the disambiguation of adverbially ambiguous sentences. The findings demonstrate that prosody plays a crucial role in both the perception and production of such sentences, showing a strong association between prosodic cues and sentence-level meaning. In perception, native speakers can identify prosodic variations corresponding to different interpretations and accurately infer sentence meaning based on prosody. In production, they effectively use prosodic cues to convey intended interpretations. While L2 learners approximate native-like sensitivity to prosodic differences and can partially rely on prosodic information for interpretation, they still face challenges in using prosody effectively for disambiguation.
Keywords: prosodic features, ambiguous sentences, perception, production
Introduction
Prosodic features, also known as suprasegmental features, constitute a phonological structure of language that refers to variations in acoustic parameters such as pitch and duration beyond segmental quality \cite{luo2002, ye2011}. Ambiguity occurs when a linguistic segment can be interpreted with two or more meanings, and common disambiguation methods include phonetic differences, lexical substitution, and syntactic transformation \cite{shao2019}. Mandarin Chinese exhibits various types of ambiguity, among which ambiguity arising from different semantic orientations of intra-sentential constituents is particularly special. Such cases share identical structural relationships and hierarchical levels, relying solely on phonetic differences—that is, prosodic features—for disambiguation. The "adverbial orientation" ambiguity discussed in this paper refers to situations where the semantic connection between an adverb and other constituents in a sentence is multidirectional, resulting in multiple semantic interpretations. Shao (2019) illustrates this phenomenon with the example "我们学校就去了二十几个人" (Only twenty-odd people from our school went). When the adverb "就" (only) points forward to "我们学校" (our school), the sentence implies that many people went; when it points backward to "二十几个人" (twenty-odd people), it suggests that few people went. Similar adverbs include "也" (also), "最" (most), "都" (all), and "才" (only) \cite{ma1988, zhu1982, wang1999, zhou2004, jin2015}.
Previous research has long recognized the interactive relationship between prosodic features and sentence information in adverbially ambiguous sentences. Lü (1999) noted that "都" can express generalization, but its scope may include multiple elements, and in certain contexts, it can specifically emphasize one element, with stress distinguishing these interpretations. Lan (1988) emphasized the important role of "logical stress" in disambiguating "都" sentences. Chen (1994) described the prosodic features of "才" sentences, observing that when "才" points forward, sentence stress falls before the adverb, and when it points backward, stress falls after the adverb. Some scholars have employed experimental phonetics to describe acoustic parameters in detail, finding that oriented constituents in "都" ambiguous sentences exhibit increased duration and fundamental frequency (F0), while constituents in "也" ambiguous sentences show lengthened duration and expanded pitch range under different semantic interpretations \cite{yang2000, wang2005}. Xu and Yang (2010) found that oriented constituents in "才" ambiguous sentences display elevated F0 and duration characteristics. Wang (2005) and Huang (2013) demonstrated through perception experiments that different prosodic patterns of "都" and "也" ambiguous sentences can be largely understood by information receivers.
Existing research has confirmed that native Mandarin speakers can effectively use prosodic means to disambiguate adverbially ambiguous sentences in spoken production, yet studies on L2 learners' prosodic features in comprehending and producing Chinese ambiguous sentences remain scarce. Chinese learners tend to rely only on duration changes, pitch variations, or boundary pauses for disambiguation \cite{zhou2019}, but often exhibit problems such as insufficient duration, missing pitch changes, or incorrect pausing \cite{liu2022}. Their overall pitch range is also narrower \cite{liu2018, luo2022}. Some learners can interpret syntactically ambiguous "都" sentences using prosodic information, but this ability correlates with proficiency level and shows instability—some advanced learners still cannot disambiguate "都" sentences through prosody at the perceptual level. This is closely related to L1 phonetic transfer. As a tonal language, Mandarin's four tones collectively constitute the vertical dimension of intonation range while also influencing its horizontal contour \cite{wen2022}. Mandarin encodes focus through continuous prosodic changes, marking it via coordinated variations in duration, pitch, and pitch range. Non-tonal L1 speakers tend to focus more on pitch changes, resulting in incomplete mastery of Mandarin prosody, including narrow pitch range, incorrect contour patterns, and syllable duration patterns \cite{li2020}.
Despite in-depth research on the prosodic features of ambiguous structures in Chinese linguistics, discussions within international Chinese education face several limitations: (1) Few studies examine L2 learners' perception and production of adverbially ambiguous sentences. Only Yao (2011) investigated how learners from different L1 backgrounds interpret "都" sentences using prosody, with no research on "也," "才," or "最" sentences. (2) There is a lack of integrated research perspectives combining perception and production. Existing studies focus primarily on native speakers' production of adverbially ambiguous sentences, with less attention to how both native speakers and learners interpret such sentences through prosody at the perceptual level. Perception and production represent two crucial aspects of language acquisition with close interconnections. A comprehensive examination of learners at different proficiency levels would provide deeper insights into their actual developmental trajectories and offer valuable guidance for teaching strategies, curriculum design, and resource development.
Based on these considerations, this study selects four typical types of adverbially ambiguous sentences—"也," "最," "都," and "才" sentences—and employs perception and acoustic experiments to investigate the role of prosodic features in how native speakers and Chinese learners perceive and produce the meanings of these sentences. The study addresses the following research questions:
2. How do Chinese learners perform when perceiving and producing different meanings of adverbially ambiguous sentences through prosodic features? Are their perceptual and productive abilities consistent? What factors underlie their performance?
1. Perception Experiment
"也," "最," and "都" are Level 1 vocabulary items in the International Chinese Proficiency Standards, while "才" is a Level 2 vocabulary item. For international students, these represent relatively low-difficulty linguistic elements that beginners should master. Based on the different orientations of the four adverbs "都," "也," "才," and "最," we created four target sentences with adverbial orientation ambiguity and designed two distinct contexts for each to disambiguate them, yielding eight sentences total. To facilitate comprehension for Chinese learners, all lexical items in the target sentences and disambiguating contexts were selected from Level 3 or below in the Standards. We also controlled for sentence structure, using only subject-predicate constructions with monosyllabic constituents to minimize variable diversity. To ensure material quality, five undergraduate and graduate students majoring in linguistics were invited to evaluate the ambiguity and naturalness of the materials before the experiment. Based on their feedback, we revised the materials and finalized the four target sentence sets shown in Table 1 [TABLE:1].
Table 1: Experimental Materials for Adverbially Ambiguous Sentence Production
Target Sentence Adverbially Ambiguous Sentence "都" sentence 他都去了 (He all went) "也" sentence 张老师也教汉语 (Teacher Zhang also teaches Chinese) "才" sentence 三个人才吃了一块蛋糕 (Only three people ate one cake) "最" sentence 安娜最喜欢唱歌 (Anna most likes singing)To ensure naturalness in actual speech flow, the disambiguating contexts for more complex ambiguous sentences were designed as two-turn dialogues, each containing a question and answer. The first turn introduces the topic, while the answer in the second turn contains the target sentence. Using "他都去了" as an example, Table 2 [TABLE:2] presents the dialogue scripts for this ambiguous sentence in different contexts, with scene and role prompts in parentheses.
Table 2: Example Sentence "他都去了" in Different Contexts
Front-Orientation Context Back-Orientation Context (Friends chatting) (Friends chatting) A: Did Xiao Zhang go to the meeting? A: Where's Xiao Zhang? B: Yes. B: He went to eat. Why are you looking for him? A: If Xiao Zhang went, I won't go. A: It's urgent. B: Even he went, can't you go? B: He's already gone. Wait a bit.When the adverb "都" orients to "他" (he), "他" becomes the discourse focus requiring emphasis, meaning "Even Xiao Zhang went, can't you go?" When "都" orients to "去了" (went), "去了" becomes the focus, indicating "He has already gone."
We added four filler sets with eight non-ambiguous sentences similar in length and structure to the target sentences, each with corresponding contexts to prevent participants from discovering the experimental purpose. Two Mandarin speakers with PSC Level 2A certification recorded the dialogues in a professional studio at a natural conversational pace. We used Audition to extract the target sentences for the perception experiment, with target and filler sentences randomly ordered.
1.2. Experimental Procedure
The perception experiment aimed to examine whether L2 learners could identify prosodic features of adverbially ambiguous sentences in different contexts. The experiment comprised two tasks: discrimination and comprehension. The discrimination task tested whether participants could perceive prosodic variations across different contexts, while the comprehension task required context matching to investigate whether participants could interpret the specific meanings carried by different prosodic patterns.
The experimental design and procedure were as follows: We used PsychoPy 21.2.3 to program the experiment. In the discrimination task, a fixation cue "+" appeared at the screen center for 500ms, followed by a 200ms blank screen. The discrimination task then presented the question: "Please judge whether the pronunciations of the sentences in the audio are the same." Participants could click the audio button to play the recordings repeatedly. If they were identical, they pressed "1"; if different, "0". Pressing Space advanced to the next trial. Four sets of eight target sentences were presented, with each set containing two audio recordings of the same target sentence in different contexts (i.e., with different prosodic patterns), plus four filler sets where the same audio was played twice. The audio speakers used standard Mandarin with clear articulation and no obvious accent. The order of target sentence sets was randomized.
In the comprehension task, participants listened to audio and judged which of the provided dialogue scenarios the sentence would occur in. They completed 16 multiple-choice questions: eight for target sentences and eight for fillers. Target sentence questions had one correct answer, while filler questions had all options correct. Participants were familiarized with the dialogue scenarios beforehand. The entire experiment lasted approximately 15 minutes.
1.3. Participants
Participants were divided into two groups: native Mandarin speakers and Chinese learners. The native speaker group consisted of eight Mandarin-speaking Chinese majors from Nanjing Normal University, aged 18–20, with no obvious dialect accent and normal language, hearing, and vision abilities. To increase reliability, we additionally recruited 12 native speakers through anonymous advertisement. The learner group comprised 13 Chinese learners from Nanjing Normal University whose native languages were non-tonal, aged 17–24, all having passed HSK Level 3. To determine participants' proficiency levels, we administered an additional language test consisting of a cloze passage using Feng's (2020) rapid L2 proficiency test. Cloze tests are considered effective measures of learner proficiency \cite{zhao2009} and are commonly used in L2 acquisition research \cite{liu2020, hong2024, he2024}. Testing revealed no significant differences among participants, indicating comparable Chinese proficiency levels.
1.4. Experimental Results
1.4.1. Discrimination Task
Figure 1 [FIGURE:1] shows the accuracy rates of native speakers and learners in discriminating prosodic patterns of four adverbially ambiguous sentence types under different semantic interpretations.
Figure 1: Discrimination Task Accuracy Rates
The figure clearly shows that both native speakers and learners achieved high accuracy in identifying prosodic patterns for "都," "也," and "最" sentences, with "才" sentences showing relatively lower accuracy. Notably, both groups reached 100% accuracy for "也" and "最" sentences, and maintained over 90% accuracy for "都" sentences. This indicates that the 20 native speakers and 13 learners could generally perceive prosodic pattern variations in "也," "都," and "最" sentences. However, accuracy dropped for "才" sentences, particularly for learners.
1.4.2. Comprehension Task
1.4.2.1. Reaction Time Analysis
Table 3 [TABLE:3] presents the average time required for native and learner groups to comprehend meanings of "都," "也," "最," and "才" sentences under different prosodic patterns, along with between-group comparisons.
Table 3: Reaction Time Statistics ("1" = front-orientation meaning, "2" = back-orientation meaning)
Sentence Type Native Group Mean Learner Group Mean Significance 都1 6.05s 15.51s p < .01** 都2 8.22s 12.58s p < .01** 也1 6.14s 17.03s p < .05* 也2 7.29s 15.28s p < .01** 最1 4.95s 12.99s p < .01** 最2 5.93s 12.17s p < .01** 才1 14.04s 24.90s p = .061 才2 13.28s 18.75s p < .01**The table clearly shows significant reaction time differences between groups across sentence types. Learners required nearly 1.5–3 times longer for semantic processing than native speakers. For "都" sentences, native speakers averaged 6.05s (都1) and 8.22s (都2), while learners averaged 15.51s and 12.58s, respectively, with significant differences (p < .01). For "也" sentences, native speakers averaged 6.14s (也1) and 7.29s (也2), versus 17.03s and 15.28s for learners (p < .05 and p < .01). For "最" sentences, native speakers averaged 4.95s (最1) and 5.93s (最2), compared to 12.99s and 12.17s for learners (both p < .01). For "才" sentences, native speakers averaged 14.04s (才1) and 13.28s (才2), while learners averaged 24.90s and 18.75s, with the difference for 才1 approaching significance (p = .061). Overall, native speakers demonstrated significantly shorter reaction times, indicating more efficient processing of grammatical and prosodic patterns, while learners showed clear processing difficulties.
1.4.2.2. Accuracy Analysis
Table 4 [TABLE:4] shows accuracy rates for native and learner groups in identifying meanings of four adverbially ambiguous sentence types under different prosodic patterns.
Table 4: Accuracy Statistics ("1" = front-orientation, "2" = back-orientation)
Sentence Type Native Group Learner Group Significance 都1 0.90 0.69 p = .18 都2 0.95 0.46 p < .01** 也1 0.95 0.85 p > .3 也2 0.90 0.77 p > .3 最1 0.95 0.84 p > .3 最2 0.85 0.77 p > .3 才1 0.90 0.54 p < .05* 才2 0.60 0.46 p = .493Native speakers outperformed learners across all sentence types, though significance varied. For "都" sentences, native speakers achieved 90% (都1) and 95% (都2) accuracy, while learners scored 69% and 46%. The 都2 difference was significant (p < .01), revealing learner difficulties. For "也" sentences, native speakers scored 95% (也1) and 90% (也2), versus 85% and 77% for learners, with no significant differences (p > .3). For "最" sentences, native speakers scored 95% (最1) and 85% (最2), compared to 84% and 77% for learners, again with no significant differences. For "才" sentences, native speakers achieved 90% (才1) and 60% (才2), while learners scored 54% and 46%, with a significant difference for 才1 (p < .05). In summary, native speakers showed higher accuracy, particularly for 都2 and 才1, indicating learner challenges with complex sentence types.
2. Production Experiment
The production experiment used the same target sentences and contexts as the perception experiment. Four additional Chinese learners from Nanjing Normal University served as speakers, bringing the total L2 speaker count to 12. All L2 speakers were non-tonal L1 speakers with HSK Level 3 or above and language test scores between 25–35.
Before recording, materials were explained to participants to ensure familiarity and full comprehension of both meanings of each target sentence. After participants confirmed their understanding of pronunciation and meaning, recordings were made in the phonetics laboratory at Nanjing Normal University's Suiyuan Campus. Recording software was Adobe Audition 1.5 (mono, 44.1kHz sampling rate, .wav format). Equipment included an HP desktop computer, Neumann U87Ai microphone, and RME Fireface 800 audio interface. Participants unable to access the lab recorded at the Xianlin Campus Arts Center using a Lenovo ThinkBook computer and Logitech Blue Yeti microphone.
Speakers followed scripted dialogues at normal speech rate and natural intonation, with the experimenter always reading Part A and participants reading Part B. The experimenter monitored recordings throughout and requested re-recordings for any disfluencies or errors. While minor variations in non-target sentences were permitted, target sentences had to match the script exactly.
2.3. Data Collection and Processing
After initial processing, recordings were edited in Adobe Audition 1.5 to remove context, retaining only target sentences for acoustic analysis.
We used Praat for three-tier annotation: (1) Character tier (HZ) marking Chinese characters in ambiguous structures; (2) Pinyin tier (PY) marking pinyin and tones (1 = level, 2 = rising, 3 = dipping, 4 = falling, 5 = neutral); (3) Syllable tier (SY) marking syllable boundaries.
We completed segmentation and annotation for 25 speakers (8 native, 17 L2), yielding 25 × 2 × 4 target sentences. All subsequent prosodic analyses were based on this dataset. We analyzed two prosodic features: F0 and duration. Annotated target structure files (.wav and .TextGrid) were placed in a single folder. A Praat script generated F0 tracks for all extracted target sentences, with extreme outliers manually removed. Since F0 contours were not perfectly time-aligned, we extracted 10 equidistant F0 values within each character for direct comparison. These values were copied to Excel for processing.
F0 values reflect individual speakers' actual pitch, which is influenced by personal factors. To remove individual variation, we converted Hz values to semitone units using the formula:
f(st) = 12 * log₂ (f0/fref)¹
where f(st) represents F0 in semitones, f0 represents F0 in Hz, and fref = 50 Hz.
Duration was extracted from the pinyin tier using a Praat script. Average duration per character was calculated for each participant group in seconds (s). Converted F0 values, duration data, and target sentences were compiled in Excel for statistical analysis.
2.4. Data Analysis
2.4.1. F0 Features
Figures 2–5 [FIGURE:2-5] and Tables 5–8 [TABLE:5-8] present average F0 contours and statistical results for native and L2 speakers reading the four sentence types. "a" indicates front-orientation features, "b" indicates back-orientation.
Figure 2: F0 Contour of "他都去了"
Figure 3 [FIGURE:3]: F0 Contour of "张老师也教汉语"
Figure 4 [FIGURE:4]: F0 Contour of "安娜最喜欢唱歌"
Figure 5 [FIGURE:5]: F0 Contour of "三个人才吃了一块蛋糕"
Table 5: Paired-Sample t-Test Results for F0 in "他都去了"
Comparison Mean Difference SD t p Front "他" - Back "他" -4.16 - - p < .01** Front "都" - Back "都" -4.14 - - p < .01** Front "去" - Back "去" -0.04 - - -Table 6 [TABLE:6]: Paired-Sample t-Test Results for F0 in "张老师也教汉语"
Comparison Mean Difference SD t p Front "张" - Back "张" -2.18 - - p < .001*** Front "老" - Back "老" -1.54 - - p < .01** Front "师" - Back "师" -6.70 - - p < .05* Front "也" - Back "也" -0.63 - - p < .01** Front "汉" - Back "汉" -0.19 - - p < .05* Front "语" - Back "语" -2.24 - - p < .05* Front "张" - Back "张" -2.32 - - -Table 7 [TABLE:7]: Paired-Sample t-Test Results for F0 in "安娜最喜欢唱歌"
Comparison Mean Difference SD t p Front "安娜" - Back "安娜" -4.32 - - p < .001*** Front "最" - Back "最" -5.85 - - - Front "唱歌" - Back "唱歌" -2.87 - - p < .05*Table 8 [TABLE:8]: Paired-Sample t-Test Results for F0 in "三个人才吃了一块蛋糕"
Comparison Mean Difference SD t p Front "三" - Back "三" -0.82 - - p < .05* Front "才" - Back "才" -1.19 - - p < .05* Front "一" - Back "一" -0.26 - - - Front "三" - Back "三" -0.19 - - -The figures and tables clearly show that native and L2 speakers produced different F0 patterns for ambiguous structures and adverbs themselves. L2 speakers' average F0 contours were consistently lower than natives', indicating generally lower pitch. Pitch has been a primary focus of intonation research \cite{xie2019}. Previous studies have shown that F0 elevation is an important acoustic correlate of syntactic emphasis \cite{lin2013}.
Paired-sample correlations were significant for both groups (p < .05). t-tests revealed that native speakers significantly increased F0 for oriented constituents when producing adverbially ambiguous sentences. In "都" sentences, F0 elevation for oriented constituents was highly significant (p < .01). When "都" oriented forward, "他" showed significantly higher F0 than in back-orientation; when oriented backward, "去" showed significantly higher F0. L2 speakers exhibited consistent patterns with natives (F0 elevation for oriented constituents), but the magnitude was substantially smaller.
In "也" sentences, when "也" oriented forward to "张," natives showed highly significant F0 elevation for "张" (p < .001), significant elevation for "老" (p < .01), and significant elevation for "师" (p < .05). When oriented backward to "汉," "汉" showed significant F0 elevation (p < .05). L2 speakers performed better in back-orientation, showing significant F0 elevation for "汉" (p < .05). In "最" sentences, when oriented forward, natives showed marginally significant F0 elevation for "安娜" (p = .087). When oriented backward, oriented constituents showed non-significant F0 decrease, while the adverb "最" itself showed highly significant F0 elevation (p < .001). L2 speakers showed almost no F0 elevation for "安娜" in forward orientation but significant elevation for "唱歌" in backward orientation (p < .05). In "才" sentences, natives showed significant F0 elevation for both forward-oriented "三" and backward-oriented "一" (p < .05). L2 speakers' F0 contours for both meanings were nearly identical, failing to show native-like acoustic patterns at semantic orientation points.
2.4.2. Duration Features
Figures 6–9 [FIGURE:6-9] and Tables 9–12 [TABLE:9-12] present duration characteristics and statistical results. "a" indicates front-orientation, "b" indicates back-orientation.
Figure 6: Duration Statistics for "他都去了"
Figure 7 [FIGURE:7]: Duration Statistics for "张老师也教汉语"
Figure 8 [FIGURE:8]: Duration Statistics for "安娜最喜欢唱歌"
Figure 9 [FIGURE:9]: Duration Statistics for "三个人才吃了一块蛋糕"
Table 9: Paired-Sample t-Test Results for Duration in "他都去了"
Comparison Mean Difference (ms) SD t p Front "他" - Back "他" 74.14 10.42 - p < .001*** Front "都" - Back "都" 14.41 -6.17 - - Front "去" - Back "去" -63.18 -6.82 - p < .001*** Front "他" - Back "他" 15.66 - - - Front "都" - Back "都" - - - - Front "去" - Back "去" - - - -Table 10 [TABLE:10]: Paired-Sample t-Test Results for Duration in "张老师也教汉语"
Comparison Mean Difference (ms) SD t p Front "张" - Back "张" 4.41 9.97 - - Front "老" - Back "老" 20.14 0.29 - - Front "师" - Back "师" 27.70 1.28 - - Front "也" - Back "也" 24.66 -24.61 - - Front "汉" - Back "汉" -16.07 -6.35 - - Front "语" - Back "语" 17.91 -1.62 - -Table 11 [TABLE:11]: Paired-Sample t-Test Results for Duration in "安娜最喜欢唱歌"
Comparison Mean Difference (ms) SD t p Front "安娜" - Back "安娜" 28.22 36.40 - - Front "最" - Back "最" -3.37 - - - Front "唱歌" - Back "唱歌" - - - -Table 12 [TABLE:12]: Paired-Sample t-Test Results for Duration in "三个人才吃了一块蛋糕"
Comparison Mean Difference (ms) SD t p Front "三" - Back "三" 29.76 -7.89 - p < .05* Front "才" - Back "才" 11.16 25.09 - p < .05* Front "一" - Back "一" - 36.17 - p < .05*Duration, as a crucial prosodic feature, plays an important role in semantic highlighting. Chao (1968) noted that Mandarin stress primarily involves pitch range expansion and duration lengthening. Li (2002) experimentally demonstrated that stressed syllables in Mandarin have longer duration than unstressed syllables. The figures show that in native speaker productions, forward-oriented constituents ("他," "张," "三," "安娜") had longer average durations in condition "a" than "b," while backward-oriented constituents ("去," "汉," "唱歌") had longer durations in condition "b" than "a." Thus, native speakers can disambiguate sentences by lengthening the duration of adverb-oriented constituents. For learners, durations across all constituents were generally longer than natives', likely reflecting difficulties in lexical and semantic processing requiring more processing time.
Across the four sentence types, only "他都去了" showed consistent significant duration changes for oriented constituents in native speech. When "都" oriented forward, subject "他" duration nearly doubled compared to back-orientation; when oriented backward, "去" duration significantly increased. Paired t-tests confirmed significant duration increases for oriented constituents ("他" and "去") when "都" oriented differently (p < .001). L2 speakers showed opposite patterns.
For "也" and "最" sentences, native speakers showed consistent duration changes aligned with F0 patterns. In "也" sentences, forward orientation to "张老师" showed marginally significant duration increase for "老" (p = .065), with "老" showing greater lengthening than the logical stress-bearing "张." L2 speakers showed similar durations across both meanings. In "最" sentences, forward orientation showed noticeable duration increase for "安娜," while backward orientation showed minimal increase for "唱歌." L2 speakers showed patterns consistent with natives for "最" sentences. In "才" sentences, natives showed longer durations for both "三" and "一" in forward orientation, suggesting they cannot highlight semantics through duration lengthening for "才" sentences. L2 speakers again showed opposite patterns.
For adverbs themselves, duration changes aligned with F0 changes in "最" and "都" sentences, with "最" showing notable duration increase in back-orientation. However, "也" and "才" sentences showed duration changes opposite to F0 changes.
3. Discussion
Native speakers can disambiguate adverbially ambiguous sentences through prosodic features. At the perceptual level, they can identify prosodic patterns corresponding to different meanings, with all native participants perceiving variations in "都," "也," and "最" sentences, and most perceiving "才" sentence variations. Comprehension and discrimination tasks showed consistent performance. Most native speakers accurately understood different meanings conveyed by prosodic patterns in "都," "最," and "才" sentences, correctly matching them to contexts. For informationally dense "才" sentences, accuracy decreased slightly for back-orientation meanings, though many still successfully interpreted them through prosody.
At the production level, native speakers generally used elevated pitch and extended duration of oriented constituents to clarify meanings in "都," "也," and "才" sentences, consistent with previous findings \cite{xu2010, huang2008, yao2011, huang2013}. Notably, previous studies used shorter sentences (mostly 5 syllables), while our materials contained more syllables and greater processing difficulty. Comparison reveals that despite increased difficulty, native speakers' ability to express semantic focus through prosody remained unaffected, demonstrating that prosodic features are not merely phonetic phenomena but essential tools for effective communication that are relatively robust against processing demands. When emphasizing oriented constituents, native speakers expanded their pitch range, followed by substantial pitch range compression and narrowing, consistent with Mandarin focus prosody \cite{shen1985, xu1999, wang2002, cao2010, wang2015}.
Previous research did not examine acoustic changes in constituents when "最" oriented differently. Our results show that "最" sentences share similarities with and differences from other sentence types: when "最" orients backward, the adverb itself shows extremely significant F0 elevation, indicating that native speakers rely more on raising "最"'s F0 to express backward orientation—emphasizing that among Anna's hobbies, singing is the favorite. Notably, in "也" sentences, besides highly significant F0 elevation for forward-oriented "张," both "老" and "师" also showed significant increases, and "老" showed even greater duration lengthening than "张." This confirms that the intonation center in "也" sentences differs from logical stress \cite{yang2000}. When orienting backward to "汉语," "汉" showed significant F0 elevation, matching the logical stress pattern that emphasizes Teacher Zhang teaches Chinese, not other languages.
Overall, native speakers' duration increases for oriented constituents were smaller than F0 increases. Only "都" sentences showed significant duration lengthening. While "也" and "最" sentences showed duration patterns consistent with F0 changes, none reached significance. "才" sentences showed no duration changes across meanings, suggesting native speakers rely more on F0 than duration for disambiguation.
3.2. Inconsistent Perceptual and Productive Abilities in L2 Learners
L2 learners could identify prosodic pattern differences in adverbially ambiguous sentences through acoustic cues, even achieving native-like accuracy for "也" and "最" sentences. However, this ability was unbalanced: accuracy was high for "都," "也," and "最" sentences but low for "才" sentences. When facing actual contexts, learners' ability to interpret sentence information through prosodic cues did not match their discrimination ability, lagging behind native speakers. This suggests non-native speakers rely more on physical cues than linguistic information when perceiving Mandarin prosodic boundaries \cite{chen2016}. For the four sentence types, learners showed relatively accurate understanding of prosody-semantics interactions in "也" and "最" sentences, with high accuracy. Although they could discriminate prosodic patterns in "都" sentences, they struggled to interpret the meanings conveyed, consistent with Yao (2011). Their low accuracy in discriminating and comprehending "才" sentence prosody may relate to the complexity of "才" sentences, which involve subjective quantity comparisons \cite{chen1994}. Notably, learners showed strong ability to perceive ambiguity caused by degree adverb "最," achieving extremely high discrimination accuracy and high comprehension accuracy for forward orientation. This may be due to our materials: forward constituent "安娜" is a transliteration common in learners' L1s, allowing "two-category assimilation" where L2 sounds map to two distinct L1 phonemes. According to the Perceptual Assimilation Model, L2 sounds undergoing two-category assimilation are better perceived \cite{chen2023}. For native speakers, "安娜" is less familiar, potentially diverting attention and reducing accuracy. This suggests that incorporating L1 elements in teaching may enhance efficiency.
Production results show that native speakers can disambiguate adverbially ambiguous sentences by producing distinct prosodic features for oriented constituents, with consistent acoustic patterns of noticeable F0 elevation and duration lengthening. L2 speakers rarely produced native-like acoustic patterns, showing generally flat intonation without clear stress—a common error in L2 stress production \cite{deng2022}. Their weak ability to use prosody for disambiguation reflects inability to accurately grasp discourse centers and logical stress. Even when F0 or duration increases occurred, they were limited to individual constituents and far smaller than native speakers' increases.
3.3. Explaining L2 Difficulties in Producing Prosodic Features
Correct production of prosodic features for adverbially ambiguous sentences requires simultaneous sentence comprehension and constituent-level prosodic implementation, posing considerable difficulty for learners. Although learners can partially interpret different meanings through prosody, they often neglect prosodic means when processing more complex tasks, adopting avoidance strategies and failing to actively use prosody to express different meanings accurately.
The relatively high F0 for "了" in learners' "都 b" productions likely reflects negative L1 transfer. Our L2 participants' native languages (English, Russian, etc.) are morphologically rich. The meaning expressed by "都 b" ("he has already gone") is conveyed through tense/aspect morphology in their L1s. "了" is often considered a rare "aspect marker" in Chinese \cite{liang2023}, sharing similarities with L1 verb morphology. In English, "也" sentences also create ambiguity. Observation of two English speakers in our study revealed they performed better in using prosody to disambiguate "也" sentences than non-English speakers. Goad & White (2004) proposed the Prosodic Transfer Hypothesis, suggesting L1 prosodic features affect L2 prosodic acquisition. Non-tonal languages' stress realization differs markedly from Mandarin. Mandarin is a tone-stress language where stress must be built upon tones \cite{xu2016}, whereas our participants' L1s rely more on lexical stress than tone. Consequently, L2 learners show weak ability to coordinate lexical tone and intonation when producing Mandarin stress, unable to adjust pitch range while maintaining tone patterns. Even when they identify discourse centers and consciously raise F0, the magnitude is significantly smaller than native speakers'.
4. Conclusion
Through perception and production experiments, this study examined the acquisition of prosodic features in adverbially ambiguous sentences by Chinese learners with non-tonal L1 backgrounds. Results show that at the perceptual level, learners can basically identify prosodic pattern changes when adverbs orient differently, but their ability to interpret sentence information through prosody remains weak. At the production level, learners largely cannot effectively use stress to disambiguate adverbially ambiguous sentences. Overall, perception ability surpasses production ability, validating Altmann's (2006) claim that good L2 prosodic perception skills do not guarantee good production.
Although research confirms the importance of prosody in L2 acquisition, prosodic instruction is often neglected in actual curricula, with some arguing that pronunciation and intonation cannot be taught. Moreover, prosodic acquisition is particularly difficult for learners beyond the critical period \cite{lengeris2012}. Since Chinese learners are typically adults past the critical period, prosodic instruction deserves greater emphasis. Current international Chinese teaching treats prosody unsystematically \cite{chen2021}, with teachers often adopting ad-hoc approaches, resulting in low acquisition efficiency. Therefore, instructors should emphasize systematic prosodic acquisition, consciously supplementing prosodic knowledge and training while teaching correct pronunciation. Training should integrate perception and production practice to help learners improve discrimination and overcome "foreign accent" issues.
References
[1] International Chinese Language Education Chinese Proficiency Standards (GF0025-2021). Beijing: Beijing Language and Culture University Press, 2021.
[2] Luo Changpei, Wang Jun. Outline of General Phonetics. Beijing: The Commercial Press, 2002.
[3] Lin Tao, Wang Lijia. Phonetics Course: Revised Edition. Beijing: Peking University Press, 2013.
[4] Shao Jingmin (Ed.). General Introduction to Modern Chinese (3rd ed.). Shanghai: Shanghai Education Press, 2016.
[5] Ye Feisheng, Xu Tongqiang, Wang Hongjun, et al. Outline of Linguistics. Beijing: Peking University Press, 2010.
[6] Ma Zhen, Lu Jianming. Essays on Modern Chinese Function Words. Beijing: Peking University Press, 2017.
[7] Zhu Dexi. Lecture Notes on Grammar. Beijing: The Commercial Press, 1982.
[8] Lü Shuxiang. 800 Words in Modern Chinese. Beijing: The Commercial Press, 1980.
[9] Wang Huan. New Chinese-English Dictionary of Function Words. Beijing: Sinolingua, 1999.
[10] Chao Yuen Ren. A Grammar of Spoken Chinese (Lü Shuxiang, Trans.). Beijing: The Commercial Press, 1979.
[11] Cao Wen. Prosodic Realization of Focus Stress in Mandarin. Beijing: Beijing Language and Culture University Press, 2010.
[12] Li Aijun. Acoustic manifestations of prosodic features in Mandarin dialogue. Chinese Language, 2002(6).
[13] Shen Kaimu. On "semantic orientation." Journal of South China Normal University (Social Science Edition), 1996(1).
[14] Liu Ningsheng, Qian Yulian. Semantic orientation of "最" and implicatures of "最" sentences. Chinese Language Learning, 1987(5).
[15] Chen Xiaohe. Preliminary exploration of subjective quantity—On adverbs "就," "才," "都." Chinese Teaching in the World, 1994(4).
[16] Zhou Shoujin. Semantic information features of "subjective quantity" and semantics of "就" and "才." Peking University Journal (Philosophy and Social Sciences), 2004(3).
[17] Jin Lixin. Explanations of some "就" and "才" phenomena. Language Teaching and Linguistic Studies, 2015(6).
[18] Lan Binhan. Semantic features of adverb "都" and its restrictions on following verbs. Language Teaching and Linguistic Studies, 1988(2).
[19] Huang Caiyu. Experimental phonetic analysis of semantically ambiguous "都" sentences. Language Teaching and Linguistic Studies, 2013(5).
[20] Yang Yiming. On ambiguity in "也" sentences. Chinese Language, 2000(2).
[21] Xu Yizhong, Yang Yiming. Study on ambiguity of "就" and "才" and related phonetic issues. Language Research, 2010, 30(1).
[22] Zhou Fengling, Wang Jianqin. Korean learners' Mandarin oral prosody processing in ambiguous sentence disambiguation. Second Language Learning Research, 2019(1).
[23] Deng Dan, Zhu Lin. L2 learners' perception and production of Mandarin neutral tone. Language Teaching and Linguistic Studies, 2019(5).
[24] Wen Baoying, Pan Chaochao, Xu Lizheng. Experimental exploration of Portuguese learners' Mandarin declarative intonation acquisition. International Chinese Language Education, 2022(3).
[25] Li Baogui, Zhou Tiantian. Experimental study on Italian students' Mandarin declarative intonation acquisition. Nankai Linguistics, 2020(2).
[26] Yao Qian. Experimental study on using prosodic information to interpret "都" sentence ambiguity—Comparison between native and L2 speakers. Chinese Language Teaching and Research, 2011(4).
[27] Feng Liping, Feng Hao, Bai Sida, et al. Development and analysis of rapid Chinese L2 proficiency test—Based on equidistant cloze test. Applied Linguistics, 2020(3).
[28] Zhao Yang. Acquisition of Chinese unaccusative and psychological verbs—On superset-subset relations and learnability. Chinese Teaching in the World, 2009, 23(1).
[29] Liu Ying. Thai speakers' acquisition of exclusive semantics in Mandarin. Chinese Teaching in the World, 2020, 34(4).
[30] Hong Wei, Liu Xiaodi. Effects of metaphorical gestures on L2 abstract vocabulary learning. Chinese Language Learning, 2024(3).
[31] He Muxuan, Zheng Lina, Chang Hui. Interface perspective on Korean learners' acquisition of Chinese negation scope. Chinese Teaching in the World, 2024, 38(1).
[32] Xie Hong, Shi Feng. On prosodic encoding of speech information structure. Journal of Tianjin University (Social Sciences), 2019, 21(4).
[33] Shen Jiong. Preliminary discussion of Mandarin intonation models. Linguistic Research, 1992(4).
[34] Wang Bei, Yang Yufang, Lü Shinan. Study on F0 patterns of stressed syllables in Mandarin sentences. Acta Acustica, 2002(3).
[35] Wang Yunjia, Ding Duoyong, Dong Tuoxiao. Tone implementation under different intonation conditions. Acta Acustica, 2015, 40(6).
[36] Chen Mo. Perception of naturalness in L2 Mandarin reading pauses. Language Teaching and Linguistic Studies, 2016(3).
[37] Chen Shuwen. Theoretical foundations and new developments in L2 speech production research. Contemporary Linguistics, 2023, 25(4).
[38] Deng Dan. Analysis of Mandarin intonation teaching for L2 learners. International Chinese Language Education, 2022, 7(3).
[39] Liang Yinfeng. Semantic evolution direction of Chinese aspect markers "了" and "着." Journal of Nanjing Normal University (School of Chinese Language and Literature), 2023(2).
[40] Xu Ximing, Shen Jiaxuan. Phonological differences in stress between English and Chinese. Foreign Language Teaching and Research, 2016, 48(5).
[41] Chen Mo. Prosody: A new focus in L2 acquisition research. International Chinese Language Teaching Research, 2021(3).
[42] Wang Ying. Preliminary experimental phonetic study of "也" ambiguous sentences. Nankai University, 2005.
[43] Liu Shanshan. Analysis of pause acquisition by Russian Chinese learners. Southwest Jiaotong University, 2018.
[44] Luo Li. Study on prosody-related interface acquisition of ambiguous structures by Uyghur Chinese learners. Sichuan International Studies University, 2022.
[45] Goad H, White L. Ultimate attainment of L2 inflection: Effects of L1 prosodic structure. EUROSLA Yearbook, 4(1), 2004.
[46] Lengeris A. Prosody and Second Language Teaching: Lessons from L2 Speech Perception and Production Research. In Pragmatics and Prosody in English Language Teaching. Dordrecht: Springer Netherlands, 2012.
[47] So C K, Best C T. Phonetic influences on English and French listeners' assimilation of Mandarin tones to native prosodic categories. Studies in Second Language Acquisition, 2014, 36(2).
[48] Xu Y. Effects of tone and focus on the formation and alignment of F0 contours. Journal of Phonetics, 1999, 27(1).
[49] Altmann H. The perception and production of second language stress: A cross-linguistic experimental study. University of Delaware, 2006.