Véronique Bukmaier 1, Jonathan Harrington 1, Ulrich Reubold 1, Felicitas Kleber 1 - PDF

INTERSPEECH 24 Synchronic variation in the articulation and the acoustics of the Polish threeway place distinction in sibilants and its implications for diachronic change Véronique Bukmaier, Jonathan Harrington,

Please download to get full document.

View again

of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Nature & Wildlife

Publish on:

Views: 6 | Pages: 5

Extension: PDF | Download: 0

INTERSPEECH 24 Synchronic variation in the articulation and the acoustics of the Polish threeway place distinction in sibilants and its implications for diachronic change Véronique Bukmaier, Jonathan Harrington, Ulrich Reubold, Felicitas Kleber Institute of Phonetics and Speech Processing, University of Munich, Germany [bukmaier jmh reubold Abstract The aim of the present study was to relate articulatory properties of the Polish sibilants /s ʂ ɕ/ to a potential neutralization of /ʂ/ as either /s/ or /ɕ/, the former having occurred in a number of Polish dialects. For this purpose tongue tip (TT) movement data was obtained together with acoustic data using electromagnetic articulography. The sibilants, that were always followed by either /a e o/, were produced by four L-Polish speakers at fast and slow speech rates. While /s ʂ/ had almost identical transitions, they differed greatly in the spectral characteristics with /ʂ/ being closer to /ɕ/. In order to capture differences in tongue position as well as shape both TT position and TT orientation data were analyzed. The vertical TT orientation showed similarities in /ʂ/ and /s/ production, but the two sibilants were clearly separated in TT position, with /ʂ/ being produced far more back than /s/ and /ɕ/, and the latter two being very similar. The tendentially greater effect of speech rate on /ʂ/ together with the varying acoustic and articulatory similarities between the sibilants are taken as an indicator for greater instability of /ʂ/. This synchronic instability is discussed in terms of potential diachronic mergers. Index Terms: Electromagnetic articulography, three-way place distinction in Polish sibilants, synchronic variation, diachronic change, instability. Introduction The aim of the present study was to explain a neutralization of anterior and non-anterior fricatives that has been observed in various languages using an articulatory analysis of the Polish sibilants /s ʂ ɕ/. Standard Polish is one of the very few languages that distinguishes lexically between one anterior and two non-anterior sibilants: dental /s/ (e.g. sali /sali/, Eng. room (gen.)), retroflex /ʂ/ (e.g. szali /ʂali/, Engl. scale (gen.)), and alveolopalatal /ɕ/ (e.g. siali /ɕali/, Engl. sown). As the descriptive terms suggest, the three sibilants differ articulatory not only in place of articulation, but also in tongue shape, as has been shown by MRI data in []. /ʂ/ and /ɕ/ can both be described as sharing a postalveolar place of articulation, and the resulting fricative noise has been reported to be rather similar by showing overlapping centers of gravity [2, 3, 4]; however, both non-anterior sibilants do differ in tongue-shape [], leading to very different coarticulatory influences on neighboring segments, i.e. to very pronounced acoustic differences in formant transitions. Perception experiments have shown that the three Polish sibilants are distinguished both by spectral properties and by formant transitions into the following vowel [3, 5] with transitions being more important for the distinction between the nonanteriors /ʂ/ and /ɕ/ and the steady-state frication part for distinguishing anterior /s/ from the non-anteriors /ʂ ɕ/. The /s ɕ/ contrast is encoded by both cues and is therefore perceptually robust. The distinction between the retroflex and the other two fricatives depends on only one of the two cues. In particular, as the three fricatives frequently occur in nonprevocalic position in Polish complex onset clusters, the transition cue may be perceptually masked, thus diminishing considerably its perceptual role [3, 4, 5, 6, 7, 8, 9]. Nevertheless these perceptual results together with those from articulatory and acoustic studies suggest a rather stable threeway contrast in Polish sibilants. The retroflex sibilants in Polish have been claimed to be results of a historical sound change during the 6 th century, in which palatalized palatoalveolars depalatalized and became retroflex [, ]. [7] reasoned that such a sound change could have come about because of the greater perceptual stability of the alveolopalatal vs. retroflex contrast compared to the earlier contrast of alveolopalatal vs. palatalized palatoalveolar sibilants (an argument which may also support the distribution of non-retroflex vs. retroflex sibilants in the worlds languages [2]). Yet, although there is some evidence for a good deal of stability in the three-way contrast in Polish sibilants, comparably crowded sibilant systems are still not only rare in the world s languages [3], but may be unstable: e.g. most non-standard varieties of Polish have already merged dental and retroflex sibilants [3], and the same sound change has been reported for the very similar three-way distinction of sibilants in Mandarin [4]. Given that coarticulation allows for a reasonably robust perception of differences between the three sibilants in Polish, robustness of perception may diminish in conditions in which the amount of coarticulation may be influenced, as in prosodically weak constituents [5, 6] or at higher speaking rates [7]. Conditions such as these are known to be possible triggers of historical sound changes [8]. One of the main motivations for the present study is to draw a connection between the production of the three sibilants /s, ʂ, ɕ/ and a potential diachronic collapse of the three-way to a binary contrast (most probably by a dental-retroflex merger). A comparison of the acoustic and articulatory Polish sibilant data is a good test case for quantifying a link between place differences and coarticulatory influences in a possible collapse. This is the first experimental study that investigates the production of the three sibilants in terms of electromagnetic articulographic (EMA) measurements of tongue movement and whose focus is the relation between their acoustic and articulatory properties at different speaking rates. The following three hypotheses were tested: H: The alveolopalatal fricative has the greatest influence on F2 transitions in neighboring vowels. H2: The alveolopalatal fricative differs from dental and retroflex fricatives mainly in tongue shape and less so in position. H3: The relative distance of the retroflex fricatives between dental and alveolopalatal diminishes in fast speech towards the dental fricative. Copyright 24 ISCA September 24, Singapore 2. Method 2.. Data collection and participants Acoustic and articulatory movement data were collected using electromagnetic articulometry at the IPS in Munich (AG5, Carstens Medizinelektronik; [8]) from four Polish L- speakers (two male, two female) aged between 9 and 28. The speakers were born in Poland, but lived in Munich, Germany, though no longer than two years at the time of recording. Two sensors were placed on the tongue: one on the midline cm behind the tongue tip (TT) and the other on a level with the molar teeth at the tongue back (TB). Two sensors were placed on the upper and lower lip. Four additional sensors were fixed to the maxilla, the nose bridge, as well as to the left and right mastoid bones: these served as reference sensors to correct for head movement Speech material The participants were asked to produce symmetrical CVCV (e.g. /sasa/) non-words (in which C=/s ʂ ɕ/ and V=/a e o/) which were embedded in the carrier phrase Ania woɫa CVCV aktualnie (literally Ania shouted CVCV currently ). In this study, only the initial CV-sequence was analyzed. The speech material was produced at a slow and a fast speech rate. Each carrier phrase was produced with a nuclear pitch accent on the target word, with participants repeating the sentence in case of producing it with an incorrect prosody Experimental set-up The recording session consisted of ten blocks, alternating between slow and fast speech rates. In order to determine the individual speech rate for each speaker and to adjust the corresponding recording time, each participant was asked to read examples of the speech material at self-selected fast and slow speech rates prior to the actual recording. To ensure consistent within speaker speech rate per condition, the display was enhanced with a progress bar linked to the desired speech rate that was defined for each speaker and condition based on the mean durations of the pre-recording indicated the time frame for each token. For each block, the carrier sentence containing the target words appeared in random order. In total, each participant produced 36 sentences (3 places of articulation 3 vowels repetitions 4 speakers) Analysis of articulatory data After post-processing the physiological raw data semiautomatically in Matlab, labeling and subsequent analyses of physiological data were conducted using EMU/R [2]. The physiological annotation of the three sibilants was based on the vertical movement of the TT in millimeters and the TT tangential velocity in millimeters per second. The tangential velocity is of importance in detecting TT landmarks because coronal constrictions can include TT raising as well as TT fronting. Physiological labels included seven different landmarks as can be seen in Fig. [2]. E.g., the beginning and the end of the constriction plateau were interpolated values located at a 2% threshold of two adjacent maxima in the velocity signal. As the plateau (defined by its on- and offset) of an articulatory gesture is known to be the most stable part in the measurements, the on- and offsets of the TT gesture plateaus, which are equivalent to the coronal constriction phases, define the time frames in which all articulatory analyses were conducted. Figure : Schematic representation of landmark positions: gestural onset (g on ), maximum velocity in gestural onset (v on ), onset of constriction plateau (p on ), maximum in constriction (m on ), offset of constriction plateau (p off ), maximum velocity in gestural offset (v off ), and gestural offset (g off ). Besides of delivering position data, the TT sensor was also used to determine the differences between the orientations of TT in retroflex vs. alveolopalatal fricatives. The curled anterior tongue shape is predicted to cause the TT sensor to point upwards for the retroflex, while the lowered anterior part of the tongue in the alveolopalatal fricative should cause the TT sensor to be oriented downwards [, 3]. In order to reduce as far as possible speaker differences for further analyses the articulatory data were Lobanov normalized [22]. As to do so, for each utterance the mean value,, of was calculated across all of the TT orientation and position values separately between the starting point of the constriction plateau and the endpoint of the constriction plateau of the ith utterance produced by the speaker. To quantify the articulatory distance between the three sibilants, the Euclidean distances E s and E ɕ were calculated in the VERTICAL TT ORIENTATION HORIZONTAL TT POSITION space separately for each sibilant token. The centroids of the dental and the alveolopalatal sibilants in the slow speech rate served as anchors. The log-euclidean distance ratio d sib was then calculated for each sibilant, from (): d sib = log(e s /E ɕ ) = log(e s ) log(e ɕ ) () The log-euclidean distance ratio d sib was calculated in order to obtain one value per sibilant which is a relative measure: greater positive values denote a closer distance to the alveolopalatal centroid, whereas greater negative values are associated with distances to the dental centroid, while a value of zero denotes that a given sibilant is equidistant in this articulatory space between the dental and the alveolopalatal centroids (see e.g. [23, 24] for a similar methodology) 2.5. Analysis of acoustic data The synchronized acoustic data was digitized at 6 khz and automatically segmented and labeled using forced alignment (Munich Automatic Segmentation tool, [25]). Calculations of spectra (256 point discrete Fourier transform with a 4 Hz frequency resolution, 5 ms Blackmann window, and a frame shift of 5 ms), of formant frequencies (F-F4; pre-emphasis of -.8, 2 ms Blackman window with a frame shift of 5 ms), and all further analyses were conducted in EMU/R [2]. For 24 acoustic analyses, spectra were extracted at the temporal midpoint between the acoustic onset and offset of each sibilant. These spectral data were reduced to a set of coefficients using the discrete cosine transformation (DCT), i.e. for an N-point mel-scaled spectrum, x(n), extending in frequency from n = to N points over the frequency range of 5 35 Hz, the mth DCT-coefficient C m (m =,, 2) was calculated with the formula in (2) These three coefficients C m (m =,, 2) encode the mean, the slope, and curvature respectively of the signal to which the DCT transformation was applied [2]. Since sibilants are well distinguishable by using only C 2 (i.e. the curvature of the spectral slice), all further quantifications of the sibilants were based on this coefficient. To quantify the acoustic distance between the three sibilants, for each sibilant token the Euclidean distances were calculated in the C 2 dimension following formula (), but with E ʂ instead of E ɕ. The reason for choosing slow /s/ and /ʂ/ as centroids was because [3] and [5] reported an alveolopalatal center of gravity that was between /s/ and /ʂ/. To quantify coarticulatory effects, the F2 transitions and the linear slopes (specified by the second DCT coefficient) were calculated for the second formant trajectories (from the onset of the vowel to its temporal midpoint) after applying the discrete cosine transformation (2) to the F2 trajectory (from the onset of the vowel to its temporal midpoint). The acoustic data was again Lobanov normalized [22]. 3. Results 3.. Spectral data and F2 transitions d sib Cm = 2km N 2-2 N n = (2n +)mπ x(n)cos 2N /s/ /ʂ/ /ɕ/ /s/ /ʂ/ /ɕ/ Figure 2: Log-Euclidean distance ratio d sib of the alveolopalatal sibilant to the mean positions of the dental and the retroflex sibilants in the C 2 dimension (=curvature of the spectral slice). Each box contains one token per vowel and speaker. With respect to the C 2 derived from the spectra, there was greater similarity between retroflex and alveolopalatal sibilants (cf. Fig. 2). This observation was confirmed by a repeated measures ANOVA with d sib as dependent variable and CONSONANT, VOWEL and SPEECH RATE as independent factors: the results showed a significant influence of CONSONANT (F[2,6] = 85.7, p .) but no significant influence of VOWEL or SPEECH RATE. In order to test separately the three levels of CONSONANT, post-hoc Bonferroni-adjusted t- tests were carried out, showing significant influences between (2) /s/ and /ʂ/ (p .), as well as between /s/ and /ɕ/ (p .5), but no significant differences between /ʂ/ and /ɕ/. F2[Hz] Proportional time Proportional time Figure 3: Mean F2 transitions (time normalized) averaged across vowels from vowel onset to the temporal midpoint separately for the dental (dashed), retroflex (dotted) and alveolopalatal (solid) sibilant and for fast and slow speech rate. At both speech rates, dental and retroflex sibilants showed quite similar F2 transitions into the vowels, whereas the F2 transitions of alveolopalatals were shown to differ from those of the other sibilants (cf. Fig. 3). In addition, there was more undershoot in fast than in slow speech. A repeated measures ANOVA with F2 (averaged over the transition from onset to temporal midpoint) as the dependent variable, VOWEL (three levels: /a, e, o/), SIBILANT (three levels /s, ʂ, ɕ/) and SPEECH RATE (two levels: slow, fast) as within-speaker factors was calculated in order to test the observations from Fig. 3. Apart from the significant influence of SIBILANT (F[2,6] = 46.2, p .) on the acoustic parameter, there was a predictable significant influence of VOWEL (F[2,6] = 29., p .) and of SPEECH RATE (F[,3] =., p .5) and a significant VOWEL SPEECH RATE interaction (F[2,6] = 6., p .5). In order to test whether there was a difference in the slope of the F2 transitions a repeated measures ANOVA with slope (encoded by the second DCT coefficient) as dependent variable and VOWEL, SIBILANT and SPEECH RATE as independent variables was calculated. In this case, there was a predictable significant influence of VOWEL (F2 [2,6] = 5.2, p .5) but no influence of SIBILANT and SPEECH RATE Articulatory analyses: tongue tip (TT) orientation data Vertical TT orientation - Figure 4: Lobanov-normalized and averaged vertical TT orientation and horizontal TT position at 3 % of the constriction plateau duration. F2[Hz] /ɕ/ /s/ /ʂ/ Horizontal TT position 25 While dental and retroflex sibilants resemble each other in vertical TT orientation (both show a slight upward TT orientation indicated by negative TT orientation values),the alveolopalatal sibilant differs from the two other sibilants in showing a downward TT orientation (indicated by positive TT orientation values). The TT position data shows that dental and alveolopalatal sibilants are fronted (indicated by negative TT position values) compared to the retroflex, which is located further back (indicated by positive TT position; cf. Fig. 4). The log-euclidean distance ratios d sib in Fig. 5 show no difference between the sibilants. This observation was confirmed by a repeated measures ANOVA with d sib as dependent variable and VOWEL and SPEECH RATE as independent factors showing no significances. d sib /s/ /ʂ/ /ɕ/ /s/ /ʂ/ /ɕ/ Figure 5: Log-Euclidean distance ratios d sib of the retroflex sibilant to the dental and the alveolopalatal sibilant in the VERTICAL TT ORIENTATION HORIZONTAL TT POSITION space. Finally, in order to quantify the influence of speech rate and therefore as well as the influence of speaking style on the articulatory distribution and the articulatory stability of the sibilants, the difference of the TT vertical orientation between the slow and the fast speech rate was calculated. Figure 6: Difference in TT vertical orientation between slow and fast speech rate. Fig. 6 (a) and (b) show a very stable TT orientation for dental and alveolopalatal fricatives across speech rate conditions, while the retroflex s TT orientation differs from that of /s/ only in slow speech. Nonetheless, a repeated measures ANOVA with TT orientation as dependent variable, CONSONANT and RATE as within-speaker variables and SPEAKER as random factor only showed a significant effect for CONSONANT (F[2,6] = 8.3, p .5) but not for RATE. Commensurate with Fig. 6 (c), there was also more variation in retroflex compared to dental and alveolopalatal sibilants, again indicating a greater difference between slow and fast speech rate in retroflex sibilants. An RM-ANOVA with TT orientation difference as dependent variable revealed no significant difference. Given that the mean TT orientations of /s/ and /ʂ/ are almost identical, presumably only a number of speakers seem to show an effect of speech rate on TT orientation. 4. Discussion & Conclusion Three major findings arise from the present study that aimed at explaining a potential neutralization of anterior and nonanterior fricatives using an articulatory analysis of the Polish sibilants /s ʂ ɕ/. The first one is that the three-way place contrast in Polish sibilants is maintained articulatorily in terms of different tongue shapes and positions indicating a synchronic stability. Our second finding, however, revealed that the retroflex shows commensurate with previous acoustic findings considerable acoustic similarities with both dental and alveolopalatal fricatives. On the one hand, /ʂ/ overlaps greatly with the alveolopalatal fricative in spectral properties (cf. also [3]) which is partly related to their being closer together in TT position than are /s/ and /ʂ/, though /ɕ/ is nevertheless closer to /s/. The greater acoustic similarity is likely to stem from similar constriction positions between /ʂ/ and /ɕ/, presumably resulting i
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks