The relationship between perception and production

Get Complete Project Material File(s) Now! »

Perceptual distortions in prelexical and lexical levels

The mismatch between the properties of the native language and the foreign one occurs at two levels: segmental/suprasegmental and phonotactic. Thus, either certain segments or suprasegments of the L2 do not exist in the L1, or certain sequences of L2 sounds are not allowed in the L1. In order to match the phonological constraints of the L1 such mismatches are “repaired” by the perceptual system. Three types of “repair” strategies have been attested: changing the illegal sound or sequence, deleting the disallowed sound or part of the sequence or inserting a sound to correct the disallowed sequence (the later strategy has been attested only in phonotactic repair) (Sebastián-Gallés, 2005; Davidson & Shaw, 2012).
This results in perceptual distortions or “illusions”. We will first give a short overview of such “repair” processes at the prelexical level of processing.
At the segmental level, phonological change is probably the most widely attested type of illusions, that results in L2 sound assimilation (Flege et al., 1999; Hayes-Harb & Masuda, 2008; Escudero et al., 2012, etc.). It occurs when an L2 learner does not perceive the difference between two L2 sounds as both of those sounds are mapped to a single L1 category. One of the most studied examples is the case of Japanese learners who fail to perceive the difference between English liquids /ɹ/ and /l/ as Japanese has only one liquid /r/ (Goto, 1971; Hattori & Iverson, 2009). Concerning the phonotactic mismatches between L1 and L2, phonological change arises when one phoneme is perceptually changed into another to correct a phonotactically illegal sequence of sounds (Hancin-Bhatt, 1994; Brannen, 2002; Cutler et al. 2004, etc.). For instance, French native speakers have been shown to perceptually transform the Hebrew onset clusters /tl/ and /dl/, into onset clusters /kl/ and /gl/ (Hallé & Best, 2007). The second type of phonological illusions, epenthesis, occurs when an L2 learner perceptually inserts an illusory (epenthetic) phoneme in order to correct a sequence of segments that violates the phonotactic constraints of the L1 (Davidson & Shaw, 2012; Durvasula & Kahng, 2016; Guevara-Rukoz et al., 2017, etc.). For example, Japanese speakers perceive an illusory vowel /ɯ/ in disallowed consonant clusters (Dupoux et al., 1999). Finally, the last type of repair strategies, deletion, has been scarcely studied. Deletion could arise as an alternative to change in order to “repair” an illegal segment. Similarly, it could be an alternative to epenthesis in cases where the L2 syllabic structure should be modified to match the L1 phonotactic constraints. These two uses of deletion are indeed attested in loanword adaption where deletion typically concerns less salient sounds or sounds in less salient positions (see Kang, 2011). As there seems to be a close parallel between the perceptual “repair” processes happening in loanword adaptation and those found in speech perception (Peperkamp et al., 2008), it seems likely that deletion in loanword adaptation should as well have its counterpart in perception. One such example has been attested by Cho et al. (2008) who tested Korean learners of English on the perception of the English diphthong [ow], using a phonetic transcription task. The results demonstrated that participants misperceived [ow] as [o], suggesting that they perceptually deleted the second element of the diphthong. Similarly, Mah et al. (2016) carried out an EEG experiment on the perceptual deletion of /h/ by French learners of English and native English speakers. When in an oddball paradigm participants listened to series of “um” and “hum” syllables, MMN was detected only for native French speakers, but not for French learners of English. Thus, due to perceptual deletion the learners did not perceive the difference between the presence vs. absence of /h/. Concerning phonotactics, an example of deletion is attested in Thai, where vowel-adjacent liquids delete, as such sound sequences are disallowed in this language (Yun, 2014). Importantly, these distortions occurring in non-native sound perception can also influence the perception of L2 words containing these sounds. Previous studies used a variety of tasks (lexical decision (Darcy et al., 2013); lexical decision with long-term repetition priming (Pallier et al., 2001); cross-modal priming (Broersma & Cutler, 2011); word identification (Diaz et al., 2012); eye-tracking (Cutler et al., 2006); semantic relatedness judgment (Ota et al., 2009)) to show that perceptual problems at the prelexical level can severely impair the processing of the non-native language at the lexical level. Hence, the three perceptual repair strategies observed at the prelexical level of processing have also been found to surface at the lexical level. Most of the studies that looked at how perceptual distortions affect lexical processing focused on perceptual assimilation. For example, Ota et al. (2009) investigated how the perceptual assimilation of English /ɹ/-/l/ that has been reported in Japanese learners at the prelexical level affects their lexical processing of English minimal pairs that only differ in /ɹ/-/l/. They used /ɹ/-/l/ minimal pairs (e.g., ROCK – LOCK) and /p/– /b/ control minimal pairs (e.g., PEACH–BEACH) in a visual semantic relatedness judgment task. In this task participants saw in each trial two written words (e.g., KEY – LOCK) and had to judge whether they are semantically related or not. Japanese participants made significantly more errors when the trial contained a word involving the sound /ɹ/ or /l/; i.e. when they saw a pair such as KEY – ROCK, they wrongly answered that these words are related (this suggests that the word LOCK was activated when seeing ROCK). This is an indication that Japanese learners have inaccurate phonological representations of English words containing /ɹ/ and /l/ which get activated despite the information provided by orthography.
A few studies tested the negative impact that the two other types of perceptual repair strategies can have on lexical access and word recognition. Regarding perceptual epenthesis, its effect on lexical processing was tested by Dupoux & Pallier (2001). Note, that instead of using stimuli from a foreign language they tested Japanese native speakers on Japanese words and nonwords. The participants performed, among other things, a lexical decision task which showed that they tended to accept nonwords such as *sokdo but not *mikdo as real words (cf. sokudo ‘speed’ and mikado ‘emperor’). This suggests that Japanese speakers perceptually inserted an illusory vowel /ɯ/ in the nonwords, which for *sokdo but not *mikdo resulted in a real word. Thus, because of perceptual illusions, Japanese speakers do not perceive the difference between words and nonwords that differ only in the presence or absence of /ɯ/ in a consonant cluster. A similar pattern of results was obtained by White et al. (2017) who studied the difficulty in lexical processing caused by the perceptual deletion of /h/ in French listeners. In an EEG study, French learners of English performed a semantic classification task on words and nonwords, where the nonwords were created from /h/- and vowel-initial words by removing or adding /h/, respectively. Crucially, the participants were not informed that the items contained nonwords as well as real words. Results revealed that low-proficiency learners did not show an N400 nonword effect; thus, they processed the nonwords as if they were real words, suggesting that the misperception of /h/ resulted in impaired lexical processing of h-initial words and nonwords.
Taken together, the studies reviewed in this section show that the perceptual repair strategies, which occur in order to “repair” L2 sounds that do not match the constraints of the L1, can have a great negative impact on perception both at the prelexical and lexical levels of processing.

Increasing perceptual difficulty with increasing levels of processing

The strength of these perceptual problems might be modulated by experience and proficiency in L2 (Flege et al., 1997). However, within the same level of proficiency, performance on L2 perceptual tasks can also vary depending on the level of processing being tested. That is, in order to decode the incoming acoustic signal into meaningful words the listener has to succeed in accurately performing throughout several stages, starting from auditory processing, phonetic and phonological analysis, to word recognition and lexical access (Pisoni & Luce, 1987). Although under normal listening conditions the accuracy of native speakers of a given language is generally at ceiling across tasks that tap into different levels of processing, there is evidence that early bilingual speakers who succeeded in one task might not succeed in the others. This was first tested by Sebastián-Gallés & Baus (2005) who focused on highly proficient early Spanish-Catalan bilinguals. They performed three perceptual tasks involving the Catalan contrast /e/-/ɛ/ which does not exist in Spanish. The tasks used in this experiment where chosen such as to test the robustness of a range of phono-lexical representations: a categorization task with isolated and synthesized stimuli, a gating task, and a lexical decision task. The results showed that for Spanish-dominant participants the performance on the tasks was increasingly difficult. While many bilinguals (68%) reached native-like accuracy in phonological categorization, only a few of them (18%) reached this level in the lexical decision task. In a further study, Díaz et al. (2012) tested if the same effects can be observed in late learners of an L2. Dutch L2 learners of English were tested on their processing of the English /æ/-/ε/ contrast with tasks that tap into different levels of processing: categorization, lexical decision and word identification tasks. As in Sebastián-Gallés & Baus (2005), they found that many more participants succeeded to perform at a nativelike level in the phonetic task (categorization) than in tasks that involve lexical processing (lexical decision and word identification). This is likely due to the fact that different tasks that involve different levels of processing levels, require different skills.
Another set of studies by Werker and her colleagues (Werker & Tees 1984b; Werker & Logan 1985) demonstrated similar results by using one task but variable ISIs (Interstimulus Intervals) to tap into different levels of processing. Werker & Logan (1985) implemented an AX discrimation task to test native English speakers on a difficult retroflex/dental contrast, which does not exist in English. They used three ISI conditions (250 ms, 500 ms and 1500 ms) and hypothesized that variable memory demands and cognitive load in the task will trigger different processing levels. Thus, an ISI of 250 ms would tap an auditory-acoustic level, one of 500 ms a phonetic level, and one of 1500 ms a phonological level. The results confirmed their predictions, as participants performed differently depending on the ISI condition. Importantly, in the shortest 250 ms ISI condition, native English speakers could discriminate the difficult contrast, as well as within-category phonetic differences. With longer ISIs, however, their performance decreased, suggesting that a task becomes more difficult when it taps into a higher level of processing.
In accordance with these results, several other studies also showed that even the hardest non-native contrasts can be perceived at the low acoustic level. For instance, Dupoux et al. (1997) demonstrated that naïve French listeners can discriminate between Spanish stimuli that vary only in the position of stress in a low-level task, i.e., AX discrimination. However, as soon as the tasks tap into higher order of processing, such as ABX discrimination (Dupoux et al., 1997), sequence recall, or lexical decision (Dupoux et al., 2008), French participants exhibit “stress deafness”, or inability to perceive stress contrasts.
For certain contrasts the difficulty in performing accurately in more complex tasks involving lexical access can persist even after many years of practice. Pelzl et al. (2018) tested highly proficient learners of Mandarin Chinese on several tasks involving Mandarin tones. While L2 speakers were very accurate at tone identification in isolated syllables (experiment 1), they performed extremely poorly on lexical decision (experiment 2) compared to native speakers of Mandarin. Moreover, Pelzl et al. (2018) carried out a third experiment using EGG, where participants performed a sentence judgment task on sentences containing disyllabic real words or tonal nonwords, depending on the condition. Results showed that L2 learners did not show N400 pseudoword effect when hearing sentences with nonwords, suggesting that they did not perceive the difference between words and nonwords.
However, there is evidence that for some contrasts accuracy across levels of processing can improve with proficiency, although the speed and size of the improvement will not necessarily be the same at different levels of processing. Darcy et al. (2013) carried out a set of experiments testing American English learners of Japanese on the Japanese contrast between singleton and geminate consonants, and American English learners of German on the German contrast between front and back rounded vowels. They tested intermediate and advanced learners as well as native speakers of each language in an ABX discrimination task and a lexical decision task. In both language settings, all participants performed with high accuracy in the ABX task, so that the performance of learners and native speakers did not differ. According to the authors, this is an indication that the phonological categorization of a hard contrast can be learned to a nativelike level. The pattern of results was, however, very different in the lexical decision task. Groups differed significantly, and only native speakers performed in consistence with their performance in the ABX task. All groups of learners made much more mistakes in this task, compared to the ABX. Nevertheless, proficient learners had significantly better results that intermediates, pointing to the possibility to improve one’s perception even at the lexical level of processing.

READ Immigrant communities and the immigration process

Distortions in L2 speech production

Differences in L1 and L2 phonological inventories also impact L2 speech production and result in perceived foreign accent. Foreign accent refers to accent at the segmental level and to global accent at the sentence or utterance level (Riney & Flege, 1988). The difficulty in L2 production can arise in two, not mutually exclusive, ways: first, some sounds can be perceived inaccurately, which can lead to wrong production. Second, the pronunciation of certain sounds requires to use some articulators that are not used in the production of L1 sounds, thus resulting in motor difficulty. Concerning the first of these reasons, the perceptual problems described in section 1.1.2. are mirrored in inaccurate production. For instance, the “repair” strategy consisting in perceptually changing one sound into another results in confusions of English /ɹ/ and /l/ in the productions of Japanese speakers. Flege et. al. (1995) tested two groups of Japanese learners of English on English words containing the sounds /ɹ/ and /l/ in a reading and a spontaneous speech task. The authors found that /ɹ/ and /l/ tokens produced by inexperienced Japanese learners were often misidentified by native English judges, whereas productions of advanced learners were much more accurate and did not differ significantly from that of native speakers’. This suggests that the perceptual problems encountered with this difficult contrast were transferred to production. With extensive experience, however, these problems can be overcome. Similarly, the illusion of epenthetic /ɯ/ in consonant clusters, observed in the perception of Japanese learners of English, has its counterpart in production. Masuda & Arai (2008) tested monolingual Japanese speakers and proficient Japanese speakers of English on the production of nonwords containing consonant clusters. They found that both less and more proficient groups of speakers inserted a vowel /ɯ/ in English consonant clusters, but the rate of insertions was much higher in monolingual Japanese speakers (in 80 % of items) than in highly proficient speakers of Japanese (in 12% of items), indicating the possibility of improvement with raising proficiency. Finally, deletions in production have been reported by Janda & Auger (1992) who showed that in English conversation French speakers of English delete /h/ from 5 up to 55% of the time, depending on the speaker. This mirrors patterns in /h/ perception, where French learners do not perceive the difference between the presence and absence of /h/ in English stimuli. Moreover, there is evidence that French learners sometimes use hypercorrection strategies and insert an /h/ in the wrong place (Janda & Auger, 1992; John & Cardoso 2008). This points to the fact that the difficulty with this sound in production stems from imprecise perception, not from the articulatory complexity of /h/, as French learners of English are capable of producing this sound accurately.
Turning to the articulatory difficulty with L2 sounds, pronunciation is considered to be the only “physical” aspect of language that involves complex neuromuscular demands (Scovel, 1988). Therefore, speech production is most affected by physiological limitations, compared to speech perception, morphology or syntax (Simmonds et al., 2011b). Moreover, learning to pronounce an L2 sound that does not exist in the native language requires to retune the neural circuits involved in the motor control of articulation, which is necessary to perform rapid unfamiliar sequences of movements (Simmonds et al., 2011a). One example of such difficulty in pronunciation are click sounds, used in Bantu languages, such as Xhosa. Lewis et al. (1994) tested adult English learners of Xhosa on their production of clicks, and found that learners encountered major difficulties in articulating clicks sounds and differed significantly in intelligibility judgments from the native speakers of Xhosa. Note, that these problems in production are likely to be due to articulatory difficulty and not perceptual problems. Although there are no studies on how English learners perceive clicks in Xhosa, Best et al. (1988) conducted a well-known study on the perception of clicks by English natives in another Bantu language, i.e., Zulu. They showed that English speakers can accurately discriminate between pairs of Zulu clicks, although these sounds do not exist in English or other Indo-European languages. This suggests that despite accurate perception, some L2 sounds cannot be produced accurately because of physiological constraints in articulation.
The level of accentedness can also depend on the amounts of exposure and experience with the foreign language. One of experience-related factors is the length of residence (LOR). For instance, Flege et al. (1997) tested speakers with different native languages (German, Mandarin, Spanish, and Korean) on the production of L2 English vowels. L2 learners were divided into experienced and inexperienced groups, depending on their length of residence in the USA. Results showed that experienced participants were more accurate in their productions of L2 vowels compared to the less experienced ones. The results suggest that L2 pronunciation can improve through practice. Another important factor, often considered in L2 studies, is the age of learning (AOL). It refers to the age at which the learner was first exposed to the L2. The general assumption is that the earlier the AOL, the better the outcomes of learning are. For example, Flege (1993) tested Chinese participants on the production of vowel length before word-final consonants /t/ and /d/ in English words. In such contexts English natives produce longer vowels before /d/ than before /t/, and thus vowel length becomes the cue to differentiate between /t/ and /d/ which sound the same due to final devoicing. Results revealed that the Chinese participants who arrived to the USA in adulthood significantly differed in their productions from native English speakers and Chinese participants who arrived to the USA before the age of 10. Thus, starting to learn a foreign language early can indeed lead to better accuracy in the production of the sounds of this language.
Finally, the degree of perceived foreign accent depends on a variety of other factors, such as gender, formal instruction, motivation, language learning aptitude, amount of native language (L1) use and communicative pressure (Piske et al. 2001).
To sum up, differences between the phonologies of the L1 and L2 might lead to distortions when perceiving L2 sounds. These distortions can affect perception of non-native sounds across levels of processing, these perceptual problems being more difficult to overcome at higher levels of processing than at lower ones. Similarly, mismatches between L1 and L2 phonological inventories result in foreign accent when speaking the L2. Depending on the cases, these difficulties in L2 production might stem from inaccurate perception and/or from articulatory constraints. Although much research has been conducted on L2 phonological processing, both in perception and production, in order to understand the underlying mechanisms behind the acquisition of L2 phonology, many questions remain unanswered. One of them is the relationship between perception and production within and across levels of processing. We will address this question in Chapter 2 of this thesis.

Table of contents :

1 Chapter 1: Introduction
1.1 L2 phonological processing and its relationship to L1
1.1.1 Models of L2 perception and production
1.1.2 Perceptual distortions in prelexical and lexical levels
1.1.3 Distortions in L2 speech production
1.2 Perceptual asymmetries
1.2.1 Directional asymmetries in vowels
1.2.2 Directional asymmetries in consonants
1.2.3 The link between prelexical and lexical asymmetries
1.3 Training
1.3.1 Classical HVPT procedure
1.3.2 Testing the robustness for “real-life processing”
1.3.3 New methods – ecologically realistic environments
1.4 Outline of the following chapters
2 Chapter 2: The relationship between perception and production: the processing of French /u/-/y/ by English natives
2.1 Introduction
2.2 On the relationship between perception and production of L2 sounds: Evidence from Anglophones’ processing of the French /u/-/y/ contrast
2.2.1 Introduction
2.2.2 Methods
2.2.3 Results and discussion
2.2.4 General discussion
2.2.5 Notes
2.2.6 Appendix
2.2.8 References
2.4 Conclusion
3 Chapter 3: Non-native sound perception across levels of processing: the perception of English /h/ by French natives
3.1 Introduction
3.2 Perceptual deletion and asymmetric lexical access in second language learners – Melnik & Peperkamp (2019)
3.3 The effect of phonetic training on L2 word recognition
3.3.1 Introduction
3.3.2 Methods
3.3.3 Results and discussion
3.3.4 General discussion
3.3.5 References
3.4 The relationship between prelexical and lexical asymmetries in L2 perception
3.4.1 Introduction
3.4.2 The phonetic properties of /h/ and predictions of existing models
3.4.3 Results from the training study
3.4.4 Discussion
3.4.5 Comparing results from lexical decision in the Melnik & Peperkamp (2019) study and the training study
3.5 Conclusion
4 Chapter 4: General discussion
4.1 The relationship between perception and production
4.1.1 Summary of empirical work
4.1.2 Further questions
4.2 Non-native sound perception across levels of processing
4.2.1 Summary of empirical work
4.2.2 Further questions
4.3 Insights from perception on the development of production
Appendix A: The role of domain-general cognitive capacities in L2 phonological learning
A.1 Introduction
A.2 Methods
A.3 Results
A.4 Discussion
Appendix B: Online Phonetic Training Improves L2 Word Recognition – Melnik & Peperkamp (to appear, 2019)
References