Publications

On this page, you can browse the MACAWS team's publications. To look up the abstract of a specific publication, click on the links provided below.

Asset-Oriented Approaches to Learner Corpus Data

The production of Portuguese mid-vowels by English-Spanish bilinguals

Construction Learning of Chinese: From Morphemes to Clauses and Beyond

Teaching with learner corpus data

Multilingual learner corpus for less commonly taught languages

Learner corpus as a medium for tasks

Syntactic and morphological complexity measures as markers of L2 development in Russian

The acquisition of preposition + article contractions in L3 Portuguese among different L1-speaking learners: A variationist approach

L3 Portuguese by Spanish-English bilinguals: Copula construction use and acquisition in corpus data

Asset-Oriented Approaches to Learner Corpus Data

Staples, S., Gorlova, A., Sommer-Farias, B., Vinokurova, V., Centanin-Bertho, M, & Novikov, A. (2025). Asset-oriented approaches to learner (corpus) data. L2 Journal, 17(1). https://escholarship.org/uc/item/5mt3w059

In this article, we discuss how learner corpus data can be used to promote asset-oriented approaches to language learning. We discuss how four tenets of asset-oriented approaches—challenges to the native speaker norm, accessibility/authenticity, advocacy, and agency—can be encouraged through using learner tasks. We introduce specific activities for Portuguese and Russian classrooms to promote this approach, which are freely available through our blog, and provide preliminary results from our teacher and student feedback on these activities.

The production of Portuguese mid-vowels by English-Spanish bilinguals

Centanin-Bertho, M. (2025). The production of Portuguese mid-vowels by English-Spanish bilinguals. [Doctoral Dissertation, University of Arizona].

Portuguese is among the 15 languages other than English with higher enrollment rates in higher education institutions across the US (MLA, 2023). The majority of students taking Portuguese courses also speak Spanish, thus, they are English-Spanish bilinguals learning Portuguese as a third language. However, their experience with Spanish is not homogenous: some have learned Spanish as an L1 or heritage language, in a naturalistic environment, and others have learned Spanish as an L2, later in life, in an instructional setting. In this context, this study aims at investigating the acquisition of L3 Portuguese, more specifically the phonological aspect, by English-Spanish bilinguals, comparing L1 Spanish and L2 Spanish speakers and three course levels (first, second, and third consecutive semesters of a Portuguese language program). More specifically, this study provides an acoustic analysis of learners’ production of Portuguese mid-close and mid-open vowels, two phonemic contrasts that do not exist in Spanish phonology but pertain to the acoustic space of English vowels.

Theoretical models in L3 acquisition have mainly investigated how learners’ linguistic repertoire affects their process of acquisition of the third (or more) languages, i.e. which language(s) are more likely to be the source of transfer and if transfer will happen exclusively from one language or on a property-by-property basis (Schwartz and Sprouse, 2021). The studies in L3 Portuguese have mainly focused on morphosyntax features (e.g. Giancasparo et al., 2015; Cabrelli Amaro et al., 2015) and show evidence of transfer from the most typologically similar language in learners’ linguistic repertoire, favoring the Typological Similar Model (Rothman, 2011; Rothman, 2014). Therefore, in the context of English-Spanish bilinguals learning L3 Portuguese it is expected that Spanish will be the main source of crosslinguistic transfer. However, this debate is not settled yet specially when it concerns L3 phonological acquisition (Cabrelli & Pichan, 2019). The student population of this dissertation – English-Spanish speakers, with different experiences in Spanish acquisition, learning L3 Portuguese – is an ideal context to test if the experience learning Spanish will affect the source of linguistic transfer to Portuguese.

The studies in this field are hosted under formal linguistics and have followed an experimental approach, using grammaticality judgement tasks, perception tasks, discrimination tasks, and elicited production tasks. Meanwhile, spoken corpora studies, based on compilation of naturally occurring language, have the potential to contribute to the field as it reveals features of learners’ spontaneous or semi-spontaneous performance. For that reason, this study proposes the acoustic analysis of the oral production of English-Spanish speakers’ learners of L3 Portuguese compiled in the Multilingual Academic Corpus of Assignments – Writing and Speech (Staples et al. 2019-). The corpus subset analyzed in this dissertation includes the production of oral course assignments by 38 students enrolled in three course levels (first, second, and third semester), 21 who have acquired Spanish as L1/heritage languages and 17 who have learned Spanish as an L2 later in life in instructional settings. The goal is to determine if there is any significant difference in learners’ production comparing L1 Spanish and L2 Spanish speakers and across course levels.

As the L3 theoretical models strongly suggest the English-Spanish bilinguals will draw from Spanish in the development of their Portuguese acquisition, I selected mid-open and mid-closed vowels as the target segments for learners’ production analysis. A frequency and functional load analysis was also carried out in a corpus representative of spontaneous spoken language in Brazilian Portuguese, the C-ORAL-Brasil-I (Raso & Mello, 2012). Based on the analysis of frequency of occurrence of English segments in Gilner and Morales (2010), the results indicated that mid-close and mid-open vowels (/e-ɛ/ and /o-ɔ/) have the potential to cause perception and production issues in Spanish speakers’ Portuguese production because open vowels are much less frequent than their closed counterparts. This difference in frequency may cause the less frequent item (the open vowels /ɛ, ɔ/) to be more obscure for learners to notice and distinct from their closed pairs (/e, o/), especially because such phonemic contrasts do not exist in Spanish. The Pillai score (Hall-Lew, 2010), a value from 0 to 1 that indicates how much two datapoint clusters overlap, was calculated for each participants’ production of /e-ɛ/ and /o-ɔ/.

Results show no significant distinction between mid-front vowels and mid-back vowels in learners’ production. The Mann-Whitney statistical test was used to compare the Pillai scores between L1 and L2 Spanish groups and no significant difference between groups was observed. The Kruskal-Wallis statistical test was used to determine any significant difference across course levels. The only comparison that showed significant difference was in the production of /o-ɔ/ between second and third semester. Interestingly, second semester had a higher Pillai score mean (.37) than third semester (.14). Since the closer to 1 the more distinct the two vowels are, this finding suggests that learning experience does not necessarily affect the production of this phonemic contrast. The results also indicate that the experience learning Spanish was not a determinant factor in learners’ production. Euclidean distance was also calculated as another indicator of level of merge between /e-ɛ/ and /o-ɔ/. The Euclidean distance means of course level groups and linguistic profile showed similar results as the Pillai score analysis. There was no clear pattern of open/closed mid-vowels distinction across course levels. Linguistic profile was also not determinant to participants production. These findings may suggest that order of acquisition of Spanish is does not affect participants’ production, in supporting the Typological Similarity Model (Rothman, 2011). Nonetheless, this finding should be taken with cautious as unbalanced sample sizes may have affected the results as well as individual differences.

Construction Learning of Chinese: From Morphemes to Clauses and Beyond

Chen, C. (2023). Construction learning of Chinese: From morphemes to clauses and beyond. [Doctoral Dissertation, University of Arizona].

Construction Grammar has been a prominent framework in linguistic research for nearly three decades, attracting scholars from various disciplines, including language acquisition, corpus linguistics, and psycholinguistics. While considerable progress has been made in understanding the first language (L1) construction learning of various languages, research on second language (L2) acquisition of constructions has predominantly focused on English, particularly on verb argument structures. However, there exists a significant gap in the literature regarding the L2 acquisition of constructions in Mandarin Chinese.

This dissertation employs a constructionist approach to investigate L2 acquisition of Chinese constructions, including the aspect marker le, constructions containing gei, and the bei passive construction. The objective is to provide a comprehensive understanding of L2 acquisition of Chinese constructions, supported by empirical evidence.

The first part of the dissertation delves into the impact of input frequency and textbook exposure on the L2 acquisition of the aspect marker le. It uncovers how learners comprehend this functional morpheme and its meaning, shedding light on the influence of textbooks and the progression of learners’ proficiency. The second part examines the L2 acquisition of constructions containing gei. This investigation explores how learners acquire the meaning of these forms based on the positioning of gei, revealing the nuances and complexities of L2 construction learning. The third and final part of the dissertation delves into the bei passive construction, focusing on its adversative meaning and the role of pragmatics in construction learning. This study explores how learners associate the bei passive construction with adversity.

In summary, this dissertation significantly contributes to our understanding of L2 construction learning in Chinese. The findings are not only pertinent to linguistic research but also hold pedagogical significance, offering guidance to educators teaching Chinese as a second language.

Teaching with learner corpus data

Sommer-Farias, B., Vinokurova, V., Gorlova, A., & Centanin-Bertho, M. (2023). Teaching with learner corpus data. FLTMAG. https://fltmag.com/teaching-learner-corpus-data/

This article discusses the use of learner corpora in language teaching. It outlines the argument for using learner corpora in the classroom, discusses key terms in corpus-based teaching and strategies for using learner corpora, and presents a sample lesson from a Russian language classroom based on data from the MACAWS corpus.

Multilingual learner corpus for less commonly taught languages

Sommer-Farias, B., Novikov, A., Picoral, A., Bertho, M., & Staples, S. (2022). Multilingual learner corpus for less commonly taught languages. International Journal of Learner Corpus Research, 8(2), 261-282.

Available at https://www.jbe-platform.com/content/journals/10.1075/ijlcr.21001.som

This article provides a detailed account of the framework and research and pedagogical applications of the [our learner corpus]. [Our learner corpus] is a monitor learner corpus of written and oral assignments on various topics from Foreign Language (FL) learners. Currently the corpus contains 124,054 words in Russian and 536,168 in Portuguese but it is updated each semester as new texts are added to the corpus. The online interface allows teachers, students and researchers to search for words and phrases and access metadata on students, courses and assignments. Our novel interactive Data-driven Learning (iDDL) tool allows embedding of concordance lines into websites and Learning Management Systems (LMS), facilitating student interaction with concordance lines. We also have an offline version of the corpus that is available upon request.

Keywords: multilingual, Less Commonly Taught Languages (LCTL), interactive Data-driven Learning (iDDL)

Learner corpus as a medium for tasks

Novikov, A., & Vinokurova, V. (2022). Learner corpus as a medium for tasks. In W. Martelle & S. V. Nuss (Eds.), Teaching Russian through task: Task-based/supported instruction of Russian as a foreign language. Routledge.

This chapter argues for the use of a learner corpus in task-based teaching of Russian. First, the chapter provides the definitions of tasks and discusses texts in task-based teaching. Second, we elaborate on focused tasks in light of the principles of language awareness and introduce Data-Driven Learning (DDL). And finally, we describe the two types of focused tasks, namely structure-trapping tasks and DDL tasks, and explain how a learner corpus can be seen as a medium between the more traditional structure-trapping tasks and innovative DDL tasks.

Syntactic and morphological complexity measures as markers of L2 development in Russian

Novikov, A. (2021). Syntactic and morphological complexity measures as markers of L2 development in Russian [Doctoral dissertation, University of Arizona].

Within second language acquisition research, L2 development has been traditionally analyzed through the dimensions of Complexity, Accuracy and Fluency (CAF) (Larsen-Freeman, 2009; Ortega, 2003; Skehan, 2009). Complexity within the CAF framework has gained the most attention and has often been examined through measures associated with clausal length (Bulté & Housen, 2012). In contrast, the present study examines complexity through the register-functional framework (Biber, Gray & Poonpon, 2011; Biber, Gray & Staples, 2016). The fundamental principle of the register-functional framework is that complexity is situation-dependent, meaning that complexity depends on the situational characteristics of texts such as communicative purposes of texts and their production circumstances. Thus, instead of relying on particular measures of complexity such as complexity indices or T-unit measures, the investigation of complexity within the register-functional framework begins with a linguistic description of texts. The present study builds on the foundation of the register-functional framework and adds to the body of L2 development research by being the first of its kind in L2 Russian. Previous studies that investigated L2 complexity in Russian are rather few (e.g., Henry, 1996; Kisselev & Alsufieva, 2017) but are also limited in that they 1) only study writing; 2) use either omnibus complexity measures or a rather limited set of measures; 3) investigate high levels of proficiency. The overall goal of this study is to provide a comprehensive description of syntactic and morphological L2 development at lower levels (e.g., beginner to intermediate) across speech and writing in Russian. To address these research gaps, the present study examines L2 development through morphological and syntactic complexity measures in L2 Russian. The study uses a corpus of written and spoken texts produced by learners across four program levels (i.e. the first two years of Russian). First, the study examines individual measures of morphological and syntactic complexity and interprets the findings in light of the curriculum progression and assignment effects. Second, the study performs a Multidimensional Analysis (MD) in order to group the individual measures of complexity into dimensions of complexity that are interpreted functionally. The results of the study show that both syntactic and morphological complexity measures behave differently across program levels. For example, while adverbial if- and when- clauses increase with program level, adverbial because-clauses decline. Similarly, while post-modifying nouns increase, attributive adjectives decrease. In terms of morphological complexity, the measures that have clear increasing trends across program levels are genitive nouns and adjectives, instrumental nouns and adjectives, dative nouns, and past perfective and imperfective verbs. The Multidimensional Analysis (MD) yielded two dimensions of complexity: 1) Narrative vs. Non-narrative/Descriptive, and 2) Informational vs. Personal. The narrative side of Dimension 1 includes perfective and imperfective past verbs, while the non-narrarive side includes 3rd person plural verbs and attributive adjectives. The informational side of Dimension 2 is represented by complexity measures such as prepositional adjectives, genitive singular nouns, genitive adjectives and attributive adjectives. In contrast, the personal side is characterized by such measures as 1st person present tense verbs, accusative nouns and non-finite complement clauses. These dimensions of complexity showed significant differences between program levels. Significant interactions between program level and mode were also demonstrated pointing out to differences between speech and writing with regards to these dimensions across program levels. Although the complexity measures included in these dimensions are very specific to Russian, these two dimensions have been consistently identified in other MD studies.

To view this dissertation, click here.

The acquisition of preposition + article contractions in L3 Portuguese among different L1-speaking learners: A variationist approach

Picoral, A., & Carvalho, A. (2020). The acquisition of preposition+article contractions in L3 Portuguese among different L1- speaking learners: A variationist approach. Languages, 5(4), 45-62.

This paper sheds light on the paths of third language (L3) acquisition of Portuguese by Spanish–English speakers whose first language is Spanish (L1 Spanish), English (L1 English), or both in the case of heritage speakers of Spanish (HL). Specifically, it looks at the gradual acquisition of a categorical rule in Portuguese, where some prepositions are invariably contracted with the determiner that follows them. Based on a subcorpus of MACAWS, comprising 1910 written assignments by Portuguese L3 learners, we extracted 21,879 tokens in obligatory contraction contexts and submitted them to a multivariate analysis. This analysis allowed for the investigation of the impact of linguistic (type of preposition and definite article number and gender) and extra-linguistic factors (course level and learner’s language background), with logistic regression modeling with sum contrasts and individual as a random effect. While results point to some clear similarities across the three language groups—all learners acquired the contractions in a u-shaped progression and used more contractions with the a preposition and fewer with the por preposition—participants acquire contractions at a higher rate when the article is singular than when it is plural, and in the case of HL speakers, more so when the article is masculine than when it is feminine. These results confirm the facilitatory role of a previously acquired language (i.e., Spanish) that is typologically similar to the target language (i.e., Portuguese) in transfer patterns during L3 acquisition.

You can access the full article at the Languages webpage. We welcome your comments and questions!

L3 Portuguese by Spanish-English bilinguals: Copula construction use and acquisition in corpus data

Picoral, A. (2020). L3 Portuguese by Spanish-English bilinguals: Copula construction use and acquisition in corpus data. [Doctoral dissertation, University of Arizona].

Previous research on third language (L3) acquisition has shown that the source language for transfer to the L3 can be either an L1, an L2, or both (Bardel & Falk, 2007; Flynn et al., 2004; Rothman, 2014). It has been hypothesized that either typological similarities between languages previously acquired and the target language (Rothman, 2010), or language status (L1 vs. L2) of previous acquired languages (Bardel & Falk, 2007) determine cross- linguistic influence. This dissertation investigates the acquisition of copula structures in L3 Portuguese by Spanish-English three groups of adult bilinguals: L1 English L2 Spanish, L1 Spanish L2 English, and L1 Spanish/English (i.e., heritage speakers of Spanish for the purposes of this dissertation). Language use by both native speakers (L1 Spanish, L1 English, and L1 Portuguese) and learners (L3 Portuguese) is analyzed using word embeddings and logistic regression modeling. The goal of these methods is to reveal patterns of copula use and acquisition. Copula constructions were chosen because they allow for the combined investigation of form, syntactic frame, and concept/meaning, as proposed by third language acquisition scholars. The main goal of this dissertation is to shed light on both transfer patterns from previously acquired languages (i.e., Spanish and English) on L3 Portuguese, and establish L3 Portuguese developmental patterns across bilingual groups. Results show evidence of L3 Portuguese development for all three groups of Spanish-English bilinguals. However, transfer patterns from Spanish and English onto L3 Portuguese are not the same across all groups, varying in degree depending on the copula construction. These results conflict with the Typological Primacy Model, which predicts that L3 acquisition in adulthood starts o from a wholesale transfer of the pre-acquired language system that is most typologically similar to the target language (Rothman, 2014). This dissertation offers support instead to L3 acquisition models that take into consideration structural characteristics of individual constructions, and how similar or different these are between source and target languages, including models such as the Parasitic Model (Hall et al., 2009).

Google Sites

Report abuse