Targeted Pronunciation Instruction in Multilingual Classrooms: A Mixed-Methods Study 2025

Pronunciation lives where clarity meets confidence: when learners are understood, they speak more, risk more, and grow faster—but in many programs, it still plays second fiddle to grammar and vocabulary. That gap becomes wider in multilingual classrooms, where different first-language (L1) sound systems shape how learners hear and produce English vowels, stress, and intonation, often in ways that a one-size-fits-all syllabus cannot reach. A learner might confuse [ɪ] and [iː] in minimal pairs like “ship” and “sheep,” or carry over rhythm from a syllable‑timed or tonal L1, which muddles English stress and intonation contours in connected speech. Without systematic, targeted practice, those patterns can fossilize: intelligibility drops, anxiety spikes, and participation shrinks—a loop that reinforces itself unless instruction intervenes.

Despite a decade of renewed attention to intelligibility-focused teaching and technology‑supported feedback, much of the evidence still comes from relatively homogeneous cohorts, leaving open questions about what works best when error profiles vary widely across L1s and proficiency levels. Teachers also confront practical choices: which features pay off fastest for communication, how to tailor activities under time pressure, and how to blend perception, production, and prosody work without losing classroom momentum. This study tackles those questions by testing a targeted pronunciation program that foregrounds high‑impact features, uses diagnostics to personalize practice, and closes the loop with interactive tasks that demand real‑world clarity.

Also Read: Neutrosophic AI in Education: Uncertainty Modeling for Personalized, Transparent, Equitable Learning

Gaps in the literature

Multilingual cohorts remain underrepresented in empirical reports, limiting guidance on how to orchestrate instruction when learners bring very different phonological backgrounds to the same room.
Long‑term retention and transfer to spontaneous speech are less documented than short‑term gains on controlled tasks, especially beyond the end of a single term.
Affective outcomes—confidence, anxiety, willingness to communicate—are acknowledged in principle but too rarely measured alongside acoustic or perceptual accuracy.

Research objectives

Identify prevalent segmental and suprasegmental challenges in multilingual English classrooms, with attention to vowel contrasts and stress–intonation patterns.
Evaluate a targeted instruction module for gains in vowel contrast accuracy and stress control relative to standard practice.
Examine effects on confidence and willingness to communicate via interviews and classroom observation.
Propose a classroom-ready framework for diagnostics, sequencing, and feedback that scales in diverse cohorts.

Methodology

Research design

A mixed‑methods approach combined quasi‑experimental pre/post testing with qualitative observations and semi‑structured interviews, capturing both measurable progress and lived classroom experience.

Research Link: [link]

Participants and sampling

Two hundred learners from secondary schools and language institutes participated, purposefully sampled to represent varied L1s, ages, and intermediate proficiency profiles; groups were split into a targeted‑instruction condition (n=100) and a standard‑curriculum control (n=100).

Instruments

Diagnostics and outcomes: reading‑aloud tasks, minimal pair perception–production for targeted vowels, and controlled/semicontrolled measures of word/sentence stress and basic intonation contours.
Instructional module: tense–lax vowels (e.g., /iː/–/ɪ/), mid‑vowel contrasts as relevant, lexical stress rules (including stress shifts), and core intonation patterns (statements, yes/no and wh‑questions, lists, contrastive focus), practiced through perception–production cycles, shadowing, rhythmic tapping, and guided dialogues.
Interviews and observations: prompts on perceived gains, strategy use, confidence, and transfer; rubrics tracking engagement, feedback density, and breakdown/repair episodes.

Procedures

Across 12 weeks, the targeted group met for two hours weekly, moving from perception to controlled production to communicative tasks; subgroup diagnostics ensured time went to the highest‑payoff contrasts and prosodic patterns. The control group followed standard instruction with minimal pronunciation focus; both groups completed pre‑tests in week 1 and post‑tests in week 12, with rolling observations and endline interviews supplementing scores.

Data analysis

Paired t‑tests assessed within‑group change; independent t‑tests compared post‑test outcomes; Cohen’s d gauged practical magnitude; thematic analysis of interviews and field notes captured error awareness, strategy uptake, confidence, participation, and observable transfer.

Results

Quantitative outcomes

Learners in the targeted condition showed substantial improvement: vowel contrast accuracy rose from 62.5 (SD=6.8) to 78.3 (SD=7.2) and stress accuracy from 60.8 (SD=6.5) to 76.4 (SD=7.0); the control group’s gains were modest by comparison (vowels 61.9→64.2; stress 60.5→63.1). Within‑group gains were statistically significant (p<.05), between‑group differences favored targeted instruction (p<.05), and effects were large for vowels (d=1.48) and moderate for stress (d=0.92).

Qualitative insights

Four themes recurred across interviews and observations: learners noticed their own errors earlier and more precisely; they adopted durable strategies (minimal pairs, shadowing with rhythm and pitch cues, stress marking); they reported lower anxiety and more spontaneous talk; and they described fewer misunderstandings and smoother turn‑taking beyond class.

Case vignettes

A university learner finally separated /ɪ/ and /iː/—“ship” versus “sheep”—after short daily perception drills and slow‑to‑fast production cycles, and classmates began asking for repeat explanations far less often.
A working professional used stress rehearsal and script annotation to sharpen emphasis and flow in presentations, with colleagues noting clearer key points and a steadier pace.
A lifelong learner, previously quiet in group tasks, started volunteering turns after guided dialogues normalized feedback and made progress visible week by week.

Discussion

Interpreting the gains

Concentrating instruction on a small set of high‑impact segmental and suprasegmental targets—and cycling from perception to production to communicative use—appears to recalibrate categories and patterns efficiently. The larger vowel effects fit decades of work on the value of contrast‑rich, feedback‑dense practice; prosodic changes, while robust, often trail as learners coordinate stress and intonation with syntax, discourse intent, and new lexical items. Crucially, the affective boost matters: as errors drop and self‑monitoring improves, learners speak more, and that extra speaking accelerates improvement in a reinforcing loop.

Why ad hoc coverage falls short

Incidental pronunciation work rarely supplies enough contrast, repetition, or timely feedback to reshape entrenched habits, and it seldom aligns with the feature priorities that yield the greatest intelligibility gains for a given cohort. Diagnostics make the difference: knowing which contrasts and prosodic cues will pay off fastest lets teachers spend scarce minutes where they count.

Multilingual classrooms as an asset

Diversity complicates planning but enriches modeling: different L1s surface different patterns, and hearing peers’ solutions can spark noticing and self‑correction that a homogeneous group may miss. Subgroup targeting within shared cycles kept instruction coherent while still personalizing the heavy hitters for each cluster.

Implications for practice and policy

Classroom integration

Build short, recurring cycles: diagnose, prioritize, practice (perception → controlled production → communicative use), reassess, repeat.
Focus on high‑yield targets first: tense–lax and mid‑vowel contrasts; lexical stress; core intonation contours for common discourse moves.
Make feedback visible and actionable: quick modeling, contrastive examples, stress marks, and simple arrows for pitch movement.

Teacher development

Train for L1‑informed diagnostics and cross‑language error patterns that most affect intelligibility.
Build prosody feedback skills: rhythm training, focus placement, and guided shadowing that moves from phrases to connected speech.
Share ready‑to‑use activity banks indexed by feature, level, and communicative task type.

Curriculum and assessment

Tie pronunciation targets to real tasks—presentations, interviews, debates—so practice meets performance.
Include intelligibility and prosodic control in rubrics, not just segmental accuracy, and space practice across the term.

Technology at scale

Use speech recognition and visualization for timely feedback on contrasts, stress, and pitch; assign low‑stress CAPT work beyond class to save classroom time for interaction.
Personalize practice sets adaptively and track longitudinal progress to sustain motivation and guide next steps.

Equity and access

Equip underserved programs with mobile‑ready tools and succinct instructor guides; where helpful, provide L1‑specific contrast sheets while keeping production practice in English.

Limitations

A 12‑week arc cannot answer questions about long‑term retention or transfer to spontaneous speech months later; the focus on intermediate learners also means beginners and advanced learners deserve targeted testing with adjusted goals and dosage. Classroom dynamics—feedback styles, peer norms—introduce variance even with training, and measures emphasized high‑impact features rather than the full phonological landscape or discourse‑level prosody.

Future research

Follow learners 6–12 months to track maintenance and transfer into presentations, interviews, and everyday talk.
Calibrate modules and dose by proficiency band, from beginner category formation to advanced prosodic finesse.
Standardize feature‑priority frameworks and progressions by L1 profile and communicative domain.
Test adaptive, AI‑supported practice that blends perception training, production feedback, and prosody visualization at scale.
Ensure culturally responsive delivery across diverse classrooms, centering learner identities and goals.

Conclusion

Targeted pronunciation instruction—diagnostic, focused, and practiced from perception to real use—delivers measurable gains in vowel contrasts and stress accuracy while lifting confidence and willingness to speak in multilingual classrooms. The core idea is simple: do the small set of things that matter most, do them visibly and often, and connect them to speaking that counts, so progress can be heard, felt, and sustained. With short, frequent cycles, teacher training that demystifies prosody, and technology that supplies timely feedback, pronunciation can move from an afterthought to a pillar of equitable, communicative instruction.

𝕃𝕀𝕆ℕ𝕁𝔼𝕂

Targeted Pronunciation Instruction in Multilingual Classrooms: A Mixed-Methods Study 2025