Computers vs. Humans: Which is More Effective for Learning Pronunciation?

Yamada Laboratory, Kyushu University

日本語
English
中文

Computers vs. Humans: Which is More Effective for Learning Pronunciation?

2024年04月11日

In this session of the English literature seminar, I read the following paper.

Title: Computer-aided feedback on the pronunciation of Mandarin Chinese tones: using Praat to promote multimedia foreign language learning
Author: Mengtian Chen
Journal: Computer Assisted Language Learning, 2022, p. 1-26
https://www.tandfonline.com/doi/full/10.1080/09588221.2022.2037652

This paper examined whether visual and auditory feedback on pronunciation can improve the recognition and production of Mandarin Chinese tones as a foreign language.

Learners from non-tonal language backgrounds find it challenging to acquire the tones of Mandarin Chinese. Furthermore, teachers frequently face challenges in giving precise feedback on tonal errors. In response to this challenge, using computers can present learners with audio and visual data (e.g., graphs). It is believed that learners can more easily observe their pronunciation errors through digital feedback provided by computers.

This study explores the potential advantages of digital feedback over traditional human instruction in teaching Standard Mandarin tones. Specifically, the following two research questions (RQs) were set:

1. Can digital feedback, compared to human feedback, improve the recognition and production of tones?
2. What are learners’ impressions regarding the use of computers in learning the tones of Standard Mandarin?

For digital feedback, the free software Praat was used. Praat allows for the presentation of pronunciation waveform data and the correction of tones. In correcting tones, the tones are classified into five categories, and corrected versions of the learners’ voices are provided as feedback.

To investigate RQ1, the research established two groups: an “experimental group,” focusing on multimedia learning, and a “control group,” receiving traditional human feedback. Listening and speaking tests were conducted before and after the experiment. To verify RQ2, after the experiment, a perception survey was conducted on the “difficulties learners feel towards tones” and their perceptions of “feedback by Praat.”

The participants were 44 university students from the United States, all of whom were enrolled in a beginner-level Chinese course. The Mann-Whitney U test confirmed that there were no significant differences in the participants’ pre-test scores.

As a measure of the effectiveness of tone learning, this study focused on word-level tones (tones). The learning material was extracted from a Pinyin textbook, and learners engaged in the study over a period of four weeks. The test consisted of two sections: listening and speaking. In the post-test, in addition to the above academic measurements, participants’ issues with tones, their perceptions of the software feedback, and their prospects for future tone learning were surveyed.

The experimental procedure followed the sequence of “pre-test → 4 weeks of instruction → post-test and post-survey.” In the control group, for pronunciation feedback, teachers listened to the students’ pronunciation and explained how to correct it using hand gestures (visual feedback). Students mimicked the teacher’s pronunciation while observing the teacher’s hand gestures. On the other hand, in the experimental group, as pronunciation feedback, the teacher presented a screen showing the waveform of the student’s pronunciation and the model’s voice using Praat, pointing out where corrections should be made. Students practiced their pronunciation while checking the waveform on Praat. The experiment was designed to provide both groups with auditory and visual feedback on pronunciation.

For the analysis of the test results, the speaking tests were scored by the author and a Chinese language teacher, and an inter-rater reliability coefficient was calculated. In verifying RQ1, the Shapiro-Wilk test rejected the hypothesis that the test scores were normally distributed, so nonparametric tests were applied. The Wilcoxon signed-rank test was used to verify whether there was a significant improvement in the pre- and post-test scores within each group, and the Mann-Whitney U test was used to verify whether there was a significant difference between the groups in the change from pre- to post-test scores.

The experiment’s findings are as follows: In response to RQ1—whether digital feedback enhances tone recognition and production more effectively than human feedback—the Wilcoxon test indicated significant improvements in both listening and speaking scores across both groups. Additionally, the Mann-Whitney U test showed no significant difference in the pre-test scores, but in the post-test, the experimental group significantly outperformed the control group. Therefore, the results indicate that pronunciation feedback through Praat is more effective for Mandarin Chinese tone training than feedback provided by humans.

Next, regarding RQ2, “What are learners’ impressions regarding the use of computers in learning the tones of Standard Mandarin?”, the analysis of the post-survey results yielded the following insights:
Firstly, regarding the difficulties of learning tones, opinions such as “finding it difficult to pronounce the tones correctly after recognizing them” and “it is hard to discern tones within phrases and sentences” were observed.
Secondly, concerning “impressions of feedback provided by Praat,” the positive aspects highlighted include “simultaneous presentation of auditory and visual feedback,” “ability to identify pronunciation errors,” “opportunity to compare one’s own pronunciation waveform with that of the model pronunciation,” and “ability to practice pronunciation without feeling pressured.” Conversely, areas suggested for improvement included “complexity in operating the software” and “the necessity for human feedback as well.”

Based on the results, the following discussion can be made:
Firstly, the significant improvement in post-test scores for both the experimental and control groups suggests that presenting both auditory and visual feedback can enhance the learning of tones. The fact that the post-test scores of the experimental group improved significantly more than those of the control group indicates that computer-assisted instruction has a greater effect in aiding learning compared to human instruction. This can be attributed to the post-survey results, where Praat simultaneously presented the waveform of the learner’s pronunciation alongside the model pronunciation, making it easier for learners to recognize their pronunciation errors and practice improvements independently. Additionally, having one’s own voice as the model makes it clearer what corrections need to be made.
Furthermore, the experiment revealed that practicing pronunciation not only improves the production of pronunciation but also enhances its recognition. This highlights the need for further research on the relationship between production and recognition in language learning.

Here are my thoughts on the paper:
I am conducting research on system development to enhance the pronunciation ability of English words for my master’s thesis. The reason I chose this paper was that I wanted to understand the advantages of software-based pronunciation feedback compared to feedback provided by humans. After reading the paper, I felt that it serves as valuable prior research that supports the effectiveness of pronunciation feedback provided by software.
I think this paper shows that visualizing pronunciation as waveforms can lead to improved pronunciation skills. Since my system also presents feedback using waveform data, I want to use this paper as prior research to prove the effectiveness of such feedback. However, in the experiments of this paper, it’s not just the system-generated feedback that is provided; a teacher’s explanation is also included. Improvements in pronunciation may also derive from teachers’ instructions on enhancing pronunciation, complemented by waveform visualizations. Therefore, I had some doubts about the paper’s claim that the improvement in pronunciation skills is solely due to the effectiveness of digital feedback.

By: Saki Hirata (M2 student)

MARK-LAB.NET@KYUSHU-U
Yamada Laboratory, Kyushu University

Yamada Laboratory is accepting applications for admission to Master's and Doctoral programs. If you want to contribute to the development of education in Japan and around the world, we always welcome you!

Click here