Yamada Laboratory, Kyushu University

About the paper on the impact of speech recognition technology on “vocabulary” and “pronunciation skill”


I am Saki Hirata, a second year of master’s degree.
I read the following paper.

Title:Look I can speak correctly learning vocabulary and pronunciation through websites equipped with automatic speech recognition technology
Author:Muzakki Bashori, Roeland van Hout, Helmer Strik and Catia Cucchiarini
Year of publication:2022
Journal:Computer Assisted Language Learning (DOI: 10.1080/09588221.2022.2080230)

This is a paper about an English learning website that uses automatic speech recognition technology and examines its effectiveness in learning word meanings and pronunciations.

Insufficient vocabulary and pronunciation skills have been reported to interfere with speaking. In this study, the effectiveness of two voice recognition web applications, ILI and NOVO, was verified to address the above problem.
ILI provides two types of feedback on pronunciation: “Excellent” (for correct answers) and “Try again” (for incorrect answers), whereas NOVO points out “phoneme errors” and displays “the phoneme that is incorrectly pronounced”. NOVO gives more detailed pronunciation feedback than ILI by pointing out pronunciation errors and explaining how to pronounce that phoneme.

A three-condition experiment was conducted toward 146 participants, including an ILI group, a NOVO group, and a control group, in which vocabulary and pronunciation skills were measured before and after the intervention. The study distinguished two aspects of vocabulary: recognition and production, and collected data on recognition vocabulary, production vocabulary, and pronunciation ability. Recognition vocabulary is the vocabulary that answers the meaning of an English word from its meaning, whereas production vocabulary is that learners answer the correct English word from its meaning. Production vocabulary is generally considered more challenging than recognition vocabulary.
The experiment was conducted for two weeks, with the ILI group using ILI and the NOVO group using NOVO, while the control group was given the same amount of study time as the experimental group to learn words.

The results of the experiment showed that “recognition vocabulary” and “pronunciation ability” improved in the experimental group using the web application.
Specifically, the ILI and NOVO groups outperformed the control group on a test measuring word recognition, and no significant group differences were shown with respect to word production. Regarding pronunciation ability, analysis of variance of pre-post differences in pronunciation scores showed that the experimental group (ILI and NOVO groups) had significantly higher pronunciation scores than the control group. On the other hand, there was no significant difference between the ILI and NOVO groups, and the control group did not significantly improve their pronunciation scores in the posterior.

Based on the results, ILI and NOVO are effective in improving “recognition vocabulary”. On the other hand, no inter-county differences were found for “production vocabulary (i.e., answering English words from their meanings),” suggesting that word production is more difficult than recognition. In addition, regarding pronunciation ability, the ILI and NOVO groups improved significantly, while the control group did not improve pronunciation ability at all. The reason for this may be the lack of pronunciation practice in the independent study conducted by the control group. The system’s pronunciation evaluation environment, based on speech recognition, may have encouraged pronunciation practice.

Now I would like to share with you my impressions of this paper.
I chose this paper because it has some points in common with my own research for example, it measures both vocabulary and pronunciation ability, and I thought I could refer to the evaluation and analysis methods for my next experiment.

There are two points that I thought were good.
First, this study verifies the effectiveness of the system by distinguishing two vocabulary aspects, “recognition” and “production,” through a vocabulary test design that relies on previous research. I thought this point had originality. Furthermore, regarding “production,” the vocabulary was measured separately for “meaning” and “pronunciation,” and I wanted to put this evaluation method into practice in my own research.
Second, the results of this paper are considered reliable because the number of subjects is very large (146). Another novelty was the fact that two experimental groups with different pronunciation feedback methods, such as the ILI group and the NOVO group, were set up to examine the difference in effect due to the difference in feedback.

On the other hand, there were some points which is hard to understand.
Firstly, I could not understand how ILI and NOVO approach vocabulary and pronunciation skills, which were the goals of the study. For example, the results of the experiment showed that the system was significant in improving recognition vocabulary, but since the system does not explain the functions that support recognition vocabulary, it is not clear why vocabulary was improved. Also, from reading the description about the system, it seems that there is no function to support vocabulary production.
Furthermore, with regard to pronunciation skills, the NOVO group did not significantly improve pronunciation skills more than the ILI group, even though the NOVO group provided more detailed pronunciation feedback. This point was also questionable.

I think this paper would be even more interesting if there is a discussion of the results of the experiments as they relate to the design and function of the system.
In my research, I would like discuss the effects of the theory, function, and design on the learning effect.