How Does an AI-Enabled Formative Assessment Tool Support Learning?

Yamada Laboratory, Kyushu University

日本語
English
中文

How Does an AI-Enabled Formative Assessment Tool Support Learning?

2026年02月26日

Hello everyone.

In this article, I would like to introduce a paper I read in our most recent English Literature Seminar along with my personal thoughts.

Paper Title: Design and implementation of an AI-enabled visual report tool as formative assessment to promote learning achievement and self-regulated learning: An experimental study
Journal: British Journal of Educational Technology
Volume & Issue: 55(3)
Pages: 1253-1276
Year of Publication: 2023
Authors: Xiaofang Liao, Xuedi Zhang, Zhifeng Wang, Heng Luo

Formative assessment, unlike summative assessment which focuses only on the final outcome of learning, focuses on the learning process and provides feedback on performance. Among these, traditional formative assessment methods such as monthly tests make it difficult to accurately reflect a learner’s cognitive structure and learning process, and the presentation of information can sometimes be unintuitive. In recent years, advances in AI technology have made it possible to provide more accurate and personalized feedback. Particular attention is being paid to the utilization of Natural Language Processing (NLP) and Cognitive Diagnostic technologies. NLP is a computer analysis technique for understanding and generating human language, which can process and analyze textual information from monthly tests to improve the accuracy and efficiency of evaluation. On the other hand, cognitive diagnostics is a technique for identifying latent cognitive states, such as the level of mastery or proficiency of knowledge points, from learner behavioral data like test answers. By combining these technologies with visualization techniques, it becomes possible to easily discover information that learners might not notice and to clearly understand their current learning level. Therefore, in this study, the researchers developed a formative assessment tool that integrates natural language processing and cognitive diagnostic technologies and investigated its effects on learning achievement and self-regulated learning in high school biology classes.

In the experiment, a 13-week study was conducted with 125 third-year high school students (71 males, 54 females) in China. Participants were divided into an experimental group (63 members) and a control group (62 members), and they studied chapters 1–4 of a biology textbook (molecules, cells, genetics, and evolution). Biology was chosen because it is rich in structured declarative knowledge and was judged suitable for high-frequency formative assessment. The experiment was conducted in three stages. In the first week, a pre-survey was conducted using a questionnaire based on the MSLQ (Motivated Strategies for Learning Questionnaire) to confirm any differences in self-regulated learning skills between the two groups before the experiment. From the 2nd to the 12th week, lessons and instruction on chapters 1–4 of the biology textbook were provided to both groups, and three monthly tests were conducted. Each test was created by teachers with 10–15 years of educational experience, based on the 2017 high school biology curriculum. Regarding feedback methods, the control group received feedback on scores and rankings, along with general oral evaluations from the teachers. Meanwhile, the experimental group received the same scores and rankings, in addition to three feedback reports provided by the AI tool. In the 13th week, a post-survey using the self-regulated learning questionnaire was conducted.

The developed AI evaluation tool is composed of six modules based on the four feedback levels of Hattie and Timperley (2007). These modules are designed to deepen the learner’s understanding step-by-step: First, as task-level feedback, it displays performance rankings and individual proficiency levels. Next, at the process level, it provides functions for error analysis and “knowledge alerts.” In the knowledge alert module in particular, the level of knowledge mastery is displayed using color-coding, visually indicating areas that need improvement. Furthermore, at the self-regulation level, a module to prompt self-evaluation and reflection is provided, and finally, as self-level feedback, a module for summary and evaluation generates personalized feedback.

Regarding the technical implementation, three elements are combined: cognitive diagnosis using the NeuralCD model, natural language analysis using the BM25 algorithm, and data visualization using PyEcharts. This made it possible to determine the level of mastery of knowledge points from test answers, automatically match incorrect answers with the corresponding sections of the textbook, and display the results in a visually easy-to-understand manner.

The results of the analysis clarified the effects of the AI evaluation tool on learning. First, 93.65% of the experimental group viewed the visual reports, and 66.67% responded that they could clearly understand the content. Furthermore, high regard for the tool was shown, with 84.13% expressing a desire to use it for other subjects. Regarding learning achievement, a repeated measures ANOVA confirmed a significant interaction between time and group (F(2,122) = 7.368, p = 0.001). Particularly noteworthy was that from the second test onward, the rate of grade improvement for the experimental group exceeded that of the control group. This suggests that there is a certain time lag in the effectiveness of AI technology. The analysis from the perspective of self-regulated learning also showed interesting results. In the experimental group, self-efficacy improved to a statistically significant degree after using the AI evaluation tool. This might be because visual evaluation strengthened information processing, creating a positive belief that the learner can successfully perform tasks. However, at the same time, several challenges became clear. For example, as visual reports clearly showed scores and lack of knowledge, a tendency for test anxiety to increase was observed. Furthermore, differences were seen in evaluations among modules; while the modules showing performance rankings and individual proficiency were rated highest, the evaluation of the self-evaluation and reflection module was relatively low. This suggests that it may be a burden for learners, especially those with low learning motivation, as they need to proactively fill in the content.

The following are my personal thoughts. Regarding this study, what I personally found particularly interesting was the design of the dashboard. I believe the perspective of visualizing the level of mastery of knowledge points within a knowledge framework and the utilization of dashboards as formative assessment tools provide important suggestions for educational technology research. On the other hand, while reading this study, I also saw several points that could be explored further. For example, regarding the visualization of the level of knowledge mastery, a very interesting design has been made, but I also became interested in how the accuracy of this visualization can be evaluated. Furthermore, regarding self-regulated learning, it seems it would be interesting to look more closely at the processes of learning planning and goal setting in particular. This study shows the potential of AI utilization in educational settings and is an interesting piece of research from which further expansion of studies can be expected. In particular, I am looking forward to seeing how AI-driven formative assessment and evaluation by teachers can be combined in future research developments.

By: Geng Xuewang

PAGE TOP