Yamada Laboratory, Kyushu University

ChatGPT improves creative problem-solving performance in university students: An experimental study

2026年02月26日

I am Chu, a research student. I joined the Yamada Laboratory as a research student this year. Nice to meet you. My research theme focuses on grade estimation using learning logs and learning support based on those grades.

In this article, I would like to introduce the paper I read in this session of the English Literature Seminar and my thoughts on it.

  • Paper Title: ChatGPT improves creative problem-solving performance in university students: An experimental study

  • Journal: Computers & Education

  • Volume: Volume 215, 105031

  • Year: 2024

  • Authors: Marek Urban, Filip Děchtěrenko, Jiří Lukavský, Veronika Hrabalová, Filip Svacha, Cyril Brom, Kamila Urban

Before introducing this paper, it is necessary to first understand what HHAIR is. HHAIR (Hybrid Human-AI Regulation) is a concept proposed by (Molenaar, 2022). The core of this concept is based on Self-Regulated Learning (SRL). HHAIR is a hybrid system that combines artificial intelligence and human intelligence, aiming to support the learner’s self-regulated learning (SRL). This concept achieves this by critically evaluating human regulation and using artificial intelligence to strengthen human SRL, aiming to overcome weaknesses during learning and improve the learner’s SRL skills.

(The link to the paper regarding the definition of HHAIR is here; please read it if you are interested: https://www.sciencedirect.com/science/article/pii/S2666920X2200025X )

The following is an overview of the content of the paper.

University students frequently use generative artificial intelligence tools like ChatGPT to solve ill-defined problem-solving tasks. However, experimental evidence regarding the impact of ChatGPT on ill-defined problem-solving performance is still lacking. This study explores the impact of ChatGPT on ill-defined problem-solving tasks and shows that the use of ChatGPT significantly improves self-efficacy as well as the quality, elaboration, and originality of solutions. This paper discusses the influence of HHAIR theory and points out that for effective regulation to occur, students should focus on effective metacognitive cues when using artificial intelligence tools, rather than the perceived difficulty of problem-solving with the help of ChatGPT.

Ill-defined problem-solving tasks refer to things such as thesis writing, case study analysis, decision-making dilemmas, engineering design problems, and complex scientific experiments. These tasks are widely applied in STEM education, and their fundamental goal is to cultivate creativity and innovation capabilities. Solving ill-defined problems is useful for promoting interdisciplinary knowledge transfer and innovation capabilities, and for cultivating critical thinking.

The purpose of this study is to explore how the use of ChatGPT affects actual problem-solving performance (i.e., solution quality, elaboration, and originality). Furthermore, referring to the HHAIR theory, it investigates five elements necessary for effectively using generative artificial intelligence (task motivation, accuracy of metacognitive monitoring, metacognitive monitoring cues, task interest, perception of task difficulty, and mental effort invested).

It is pointed out that ChatGPT has a large number of parameters and, while appearing reliable, has the problem of generating answers that are not necessarily accurate. Additionally, self-efficacy and metacognitive experience are mutually intertwined factors that influence task outcomes. While high self-efficacy can improve task performance, it is noted that excessively high self-efficacy can lead to overconfidence in one’s own abilities, potentially weakening the motivation for task execution.

Experimental Methods

The study consisted of 68 university students in the control group and 77 participants in the experimental group, who were respectively assigned to “not use ChatGPT” (control group) / “use ChatGPT” (experimental group) for comparison. The experimental tasks included a basic test and complex problem-solving, and the experimental group performed the tasks using ChatGPT. In this study, the quality, elaboration, and originality of the task solutions, as well as self-efficacy, task interest, difficulty, and mental effort were measured. The experimental results showed that the use of ChatGPT significantly improved the quality of solutions and self-efficacy. The authors conducted the experiment with a total duration of 26 minutes for the control group and 36 minutes for the experimental group.

Results

The results showed that the use of ChatGPT significantly improved the quality, elaboration, and originality of solutions and strengthened task self-efficacy, but it did not have a significant impact on the absolute accuracy of self-evaluation. Participants in the experimental group generally overestimated the quality and originality of their answers and reported feeling that solving the task was easier and required less mental effort. There was a moderate correlation between perceived usefulness of ChatGPT and overestimation of performance. Previous experience with ChatGPT was negatively associated with task difficulty, showing that more experience made solving the task easier. This study revealed that ChatGPT strengthens divergent thinking and helps students develop combinations of new concepts by presenting many possible solutions. However, the authors also show that it is difficult for participants using ChatGPT to use effective heuristic cues in their self-evaluation.

The following are my reasons for selecting this paper and my thoughts.

Currently, the number of students using generative AI like ChatGPT to support their learning at universities is increasing. However, evaluating what kind of influence generative AI has on student learning remains a challenge. My research focuses on Adaptive Learning, and artificial intelligence is an important means of supporting adaptive learning. This paper gave me several ideas for evaluating the effectiveness of artificial intelligence. However, because HHAIR is built based on SRL (Self-Regulated Learning) theory, it uses a dashboard to monitor one’s own actions, but the experiment in this paper did not provide a means for self-monitoring. I felt there were points where the theoretical consistency was not visible in that area. I believe that monitoring oneself is extremely important when using AI, so I want to proceed with my research based on the validity of the fundamental theories.

PAGE TOP