A study from Stanford University has revealed that large language models (LLMs) like GPT-3.5 and GPT-4 could aid the evaluation and optimisation of educational materials. Researchers Joy He-Yueya, Noah D. Goodman, and Emma Brunskill have demonstrated that these AI models can reliably assess the effectiveness of instructional materials and even generate improved content, potentially streamlining the traditionally time-consuming and expensive process of educational content development.
The study, published in arXiv, introduces a novel approach called "Simulated Expert Evaluation" (SEE), which uses LLMs to predict student learning outcomes for various instructional materials. The AI evaluations successfully replicated well-established educational phenomena, including the Expertise Reversal Effect and the Variability Effect. This success suggests that these AI models can act as reliable evaluators of educational content, showing an understanding of how different types of instruction impact various student groups.
The researchers also developed an "Instruction Optimisation" algorithm that uses one LLM to generate educational content while another evaluates its effectiveness. This approach was applied to create math word problem worksheets, with human teachers later assessing the AI-generated materials. The results showed a significant correlation between the AI-generated content and human teacher preferences, indicating the potential of this method in creating effective educational materials.
However, the study also revealed some discrepancies between AI and human evaluations, highlighting areas for further research. The researchers stress that while their results are promising, these AI evaluators should augment, not replace, human expertise and empirical studies with actual students.
The ability of AI to quickly evaluate and optimize educational content could accelerate the development of personalised learning materials and reduce the cost of educational research. This technology could enable educators to create more effective, tailored instructional materials in less time, potentially improving learning outcomes for students across various subjects and skill levels.
Looking ahead, the researchers identify several areas for future investigation. These include addressing the discrepancies between AI and human evaluations and extending the approach to multi-modal instructional content, such as videos or interactive simulations. They also stress the importance of continuing to involve human experts and students in the evaluation process to ensure that AI-optimised materials truly enhance learning experiences.