Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for Humans and Chatbots
arXiv:2603.23682v1 Announce Type: new Abstract: The rapid adoption of large language models (LLMs) in education raises profound challenges for assessment design. To adapt assessments to the presence of LLM-based tools, it is crucial to characterize the strengths and weaknesses of LLMs in a generalizable, valid and reliable manner. However, current LLM evaluations often rely on descriptive statistics derived from benchmarks, and little research applies theory-grounded measurement methods to characterize LLM capabilities relative to human learners in ways that […]