LAK 2026workshop
LLM Psychometrics
Applying Assessment Theories to Evaluate LLMs and LLM-Supported Learners
Large Language Models (LLMs) are increasingly used for tasks traditionally performed by human learners, such as reading, writing, problem solving, and programming. While current evaluations rely heavily on benchmarks, mature frameworks from educational measurement – such as Item Response Theory (IRT), cognitive diagnostic models (e.g., DINA), and learning taxonomies – offer principled approaches for understanding their capabilities and limitations. This workshop will explore how these theories can inform the evaluation of LLMs and human-AI collaboration, highlight divergences and alignments with human learning processes, and address concerns around responsible AI use in education.
Our workshop goals are to advance theory-grounded evaluation of LLMs and LLM-supported learners, connect researchers across learning analytics, educational measurement, and AI, and surface actionable patterns in LLM performance and translate them into practices for responsible assessment and human–AI collaboration.
Call for Papers
We accept two types of contributions: (1) short empirical work-in-progress papers, and (2) short discussion papers that provoke debate on key issues and challenges in assessing LLMs and LLM-supported learners.
Topics of interest include:
- Application of educational measurement theories and methods to understanding AI and Human-AI learning and performance
- Systematic comparisons of human vs AI learner performance
- Systematic comparisons of human-human vs human-AI team performance
- Adaptation of conceptual frameworks for designing tasks involving AI and human-AI learners
- Methods for detecting AI participation in solving complex tasks, including detection of GenAI-enabled cheating, misuse, and implications for academic integrity
- Methods for studying AI knowledge representation and its impact on learning and performance
- Error analyses of AI performance on complex tasks that provide insights into AI knowledge, skills, and learning
- Ethical considerations and limitations of using human-based psychometric models to study LLMs
Important Dates
Submission deadline
Dec 4, 2025
Notification
Dec 19, 2025
Camera-ready
Jan 12, 2026
Workshop Day
Apr 27, 2026
Submission Guidelines
- Length: 4–6 pages (not including tables, figures, references, acknowledgements, AI declarations, and ethics statements)
- Format: CEUR Workshop Proceedings template
- Review: Double-blind
- Proceedings: Accepted papers will be presented during the workshop and published in CEUR-WS open workshop proceedings (workshop papers are not included in the Companion Proceedings of LAK2026).
Program Schedule
- 09:00–09:15
- Welcome and overview of workshop goals
- Introduction of organizers and agenda
- Brief participant introductions and ice-breaker
- 09:15–10:00

Keynote: Alina von Davier
Chief of Assessment, Duolingo — talk + Q&A
- 10:00–11:00
Presentations (12 min + 2–3 min Q&A)
- 11:00–11:15
Coffee break
- 11:15–12:15
- Introduction to the activity
- Group discussions on sub-themes (evaluating LLMs, assessment of hybrid human+AI work, theoretical frameworks, assessment design, etc.)
- Groups report back (Plenary discussion)
- 12:15–13:00
Flash talks and poster session
- 13:00–13:30
Summary, next steps (further collaboration, etc.), closing
Workshop Organizers

Giora Alexandron
Associate Professor, Weizmann Institute of Science

Beata Beigman Klebanov
Principal Research Scientist, Educational Testing Service

Jill Burstein
Principal Assessment Scientist, Duolingo

Yang Jiang
Research Scientist, Educational Testing Service

Alona Strugatski
Postdoctoral Researcher, Weizmann Institute of Science

Licol Zeinfeld
MSc Student and Graduate Researcher, Weizmann Institute of Science
