The Lab · research & engineering

Built on evidence.

EuraStudy, worked out in the open — methods with a literature, figures computed to specification, every claim cited. Where we are unsure, we say so.

10
Entries: 4
Areas: 8
Cited works: 37
Computed figures

Latest entry · 20 June 2026

§ 01 · Methods

Methods we build on.

The instruments the platform actually runs on — the evidence behind each is cited under From the field.

Fig. mastery over practice

Knowledge tracing

A per-topic estimate of what a student has really mastered, updated with every answer.

Read · How a Machine Reads What You Know Fig. the zone of practice

Adaptive testing

Every question carries a calibrated difficulty; the diagnostic picks the item that reveals the most.

Read · Twenty Questions Fig. review before you forget

Spaced repetition

A review timed just before forgetting lifts memory back and flattens the next decay.

Read · The Half-Life of Knowing

§ 02 · Areas

Four areas of work.

A·01AI tutoringHow the tutor decides what to say — and, more often, what to withhold.1 entry · latest: Withholding the Answer

A·02Assessment & feedbackMeasuring answers, marking like an examiner, and judging the tutor itself.3 entries · latest: Twenty Questions

A·03Learning scienceWhat the evidence on practice, memory and difficulty actually supports.2 entries · latest: The Half-Life of Knowing

A·04Design & craftThe instruments, typography and figures the whole platform is built from.4 entries · latest: A Calculus of Diagrams

§ 03 · The work

The work.

10 of 10

EngineeringDesign & craftA Calculus of DiagramsMost diagrams in educational software are pictures someone drew once. Ours are values in a typed language, compiled to pixels by a function that cannot lie. A formal account of the figure engine — its grammar, its determinism, and the proof obligations that keep nearly two thousand diagrams honest.Read

Fig. D·01

20 Jun · 9 min read ResearchLearning scienceThe Half-Life of KnowingYou can know something on Tuesday and not know it on Friday — memory has a half-life, and it is shorter than anyone would like. Spaced repetition is the engineering discipline built on that uncomfortable fact: schedule each review for the moment a memory is about to fade, and a little forgetting becomes the thing that makes learning stick. We trace the idea from Ebbinghaus to the algorithms now built into the tools millions revise with.Read

Fig. D·02

19 Jun · 14 min read ResearchAssessment & feedbackTwenty QuestionsA good adaptive test can pin down what you know in a dozen questions, not fifty — because it chooses each one to be the most revealing it can ask. We trace the quiet mathematics of item response theory and computerized adaptive testing, from the shape of a single question to the loop that learns you in real time, and the places where adaptivity has to be reined in.Read

Fig. D·03

18 Jun · 13 min read ResearchAssessment & feedbackHow a Machine Reads What You KnowEvery adaptive tutor rests on a quiet act of inference: guessing the knowledge it cannot see from the answers it can. We trace that idea from Bayesian Knowledge Tracing to its deep-learning successors — and the honest places where the deeper model is not the better one.Read

Fig. D·04

17 Jun · 9 min read EngineeringDesign & craftOn the Making of a Quiet MachineA study of the obsessions behind a learning platform built for four national examinations — where nothing is accidental, and restraint is the most exacting discipline of all.Read

Fig. D·05

14 Jun · 7 min read ResearchAI tutoringWithholding the AnswerA system that hands over the answer is not teaching. We argue that the central design problem for a machine tutor is not how to explain, but when and how much to withhold.Read

Fig. D·06

11 Jun · 9 min read EngineeringDesign & craftDrawn, Not DecoratedEvery chart, curve and diagram a student meets is drawn to exact specification by a single figure engine — and verified before it ships. Never faked, never screenshotted.Read

Fig. D·07

8 Jun · 5 min read ResearchAssessment & feedbackHow Should We Measure a Tutor?A tutor that keeps students busy is not the same as a tutor that helps them learn. We argue for measuring AI tutors by learning gains and transfer — and against the engagement metrics that quietly reward the wrong thing.Read

Fig. D·08

5 Jun · 8 min read EngineeringDesign & craftOne Platform, Four National ExamsThe Austrian Matura and the German Abitur are live; the French Baccalauréat and Spanish Selectividad are on the waitlist. The hard part was never the content — it was deciding what four exams could share without flattening any of them.Read

Fig. D·09

2 Jun · 6 min read ResearchLearning scienceAdaptive Practice and Its LimitsAdaptivity is the most over-promised word in educational technology. Two effects in the learning-science record are real and worth building on; almost everything sold above them is decoration.Read

Fig. D·10

28 May · 8 min read

Reading the fieldBEYOND EURASTUDY

From the field.

A standing reading of the research on artificial intelligence and learning — the work of others, across decades, that the rest of this notebook is built on. These are published findings by researchers across the field, not EuraStudy’s own results; we summarise them and point to the original work.

01Tutoring efficacy
One-to-one tutoring moved the average student to the 98th percentile.
Students who worked with a personal tutor outperformed conventionally taught peers by about two standard deviations — Bloom’s “two sigma” result. It set the central ambition that has driven educational technology ever since: to reproduce, at scale, what a good tutor does for one learner.
Benjamin S. Bloom1984The 2 Sigma ProblemEducational Researcher
02Tutoring efficacy
Intelligent tutoring systems came within a whisker of human tutors.
Reviewing decades of controlled studies, VanLehn measured human tutoring at roughly 0.79 standard deviations over no tutoring and step-based intelligent tutors at about 0.76 — far below Bloom’s famous 2.0, and close enough to each other to reframe the question from “can a machine tutor?” to “what does effective tutoring actually consist of?”
Kurt VanLehn2011The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring SystemsEducational Psychologist
03Evidence & meta-analysis
Across fifty controlled evaluations, intelligent tutors raised scores by about two-thirds of a standard deviation.
The median system raised scores by about two-thirds of a standard deviation — but far more on the locally designed tests that match what a system actually taught (around 0.73) than on standardised exams (around 0.13). Real, and a reminder that the size of an effect depends heavily on what you choose to measure.
James A. Kulik & J. D. Fletcher2016Effectiveness of Intelligent Tutoring Systems: A Meta-Analytic ReviewReview of Educational Research
04Memory & practice
Being tested on material beats re-reading it — and the gap widens with time.
Learners who practised retrieving what they had studied remembered substantially more a week later than those who simply restudied — even though the restudiers felt more confident at the time. The “testing effect” is among the most robust results in the science of learning, and the reason deliberate practice, not mere exposure, sits at the centre of exam preparation.
Henry L. Roediger III & Jeffrey D. Karpicke2006Test-Enhanced LearningPsychological Science
05Cognitive load
Working memory is the bottleneck — and the help a novice needs becomes noise for an expert.
Cognitive load theory holds that instruction fails when it overwhelms a narrow working memory. Later work on the “expertise-reversal effect” sharpened the point: scaffolding that helps a beginner actively hinders a more advanced learner. Together they argue that good tutoring must adapt its support to the individual, not just to the topic.
John Sweller1988Cognitive Load During Problem Solving: Effects on LearningCognitive Science
06Learning theory
Good help is temporary: a scaffold exists in order to be removed.
Wood, Bruner and Ross named “scaffolding” — the support an expert lends so a learner can do what they cannot yet do alone, an idea since drawn together with Vygotsky’s zone of proximal development. Its defining feature is that it fades: support that never withdraws breeds dependence, not competence. It is the principle behind any tutor that deliberately holds back the answer.
David Wood, Jerome S. Bruner & Gail Ross1976The Role of Tutoring in Problem SolvingJournal of Child Psychology and Psychiatry
07Feedback
Feedback is one of the most powerful influences on learning — and one of the most variable.
Synthesising hundreds of studies, Hattie and Timperley placed feedback among the strongest levers on achievement, with effects ranging from large to outright negative. What separated them was whether the feedback told a learner where they were going, how they were doing, and what to do next. Feedback that grades without directing can achieve nothing at all.
John Hattie & Helen Timperley2007The Power of FeedbackReview of Educational Research
08Critical perspectives
Perhaps the machine should stay simple, and the intelligence should stay human.
Baker argues the field over-invested in modelling the learner’s mind and under-invested in the simpler, robust systems that actually help — and in keeping teachers in the loop. A standing corrective for anyone building an AI tutor: sophistication is not the goal; better learning is.
Ryan S. Baker2016Stupid Tutoring Systems, Intelligent HumansInternational Journal of Artificial Intelligence in Education

Selected reading · 8 works · a starting point, not a survey

Every figure on the Lab is computed to specification and verified before it ships — never a screenshot. Every external claim is cited to a real, published work. Where we are unsure, we say so.

Methods we build on.

Knowledge tracing

Adaptive testing

Spaced repetition

Four areas of work.

The work.

From the field.

One-to-one tutoring moved the average student to the 98th percentile.

Intelligent tutoring systems came within a whisker of human tutors.

Across fifty controlled evaluations, intelligent tutors raised scores by about two-thirds of a standard deviation.

Being tested on material beats re-reading it — and the gap widens with time.

Working memory is the bottleneck — and the help a novice needs becomes noise for an expert.

Good help is temporary: a scaffold exists in order to be removed.

Feedback is one of the most powerful influences on learning — and one of the most variable.

Perhaps the machine should stay simple, and the intelligence should stay human.

Methods we build on.

Knowledge tracing

Adaptive testing

Spaced repetition

Four areas of work.

The work.

From the field.

One-to-one tutoring moved the average student to the 98th percentile.

Intelligent tutoring systems came within a whisker of human tutors.

Across fifty controlled evaluations, intelligent tutors raised scores by about two-thirds of a standard deviation.

Being tested on material beats re-reading it — and the gap widens with time.

Working memory is the bottleneck — and the help a novice needs becomes noise for an expert.

Good help is temporary: a scaffold exists in order to be removed.

Feedback is one of the most powerful influences on learning — and one of the most variable.

Perhaps the machine should stay simple, and the intelligence should stay human.