AI & Learning

The Half-Life of Knowing

You can know something on Tuesday and not know it on Friday — memory has a half-life, and it is shorter than anyone would like. Spaced repetition is the engineering discipline built on that uncomfortable fact: schedule each review for the moment a memory is about to fade, and a little forgetting becomes the thing that makes learning stick. We trace the idea from Ebbinghaus to the algorithms now built into the tools millions revise with.

The EuraStudy Team14 min readD·01
Fig. 01 · The forgetting curve, drawn exactly. Retrievability — the probability you can recall something right now — decays as R(t) = e^(−t/S), here with a stability S of six days. Stability is the memory’s time-constant: the elapsed time at which recall has fallen to 1/e, about 37%. The half-life, the time to reach a coin-flip, is S·ln 2 ≈ 4.2 days. Left alone, even well-learned material leaks away — and that leak is the raw material spaced repetition learns to exploit.
AbstractKnowing something is not a state but a process of decay: leave a fact alone and the probability you can recall it falls along a curve Hermann Ebbinghaus first measured on himself in 1885. Spaced repetition is the engineering discipline built on that curve. Its founding observation — the spacing effect — is among the most robust in all of psychology: practice distributed over time produces far more durable memory than the same practice massed together, and the optimal gap between reviews scales with how long you need to remember. Its mechanism is a productive paradox, the desirable difficulty: a memory retrieved just as it begins to fade is strengthened more than one recalled with ease, because the struggle itself is the signal. From these facts comes a family of algorithms — Leitner’s paper boxes, SuperMemo’s expanding intervals, and the modern statistical models that fit a learner’s memory from millions of review logs and place each review at the edge of forgetting. We follow the idea from the forgetting curve to the Difficulty–Stability–Retrievability model now built into the tools students revise with, dwell on the genuine debate over the curve’s shape, and are honest about what scheduling can and cannot do — it can decide when to bring a fact back, but it cannot do the remembering for you.

You can know something on Tuesday and not know it on Friday. Every student has felt the quiet treachery of it: a definition that was solid the night before dissolves under the exam’s gaze — not because it was never learned, but because learning, left alone, leaks. Knowing is not a state you reach and hold. It is a process of slow decay, and the decay has a shape.

Two earlier dispatches in this series followed the machine’s attempts to see a learner. Knowledge tracing draws the moving picture of what you know as you practise 20; item response theory takes the sharpest possible snapshot of your ability at a single moment. This piece is about the third question — the one a calendar asks. Not what do you know, nor how surely, but when should a thing be brought back before it slips away. The answer turns out to rest on a counter-intuitive idea: that forgetting, handled well, is not the enemy of durable learning but the raw material of it.

The curve that started a science

In 1885 Hermann Ebbinghaus did something slightly mad. Lacking any subject but himself, he memorised thousands of nonsense syllables — meaningless on purpose, so that prior knowledge could not help — and then measured, hour by hour and day by day, how much effort it took to relearn a list he had once known. The gap between the first learning and the relearning he called savings, and savings fell away with time along a smooth, decelerating curve 1. It dropped steeply at first, then ever more gently, as though the memories that survived the first hours were made of sterner stuff. A careful modern replication, run more than a century later with proper controls, recovered a curve strikingly close to the one Ebbinghaus drew on a sample of one 2.

It is worth giving the curve a vocabulary, because the rest of the field is built on it. Call retrievability the probability that you can recall an item right now; it starts near one just after study and decays toward zero. Call stability the durability of the memory — loosely, the time-constant of that decay. A common first approximation writes retrievability as a falling exponential,

R(t)=et/S,R(t) = e^{-t/S},

where SS is the stability in days. On this model the half-life of a memory — the time until recall is a coin-flip — is Sln2S\ln 2, and the moment retrievability has fallen to about 37% (one over ee) is exactly t=St = S. The hero plate above shows one such curve. Whether the true shape is really an exponential is a question we will return to, and the answer is genuinely interesting. But the qualitative fact is not in doubt: untended, knowledge leaks, and it leaks fast.

The natural response — the one every crammer reaches for — is to relearn the moment before you need it. Spaced repetition makes the opposite bet. It says the leak is an opportunity, and that when you patch it matters far more than how hard.

The most reliable result in learning science

Here is the founding observation, and it is one of the most replicated findings psychology has. Take a fixed amount of study and split it into sessions. If you bunch the sessions together — massed practice, the all-nighter — you remember poorly in the long run. If you spread the same sessions out — distributed, or spaced, practice — you remember far better, often dramatically so. A 2006 synthesis pooled 839 separate assessments drawn from 317 experiments and found the advantage of spacing to be large, general, and remarkably consistent across materials and ages 3. Nothing about the total effort changed. Only its arrangement in time did.

Fig. 02 · The spacing effect, simulated. Two students study the same item with the same five reviews; one masses them into four days, the other distributes the same five across the term. Stability grows more under spacing, because a review caught as the memory fades earns a larger boost — so at a delayed test, weeks after the last review, the distributed schedule is still remembered while the massed one has gone. The retention traces are computed from a deliberately simple model; the direction of the result is the robust finding of a century of experiments.

The follow-up question is the practical one: how far apart? The answer is the elegant part. The best gap is not fixed; it scales with how long you need to remember. In a large study that varied both the gap between two sessions and the delay until the final test, the optimal inter-study gap came out as a sizeable fraction of the retention interval — on the order of 10 to 40% of it at week-scale delays, shrinking toward roughly 5% when the target was a year away 4. To remember something for a week, review it after a day; to remember it for a year, review it after a month or two. The schedule should breathe in proportion to the horizon. This is why a good spacing system is not a single magic interval but a widening one — a point the algorithms will make concrete.

There is a second engine of durable memory that rides alongside spacing, and the two reinforce each other: the testing effect. Trying to retrieve a fact — as opposed to re-reading it — is itself one of the most powerful study acts known. Taking a test on material, even with no feedback, produces more durable retention than restudying it for the same time 7, an effect so strong that its authors called retrieval not merely a way of measuring learning but a primary cause of it 8. Spaced repetition is, at heart, a machine for scheduling retrievals. It does not ask you to re-read. It asks you to recall — and it picks the moment.

Why a little forgetting helps

So why should spacing the retrievals matter, beyond simply doing more of them? The deepest answer is a productive paradox. A memory retrieved easily — just after studying, while it is still fresh — is barely strengthened by the act. A memory retrieved with effort, just as it begins to slip, is strengthened a great deal. The struggle is not a tax on learning; it is the signal that drives it.

Robert and Elizabeth Bjork gave this a formal frame in their new theory of disuse 5, which splits a memory into two quantities that are easy to confuse and important to separate. Retrieval strength is how accessible an item is right now — high just after study, fading with time. Storage strength is how deeply learned it is — how well it will come back after a delay. The theory’s sharp, counter-intuitive claim is that the gain in storage strength from a successful retrieval is larger when retrieval strength has fallen lower. In other words: the more you have begun to forget — provided you can still, with effort, recall — the more a successful recall is worth. Robert Bjork named the broader principle a desirable difficulty 6: conditions that make practice feel harder and slower can make the resulting learning more durable.

Fig. 03 · Why a little forgetting helps. The solid curve is the boost to stability a successful review earns as a function of how retrievable the item was at that moment: the fainter the memory, the larger the gain. But the dashed line is the catch — the chance you actually recall the item is itself its retrievability, so wait too long and the review fails and becomes a costly relearn. The two pressures leave a useful window; modern schedulers aim reviews at a retrievability of roughly 0.7 to 0.9.

The paradox has a hard limit, and the figure above is built around it. Push the difficulty too far — wait until the memory is genuinely gone — and the retrieval simply fails. Now you are not strengthening anything; you are relearning from scratch, which is slow and dispiriting. The probability that you succeed at the recall is, after all, just the retrievability itself. So two pressures pull against each other: the boost from a successful review grows as the memory fades, but the chance of a successful review shrinks at the same time. The sweet spot is an intermediate retrievability — fade the memory enough that recalling it is real work, but not so much that the work fails. Cognitive models of the spacing effect, such as Pavlik and Anderson’s activation-based account built on the ACT-R architecture, reproduce exactly this trade-off from first principles, and predict the spacing benefit as a function of the gap 9. The whole art of a scheduler is to keep landing reviews in that window.

Turning it into a schedule

Once you believe all this, an algorithm almost writes itself: track each item’s memory, predict when its retrievability will fall to the edge of the useful window, and schedule the next review for that day. The first systems to do it were beautifully analogue. Sebastian Leitner’s 1972 box method needs nothing but index cards and a row of boxes 12: a card you answer correctly moves to the next box along, which is reviewed less often; a card you miss falls back to the first. The boxes are the stability estimate, made of cardboard.

The computational leap came from Piotr Woźniak, whose SuperMemo and its SM-2 algorithm, devised in the late 1980s, still underpins much of the field 13. SM-2 keeps, for every item, an ease factor — a per-item multiplier that starts at 2.5. The first review interval is one day, the second is six, and every interval after that is the previous one times the ease factor:

In=In1×EF,EF=2.5.I_n = I_{n-1}\times \mathrm{EF}, \qquad \mathrm{EF} = 2.5.

Each time you grade your own recall, the ease factor is nudged — down for a hard item, up for an easy one, never below a floor of 1.3 — and a lapse sends the item back to the start. The consequence is an expanding schedule, exactly the widening rhythm the spacing research called for: one day, six, fifteen, thirty-eight, ninety-five, and on.

Fig. 04 · The expanding schedule of SuperMemo’s SM-2 algorithm, drawn to a linear day scale so each rung’s length is its true interval. The first interval is one day, the second six; thereafter each successful recall multiplies the wait by an ease factor of 2.5, giving 1, 6, 15, 38, 95 days. The gaps grow geometrically — you review a mastered item less and less often — and a single lapse drops it back to the first rung.

Notice what the expansion buys. An item you keep getting right demands your attention less and less often; the schedule thins itself out, freeing time for the things you are still struggling with. Woźniak and Gorzelańczyk later set the approach on firmer theoretical ground, framing the spacing of repetitions as an explicit optimisation of long-term retention against review effort 14. SM-2 was a hand-built rule of thumb that happened to capture the shape of memory. The next step was to stop guessing the shape and learn it.

A memory you can fit to data

A hand-tuned ease factor is a strong guess about how memory works. But a flashcard app logs millions of reviews — every card, every grade, every interval, every success and lapse — and that is precisely the raw material a statistical model wants. The modern turn in spaced repetition is to fit the forgetting curve itself to data.

The most widely used open implementation is FSRS, the Free Spaced Repetition Scheduler, now built into the flashcard program Anki as of its 23.10 release. It models every item with three numbers, and the diagram below is their whole story. Difficulty is how stubborn the item is to make stick. Stability is how long the memory lasts — defined, cleanly, as the interval at which retrievability falls to exactly 0.9. Retrievability is the live probability of recall, computed from the elapsed time and the current stability. After each review the model updates the trio: a recalled review increases stability — by more when retrievability had dropped low, the desirable-difficulty principle made quantitative — and a lapse cuts stability and raises difficulty 18. The parameters that govern those updates are not guessed; they are fit by gradient descent to a learner’s own review history.

Fig. 05 · The three-component model behind FSRS, the open-source scheduler now built into Anki. Each item carries a Difficulty, a Stability, and a Retrievability that falls with elapsed time on the stability the item currently has. A recalled review grows the stability — most of all when retrievability had already dropped low — while a lapse resets it and nudges difficulty up. Stability is defined precisely as the interval at which retrievability equals 0.9.

The same instinct — learn the memory model from logs — has been pursued at industrial scale. Duolingo’s half-life regression trains a model to predict the half-life of each word for each learner, fit on the order of thirteen million review logs, and uses it to schedule practice 15. And the payoff is not confined to apps: in a semester-long classroom study, eighth-graders learning Spanish vocabulary on a personalised review schedule — one that modelled each student’s memory and timed reviews accordingly — outscored a massed schedule by 16.5% and even a generic spaced schedule by 10% on a cumulative exam a month after the semester ended, a gap of roughly two letter grades on the early-term material 16. Scheduling, done well, is not a marginal optimisation. It moves grades.

At the research frontier the problem has been posed in its most general form, as one of optimal control: given a model of how reviews move a memory, and a cost on the learner’s time, what review schedule maximises long-term recall? Casting it as stochastic optimal control yields scheduling policies with provable guarantees, and — pleasingly — they often recommend reviewing at a roughly constant target retrievability, the very window the desirable-difficulty argument pointed to 17. The hand-built rule and the optimal policy converge on the same advice.

The shape of forgetting

We have been writing the forgetting curve as an exponential, and it is time to admit that this is probably wrong — interestingly wrong. Ebbinghaus’s curve is commonly drawn as an exponential, but a long line of work argues that long-term forgetting follows a power law instead, with a much heavier tail. Wickelgren proposed power-function forgetting in the 1970s, and Wixted and Carpenter later showed that a power law of the form R=m(1+ht)fR = m(1 + ht)^{-f} fits the savings data — Ebbinghaus’s own included — better than a simple exponential 1011.

Fig. 06 · Two laws of forgetting, pinned to the same point. Ebbinghaus is usually drawn as a simple exponential, but Wickelgren and later Wixted argued long-term forgetting follows a power law, whose tail is far heavier. Both curves are anchored at the same stability — retrievability 0.9 at ten days — yet they part company badly later: the power law says old memories decay ever more slowly. This is why modern schedulers fit a power curve rather than the textbook exponential.

The difference is not academic hair-splitting; it changes the schedule. An exponential decays at a constant proportional rate forever, which means a memory you have not reviewed in a year is essentially gone. A power law decelerates: the older a memory gets, the more slowly it fades — an observation old enough to have a name, Jost’s law. Anchor the two curves at the same stability and they agree in the short run and diverge sharply in the long run, the power law holding a recallable trace long after the exponential has flatlined. This is why the serious modern schedulers — FSRS among them — fit a power forgetting curve rather than the textbook exponential: its shape is closer to how human memory actually behaves, and it forgives the long gaps that real revision schedules inevitably contain.

Where the model thumbs the scale

It would be easy to read all this as a solved problem, and it is not. The honest reading is that scheduling memory is a genuinely useful tool with genuine limits, and the limits matter.

First, every model here predicts a population, not your particular brain on this particular Tuesday. Stability and difficulty are fitted averages; an individual recall is a coin weighted by them, not determined by them. A scheduler that has seen only a handful of your reviews is mostly guessing — the cold-start problem — and grows trustworthy only with history.

Second, and more fundamentally: a scheduler decides when to bring a fact back, but it cannot do the bringing-back for you. The entire benefit rests on the assumption that the review is an effortful retrieval 78. Flip a card, sigh, and read the back without trying, and the desirable difficulty collapses into no difficulty at all; the interval expands on the strength of a recall that never happened. The algorithm is only ever as good as the honesty of the act it schedules.

Third, spacing optimises retention of specified items, which is not the same as understanding. It will faithfully keep a definition, a date, a formula at your fingertips. It will not, by itself, teach you to connect them, to transfer a method to an unfamiliar problem, or to see why a result is true. Other desirable difficulties — interleaving different problem types rather than blocking them, which improves the ability to choose the right method 19 — address parts of what pure spacing leaves out. A revision strategy that is all flashcards and no synthesis has optimised one real thing while quietly neglecting another.

And finally there is a human limit the mathematics is silent on. An aggressive scheduler can pile up a punishing daily backlog of due reviews, and a learner buried under it abandons the system entirely — at which point the theoretically optimal schedule retains nothing, because no one is following it. The best schedule a person actually keeps beats the perfect one they quit.

What we read from it

EuraStudy is built on a reviewed bank of exam-style questions, each tagged to a topic and a curriculum — and a question, retrieved at the right moment, is exactly the effortful recall this whole literature turns on. The spaced-repetition lens gives us a way to think about that bank not as a static archive but as a set of memories, each with its own quiet half-life, each due to be brought back before it fades.

The reading we take from a century of this work is threefold, and bracingly simple. Forgetting is not failure — it is the gradient the whole method climbs; a review that costs you a little effort is worth more than one that costs you none. When beats how much — the same practice, timed to the edge of forgetting, outlasts far more practice crammed together. And the schedule should widen — an item you keep recalling has earned a longer leash, and the time saved belongs to the things you have not yet learned.

There is a tidy symmetry in where this leaves the series. Item response theory gave us the snapshot — how able you are, at one instant. Knowledge tracing gave us the film — how that ability moves as you learn. Spaced repetition gives us the calendar — when the film needs replaying before a scene is lost. None of the three does the learning for you. Between them, they decide what to ask, judge how you answer, and choose when to ask again — and then leave the one irreducible act, the act of remembering, where it has always belonged.

References

  1. 1.Ebbinghaus, H. (1885). Über das Gedächtnis: Untersuchungen zur experimentellen Psychologie. Leipzig: Duncker & Humblot. (English: Memory: A Contribution to Experimental Psychology, trans. H. A. Ruger & C. E. Bussenius, 1913, Teachers College, Columbia University.)
  2. 2.Murre, J. M. J., & Dros, J. (2015). Replication and Analysis of Ebbinghaus’ Forgetting Curve. PLoS ONE, 10(7), e0120644.
  3. 3.Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354–380.
  4. 4.Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in learning: A temporal ridgeline of optimal retention. Psychological Science, 19(11), 1095–1102.
  5. 5.Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.), From Learning Processes to Cognitive Processes (Vol. 2, pp. 35–67). Hillsdale, NJ: Erlbaum.
  6. 6.Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about Knowing (pp. 185–205). Cambridge, MA: MIT Press.
  7. 7.Roediger, H. L., & Karpicke, J. D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249–255.
  8. 8.Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. Science, 319(5865), 966–968.
  9. 9.Pavlik, P. I., & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29(4), 559–586.
  10. 10.Wickelgren, W. A. (1974). Single-trace fragility theory of memory dynamics. Memory & Cognition, 2(4), 775–780.
  11. 11.Wixted, J. T., & Carpenter, S. K. (2007). The Wickelgren power law and the Ebbinghaus savings function. Psychological Science, 18(2), 133–134.
  12. 12.Leitner, S. (1972). So lernt man lernen: Der Weg zum Erfolg. Freiburg: Herder.
  13. 13.Woźniak, P. A. (1990). Optimization of learning: The SM-2 algorithm. SuperMemo. (super-memory.com/english/ol/sm2.htm)
  14. 14.Woźniak, P. A., & Gorzelańczyk, E. J. (1994). Optimization of repetition spacing in the practice of learning. Acta Neurobiologiae Experimentalis, 54(1), 59–62.
  15. 15.Settles, B., & Meeder, B. (2016). A trainable spaced repetition model for language learning. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), 1848–1858.
  16. 16.Lindsey, R. V., Shroyer, J. D., Pashler, H., & Mozer, M. C. (2014). Improving students’ long-term knowledge retention through personalized review. Psychological Science, 25(3), 639–647.
  17. 17.Tabibian, B., Upadhyay, U., De, A., Zarezade, A., Schölkopf, B., & Gomez-Rodriguez, M. (2019). Enhancing human learning via spaced repetition optimization. Proceedings of the National Academy of Sciences, 116(10), 3988–3993.
  18. 18.Ye, J., Su, J., & Cao, Y. (2022). A stochastic shortest path algorithm for optimizing spaced repetition scheduling. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22), 4381–4390.
  19. 19.Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves learning. Instructional Science, 35(6), 481–498.
  20. 20.Corbett, A. T., & Anderson, J. R. (1995). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4(4), 253–278.

Start preparing with EuraStudy

Join the waitlist to be first in when your curriculum opens.

Next dispatch · D·02

Twenty Questions

A good adaptive test can pin down what you know in a dozen questions, not fifty — because it chooses each one to be the most revealing it can ask. We trace the quiet mathematics of item response theory and computerized adaptive testing, from the shape of a single question to the loop that learns you in real time, and the places where adaptivity has to be reined in.

More dispatches