Paper, Dataset, code Tags: #linguistics
Half-life regression (HLR) is a novel model for spaced repetition practice with applications to second language acquisition. It combines psycholinguistic theory with modern ML techniques, estimating the 'half life' of a word or concept in a student's long-term memory.
We use data from Duolingo to fit HLR models, reducing eror by ~45% compared to several baselines at predicting student recall rates. The model also hints at the challenging linguistic concepts for second language learners.
Spacing effect: people remember things more effectively if they use spaced repetition practice as opposed to massed practice, cramming.
Lag effect: people learn even better if the spacing between practices gradually increases.
Pimsleur (1967) was the first to make mainstream practical use of the spacing and lag effects, with his audio-based language learning program.
Leitner (1972) proposed another spaced repetition algorithm intended for use with flashcards. There are a few boxes that correspond to different practice intervals, 1 day, 2, 4, etc. All cards start out in day 1, if the student gets it right, goes to the 2 day box. 2 days later, ask again. if correct, 4 day box. if not, back to the previous box.
Central to the theory of memory is the Ebbinghaus model (1885), or the forgetting curve. Memory decays exponentially over time.
We're looking to obtain the half-life given some student parameters, and some weights. We want to find the weights that minimize some loss function depending on the recall rate, lag time and the student parameters.
We choose the L2-regularized squared loss function.
We focus on features that are easily instrumented and available in the production Duolingo system.
- Interaction features: set of counters summarizing each student's practice history with each word
- Lexeme tag features: a large, sparse set of indicator variables, one for each lexeme tag in the system. These are intended to capture the inherent difficulty of each word.
HLR incorporates arbitrarily rich features and fits their weights to data. This approach is significantly more accurate at predicting student recall rates than either of the previous methods, and is also better than a conventional machine learning approach like logistic regression.
In addition to better predictions, HLR can capture the inherent difficulty of concepts that are encoded in the feature set. Positive/easier words are associated with cognates, common words, short or morphologically simple to inflect. Negative weights are associated with irregular forms, rare words and grammatical constructs.
The evaluation suggests that HLR is a better approach than the Leitner algorithm originally used by Duolingo.