In this work, we release a large and novel dataset of learners engaging with educational videos in-the-wild. The dataset, named Personalised Educational Engagement with Knowledge Components (PEEKC), is one of the first publicly available datasets that address personalised educational engagement. Educational recommenders have received much less attention in comparison to e-commerce and entertainment-related recommenders, even though efficient personalised learning systems could improve learning gains significantly. One of the main challenges in advancing this research direction is the scarcity of large, publicly available datasets. In the PEEKC dataset, educational video lectures have been associated with Wikipedia concepts related to the material of the lecture, thus providing a humanly intuitive taxonomy. We believe that granular learner engagement signals, in unison with rich content representations, will pave the way to building powerful personalisation algorithms that will revolutionise educational and informational recommendation systems. Towards this goal, we 1) construct a novel dataset from a popular video lecture repository, 2) identify a set of benchmark algorithms to model engagement, and 3) run extensive experimentation on the PEEKC dataset to demonstrate its value. Our experiments with the dataset show promise in building powerful informational recommender systems.
- Number of Events in the Training Data: 203,590
- Number of Events in the Test Data: 86,945
- Total Number of Events in the Dataset: 290,535
- Number of Learners in the Training Data: 14,050
- Number of Learners in the Test Data: 5, 969
- Total Number of Learners in the Dataset: 20,019
- Number of Unique Lecture Videos in the Training Data: 6,835
- Number of Unique Lecture Videos in the Test Data: 4,409
- Total Number of Unique Lecture Videos in the Dataset: 7,999
For more information: Sahan Bulathwela (m.bulathwela@ucl.ac.uk)