Code and data will be finalized closer to the conference.
You can download the Webis-CausalQA-22 corpus. To recreate the ELI5 part, check instructions bellow.
The 10 datasets used to construct Webis-CausalQA-22 corpus:
ELI5
is also available in Hugging Face https://huggingface.co/datasets/eli5 that contains a script for downloading the data. This blog post provides a guide of how to download the data as well: https://yjernite.github.io/lfqa.html (was used).
Example to obtain the ELI5
data
pip install nlp
import nlp
eli5 = nlp.load_dataset('eli5')
train_set = eli5['train_eli5']
val_set = eli5['validation_eli5']
Use the regex rules to identify causal questions.