This repository provides underlying code and materials for the paper 'Living Machines: A Study of Atypical Animacy' (COLING2020).
- Installation
- Directory structure
- Description of the codes
- Datasets and resources
- Evaluation results
- Citation
- Acknowledgements
- License
-
Create a new environment:
conda create -n py37animacy python=3.7
- Activate the environment:
conda activate py37animacy
- Clone AtypicalAnimacy repository:
git clone https://github.com/Living-with-machines/AtypicalAnimacy.git
- Install the requirements:
cd /path/to/my/AtypicalAnimacy
pip install -r requirements.txt
fasttext
crashes. This issue has been reported, see here.
- Allow the newly created
py37animacy
environment to show up in the notebooks:
python -m ipykernel install --user --name py37animacy --display-name "Python (py37animacy)"
- Run the
code/setup.ipynb
notebook, one cell at a time.
In our code, we assume the following directory structure:
AtypicalAnimacy/
├── code/
├── data/
├── experiments/
├── models/
│ ├── classifiers/
│ └── language_models/
│ ├── bert_models/
│ └── fastai/
└── resources/
To get the data to the right format, run these notebooks in the following order:
code/process_stories_dataset.ipynb
code/process_machines19thC_dataset.ipynb
To apply the masking approach, run the following notebook:
code/masking_approach.ipynb
To train the classifiers, run the following notebooks:
code/train_bert_classifier.ipynb
code/train_svm_classifiers.ipynb
To apply the classifiers on new data, run the following notebook:
code/classification_approach.ipynb
To train and evaluate the LSTM classifier, run the following notebook:
code/train_LSTM_seq_classifiers.ipynb
Run code/setup.ipynb
to download and prepare the data and resources used in the experiments.
Dataset described in Tables 1 and 3 of the paper, generated from the animacy dataset annotated in:
Jahan, Labiba, Geeticka Chauhan, and Mark Finlayson. "A new approach to animacy detection." In Proceedings of the 27th International Conference on Computational Linguistics, pp. 1-12. 2018.
Run code/setup.ipynb
to download it and convert it to the format used in our experiments.
Atypical animacy dataset, described in Tables 2 and 3 of the paper, annotated by us, based on nineteenth-century sentences in English extracted from an open dataset of nineteenth-century books digitized by the British Library. Run code/setup.ipynb
to download it and convert it to the format used in our experiments.
Nineteenth-century BERT and Word2vec English models trained on the 19thC BL Books dataset can be downloaded from Zenodo. For more information, you can read this paper and look at its Github repository.
If you use these models, please cite:
Hosseini, Kasra, Beelen, Kaspar, Colavizza, Giovanni, & Coll Ardanuy, Mariona (2021). Neural Language Models for Nineteenth-Century English. Journal of Open Humanities Data, 7: 22, pp. 1–6. DOI: https://doi.org/10.5334/johd.48
@article{hosseini2021neural,
title={Neural Language Models for Nineteenth-Century English},
author={Hosseini, Kasra and Beelen, Kaspar and Colavizza, Giovanni and Coll Ardanuy, Mariona},
journal={Journal of Open Humanities Data},
year={2021},
volume = {7:22},
pages = {1--6}
}
The evaluation results of our experiments (partially reported in Table 5 of the paper) can be found in this file.
Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen, Kasra Hosseini, Ruth Ahnert, Jon Lawrence, Katherine McDonough, Giorgia Tolfo, Daniel CS Wilson and Barbara McGillivray. "Living Machines: A study of atypical animacy." In Proceedings of the 28th International Conference on Computational Linguistics (COLING), pp. 4534--4545. 2020.
@inproceedings{coll-ardanuy-etal-2020-living,
title = "Living Machines: A study of atypical animacy",
author = "Coll Ardanuy, Mariona and
Nanni, Federico and
Beelen, Kaspar and
Hosseini, Kasra and
Ahnert, Ruth and
Lawrence, Jon and
McDonough, Katherine and
Tolfo, Giorgia and
Wilson, Daniel CS and
McGillivray, Barbara",
booktitle = "Proceedings of the 28th International Conference on Computational Linguistics (COLING)",
year = "2020",
address = "Barcelona (Online)",
publisher = "International Committee on Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.coling-main.400",
pages = "4534--4545",
}
Work for this paper was produced as part of Living with Machines. This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London. This work was also supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1.
- The source codes are licensed under MIT License.
- Copyright (c) 2020 The Alan Turing Institute, British Library Board, Queen Mary University of London, University of Exeter, University of East Anglia and University of Cambridge.
- The atypical animacy dataset hosted on the British Library Research Repository is licensed under CC0 1.0 Universal Public Domain.