Machine learning (ML) methods present new ways of approaching archaeological research questions and interest in applying these methods continues to grow.
This repository collects resources relating to the application of ML methods to archaeological data, aiming to:
- provide an overview of the ways ML is being applied in archaeology
- spark new ideas whilst reducing duplication of work
- encourage the sharing of code, data, and other resources
- make resources more FAIR (Findable, Accessible, Interoperable, and Reuseable)
By doing this, we hope to support practitioners to learn about, critically apply, or contribute to conversations about, ML in archaeology.
Check out our πΊοΈ roadmap for an overview of what we're working on, or go straight to the β contributor guidelines.
Please cite the project if you've found it useful. Releases are made at regular intervals and archived on Zenodo.
- ML case studies (split by application area)
- π datasets
- π glossary of technique names
task | authors | year | data type | technique | paper | code | data |
---|---|---|---|---|---|---|---|
segmentation for carved reliefs | Ji et al. | 2023 | RGB images [digital photos], depth map, soft-edge images | CNN [DenseNet121] | paper | nan | nan |
classification for ceramic elemental analysis | Ruschioni et al. | 2023 | x-ray fluorescence | LR, LDA, MLP, SVM, DT, RF, NB, KNN | paper | code | data |
classification for ceramic sherds | Helden et al. | 2022 | RGB images [smartphone photos], synthetic data | CNN [VGG19, Mobilenetv2, ResNet50v2, Inceptionv3] | paper | models | data |
classification for multiple artefact types | Resler et al. | 2021 | RGB images [digital camera photos] | CNN [EfficientNetB3], KNN | paper | nan | data |
classification for ceramic petrography | Lyons | 2021 | RGB images [microscope photos] | CNN [VGG19, ResNet50] | paper | nan | nan |
object detection for rock carvings | Tsigkas et al. | 2020 | RGB images [digital camera photos] | CNN [YOLOv2, TinyYOLOv2] | paper | nan | nan |
classification for lithics | Grove and Blinkhorn | 2020 | lithic types, period | NN | paper | code | data |
classification for ceramic elemental analysis | Charalambous et al. | 2016 | x-ray fluorescence | KNN, DT, LVQ | paper | nan | nan |
task | authors | year | data type | technique | paper | code | data |
---|---|---|---|---|---|---|---|
classification for multi-cell phytoliths | Berganzo-Besga et al. | 2022 | RGB images [microscope photos] | CNN [VGG19, ResNet50v2] | paper | code | nan |
classification for contexts | Vos et al. | 2021 | geochemistry, phytolith type and quantity | DT | paper | nan | data |
classification for starch granules | ArrΓ‘iz et al. | 2016 | morphometric and optical measurements | RF | paper | nan | nan |
task | authors | year | data type | technique | paper | code | data |
---|---|---|---|---|---|---|---|
masked language modelling for archaeological text | Brandsen | 2023 | english language | BERT | paper | model | nan |
named entity recognition for archaeological text | Brandsen | 2023 | english language | BERT | paper | model | nan |
masked language modelling for archaeological text | Brandsen | 2023 | dutch language | BERT | paper | model | nan |
named entity recognition for archaeological text | Brandsen | 2023 | dutch language | BERT | paper | model | data |
masked language modelling for archaeological text | Brandsen | 2023 | german language | BERT | paper | model | nan |
named entity recognition for archaeological text | Brandsen | 2023 | german language | BERT | paper | model | nan |
restoration/attribution for ancient Greek inscriptions | Assael et al. | 2022 | transcribed inscriptions, place, time | transformer | paper | code | data |
transliteration and segmentation of cuneiform characters | Gordin et al. | 2020 | encoded Unicode cuneiform | bidirectional LSTM | paper | code | data |
task | authors | year | data type | technique | paper | code | data |
---|---|---|---|---|---|---|---|
transfer learning between geographic areas | Sech et al. | 2023 | lidar visualisations [e2MSTP] | CNN [U-Net, DeepLabv3+, ResNet, EfficientNet, SegFormer] | paper | nan | nan |
segmentation for mounds on maps | Berganzo-Besga et al. | 2023 | RGB images [historical maps], synthetic data | CNN [Mask R-CNN] | paper | nan | on request |
segmentation for field systems | Küçükdemirci et al. | 2022 | lidar DTMs | CNN [U-Net] | paper | nan | nan |
classification for hollow roads | Verschoof-van der Vaart and Landauer | 2021 | lidar visualisations [local relief model, openness], lidar DTM | CNN [ResNet34] | paper | nan | nan |
classification for land use | Mboga et al. | 2020 | panchromatic images [historical aerial photographs] | CNN [FCN-ATR-SKIP, U-Net] | paper | nan | nan |
classification for war landforms | de Matos-Machado et al. | 2019 | morphometric measurements | SOM, HAC | paper | nan | nan |
object detection for mining pits | Gallwey et al. | 2019 | lidar DSM | U-Net | paper | model | nan |
object detection for multiple classes | Verschoof-van der Vaart and Lambers | 2019 | lidar visualisations [simple local relief model] | CNN [Faster R-CNN] | paper | nan | nan |
task | authors | year | data type | technique | paper | code | data |
---|---|---|---|---|---|---|---|
regression for neolithic sites | Li et al. | 2023.3 | topography, hydrology | RF | paper | nan | nan |
regression for neolithic sites | Li et al. | 2023 | topography, hydrology | RF | paper | nan | nan |
classification for site dating | Reese | 2021 | ceramic types, dendochronology dates | NN | paper | code | data |
regression for roman sites | Castiello and Tonini | 2021 | soil, topography | RF | paper | nan | nan |
regression for formative period sites | Yaworsky et al. | 2020 | environmental, topography | MaxEnt, RF | paper | code | data |
regression for strontium isoscapes | Bataille et al. | 2020 | strontium, coordinates, geology, climate, environmental, anthropogenic | RF | paper | code | data |
regression for strontium isoscapes | Funck et al. | 2020 | strontium, coordinates, geology, climate, environmental | RF | paper | nan | data |
classification for habitat suitability | Jones et al. | 2019 | climate, topography | RF | paper | nan | nan |
regression for strontium isoscapes | Bataille et al. | 2018 | strontium, geology, climate, environmental, topographic | RF | paper | code | nan |
classification for soil geochemistry | Oonk and Spijker | 2015 | soil geochemistry | KNN, SVM, NN | paper | nan | nan |
task | authors | year | data type | technique | paper | code | data |
---|---|---|---|---|---|---|---|
proposed null dataset for lithics | Eren et al. | 2023 | tbc, qual and quant info from naturally fractured rocks | nan | paper | nan | nan |
dataset for named entity recognition | Brandsen et al. | 2020 | dutch language | named entity recognition | paper | nan | data |
dataset for maya site detection | Kokalj et al. | 2023 | lidar visualisations [multiple], lidar canopy height, SAR, optical satellite | object recognition, object detection, semantic segmentation | paper | nan | data |
acronym | technique |
---|---|
BERT | bidirectional encoder representations from transformers |
CNN | convolutional neural network |
DT | decision tree |
HAC | hierarchical agglomerative clustering |
KNN | k-nearest neighbours |
LDA | linear discriminant analysis |
LR | logistic regression |
LSTM | long short-term memory network |
LVQ | learning vector quantisation |
MaxEnt | maximum entropy |
MLP | multi-layer perceptron |
NB | naive bayes |
NN | neural network |
RF | random forest |
SOM | self-organizing map |
SVM | support vector machine |