Skip to content

Regional Identity and Cohesion Identified in Britain through Reddit Comments

Notifications You must be signed in to change notification settings

cjber/reddit-footprint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mapping Semantic Regional Variation in Great Britain through Reddit Comments

Python PyTorch Lightning

Cillian Berragan [@cjberragan]1*, Alex Singleton [@alexsingleton]1, Alessia Calafiore [@alel_domi]2 & Jeremy Morley [@jeremy_morley]3

1 Geographic Data Science Lab, University of Liverpool, Liverpool, United Kingdom
2 Edinburgh College of Art, University of Edinburgh, United Kingdom
3 Ordnance Survey, Southampton, United Kingdom

*Correspondence: c.berragan@liverpool.ac.uk

Abstract

Observed regional variation in geotagged social media text is often attributed to dialects, where features in language are assumed to exhibit region-specific properties. While dialects are seen as a key component in defining the identity of regions, there are a multitude of other geographic properties that may be captured within natural language text. In our work, we consider locational mentions that are directly embedded within comments on the social media website Reddit, providing a range of associated semantic information, and enabling deeper representations between locations to be captured. Using a large corpus of Reddit comments from UK related local discussion subreddits, we identify place names using a transformer-based named entity recognition model. Embedded semantic information is then generated from these comments and aggregated into local authority districts, representing the semantic footprint of these regions. These footprints broadly exhibit spatial autocorrelation, with clusters that conform with the national borders of Wales and Scotland. London, Wales, and Scotland demonstrate notably different semantic footprints compared with the rest of the UK, which may be explainable through the perception of national identity associated with these regions.

HuggingFace NER Model

The NER model used as part of this work is available on the HuggingFace model hub. Instructions for using this model are included on the model card.

https://huggingface.co/cjber/reddit-ner-place_names

Project layout

src
├── common
│   └── utils.py  # various utility functions and constants
├── preprocessing.py    # process comments with identified place names
├── embeddings.py    # generate sentence embeddings
└── zero_shot.py  # generate identities using zero shot

About

Regional Identity and Cohesion Identified in Britain through Reddit Comments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published