Myndighetsdata is an attempt to make data about the Swedish government agencies (myndigheter) more accessible. By data, I mean name and basic information such as contact details, address... It downloads the data from various sources, converts it to structured JSON files with a consistent format and even attempts to merge all these data points in one big list.
There are many government agencies in Sweden, they get called by various names and several hundred agencies have disappeared over the past decades. This data will hopefully be of some help to those who try to study public sector and build services building on government data. It's not a finished product, it's not 100% clean and exact but feel free to reuse it and contribute to make it even better! 😊
It's in the data folder:
- agv.json is data from Arbetsgivarverket
- esv.json comes from Ekonomistyrningsverket's myndighetsregister
- handlingar.json comes from handlingar.se
- scb.json combines SCB's myndighetsregister with the information about added and removed agencies
- sfs.json extracts agency names from the government's rättsdatabas
- stkt.json comes from Statskontoret's list
- wd.json comes from Wikidata
And merged.json is an attempt at merging all these files together by matching agencies by organisation numbers and by name (using fuzzy matching and some wild rules). It is not 100% correct as the underlying data is too unconsistent. But it can be used in order to complete Wikidata and improve the quality of government sources so that future merge attempts are easier.
You can use the code yourself to download the source files, extract the information from them and merge it.
For this, you need Python 3 and to install dependencies:
pip install -r requirements.txt
Once that is done, you can run the following commands:
# Download the source files (if DOWNLOAD is set to True) and extract the information from them
python run.py
# OBS: Arbetsgivarverket's data has to be downloaded manually
# Try to merge the lists into one
python smart_merge.py
# Rule-based cleaning to remove the biggest anomalies in the merged file
python manual_cleaning.py
The code is licensed under AGPLv3, which means you can reuse as long as you attribute, and that you can modify as long as you published what you make.
The data comes from a number of sources but they are all licensed as CC0, either explicitly or through praxis (allmänna handlingar can usually be considered CC0). So feel free to reuse as you please!