This repository is part of the ESTC data unification project by Helsinki Computational History Group (COMHIS).
The workflow provides code and outcomes for processing the physical dimension (gatherings / book formats) information found in the English Short Title Catalogue (ESTC).
- Automated summaries: overview of harmonization and links to supporting data
- Output data tables.
After setup, the running time is just a few minutes on a normal laptop.
Update the required data folders from Git:
- estc-data-unified
- estc-data-verified
The main input is the parsed and prefiltered MARC data.
Use the ESTC field picker is used to pick the field 300a (physical dimension) separately. The branch antagomir-physicaldimension contains the Python scripts that were used to generate the parsed data file out/fields_picked_300a.csv Run "python3 fieldpicker_main.py" and check that file paths are ok.
Run the script main.R to convert the raw MARC data into the final table of harmonized document dimensions. This will read in the data, harmonize the data, and finally write summary files.
Remember to push the changes in
- this working repo
- data repo estc-data-unified/estc-physicaldimension/
R.Version()
## $platform
## [1] "x86_64-pc-linux-gnu"
##
## $arch
## [1] "x86_64"
##
## $os
## [1] "linux-gnu"
##
## $system
## [1] "x86_64, linux-gnu"
##
## $status
## [1] "Patched"
##
## $major
## [1] "3"
##
## $minor
## [1] "6.3"
##
## $year
## [1] "2020"
##
## $month
## [1] "03"
##
## $day
## [1] "11"
##
## $`svn rev`
## [1] "78037"
##
## $language
## [1] "R"
##
## $version.string
## [1] "R version 3.6.3 Patched (2020-03-11 r78037)"
##
## $nickname
## [1] "Holding the Windsock"