Skip to content

COMHIS/estc-physicaldimension

Repository files navigation

ESTC Physical Dimension

This repository is part of the ESTC data unification project by Helsinki Computational History Group (COMHIS).

Output

The workflow provides code and outcomes for processing the physical dimension (gatherings / book formats) information found in the English Short Title Catalogue (ESTC).

Input

After setup, the running time is just a few minutes on a normal laptop.

ESTC

Input data preparation steps

Update the required data folders from Git:

  • estc-data-unified
  • estc-data-verified

The main input is the parsed and prefiltered MARC data.

Use the ESTC field picker is used to pick the field 300a (physical dimension) separately. The branch antagomir-physicaldimension contains the Python scripts that were used to generate the parsed data file out/fields_picked_300a.csv Run "python3 fieldpicker_main.py" and check that file paths are ok.

Data processing

Run the script main.R to convert the raw MARC data into the final table of harmonized document dimensions. This will read in the data, harmonize the data, and finally write summary files.

Remember to push the changes in

  • this working repo
  • data repo estc-data-unified/estc-physicaldimension/

Software version

R.Version()
## $platform
## [1] "x86_64-pc-linux-gnu"
## 
## $arch
## [1] "x86_64"
## 
## $os
## [1] "linux-gnu"
## 
## $system
## [1] "x86_64, linux-gnu"
## 
## $status
## [1] "Patched"
## 
## $major
## [1] "3"
## 
## $minor
## [1] "6.3"
## 
## $year
## [1] "2020"
## 
## $month
## [1] "03"
## 
## $day
## [1] "11"
## 
## $`svn rev`
## [1] "78037"
## 
## $language
## [1] "R"
## 
## $version.string
## [1] "R version 3.6.3 Patched (2020-03-11 r78037)"
## 
## $nickname
## [1] "Holding the Windsock"

About

Processing the ESTC physical_dimension field

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages