o2r meta

This is a collection of tools for extract-map-validate workflows.

schema & documentation of the o2r metadata
extract - collect meta information from files or session
broker - translate metadata from o2r to third party schemas
validate - check if metadata set is valid to the schema
harvest - collect metadata from external sources via OAI-PMH

For their role within o2r, please refer to o2r-architecture.

License

Installation

o2r meta is designed for python 3.6 and supports python 3.4+.

Installation steps

(1) Acquire python version 3.4+.

(2) Parts of o2r meta require the gdal module that is known for causing trouble when installed via PIP. Therefore it is recommended to prepare the installation like this:

sudo add-apt-repository ppa:ubuntugis/ppa -y
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable -y
sudo apt-get -qq update
sudo apt-get install -y python3-dev
sudo apt-get install -y libgdal1h
sudo apt-get install -y libgdal-dev
sudo apt-get build-dep -y python-gdal
sudo apt-get install -y python-gdal
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal

and afterwards install gdal this way:

pip install GDAL==$(gdal-config --version | awk -F'[.]' '{print $1"."$2}')

Alternatively you can use a precompiled python wheel (note: these are inofficially provided) of the gdal module that fits your desired platform.

(3) Install the required modules:

pip install -r requirements.txt

Using Docker

Another way of installation is provided by the Dockerfile. Build it like this:

docker build -t meta .

And start the extractor (e.g.) like this:

docker run meta o2rmeta.py -debug extract -i extract/tests -o extract/tests -xo

Documentation

Current documentation as part of the ERC-SPEC (GitHub)
Current structure dummy
MD of the erc configuration file
~~schema draft~~

Usage

When calling o2r meta, you can chose from the following commands, each representing one tool of the o2r meta suite: extract, validate, broker and harvest.

python o2rmeta [-debug] extract|validate|broker|harvest <ARGS>

Options:

debug : option to enable verbose debug info where applicable

Each tool then has a number of required arguments:

(1) Extractor tool:

python o2rmeta.py extract -i <INPUT_DIR> -s|-o <OUTPUT_DIR> [-xo] [-m] [-xml] [-ercid <ERC_ID>]

Example call:

python o2rmeta.py -debug extract -i extract/tests -o extract/tests -xo

Explanation of the switches:

-i <INPUT_DIR> : required starting path for recursive search for parsable files
-s: option to print out results to console. This switch is mutually exclusive with -o. At least one of them must be given
-o <OUTPUT_DIR> : required output path, where data should be saved. If the directory does not exist, it will be created on runtime. This switch is mutually exclusive with -s. At least one of them must be given.
-xo : option to disable http requests (the extractor will stay offline. This disables orcid retrieval, erc spec download, doi retrieval, ...)
-m : option to additionally enable individual output of all processed files.
-xml : option to change output format from json (default) to xml.
-ercid <ERC_ID>: option to provide an ERC identifier.
-b <BASE_DIR>: option to provide starting point directory for relative paths output

Supported files and formats for the metadata extraction process:

Feel free to open an issue for suggestions!

Current version:

file type	description	extracted part	status
(r session)	live extraction	memory objects	under evaluation
.cdl/.nc	NetCDF	geometry	under evaluation
.csv/.tsv	seperated values	column headers	planned
.geojson/.json	GeoJSON	geometry	WIP
.gpkg	OGC GeoPackage	geometry	planned
.jp2	JPEG2000	geometry	planned
.py	python script	all	planned
.r	R Script	all	implemented
.rmd	R-Markdown	all	implemented
.shp	Esri shapefile	geometry	implemented
.tex	LaTeX	header	planned
.tif(f)	geo TIFF	geometry	planned
.yml	YAML	metadata	planned
bagit.txt	BagIt	metadata	implemented
...	...	...	...

(2) Brokering/Mapping tool

The broker has two modes: In mapping mode, it creates fitting metadata for a given map by following a translation scheme included in that mapping file. In checking mode it returns missing metadata information for a target service or plattform, e.g. zenodo publication metadata, for a given checklist and input data.

The broker can be used to translate between different standards for metadata sets. For example from extracted raw metadata to schema-compliant metadata. Other target outputs might DataCite XML or Zenodo JSON. Translation instructions as well as checklists are stored in json formatted map files.

python o2rmeta.py broker -i <INPUT_FILE> -c <CHECKLIST_FILE>|-m <MAPPING_FILE> -s|-o <OUTPUT_DIR>

Example calls:

python o2rmeta.py -debug broker -c broker/checks/zenodo-check.json -i schema/json/example_zenodo.json -o broker/tests/all

python o2rmeta.py -debug broker -m broker/mappings/zenodo-map.json -i broker/tests/metadata_raw.json -o broker/tests/all

Explanation of the switches:

-c <CHECKLIST_FILE> : required path to a json checklist file that holds checking instructions for the metadata. This switch is mutually exclusive with -m. At least one of them must be given.
-m <MAPPING_FILE> : required path to a json mapping file that holds translation instructions for the metadata mappings. This switch is mutually exclusive with -c. At least one of them must be given.
-i <INPUT_FILE> : path to input json file.
-s: option to print out results to console. This switch is mutually exclusive with -o. At least one of them must be given.
-o <OUTPUT_DIR> : required output path, where data should be saved. If the directory does not exist, it will be created on runtime. This switch is mutually exclusive with -s. At least one of them must be given.

Supported checks/maps

service	checklist file	mapping file	status	comment
zenodo	zenodo-check.json	zenodo-map.json	WIP	zenodo will register MD @ datacite.org
eudat b2share	eudat-b2share-check.json	eudat-b2share-map.json	WIP	b2share supports custom MD schemas
...	...	...	...	...

Additionally the following features will be made available in the future:

Documentation of the formal map-file "minimal language" (create your own map-files).
Governing JSON-Schema for the map files (validate map-files against the map-file-schema).

(3) Validator tool:

python o2rmeta.py validate -s <SCHEMA> -c <CANDIDATE>

Example call:

python o2rmeta.py -debug validate -s schema/json/o2r-meta-schema.json -c schema/json/example1-valid.json

Explanation of the switches:

-s : required path or URL to the schema file, can be json or xml.
-c : required path to candidate that shall be validated.

(4) Harvester tool:

Collects OAI-PMH metadata from catalogues, data registries and repositories and parses them to assist the completion of a metadata set. Note, that this tool is currently only a demo.

python o2rmeta.py harvest -e <ELEMENT> -q <QUERY>

Example call:

python o2rmeta.py -debug harvest -e"doi" -q"10.14457/CU.THE.1989.1"

Explanation of the switches:

-e : MD element type for search, e.g. doi or creator
-q : MD content to start the search

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

o2r meta

License

Installation

Using Docker

Documentation

Usage

(1) Extractor tool:

Supported files and formats for the metadata extraction process:

(2) Brokering/Mapping tool

(3) Validator tool:

(4) Harvester tool:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
broker		broker
extract		extract
harvest		harvest
schema		schema
validate		validate
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
o2rmeta.py		o2rmeta.py
requirements.txt		requirements.txt

License

jansule/o2r-meta

Folders and files

Latest commit

History

Repository files navigation

o2r meta

License

Installation

Using Docker

Documentation

Usage

(1) Extractor tool:

Supported files and formats for the metadata extraction process:

(2) Brokering/Mapping tool

(3) Validator tool:

(4) Harvester tool:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages