RepoEval

repo_eval.py: Script to check for availability of PIDs (are they online?) and to check all checksums + filesize.

How to use: $python -o OUTPUT_FILE -u USERNAME -p PASSWORD (username and password are optional)

create_html.py: Script to create Statistics HTML for TALAR and plot generation of mimetype size and count.

How to use: $python -i OAI.xml -o OUTPUT_DIRECTORY/ (-is is optional. If not defined, OAI.xml will get downloaded first)

acl_check.py: Script to compare current ACL settings to previous ones. Creates report.csv (unavailable ACLs + changes to previous ACL) + acl.json (JSON with all resources + associated ACL).

check_checksums.sh and check_checksums_local2.py: Python2 script to compare sha1 checksums directly on TALAR.

How to use: $ bash check_checksums.sh TARGET_DIR/ OUTPUT_DIR/

Creates affected_cmdis.cv (listing CMDIs which sha1 was not found) and not_in_cmdi.csv (files not found in any CMDI)

update_cmdi_with_IDs/ contains a strongly modified, simplified version of the BioDataNER tool.

cmdi_extractor.py extracts all authoritative IDs from any Person, Author or Organization component and adds it to a cache.

How to use: $ python cmdi_extractor.py PATH_TO_CACHE PATH_TO_CMDIs --new_cache (--new-cache flag is optional)

cmdi_updater.py takes the cache and updates every CMDI with missing authoritative IDs

How to use: $ python cmdi_updater.py PATH_TO_CACHE PATH_TO_CMDIs

The cache is not a CSV anymore, but a JSON file. IDs outside of VIAF are now supported as well.

Validate OAI.xml: There is no command-line tool/Python library that can handle the validation of XML files as complex as the OAI.xml (to my knowledge). Using Xerces (http://xerces.apache.org/xerces-c/), however works and is also used by http://oai.clarin-pl.eu/. xmlValid.sh contains a sample script in how this can be done:

$ bash xmlValid.sh -v -n -np -s -f OAI_File (requires Java to be installed)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
checksums_local		checksums_local
update_cmdi_with_IDs		update_cmdi_with_IDs
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
acl_restricted.txt		acl_restricted.txt
check_acl.py		check_acl.py
create_html.py		create_html.py
repo_eval.py		repo_eval.py
requirements.txt		requirements.txt
xmlValid.sh		xmlValid.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepoEval

About

Releases

Packages

Contributors 2

Languages

License

SfS-ASCL/RepoEval

Folders and files

Latest commit

History

Repository files navigation

RepoEval

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages