The code in the src/
directory can be used to re-create or update unarXive.
- software
- Tralics (Ubuntu:
# apt install tralics
) - latexpand (Ubuntu:
# apt install texlive-extra-utils
) - GROBID
- Tralics (Ubuntu:
- data
- arXiv source files: see arXiv.org help - arXiv Bulk Data Access
- OpenAlex (works records only)
- Prepare arXiv metadata with:
utility_scripts/generate_metadata_db.py
- Prepare OpenAlex DB with:
utility_scripts/generate_openalex_db.py
- Parse arXiv sources with:
prepare.py
(ornormalize_arxiv_dump.py
+prase_latex_tralics.py
) - Match reference items with:
match_references_openalex.py
- Extend matched data with:
extend_matched.py
(adds arXiv IDs to matched references and discipline information) - Verify and analyze result with:
utility_scripts/calc_stats.py