Skip to content

Latest commit

 

History

History
102 lines (78 loc) · 4.01 KB

README.md

File metadata and controls

102 lines (78 loc) · 4.01 KB

EpiDoc Parser

Python

Python parser for EpiDoc (epigraphic documents in TEI XML).

For example idp.data-sheet uses the parser to generate a single CSV sheet of the Papyri.info Integrating Digital Papyrology data.

Usage

Installation

Install the package

pip install git+https://github.com/Xennis/epidoc-parser

Load a document

Load a document from a file

import epidoc

with open("my-epidoc.xml") as f:
    doc = epidoc.load(f)

Load a document from a string

import epidoc

my_epidoc = """<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.stoa.org/epidoc/schema/8.13/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="hgv74005">
   [...]
</TEI>
"""

doc = epidoc.loads(my_epidoc)

Get data from a document

Call the attributes, for example

>>> doc.title
"Ordre de paiement"
>>> doc.material
"ostrakon"
>>> doc.languages
{"en": "Englisch", "la": "Latein", "el": "Griechisch"}
>>> [t.get("text") for t in doc.terms]
["Anweisung", "Zahlung", "Getreide"]
>>> doc.origin_place.get("text")
"Kysis (Oasis Magna)"
>>> doc.origin_dates[0]
{"notbefore": "0301", "notafter": "0425", "precision": "low", "text": "IV - Anfang V"}

Documentation

Field EpiDoc source element (XPath)
commentary //body/div[@type='commentary' and @subtype='general']
edition_foreign_languages //body/div[@type='edition']//foreign/@xml:lang
edition_language //body/div[@type='edition']/@xml:lang
idno //teiHeader/fileDesc/publicationStmt/idno
authority //teiHeader/fileDesc/publicationStmt/authority
availability //teiHeader/fileDesc/publicationStmt/availability
languages //teiHeader/profileDesc/langUsage/language
material //teiHeader/fileDesc/sourceDesc/msDesc/physDesc/objectDesc//support/material
origin_dates //teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origDate
origin_place //teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origPlace
provenances //teiHeader/fileDesc/sourceDesc/msDesc/history/provenance
reprint_from //body/ref[@type='reprint-from']
reprint_in //body/ref[@type='reprint-in']
terms //teiHeader/profileDesc/textClass//term
title //teiHeader/fileDesc/titleStmt/title

Development

Create a virtual environment, enable it and install the dependencies

python3 -m venv venv
. venv/bin/activate
pip install --requirement requirements.txt

Run the test

make unittest

LICENSE

Code

see LICENSE

Test data

The test data in this project is from the project idp.data by Papyri.info. This data is made available under a Creative Commons Attribution 3.0 License, with copyright and attribution to the respective projects.