morphgnt_lexicon.yaml
-- stem lexicon with all the verbs in MorphGNTmorphgnt_generate.py
-- script that goes through (usingpy-sblgnt
) all the verb forms in given NT books and validates that the code + dataset generates the correct formgenerate_morphgnt_lexicon.py
-- similar tomorphgnt_generate.py
but instead of just validating and showing unexplainable forms, builds the starting point for a lexicon file to explain the formsmake_morphgnt_test.py
-- script for generating a YAML test file (like those intest_data/
) based on verbs in John (or change line 10)make_morphgnt_nominal_test.py
-- likemake-morphgnt_test.py
but for nouns and adjectives.morphgnt_utils.py
-- common code used by the above scripts
To test John's gospel:
$ ./morphgnt_generate.py 4
To test the Johannine Epistles:
$ ./morphgnt_generate.py 23 24 25
morphgnt_generate.py
also takes --lexicon
and --stemming
arguments to change the stem lexicon and stemming rule files respectively.
- run
generate_morphgnt_lexicon.py
on the book(s) you want to add coverage for:./generate_morphgnt_lexicon.py BOOK1 BOOK2 ... > tmp1
cat morphgnt_lexicon.yaml tmp1 > tmp2
./sort_lexicon.py tmp2 > morphgnt_lexicon.yaml
- review all lines in
morphgnt_lexicon.yaml
that have# @
(you can review about 10 a minute once you get good at it)
@m
means multiple possible stems, for example:
ἀγνοέω:
stems:
1-: {'ἀγνοου{athematic}', 'ἀγνοε', 'ἀγνοο'} # @m
which should be manually corrected to:
ἀγνοέω:
stems:
1-: ἀγνοε
@1
means a single possible stem. Verify and make sure if the lemma already exists that the new stem is moved in with the others.
For example:
ἀποκόπτω:
stems:
3+: ἀπεκοψ
stems:
2-: ἀποκοψ # @1
needs to be changed to
ἀποκόπτω:
stems:
2-: ἀποκοψ
3+: ἀπεκοψ
stems:
@0
means no stem could be guessed. This normally means a missing stemming.yaml
rule.
Run ./morphgnt_generate.py
with appropriate arguments to test your new lexicon.
Failures could just be mistakes you made in step 5 or could be missing stemming.yaml
rules.