Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract lists for POS & MSD from data and encode them in TEI #91

Open
VeronikaEngler opened this issue Nov 5, 2024 · 1 comment
Open
Assignees

Comments

@VeronikaEngler
Copy link
Collaborator

Extract lists for POS & MSD from files in manannot (oxygen, xpath) and encode in TEI: https://github.com/acdh-oeaw/vicav-content/blob/master/vocabs/fLib.xml#L174C13-L190C20

@rausch-supola rausch-supola self-assigned this Nov 18, 2024
@rausch-supola
Copy link
Collaborator

rausch-supola commented Dec 9, 2024

These are the extracted results:

pos
-: 1
: 1475
Islamic eulogy: 1
activeParticiple: 258
activePartriciple: 6
adjective: 605
adverb: 204
article: 33
coll.: 1
collcetiveNoun: 7
collectiveNoun: 4
conjunction: 53
conujunction: 1
dem.pron.: 1
demonstrative: 8
demonstrativeParticle: 3
demonstrativePronoun: 35
discourseParticle: 2
existential: 4
genexp: 7
indefinite: 3
indefinitePronoun: 10
indirect object-suffix: 10
interjection: 26
interrogative: 41
interrogativePronoun: 1
multiWordUnit: 82
name: 16
negation: 11
negationParticle: 7
noun: 3929
numeral: 101
participle: 2
particle: 25
passiveParticipe: 3
passiveParticiple: 65
persSuffix: 1
preposition: 119
prepositionalPhrase: 1
present prefix: 4
present suffix: 1
pronoun: 37
pronounSuffix: 33
properNoun: 6
pseudoVerb: 4
purp: 1
reflexive: 1
relativeParticle: 23
toponym: 1
verb: 24454
verbalNoun: 37
vocative particle: 4
{pos}: 652

msd
1.pl.: 9
1.sg.: 11
2.f.pl.: 2
2.f.sg.: 3
2.m.pl.: 4
2.m.sg.: 8
3.f.pl.: 4
3.f.sg.: 13
3.m.pl.: 12
3.m.sg.: 13
: 2248
ap.m.sg.: 1
coll: 2
collective: 31
conjunction: 4
construct + suffix: 5
construct-suffix: 2
construct: 10
dual : 1
dual: 33
f.: 1
f.pl.: 12
f.sg.: 218
imp.2.f.pl.: 2
imp.2.f.sg.: 10
imp.2.m.pl.: 15
imp.2.m.sg.: 55
imp.f.pl.: 62
imp.f.sg.: 65
imp.m.pl.: 63
imp.m.sg.: 64
impf. 3.sg.: 1
impf.1.m.sg.: 1
impf.1.pl.: 1981
impf.1.sg.: 1571
impf.2.f.pl.: 1019
impf.2.f.sg.: 1023
impf.2.m.pl.: 1053
impf.2.m.sg.: 949
impf.3.f.pl.: 1237
impf.3.f.sg.: 939
impf.3.m.pl.: 1515
impf.3.m.sg.: 1309
impf.3.pl.: 1
impf.3.pl.f.: 1
impf.3.sg.: 1
impf.3.sg.f.: 5
m.pl.: 13
m.sg.: 226
name: 8
pf.1.pl.: 1083
pf.1.sg.: 1108
pf.2.f.pl.: 1073
pf.2.f.sg.: 1078
pf.2.m.pl.: 1073
pf.2.m.sg.: 1084
pf.3.f.pl.: 1053
pf.3.f.sg.: 1072
pf.3.m.pl.: 1067
pf.3.m.sg.: 1203
pl.: 676
preposition: 1
properNoun : 2
properNoun: 33
sg.: 2287
{msd}: 2691

@dasch124 @charlymo my questions here are:

  1. Should every value be included? There seem to be some typos
  2. Where I don't find vicav counterparts, should I invent an xml:id abbreviation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants