Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to settle on a VerbType for AUX be verbs #9

Open
AngledLuffa opened this issue Dec 7, 2024 · 7 comments
Open

Need to settle on a VerbType for AUX be verbs #9

AngledLuffa opened this issue Dec 7, 2024 · 7 comments

Comments

@AngledLuffa
Copy link

Currently, AUX verbs be are labeled AuxType=be. This is not a standard annotation, which is generally not a problem, but there are in fact two possible annotations which would apply:

VerbType=Aux
VerbType=Cop

Presumably VerbType=Cop is a better choice, but it's not clear if that is going to be universally true

@nschneid
Copy link

nschneid commented Dec 7, 2024

Would this information be redundant with the cop and aux relations?

@AngledLuffa
Copy link
Author

Yes, I think so. @muteeurahman ?

Does that mean we should just not have the feature?

@muteeurahman
Copy link

Yes, I think so. @muteeurahman ?

Does that mean we should just not have the feature?
I think yes, redundancy will be there..

@AngledLuffa
Copy link
Author

AngledLuffa commented Dec 8, 2024

The AuxType currently isn't uniformly applied, regardless. Here are the current usages in xpos_tagged_with_features.conllu:

[john@localhost xpos_standard]$ grep "  ٿي      " xpos_tagged_with_features.conllu
4       ٿي      ٿي      AUX     VAUX    Gender=Fem|Number=Sing|Tense=Pres       3       aux     _       SpaceAfter=No
4       ٿي      ٿي      AUX     VAUX    Gender=Fem|Number=Sing|Tense=Pres       3       aux     _       SpaceAfter=No
4       ٿي      ٿي      AUX     VAUX    Gender=Fem|Number=Sing|Tense=Pres       3       aux     _       SpaceAfter=No
11      ٿي      ٿي      AUX     VAUX    AuxType=Be      10      cop     _       _
18      ٿي      ٿي      AUX     VAUX    Gender=Fem|Number=Sing|Tense=Pres       17      aux     _       _

AuxType shows up a couple times in the data tagged by MLtwist:

[john@localhost mltwist_xpos]$ grep AuxType *
Sindhi_100sentences_labeled_2024-11-25.txt:14   ٿي      _       AUX     VAUX    Gender=Masc|Number=Sing|Tense=Pres|Person=3|AuxType=Be  12      aux     _      _
sindhi_50_features_v2_labeled_2024-11-04.txt:3  آهيون   _       AUX     VAUX    Number=Plur|Tense=Pres|AuxType=Be       2       cop     _       SpaceAfter=No
sindhi_50_features_v2_labeled_2024-11-04.txt:7  ٿا      _       AUX     VAUX    Number=Plur|Tense=Pres|AuxType=Be       6       aux     _       SpaceAfter=No

(could find the whole sentences if that's helpful)

There are other cases of the word originally tagged AuxType in the data processed by MLtwist:

[john@localhost mltwist_xpos]$ grep "   آهن     " *
sd_initial_100_labeled_2024-12-04.txt:6 آهن     آهي     AUX     VAUX    Number=Plur|Tense=Pres  3       cop     _       _
sd_initial_100_labeled_2024-12-04.txt:7 آهن     آهي     AUX     VAUX    Number=Plur|Tense=Pres  6       aux     _       _
sd_initial_100_labeled_2024-12-04.txt:6 آهن     آهي     AUX     VAUX    Number=Plur|Tense=Pres  5       aux     _       _
sd_initial_100_labeled_2024-12-04.txt:4 آهن     آهي     AUX     VAUX    Number=Plur|Tense=Pres  3       aux     _       _
sd_initial_100_labeled_2024-12-04.txt:6 آهن     آهي     AUX     VAUX    Number=Plur|Tense=Pres  5       aux     _       _
Sindhi_100sentences_labeled_2024-11-25.txt:11   آهن     _       AUX     VAUX    Number=Plur|Tense=Pres  10      aux     _       _
Sindhi_100sentences_labeled_2024-11-25.txt:8    آهن     _       AUX     VAUX    Number=Plur|Tense=Pres  7       cop     _       _
Sindhi_100sentences_labeled_2024-11-25.txt:8    آهن     _       AUX     VAUX    Number=Plur|Tense=Pres  7       cop     _       _
Sindhi_100sentences_labeled_2024-11-25.txt:10   آهن     _       AUX     VAUX    Number=Plur|Tense=Pres  9       aux     _       _
... and others

so there doesn't seem to be a consistency between that word being labeled cop vs aux and having AuxType

Long story short, we could always just eliminate the few AuxType labels that show up, since it seems to be inconsistently labeled so far. Or is there a different annotation scheme we should use here?

@muteeurahman
Copy link

muteeurahman commented Dec 9, 2024

image

Usually copula verbs are standalone AUX in the sentence and Auxiliary verb is AUX Along with VERB. Standalone AUX are having cop relation with Subjects. In above sentences same AUX آھي is copula in sentence 8 while ordinary auxiliary verb in sentence 17.
What I see in above examples feature labels are inconsistant (or incorrect in some cases i.e. they are not copula but having AuxType=be). While in above sentence 8, AuxType should indicate that this is copula but not there.

@AngledLuffa
Copy link
Author

is there some work to be done to correct the AUX verbs, then? if i gather them up from the whole dataset and you could explain to the annotators what the issue is, perhaps they can go through looking just for the AUX that need fixing

@muteeurahman
Copy link

I think yes we need to revisit the AUX features. I along with annotators can do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants