-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gender=Unsp #780
Comments
I would strongly advice against feature values that say "None", "Unsp(ecified)", and the like, even if technically it is possible to define them at the language-specific level. The correct UD way is to omit the feature completely from the word's annotation. It is also stated in the guidelines: Not mentioning a feature in the data implies the empty value, which means that the feature is either irrelevant for this part of speech, or its value cannot be determined for this word form due to language-specific reasons. Furthermore, this particular case (or, more precisely, its Spanish counterpart) is discussed here: "For example, in Spanish, nouns distinguish two genders, masculine and feminine, and every noun can be classified as either Masc or Fem. Adjectives are supposed to agree with nouns in gender (and number), which they typically achieve by alternating -o / -a. But then there are adjectives such as grande or feliz that have only one form for both genders. So we cannot tell whether they are masculine or feminine unless we see the context. Yet they are either masculine or feminine (feminine in una ciudad grande, masculine in un puerto grande). Therefore in Spanish we should not tag grande with Gender=Com. Instead, we should either drop the gender feature entirely (suggesting that this word does not inflect for gender) or tag individual instances of grande as either masculine or feminine, depending on context." |
Thank you @dan-zeman, I didn't pay attention in the end of the documentation. |
By the way, from discussions here I came to understand that more than one value for a feature signals ambiguity or indecision: so I think that the best fit for such "desperate" cases is Portuguese (and other languages's) adjectives always bear a |
@Stormur You are right that multiple values of a feature are possible, but the guidelines also say that if the multi-value would list all values that are relevant in the given language, then the feature should be dropped instead. |
Right, it is true that I was mainly thinking of Latin, where contrary to most modern Romance langiages we have also the neutral gender, and the ambiguity, apart possibly from really weird cases which I am not aware of, is only for I forgot this exact point you mention, but is it not better to still annotate for |
It would be useful to know the disambiguated value, if there is manpower to do it reliably. If it cannot be disambiguated, then I'm not sure about possible benefits of knowing the fine-grained reason of why it cannot be disambiguated. |
I think that the rationale is, rather than knowing the reasons for such an ambiguity, to keep annotational coherence for those word classes which normally do express a On the one hand, I think that a feature like On the other hand, if I were to do a search for elements with e.g. In general, in my opinion the problem is that the undecidedness of a feature is different than the absence of it, so in such non-systematic cases it still makes sense to have a "neutral" annotation rather than not having it at all; systematic is the key word here. And in this context, listing all possile cases would surely be better than using a specific "negative feature". |
In our article 'Universal Dependencies for Portuguese" we argue that an extra value for Gender is necessary (https://www.aclweb.org/anthology/W17-6523/):
Revisiting this topic, I wonder what other treebanks are doing. I see two possible solutions if a word can have multiple gender:
Unsp
only for cases where the context does not give enough information.Com
orNeut
for such words, regardless of the context (https://universaldependencies.org/u/feat/Gender.html)Comments?
The text was updated successfully, but these errors were encountered: