Gender=Unsp #780

arademaker · 2021-04-29T13:06:10Z

In our article 'Universal Dependencies for Portuguese" we argue that an extra value for Gender is necessary (https://www.aclweb.org/anthology/W17-6523/):

There are adjectives such as grande (‘big’) or feliz (‘happy’) that have only one form for both genders. So we cannot tell whether they are masculine or feminine unless we see the context they appear in. In many cases, even looking at the full sentence, one cannot tell if the word is masculine or feminine.

Revisiting this topic, I wonder what other treebanks are doing. I see two possible solutions if a word can have multiple gender:

if we do have enough information in the context, assign the right gender using the Unsp only for cases where the context does not give enough information.
we unrestrictedly use Com or Neut for such words, regardless of the context (https://universaldependencies.org/u/feat/Gender.html)

Comments?

The text was updated successfully, but these errors were encountered:

dan-zeman · 2021-04-29T13:45:03Z

I would strongly advice against feature values that say "None", "Unsp(ecified)", and the like, even if technically it is possible to define them at the language-specific level. The correct UD way is to omit the feature completely from the word's annotation. It is also stated in the guidelines: Not mentioning a feature in the data implies the empty value, which means that the feature is either irrelevant for this part of speech, or its value cannot be determined for this word form due to language-specific reasons.

Furthermore, this particular case (or, more precisely, its Spanish counterpart) is discussed here: "For example, in Spanish, nouns distinguish two genders, masculine and feminine, and every noun can be classified as either Masc or Fem. Adjectives are supposed to agree with nouns in gender (and number), which they typically achieve by alternating -o / -a. But then there are adjectives such as grande or feliz that have only one form for both genders. So we cannot tell whether they are masculine or feminine unless we see the context. Yet they are either masculine or feminine (feminine in una ciudad grande, masculine in un puerto grande). Therefore in Spanish we should not tag grande with Gender=Com. Instead, we should either drop the gender feature entirely (suggesting that this word does not inflect for gender) or tag individual instances of grande as either masculine or feminine, depending on context."

arademaker · 2021-04-29T13:50:25Z

Thank you @dan-zeman, I didn't pay attention in the end of the documentation.

Stormur · 2021-04-30T10:54:02Z

By the way, from discussions here I came to understand that more than one value for a feature signals ambiguity or indecision: so I think that the best fit for such "desperate" cases is Gender=Fem,Masc, which means that it can be either, but we cannot decide.

Portuguese (and other languages's) adjectives always bear a Gender category, so it should be present; and Gender=Com has a deceptive name, but it really is something else and language-specific.

dan-zeman · 2021-04-30T11:13:54Z

@Stormur You are right that multiple values of a feature are possible, but the guidelines also say that if the multi-value would list all values that are relevant in the given language, then the feature should be dropped instead.

Stormur · 2021-04-30T16:31:55Z

@Stormur You are right that multiple values of a feature are possible, but the guidelines also say that if the multi-value would list all values that are relevant in the given language, then the feature should be dropped instead.

Right, it is true that I was mainly thinking of Latin, where contrary to most modern Romance langiages we have also the neutral gender, and the ambiguity, apart possibly from really weird cases which I am not aware of, is only for Fem/Masc.

I forgot this exact point you mention, but is it not better to still annotate for Gender in such cases? I mean, this is still useful to distinguish those cases where for some reasons there is an ambiguity, from those where the word really does not inflect nor expresses the category.

dan-zeman · 2021-04-30T19:26:31Z

It would be useful to know the disambiguated value, if there is manpower to do it reliably. If it cannot be disambiguated, then I'm not sure about possible benefits of knowing the fine-grained reason of why it cannot be disambiguated.

Stormur · 2021-05-04T11:12:48Z

I think that the rationale is, rather than knowing the reasons for such an ambiguity, to keep annotational coherence for those word classes which normally do express a Gender. I think that the main reason this issue apparently keeps resurfacing is this: one would like to still say that a word has a Gender, even if it is not possible to determine it. This should be a very rare occurrence, anyway.

On the one hand, I think that a feature like InflClass helps in keeping this kind of coherence, because we would still be able to make a difference between truly indeclinable (InflClass=Ind) elements and others which just happen to not have been annotated for a feature linked to inflection, like Gender.

On the other hand, if I were to do a search for elements with e.g. Gender=Fem, I think I would want to include those ambiguous cases, too.

In general, in my opinion the problem is that the undecidedness of a feature is different than the absence of it, so in such non-systematic cases it still makes sense to have a "neutral" annotation rather than not having it at all; systematic is the key word here. And in this context, listing all possile cases would surely be better than using a specific "negative feature".

dan-zeman added features Romance labels Apr 29, 2021

dan-zeman added this to the v2.8 milestone Apr 29, 2021

dan-zeman added the question label Apr 29, 2021

arademaker closed this as completed Apr 29, 2021

arademaker mentioned this issue Apr 29, 2021

remover Gender=Unsp UniversalDependencies/UD_Portuguese-Bosque#297

Closed

Stormur mentioned this issue Jun 24, 2021

syntax- vs morphology-based feature assignment #791

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gender=Unsp #780

Gender=Unsp #780

arademaker commented Apr 29, 2021 •

edited

Loading

dan-zeman commented Apr 29, 2021

arademaker commented Apr 29, 2021

Stormur commented Apr 30, 2021

dan-zeman commented Apr 30, 2021

Stormur commented Apr 30, 2021 •

edited

Loading

dan-zeman commented Apr 30, 2021

Stormur commented May 4, 2021 •

edited

Loading

Gender=Unsp #780

Gender=Unsp #780

Comments

arademaker commented Apr 29, 2021 • edited Loading

dan-zeman commented Apr 29, 2021

arademaker commented Apr 29, 2021

Stormur commented Apr 30, 2021

dan-zeman commented Apr 30, 2021

Stormur commented Apr 30, 2021 • edited Loading

dan-zeman commented Apr 30, 2021

Stormur commented May 4, 2021 • edited Loading

arademaker commented Apr 29, 2021 •

edited

Loading

Stormur commented Apr 30, 2021 •

edited

Loading

Stormur commented May 4, 2021 •

edited

Loading