-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syntax- vs morphology-based feature assignment #791
Comments
Interesting problem. In all the examples you give, the feature is unmarked (zero signifier). One solution would be to add an unmarked value rather than Sing or Nom for these cases. (But to maintain Nom or Sing in other cases.) Do you have arguments to reject this solution? |
@sylvainkahane That could be one of the better solutions, since it is what I call syntax-based. I prefer syntax-based to the other. May you clarify "unmarked", though? If we take |
Incidentally, I was also thinking about such issue recently. Probably something similar was discussed in #780 . In my opinion, the best thing would be to mark things as they are: i.e. the bare nominal in Turkish (and other under this regard similarly functioning languages like Mongolian etc.) has not to be marked neither with
This is the crucial point: the fact of being "accusative" and of fulfilling the core relation that we call "object" in the clause are distinct facts. Morphological features register what can be observed with regard to the form of the word: if a case marker is absent and, as is the case here, we cannot talk of a zero-suffix in a paradigm, there simply is no case feature. Now, the question arises if an "absent" |
Another correlated issue, hoping not to stray too far, is if the -(y)I suffix in Turkish actually is a real accusative suffix, or rather a mark of definiteness. In this interpretation (towards which I am leaning), the Turkish case system does not have a systematic case marking like nominative/accusative for If I am not mistaken, something reminding of this happens in Finnish, where there is no "accusative", but partitive is used both with subjects and objects according to specific rules. |
The features in UD are generally described as a part of morphological annotation and I find it natural to favor morphological criteria over syntactic. However, this is not a strict requirement, and examples could be found where a feature is partially or completely driven by other criteria, such as syntax or semantics. What you call “syntactic accusative” is already recognizable by the However, it is of course possible that two particular positions in the paradigm have the same surface form. So I can imagine that one would say that pizza is either In any case, if the current approach in Turkish is modified, please make sure that
|
Effectively, in the end I agree with using Still, I am left wondering if there is room for a distinction between marked nominatives (as in Latin, Georgian,...) and unmarked ones (Turkish, Mongolian, English?,...). The latter would then be the unspecified case (it has been called casus indefinitus in literature sometimes). Another related question is: in languages with no paradigmatic variation of cases (such as Italian), do we still want to have For the marking of |
No. If there is no variation, there is no need to have the feature. But Italian might use Also, some languages that have case variation will have |
Ah, yes, of course, I'm always forgetting pronouns and was thinking only of nouns/adjectives. For If I am not mistaken, I understand the logic of not needing the feature, but am also thinking, from an operational point of view, of a multilingual search in which one would have to specify something like "no |
It hasn't, as far as I know. I suppose this is one of the points where UD stays close to traditional terminology in the hope that it will be better understood by the general crowd. While obviously there are other points where it departs from the tradition quite substantially :-} |
I also agree to use Case=Nom for the zero case. But I just want to make a distinction between zero morphemes and unmarking. Case=Nom would mean that we consider that there is a case marker, that is, that the absence of any other case marker is meaningful. |
In Uralic language studies there are different schools of thought regarding the use of Case=Nom vs Case=Acc for annotating the direct object when no identifying morphology is present. Research in the former Soviet Union seems to intermingle morphology with syntax -- this applies to krl, olo, kpv, koi. Shouldn't we be trying to limit ourselves to: column6=morphological features, column8=dependencytypes? Or is there a reason to reiterate dep-information in the features and feature-information in the deps? |
What does it mean? Would they distinguish |
Yes, they distinguish Case=Nom and Case=Acc for one word form depending on how it is used, sorry. In Karelian and Livvi the system works much the same as in Finnish and Estonian, i.e., in imperative predication the complete (not partitive) direct object noun (not personal pronoun) appears in the nominative form, elsewhere this same function in the singular is indicated by a genitive form, e.g. kala Case=Nom but kalan Case=Gen 'fish'. In the plural, kalat Case=Nom is used in both imperative and non-imperative. (The Finnish take on the situation) @nikopartanen |
I think the "Finnish approach" is the one that should be used in UD, and if our Karelian and Livvi treebanks use the "Karelian approach", it would be good to fix them.
Just to clarify: When you say "possessive suffix +accusative formatives", do you mean that 1. there are two morphemes, the first one is a possessive suffix not shown in the example, and the second one is the accusative formative -ӧс/-тӧ/-сӧ, OR do you mean that 2. there is just one morpheme -ӧс/-тӧ/-сӧ, which can be interpreted either as a possessive suffix or an accusative formative? |
Hi, neither 1. nor 2.
'I cooked the/that fish' OR 'I cooked my fish'
'I cooked that fish [we were talking about]' OR 'I cooked your fish'
'I cooked the/that fish [distinguishing it from other cookable items, perhaps]' OR 'I cooked his/her/its fish'
'I was cooking fish [generic]' OR 'I cooked fish [generic]' @nikopartanen please, say something if this analysis is wrong. In kpv and koi UD projects, we has chosen the UD-like approach where only distinct morphology is marked. In other words, we have used the following readings
The zero with an object dependency is labeled Case=Nom |
I am assuming you mean The approach of adding both |
Yes, thanks @dan-zeman , that should be [psor] with both Number[psor] and Person=[psor]. |
Yes, that would be my preference. On my opinion, omitting the |
I was considering again this issue after these last inputs, and indeed I find myself leaning towards a solution like the one proposed by @sylvainkahane , for all the reasons already discussed at length:
This point is critical and I think that it follows straight from one of UD principles to "annotate only what is there". Simply, some systems have unspecified forms with some patterns to make certain categories like Maybe another label for the value could be For example in English (and feel free to correct me if I am talking nonsense), we might argue that singular number is no longer expressed morphologically and is left to semantics: I cannot guess from forms like ball or rice alone that one accepts a plural inflection (balls) and the other does not if I do not know anything about the objects. But this is different from Italian where I know that palla 'ball', unerringly, is singular because the ending -a contrasting -e in palle 'balls' needs to be there, and for other word classes the same is expressed by other endings (e.g. lup-o 'wolf'). And in Italian, for this exact reason, it is probably much more viable than in English to inflect riso 'rice' in the plural risi (which would be usually interpreted as 'sorts of rice', or 'dishes of rice', and so on). |
No, no , no :-) Since the very beginning of UD, we have been clear about not explicitly using "Unspecified" as a value of a feature; instead, the feature is omitted. |
Thanks Dan for once again sobering me. So, under this light, my preference would remain to not tag for |
Deciding whether one of the values should be considered unspecified is a more delicate issue, and often requires to look at the language-specific context. (As you say, should a form be caseless or should it be |
A non-linguistic remark: For the maintenance of a treebank, it is very useful to know that a feature traditionally associated with a given POS is not missing by inadvertence but has been deliberately omitted. It seems that the better way to do that is to add a feature. @dan-zeman Do we have a particular policy concerning such features? |
As I said above, the policy is not to add such features :-) If desired/necessary for treebank maintenance, a note could be added in MISC. But I personally do not see a principal difference from the situation that a feature value is replaced with a wrong one (e.g., |
In Turkish, the accusative-(y)I might be covert, in order to give an unspecific & indefinite reading:
In the Turkish case paradigm, the suffixless noun is nominative. This means one would morphologically mark pizza 'pizza' in (2) nominative, not accusative. This is the case in the treebanks I looked up. Syntactically, however, it is accusative beyond any doubt.
Another similar decision can be made with plural subjects when their predicates morphologically seem singular. This is very common in Turkish, especially when the subject is non-human.
A plural predicate is kokuyorlar, with the plural suffix, instead of kokuyor. In this case, so-called syntax-based feature assignment requires
Number=Plur
, but morphology-based feature assignment requiresNumber=Sing
.In ellipsis and suspended affixation situations, this is more severely observed:
Dative suffix -(y)A obviously modifies whole coordinated NP Ali ve Ayşe, rather than just Ayşe. In our current annotation, which falls under morphology-based feature assignment, Ali is assigned
Case=Nom
. This is fallacious in my opinion. It should either be caseless or should carry the identical case as Ayşe.Some of these cases probably require an issue on their own but I thought they all share a simple theoretical decision, therefore I included a bunch (not all) of them here.
The text was updated successfully, but these errors were encountered: