-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Head direction (again) #236
Comments
Just referring to the original issue, now closed: #189 |
I will try to show some benefits of the current approach to coordination. I was not part of the Uppsala discussion group on coordination and I am not a strong supporter of the current approach. But it is indeed good to think about why we actually want to use this approach. Coordination is not a normal dependency relation. Ideally we would want a model where the whole coordination (i.e. all conjuncts, conjunctions and punctuation) were defined as a constituent, which would participate in the surrounding dependency tree as just one node. It would have a parent, and optionally it may have one or more children. In addition, each of the coordination members (conjuncts, conjunctions) could head its own dependency subtree. Let me call this 3D model because in addition to the usual dimensions of word order and dependency, we would have a third dimension for coordination relations. Unfortunately it is not very practical to work with the third dimension. We therefore need a technical solution that will allow us to represent the same reality using ordinary “dependency” relations and dedicated labels for the relations. We want to be able to reconstruct the reality (or the 3D representation) without loss of information, i.e. there should be a 1:1 mapping between the 3D model and the technical solution (the current approach actually does not 100% satisfy this requirement but it does not have anything to do with the first-vs-last-conjunct-head problem). Now since it is a technical solution and the relations are not normal dependencies, we cannot say that one of the conjuncts heads the others: they should be at the same level and the reason they are not is a technical limitation. Thus we need a technical rule how to attach them. Technical means that you do not have to understand the sentence to be able to apply the rule. If they tell you that these two nodes are conjuncts, there is a prescribed way of expressing it. The procedure should be deterministic so that people can convert back and forth between various representations of coordination, ideally not losing information. This is not just about the 3D vs. 2D models; in 2D alone, there are tons of approaches how coordination might be represented, and people may and will want to be able to switch between them. I believe this is a good argument at least for having just one consistent approach to coordination per language. That is, if there are two coordinations in two sentences, annotators are not allowed to pick the first conjunct as head in one and the last conjunct in the other. That would make any transformations impossible because you could not reliably restore the original annotation. And if I said that I am not a strong supporter of enforcing the first-conjunct rule in all languages, I would strongly oppose attempts to go this far. It is a matter of taste whether the argument is also strong enough to forbid different direction in each language. One of the strenghts of UD is that you can finally make assumptions about relations and their labels (while previously this was just a bag of black-box features which parsers tried to reproduce without understanding a single bit of them). The more universal these assumptions are, the easier it is for the tools to utilize them. Seen from this angle, having a single technical solution for coordinations in all languages is better than having two options, and a table that says which option is used in which language. But of course the latter is doable, if we decide it's worth it. |
I am aware, and I agree, that most annotation choices in coordination is arbitrary. However, I also hope that my post above shows that there is some usability/linguistic reasons for having a language-specific head direction for the relations that are "technical solutions" rather than linguistically motivated ones. Besides the reason I mentioned above (immediately accessible features on the head), I also think that allowing language-specific head direction would make the job of the automated tools easier rather than making it more difficult. After all, languages exhibit different general tendencies for head direction, and, when there is no principled reason for a preference, it could be helpful to have the direction of the head to follow the general tendency in the particular language being processed, rather than an arbitrary universal choice. |
For the record, if the dependency direction in the technical phrases is ever allowed to be specified differently for each language, the |
I would add the A related note:
|
@coltekin, @dan-zeman, Let me add a comment to this old issue since it is still open.
should be a representation if we abide by the left-head rule, but the dependency of "が" really confuses the structure -- it breaks the Japanese (desirable) principle that functional words (e.g.
and it makes everything easier, including conversion from existing corpus, comparison in parsing accuracy with other approaches and interpreting the results. So we totally agree with @coltekin's idea to allow language-specific head direction. I created a new issue #356 for further Japanese-specific discussion. |
Coordination "heads" have been heavily discussed during preparation of the upcoming version 2 UD guidelines. It has been proposed (http://universaldependencies.org/v2/coordination.html) that a language can opt for right-headed coordination. However, the subsequent discussion in https://github.com/UniversalDependencies/UD_v2/issues/3 inclines (although by no means unanimously) back to the all-left-headed rule. |
@dan-zeman : I'd suggest to close this as this issue has been extensively discussed for v2 and the v2 guidelines are now being published. |
This is to stir up the earlier discussion about the universal left-to-right marking of
conj
(also a few others includingname
). I have read the report at http://universaldependencies.github.io/docs/2015-08-23-uppsala/coordination.html, but did not find the arguments convincing. In general, report argues why left-to-right marking is not so much of a problem, but I do not see any arguments in favor of the original specification, either in the report or elsewhere. I still argue that default head direction inconj
and similar constructions should be either a language-dependent choice, or it can even be relaxed completely, allowing annotators to mark based on what makes sense for the data, defaulting to a language-specific guideline if there are no clear reasons.My argument (as before) is based on suffixes that are attached to the last conjunct but affect the complete coordinated structure. In Turkish (and many other languages), the suffixes that apply to each conjunct are omitted in all except the last one. Alternatively, (as suggested in the report mentioned above) we can consider this as the whole coordinated structure is suffixed, not the last word. In any case, since the suffix is on the last word, it makes the last word the natural choice for being the head. Here is a simple example:
Here, accusative suffix is affecting both conjuncts. As a result, it makes sense to mark the last conjunct,
Ayşe'yi
, as the head of the conjunction. That way, the object of the verb will correctly have accusative marking on the head of the "phrase". This, after all, is what happens with other types of phrases. For example if the phrase is a complex noun phrase, the case information would be on the head. Here is a simple example without coordination.We do not look for far-left or far-right of the phrase to discover that the object is accusative. We look at the head of the phrase. So, if the "phrase" is a coordinated structure, it is natural to expect to find such features on the head as well.
You may still ask, why my expectations matter. Here are two scenarios:
dobj
and see if the dependent is in accusative case. If we keep the current standard:dobj
relationconj
,name
(and few other?) relations, then look at the case of the dependentconj
links get their order, and pick the last one), and do not forget to recurse, because your last element could be another coordinated structure. When you are done, you have your case. Hopefull the query language and the application in use allow doing that in a manner one can formulate within the bounds of reasonble effort (and we should rememeber that not all users of treebanks are compuational linguists)conj
andname
are special if there is enough data, but we are talking about a relatively infrequent structure and hand-annotated data here.Although my examples are with the
Case
feature, the problem is not limited toCase
, all other features (Number
,Tense
,Mood
, ...) are affected by the same issue.One last note: I agree with the report that the above is similar to "Tom and Jerry’s Diner" in English, which is correct. But similar structures in English UD are annotated like the following:
Since the possessive suffix is split in English, one has the luxury of connecting it to the arbitrarily determined head ('s is connected to the first word, not to the word that it is attached to on the surface). To achieve the same consistent behavior in Turkish, we may end up splitting almost all of the suffixes (and this has been what we were trying to prevent as much as possible).
I think this deserves another consideration. I would particularly be interested to know the benefits of the universal head direction for these relations in the current specification. I understand the need for being conservative, but the proposed change does not require modification to the existing treebanks either.
The text was updated successfully, but these errors were encountered: