-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when to annotate compound
versus obj
#1013
Comments
I am afraid that the borderline is not very well developed in the UD guidelines. My impression is (disclaimer: at the moment I don't have time to re-read the guidelines, so the impression may be wrong) that we essentially said that if people want to annotate LVCs as something special, they can go with Obviously, this would not be / is not a good guideline when to use it and when to stick with the normal I completely agree with the part that says that semantic compositionality vs. idiomacity should not decide about UD syntactic relations. Which probably means that many such "compounds" are used wrongly and should be |
I think the concept of light verb construction inherently relies on the verb being semantically light, and the weights of the verbs in these constructions are definitely graded. So, in general, it is really difficult to identify an intransitive Returning to the compound definition, I find it is quite confusing. I think the guidelines wants to restrict the choice based on morphosyntactic criteria, but allows some "language specificity", and it is not clear - at least to me - what is exactly makes the positive examples in the documentation positive. From the examples, I get that noun-noun combinations without case marking should be compounds - at least for English. However, I also do not see 3 million dollar loan in the example having any level of non-compositional structure, which is given as the reason for the MWE relations including compounds. I think there is a need for a general rule/guideline here for determining what makes a compound more universally. Perhaps more important for the current thread, the documentation is mainly focused on noun compounds, not saying much about verbs (except referring to language-specific documentation). It is clear that there are some cross-linguistically similar cases. So, it would be nice to put these together as much as possible, and include in the universal documentation. |
I wonder then about why this isn't the case in English, e.g.:
In a world where those '?'s indicate that such expressions are disallowed, then by your suggestion, wouldn't give an order have order as a |
IMO the strongest case for Otherwise I'm not sure there is much to be gained by applying Taking a global perspective, I see your point that the definition of Savary et al. 2023 discuss UD and MWEs, and suggest that the |
There's quite a few types of N+V "compound" in Kyrgyz (and here I use the term to mean that the resulting meaning is not quite what one would expect if it were exactly semantically compositional—although this is maybe hard to make this call on in some instances, and it certainly isn't the deciding factor in UD). Here's a brief typology I came up with some years ago, with some examples: N with possessive morphologyN is 3rd person subjectMost of these are conjugated in 3rd person; English subject often corresponds to possessor of noun, overt with genitive case.
N is definite objectseem to be either causatives of above or just compositional?
N is in other cases
N without possessive morphology"N" is mostly limited in use to compound
"N" is ideophone
N is indefinite object
N is in other cases
Would we want Kyrgyz-specific guidelines for dealing with these, that involve |
Got it, makes sense.
It would be nice to know what the intention is / what the guidelines are, even if there's no exact definition currently. |
Potentially there could be Kyrgryz-specific criteria, or perhaps they could just be annotated as objects if there is not anything that makes them strikingly different morphosyntactically from regular (nonidiomatic) objects.
|
Both are LVCs but with different senses of ORDER. I think that the contrast come from the properties of these two different senses (ORDER2 is less countable). From the syntactic point of view, both constructions are I really agree with @dan-zeman and @nschneid that In the French treebanks, we have introduced the relation
It is a case where MWE annotation and syntactic annotation intersect, at least with UD annotation scheme which distinguish |
@sylvainkahane Interesting—I see the problem but must confess I find it counterintuitive to have obl:arg(besoin/NOUN, sous). From a basic UD perspective sous has to be either adnominal or adverbial, and this feels like it is trying to have it both ways (adnominal attachment but adverbial deprel). I think a noun is supposed to have |
In my current thinking, I'm very sympathetic to something like this approach, except:
So I'd go with I believe this is how I'd handle Yiddish ליב האָבן (lib hobn) "like", e.g. זײ האָבן ליב דעם הונט (zey hobn lib dem hunt), "they like the dog", with an accusative object in addition to ליב (lib). Or maybe this is okay to call 1 זײ _ PRON _ _ 2 nsubj _ _
2 האָבן _ VERB _ _ 0 root _ _
3 עס _ PRON _ _ 2 obj _ _
4 ניט _ ADV _ _ 2 advmod _ _
5 ליב _ PART _ _ 2 compound:lvc _ _ |
I'm guessing the Yiddish idiom is analogous to English "have no idea" (see above), where the clause has a separate object in a productive slot? Some sort of |
I would probably use |
Why xcomp? Is lib a secondary predicate? |
Yes, that would be the interpretation of the |
Are secondary predicates not more |
I am starting with a general consideration:
My maybe for many too radical solution would be to ditch
I do not really like this approach, because it mixes things. If besoin is annotated as an
I think that the solution is to acknowledge that have-verbs are auxiliaries exactly like be-verbs, and I suppose this is the insight behind such a mixed annotation. Then besoin would rightly be the head of the clause, but in any case this would not change the status of its
This still will not happen if a whole noun phrase appears in a copula. There might be
These are interesting cases. Though, is it not a little cherry-picking to base a "non-canonical" annotation for all occurrences of a given construction on outliers? Is it not possible to think of solutions for these cases specifically? For example:
I would favour the second solution, as it is the most straightforward and in line with other syntactic observations (also focusing, topicalisation, etc., so maybe even I envision also a third way, as discussed for the previous Yiddish example and maybe applicabile to Turkish tercih et etc., i.e. secondary predication. It seems to me the correct way to represent what is happening there, especially when make-verbs are involved, for example "I make something ( |
TBC are you proposing a second copula (attaching as |
Yes, I am thinking of that. The difference would be that one copula is intransitive, the other transitive. Of course, this raw idea needs to be elaborated further, but I think it is promising. Anyway, it would not change the |
I could not follow the discussion for a while, apologies if some of these were discussed earlier. First, I think there is a difference between the Fortunately, there seems to be some tests. Some of them can be applied probably for most - if not all languages. For Kyrgyz, I am sure we would at least find some guidelines that is valid for most (all?) Turkic languages. The tests I could collect are:
This is not an exhaustive list. Many (but not all) are also applicable to constructions with case marked and possessive nouns listed above. Probably we can find out/come up with more tests as well. Not all of these work on all cases, and there will definitely be leaks, but I don't think we are helpless for determining noun-verb compounds. If these constructions are rare in the language, and does not result in transitive verbs, the choice of treating them as verb-object constructions is maybe understandable. However, particularly for Turkish/Turkic I think this would make the analyses quite incoherent. |
I am not so sure about the validity of these tests. To me they seem very often to depend on the semantics of these nouns. So for example it might be that English treats shower in a way, as an uncountable entity, while in Italian you can well say
fare 'to do/make' is also a rather weak verb in Italian, but I would not end up saying that "take a shower" in English is a compound, while in Italian it is not. They really look the same to me, but then English treats some nouns differently. The really important observation is about transitivity, as in your example
But then I do not understand what a compound
The teatment of etmek as an auxiliary seems to be shared by some lexicographic sources (e.g. Wiktionary, the first I could find). Such an annotation would be more in line with other constructions. It would avoid the too wide range of |
What does "auxiliary" mean though? In Turkic languages, it usually refers to a verb that occurs as part of a single predicate with a non-finite verb form, such as şarkı söyleyip durdu ‘they kept singing’.
|
Hm, I would have said that I would try to define an auxiliary/ From what I am seeing here, etmek seems to fit this description in that it is so "light" that it even "loses" its object in favour of the true lexical head. It is just a support to form a transitive predicate. If kabul means 'acceptance', kabul etmek is 'to accept (smth)'; if tercih means 'preference', tercih etmek is 'to prefer (smth)'. This looks really regular, and all semantics are carried by the This seems very much stronger than supposedly light verbs like it. fare 'to do/make', which always keeps its transitive structure with a noun, no matter what (but then, it can also act like a "causative auxiliary" with other verbs). etmek seems to have gone a step farther, becoming a functional element. Also in söyleyip durdu the verb durdu still contributes to the content. Maybe it is more in the background, but I would not call it a copula (yet). Then, interestingly, the trend that we observe is that grammatical functions are devolved to the more functional element, while the lexical one is a "less finite" form. But I would not say this is a necessary nor a sufficient condition, just a general trend (observed e.g. in articles retaining case distinctions more than nouns, and so on). * I know that |
[I am adding some more data to the main point, I'd be very happy to discuss some of the other questions matters above, but I am afraid it may cause too much diversion from the original issue.] I do not think etmek in Turkish is I don not like the idea of analyzing telefon etmek (intransitive - dative argument) tamir et 'repair' (transitive) differently. These are very similar (lexical) constructions. Distinguishing these two forms because one is transitive and other is not produces inelegant analyses. Also, the lexicalized/MWE use and the productive/syntactic use may both be available in some cases. For example: (1) Bunlar birçok can ve mal kaybına neden olmaktadır. 'These cause many damages to life and property.' [BOUN ins_1502] (2) Bunu (bir) neden olmadan yapamayız. 'We cannot do this without (having/being) a reason/justification.' In (1) neden olmak is 'to cause' (intransitive - dative argument), and it is a MWE, in (2) neden 'reason' is the object of the verb olmak 'to be'. The structure is much more rigid in (1) than (2), and even though the verbal compound in (1) would not take an accusative object (it would take a dative argument), neden in (1) is still not an object. in (2) neden is clearly the object. I do not think we should be annotating these the same way. In short, I am pretty certain that these are MWEs, and should not be analyzed using usual syntactic dependencies (like |
I would like to answer to comment these points, and I am convinced they are quite relevant in a discussion about
You are rather precisely giving the definition of auxiliary (
Maybe I am pedantic, and I do not know if this was an error, but the dependency is
But even "non-derivational" morphology brings about semantic changes: for example, the anchoring to a time ( An important aspect is regularity. A construction like etmek seems to be extremely predictable. At the same time, derived adjectives in en. -ous only transmit a vague relation to the noun base, so petalous has something to do with petals, but what exactly is left to context. Also, another thing is if there are real alternatives to etmek to form such predicates.
This really strengthens a functional reading of etmek. Is this not very similar to the evolution of -bil- and -yor-?
From the examples before, I understood that telefon etmek is transitive... can it be both or did I understand wrong?
From your description, I get that olmak is a copula, so the dependency here is
And here I understand that yapmak is a fully lexical verb (by the way, could it be substitued for etmek here?) I agree the two sentences are different constructions (a copular and a transitive one). But if we eventually find that etmek behaves more like olmak than yapmak, then they should be annotated the same (or an equivalent) way.
This is a crucial point. MWE annotation is a different level than syntax (see e.g. the work on PARSEME): it should not be let percolate into it. In my opinion, doing so with relations like |
I'll try.
I disagree, the UD documentation says
The issue with et- is that et- normally combines with nouns. Furthermore, syntactically et- does not have the effect of complementing the predicate with additional TAME.
Yes you were pedantic ;-) It was meant to be
I agree. And, I would be willing to "invent" a function (more than considering it a stand-alone predicate with one or more objects) in syntax to assign to et-, if it was very predictable. But it is not, and this structure/construction is not specific to et-, there are other non-verb constructions with similar behavior. et- (ele-, kıl- in other Turkic languages) turns out to be the most productive with respect to what they can combine with. However, for a syntactic/inflectional construction, we'd expect it to be less selective. You cannot just combine et- with any noun: *kitap et-, *masa et-, *bilgisayar et - at least at this point in time.
This is a good point, but bil- and -yor attach to predicates (verbs), and they add TAME features. So, they fit the bill for
For UD, telefon et- is intransitive. We do not telefon et- 'someone' but 'to someone' (it has a dative argument, which in UD is
I do not think ol- is a copula. In fact, I do not think it is ever a copula in modern Turkish (it is in some other Turkic languages). It has an auxiliary function, but in this function it always attaches to predicates. Otherwise, it is, as far as I can tel, the fully lexical verb 'to become'. The literal translation may have been correct at some point in time. It may also be the reason for the current usage. However, for a Turkish speaker, the copular construction corresponding to they are the reason for many damages to life and property would be Bunlar birçok can ve mal kaybına nedendirler. Natural copula is an affix, with a limited/marked usage of a form i-. If you use ol- instead of i- (or the suffixe version), it would mean "they become the reason" normally. Semantics make it difficult to construct it here, but there is also a reading for neden ol- as "to become the reason". For example, I could easily say that başarının neden-i oldu 'he/she became the reason for success' (unambiguously with the help of the accusative marker), and başarıya neden oldu is ambiguous between 'he/she became the reason for success' and 'he/she caused the success', but no sense of 'he/she/it was the reason'.
Yes, yap- is a fully lexical verb (but it may also participate in similar constructions). And, if I understand the question correctly we cannot replace it with et-: *Bunu (bir) neden olmadan edemeyiz. at least not in standard Turkish (it may be acceptable in some dialects).
If neden ol-du was a proper copular construction ('was the/a reason'), We'd expect 'doktor ol-du' to also mean 'he/sh was the/a reason', but the second one is simply 'he became a doctor'. One more argument against ol- as copula: it can be passivized. neden ol-un-du is perfectly fine.
As I understand, |
Thanks for the comments and the discussion. They help a lot making the situation clearer! And I hope not to sound too grumpy in written form, it really is not the case 🙂
Here we are probably confusing the part of speech
It would be interesting to investigate the distribution of these other verbs. Anyway, selectivity becomes relevant only if etmek does appear in other contexts: but if nearly the totality of its appearances are in similar "copular constructions", then this would strengthen a treatment as I am not convinced about inventing new functions... how applicable could they be? Maybe a specific ones for "light verbs"?
Then ol- reminds me of the ambiguous status of fieri, also ca 'become', in Latin. OK, so ol- does not look like a copula, I was mislead by the translation. I am curious about that transitive construction with an accusative marker, though. Maybe I am confused because here we are using an intransitive, copular construction with become to express it in English.
In a sense it is, and this is a problem when MWE are such more from a semantic than morphosyntactic perspective. I admit I also have problems with the extension to verbal particles, because this is
I do not think that dictionaries should be a decisive criterion, they follow different logics, in fact mixing morphosyntax with lexical levels. |
The documentation on
compound
states thatcompound
isn't needed just because the meaning is lexicalised or idiomatic, giving example like make a decision.However, the documentation on
compound:lvc
gives the example çile çektiler - literally "they endured suffering". This seems fairly compositional and non-idiomatic (unless you translate the verb more generally, e.g. "pulled suffering"). Why wouldn't an example like this be annotated withobj
?More generally, how can we tell when to use
compound
, and especiallycompound:lvc
? The other two Turkish examples forcompound:lvc
seem reasonably like light verb constructions, since the verb et- really conveys very little lexical information in those examples. Is the criterion then about semantic content of the verb as compared to e.g. an object?We're considering English examples like make money, make a decision, give permission, place an order (which feel like a continuum from more LVC-like to less LVC-like) as well as Kyrgyz examples like буюртма бер ("place an order", literally "give an order") as compared to уруксат бер ("give permission"), which differ in that the noun in the former one does not inflect and cannot have dependents, whereas the noun in the latter example can inflect and take dependents (*буюртмамды бердим, уруксатымды бердим). We are also contrasting these with idiomatic expressions consisting of a subject and a verb (e.g., башым айланды "I got dizzy", literally "my head spun").
The question is how to know whether/when to annotate such constructions literally (
obj
,nsubj
) or as compounds (compound
,compound:lvc
).The text was updated successfully, but these errors were encountered: