You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The topic of UD was raised multiple times, e.g in #2485
But mostly as "How soon Spacy will switch to Universal Dependencies (UD)?" My question is different, hovewer.
Should Spacy transition to Universal Dependencies? 🤔
So I've been comparing Spacy graphs with CoreNLP graphs for a while... I've initially found that it's trivial to get to a master verb (to check if it's negated, its tense, etc.) from some matched token in Spacy and not so much in CoreNLP. Then I got a bad general sensation that a new approach will be harder and less performant to work with, at least for my tasks. And then I found this rabbit hole of UD vs DG (dependency grammar) – a polarising topic amonst linguists.
For non-specialists, to simplify: UD puts semantics over grammar and DG puts grammar over semantics. Imagine a Python parser favoring semantics over syntax... Sounds disturbing.
Here's an authoritative research with solid counter-arguments against UD:
The desire to subordinate function words to content words imposes a binary classification on all words; a given word is classified either as a function word or a content word. This is problematic, since the distinction between function and content word is not black and white. The distinction is, rather, more accurately captured in terms of a continuum, whereby prototypical function words and content words appear at opposite ends of the continuum, non-prototypical cases appearing somewhere on the continuum in-between.
Most of all, I'm concerned that this topic is casually discussed in other threads, like it's not a big deal, just a matter of some corpus refactoring 😨 I'm not a linguist, but my engineering experience is enough to see that UD is a huge breaking change. It also revisits foundations of linguistics since, I dunno, 1980 for benefits mostly focused on language translation. Undoubtedly an important topic, but not all linguistics and NLP boils down to that.
A migration to Spacy V4, shall it be UD-based, might be very hard for larger systems. I imagine a lot of graph-traversal algorithms would have to be revisited and replaced. One potential solution would be to support both approaches in parallel, but I'm not sure if the amount of work for that is tolerable.
Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not
optimised for statistical parsing.
As an outsider, I can't speak for trends. Maybe UD is clearly winning in minds, so it's already decided in 2025.
But for me, at the moment, it doesn't look like that. It seems to be primarily pushed by Google and Stanford University.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The topic of UD was raised multiple times, e.g in #2485
But mostly as "How soon Spacy will switch to Universal Dependencies (UD)?" My question is different, hovewer.
Should Spacy transition to Universal Dependencies? 🤔
So I've been comparing Spacy graphs with CoreNLP graphs for a while... I've initially found that it's trivial to get to a master verb (to check if it's negated, its tense, etc.) from some matched token in Spacy and not so much in CoreNLP. Then I got a bad general sensation that a new approach will be harder and less performant to work with, at least for my tasks. And then I found this rabbit hole of UD vs DG (dependency grammar) – a polarising topic amonst linguists.
For non-specialists, to simplify: UD puts semantics over grammar and DG puts grammar over semantics. Imagine a Python parser favoring semantics over syntax... Sounds disturbing.
Here's an authoritative research with solid counter-arguments against UD:
The status of function words in dependency grammar: A critique of Universal Dependencies
Most of all, I'm concerned that this topic is casually discussed in other threads, like it's not a big deal, just a matter of some corpus refactoring 😨 I'm not a linguist, but my engineering experience is enough to see that UD is a huge breaking change. It also revisits foundations of linguistics since, I dunno, 1980 for benefits mostly focused on language translation. Undoubtedly an important topic, but not all linguistics and NLP boils down to that.
A migration to Spacy V4, shall it be UD-based, might be very hard for larger systems. I imagine a lot of graph-traversal algorithms would have to be revisited and replaced. One potential solution would be to support both approaches in parallel, but I'm not sure if the amount of work for that is tolerable.
More resources:
Assessing Theoretical and Practical Issues of Universal Dependencies
UD are fundamentally flawed
As an outsider, I can't speak for trends. Maybe UD is clearly winning in minds, so it's already decided in 2025.
But for me, at the moment, it doesn't look like that. It seems to be primarily pushed by Google and Stanford University.
Beta Was this translation helpful? Give feedback.
All reactions