Should Spacy move to UniversalDependencies (controversy)? #13738

ivan-kleshnin · 2025-01-30T07:10:22Z

ivan-kleshnin
Jan 30, 2025

The topic of UD was raised multiple times, e.g in #2485
But mostly as "How soon Spacy will switch to Universal Dependencies (UD)?" My question is different, hovewer.

Should Spacy transition to Universal Dependencies? 🤔

So I've been comparing Spacy graphs with CoreNLP graphs for a while... I've initially found that it's trivial to get to a master verb (to check if it's negated, its tense, etc.) from some matched token in Spacy and not so much in CoreNLP. Then I got a bad general sensation that a new approach will be harder and less performant to work with, at least for my tasks. And then I found this rabbit hole of UD vs DG (dependency grammar) – a polarising topic amonst linguists.

For non-specialists, to simplify: UD puts semantics over grammar and DG puts grammar over semantics. Imagine a Python parser favoring semantics over syntax... Sounds disturbing.

Here's an authoritative research with solid counter-arguments against UD:

The status of function words in dependency grammar: A critique of Universal Dependencies

The desire to subordinate function words to content words imposes a binary classification on all words; a given word is classified either as a function word or a content word. This is problematic, since the distinction between function and content word is not black and white. The distinction is, rather, more accurately captured in terms of a continuum, whereby prototypical function words and content words appear at opposite ends of the continuum, non-prototypical cases appearing somewhere on the continuum in-between.

Most of all, I'm concerned that this topic is casually discussed in other threads, like it's not a big deal, just a matter of some corpus refactoring 😨 I'm not a linguist, but my engineering experience is enough to see that UD is a huge breaking change. It also revisits foundations of linguistics since, I dunno, 1980 for benefits mostly focused on language translation. Undoubtedly an important topic, but not all linguistics and NLP boils down to that.

A migration to Spacy V4, shall it be UD-based, might be very hard for larger systems. I imagine a lot of graph-traversal algorithms would have to be revisited and replaced. One potential solution would be to support both approaches in parallel, but I'm not sure if the amount of work for that is tolerable.

More resources:

Universal Dependencies are hard to parse – or are they?

Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not
optimised for statistical parsing.

As an outsider, I can't speak for trends. Maybe UD is clearly winning in minds, so it's already decided in 2025.
But for me, at the moment, it doesn't look like that. It seems to be primarily pushed by Google and Stanford University.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should Spacy move to UniversalDependencies (controversy)? #13738

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Should Spacy move to UniversalDependencies (controversy)? #13738

ivan-kleshnin Jan 30, 2025

Should Spacy transition to Universal Dependencies? 🤔

Replies: 0 comments

ivan-kleshnin
Jan 30, 2025