Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UP/dependencies white paper #51

Open
ahalterman opened this issue Oct 24, 2018 · 7 comments
Open

Add UP/dependencies white paper #51

ahalterman opened this issue Oct 24, 2018 · 7 comments
Labels
critical must address before program functions

Comments

@ahalterman
Copy link
Member

It would be really useful as we're debugging UP on Arabic (and for users more generally) to have a white paper or algorithm describing how UP uses the dependency parse, similar to the Petrarch2 white paper. Some questions that have come up on Monday and before include:

  • Does UP only look at the root verb of the sentence, or at all the verbs in the sentence? e.g.
  • Does UP determine the event and actor phrases first, and then check the dictionaries, or is there an iterative process of using the dictionaries to determine the spans?
  • How does UP decide which direct objects are part of the event and which are the target actor? (Consolidating the final work addressing some previous issues)
  • How does UP handle prepositional phrases as part of the event or target?
  • To what extent does UP rely on part-of-speech tags in addition to dependency parses? e.g.
  • Does UP match the longest/most specific found verb phrase or does it stop after it finds the first?
  • How does UP handle passive constructions?
@PTB-OEDA
Copy link
Member

PTB-OEDA commented Oct 24, 2018 via email

@ahalterman
Copy link
Member Author

ahalterman commented Dec 4, 2018

Just wanted to update this issue with the second round of documentation, along with my comments on it. Some questions to address:

  • Does UP have the ability to look at pre-verb words when using Petr2-style dictionaries?
  • Weird al-Shabaab coding: the documentation says it gets coded as IMGMUSALQUAF. That's actually not the right coding. @philip-schrodt has lots of examples of actor codes that are waaaay to long ("BRAELICHRCHRCTHGOVLABOPPPTY")
  • Incomplete matching: the documentation uses the example of "Gondor's main opposition group." Will UP stop after recognizing "opposition group"? That would be incorrect because it's leaving out important information. Or would it know to continue and also code "Gondor"?
  • "In the cases where the actor code overrides the agent code, duplicates are removed" (pg. 5). Can you explain what these means and give an example? How does it know when an actor code "overrides" an agent code?
  • Clarify the ordering of detecting noun phrases and detecting triplets (section 2.3.1)
  • "Ukraine ratified agreement" example (pg 6). Here’s an example of how Petrarch/event data’s conception of the triple differs from NLP and why making an event coder is such a difficult task. “{Ukraine, ratified, agreement}” is actually NOT a correct triple, as event data defines it. The triple should be {Ukraine, ratified agreement, European Union}. The “agreement” noun should be part of the event, and the target (actor) is the European Union. How does UP handle this sentence and similar sentences?
  • Explaining verb-verb interactions (pg. 8). How did you come up with these rules? What’s the effect of including them on the overall accuracy numbers?
  • Can you run an experiment over a small corpus with and without this transformation and extract some sentences where it changes the coding?
  • Clarify whether UP can actually code without PICO. If someone came up with a wholly new ontology, could they use UP to code it? Or can it only code CAMEO/PLOVER?

ud_petrarch_documentation-AH.pdf

@ahalterman
Copy link
Member Author

Updated documentation. I've marked the issues above that were resolved, but most are still outstanding.

The documentation addressed two issues (the al-Shabaab and "Gondor opposition" examples), describing very different behavior from the previous version of the documentation. I don't see any changes to the code, though. Has the code been updated to reflect the new behavior described in the documentation?

ud_petrarch_documentation_v3.pdf

@PTB-OEDA
Copy link
Member

PTB-OEDA commented Dec 6, 2018 via email

@ahalterman
Copy link
Member Author

Or more importantly, more code!

@ahalterman
Copy link
Member Author

I'll also emphasize that the “{Ukraine, ratified, agreement}” problem still requires comments (and major work on the coder). This is a fundamental problem to overcome.

@philip-schrodt
Copy link
Contributor

Concur with Andy's comment: situations like that are why event coding is a problem distinct from the standard "event-triple" NLP issue (as numerous projects vastly larger and better funded than ours have learned over the past three decades to their dismay when they try to adapt generic NLP software to do event coding): depending on context, the target of an action can be in a variety of different places in the sentence/parse structure (the source is usually the subject, though depending on the clause structure of the sentence, sometimes not even that is true. But usually it is), and that's why verb dictionaries have 10K or so distinct patterns, and why outfits like Raytheon/BBN developed separate software just for event coding even though long ago they'd developed NLP software for just extracting triples.

@ahalterman ahalterman added the critical must address before program functions label Jan 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical must address before program functions
Projects
None yet
Development

No branches or pull requests

3 participants