Add UP/dependencies white paper #51

ahalterman · 2018-10-24T17:46:30Z

It would be really useful as we're debugging UP on Arabic (and for users more generally) to have a white paper or algorithm describing how UP uses the dependency parse, similar to the Petrarch2 white paper. Some questions that have come up on Monday and before include:

Does UP only look at the root verb of the sentence, or at all the verbs in the sentence? e.g.
Does UP determine the event and actor phrases first, and then check the dictionaries, or is there an iterative process of using the dictionaries to determine the spans?
How does UP decide which direct objects are part of the event and which are the target actor? (Consolidating the final work addressing some previous issues)
How does UP handle prepositional phrases as part of the event or target?
To what extent does UP rely on part-of-speech tags in addition to dependency parses? e.g.
Does UP match the longest/most specific found verb phrase or does it stop after it finds the first?
How does UP handle passive constructions?

PTB-OEDA · 2018-10-24T20:26:07Z

Agreed. We need a document that does this ASAP.

…

On Wed, Oct 24, 2018 at 12:46 PM Andy Halterman ***@***.***> wrote: It would be really useful as we're debugging UP on Arabic (and for users more generally) to have a white paper or algorithm describing how UP uses the dependency parse, similar to the Petrarch2 white paper. Some questions that have come up on Monday and before include: - Does UP only look at the root verb of the sentence, or at all the verbs in the sentence? e.g. <https://github.com/openeventdata/UniversalPetrarch/blob/1fe09dd35d36ce1f850925aa9f1dfbad8960e8b1/UniversalPetrarch/PETRgraph.py#L632> - Does UP determine the event and actor phrases first, and then check the dictionaries, or is there an iterative process of using the dictionaries to determine the spans? - How does UP decide which direct objects are part of the event and which are the target actor? (Consolidating the final work addressing some previous issues) - How does UP handle prepositional phrases as part of the event or target? - To what extent does UP rely on part-of-speech tags in addition to dependency parses? e.g. <https://github.com/openeventdata/UniversalPetrarch/blob/1fe09dd35d36ce1f850925aa9f1dfbad8960e8b1/UniversalPetrarch/PETRgraph.py#L632> - Does UP match the longest/most specific found verb phrase or does it stop after it finds the first? - How does UP handle passive constructions? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#51>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJrP1r5-Sgd8Y3VYYmqj9GPteh-8CqlUks5uoKd-gaJpZM4X4dds> .

-- Patrick T. Brandt Professor Political Science School of Economic, Political and Policy Sciences University of Texas at Dallas Personal site: http://www.utdallas.edu/~pbrandt MSBVAR site: http://yule.utdallas.edu

ahalterman · 2018-12-04T14:24:13Z

Just wanted to update this issue with the second round of documentation, along with my comments on it. Some questions to address:

Does UP have the ability to look at pre-verb words when using Petr2-style dictionaries?
Weird al-Shabaab coding: the documentation says it gets coded as IMGMUSALQUAF. That's actually not the right coding. @philip-schrodt has lots of examples of actor codes that are waaaay to long ("BRAELICHRCHRCTHGOVLABOPPPTY")
Incomplete matching: the documentation uses the example of "Gondor's main opposition group." Will UP stop after recognizing "opposition group"? That would be incorrect because it's leaving out important information. Or would it know to continue and also code "Gondor"?
"In the cases where the actor code overrides the agent code, duplicates are removed" (pg. 5). Can you explain what these means and give an example? How does it know when an actor code "overrides" an agent code?
Clarify the ordering of detecting noun phrases and detecting triplets (section 2.3.1)
"Ukraine ratified agreement" example (pg 6). Here’s an example of how Petrarch/event data’s conception of the triple differs from NLP and why making an event coder is such a difficult task. “{Ukraine, ratified, agreement}” is actually NOT a correct triple, as event data defines it. The triple should be {Ukraine, ratified agreement, European Union}. The “agreement” noun should be part of the event, and the target (actor) is the European Union. How does UP handle this sentence and similar sentences?
Explaining verb-verb interactions (pg. 8). How did you come up with these rules? What’s the effect of including them on the overall accuracy numbers?
Can you run an experiment over a small corpus with and without this transformation and extract some sentences where it changes the coding?
Clarify whether UP can actually code without PICO. If someone came up with a wholly new ontology, could they use UP to code it? Or can it only code CAMEO/PLOVER?

ud_petrarch_documentation-AH.pdf

ahalterman · 2018-12-06T16:28:57Z

Updated documentation. I've marked the issues above that were resolved, but most are still outstanding.

The documentation addressed two issues (the al-Shabaab and "Gondor opposition" examples), describing very different behavior from the previous version of the documentation. I don't see any changes to the code, though. Has the code been updated to reflect the new behavior described in the documentation?

ud_petrarch_documentation_v3.pdf

PTB-OEDA · 2018-12-06T17:09:37Z

Agreed. This needs more documentation.

…

On Thu, Dec 6, 2018, 10:28 Andy Halterman ***@***.*** wrote: Updated documentation. I've marked the issues above that were resolved, but most are still outstanding. The documentation addressed two issues (the al-Shabaab and "Gondor opposition" examples), describing very different behavior from the previous version of the documentation. I don't see any changes to the code, though. Has the code been updated to reflect the new behavior described in the documentation? ud_petrarch_documentation_v3.pdf <https://github.com/openeventdata/UniversalPetrarch/files/2653914/ud_petrarch_documentation_v3.pdf> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#51 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJrP1nkunkZNS1HLP3GhZ4eSncwXq7wxks5u2UXKgaJpZM4X4dds> .

ahalterman · 2018-12-06T17:24:41Z

Or more importantly, more code!

ahalterman · 2018-12-06T18:24:01Z

I'll also emphasize that the “{Ukraine, ratified, agreement}” problem still requires comments (and major work on the coder). This is a fundamental problem to overcome.

philip-schrodt · 2018-12-06T18:31:24Z

Concur with Andy's comment: situations like that are why event coding is a problem distinct from the standard "event-triple" NLP issue (as numerous projects vastly larger and better funded than ours have learned over the past three decades to their dismay when they try to adapt generic NLP software to do event coding): depending on context, the target of an action can be in a variety of different places in the sentence/parse structure (the source is usually the subject, though depending on the clause structure of the sentence, sometimes not even that is true. But usually it is), and that's why verb dictionaries have 10K or so distinct patterns, and why outfits like Raytheon/BBN developed separate software just for event coding even though long ago they'd developed NLP software for just extracting triples.

ahalterman mentioned this issue Dec 6, 2018

Revise and update documentation #25

Open

ahalterman added the critical must address before program functions label Jan 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add UP/dependencies white paper #51

Add UP/dependencies white paper #51

ahalterman commented Oct 24, 2018

PTB-OEDA commented Oct 24, 2018 via email

ahalterman commented Dec 4, 2018 •

edited

Loading

ahalterman commented Dec 6, 2018

PTB-OEDA commented Dec 6, 2018 via email

ahalterman commented Dec 6, 2018

ahalterman commented Dec 6, 2018

philip-schrodt commented Dec 6, 2018

Add UP/dependencies white paper #51

Add UP/dependencies white paper #51

Comments

ahalterman commented Oct 24, 2018

PTB-OEDA commented Oct 24, 2018 via email

ahalterman commented Dec 4, 2018 • edited Loading

ahalterman commented Dec 6, 2018

PTB-OEDA commented Dec 6, 2018 via email

ahalterman commented Dec 6, 2018

ahalterman commented Dec 6, 2018

philip-schrodt commented Dec 6, 2018

ahalterman commented Dec 4, 2018 •

edited

Loading