Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make UniveralPetrarch pipeline compatible #43

Open
ahalterman opened this issue Jul 29, 2018 · 3 comments
Open

Make UniveralPetrarch pipeline compatible #43

ahalterman opened this issue Jul 29, 2018 · 3 comments
Labels
critical must address before program functions

Comments

@ahalterman
Copy link
Member

Now that UP is a little more stable, we need to start thinking about making it usable in production pipelines. In order for people (specifically the Spanish and Arabic teams) to be able to produce event data, UP needs to fit into our existing pipelines. This requires a few things:

  1. making UDPipe consume the JSON/Mongo format that the OEDA pipelines use, rather than XML
  2. writing custom code to fit UDPipe into, e.g. the stanford_pipeline. It should output OEDA-formatted JSON to store back in a Mongo.
  3. greatly simplifying the UDPipe installation process and ensuring that the correct versions are used.
  4. making sure UniveralPetrarch can take in JSON. I added code to do this (see test code here) but this should be tested with the actual output from 1 and 2.
@PTB-OEDA
Copy link
Member

@JingL1014, didn't we already do already do this with Sayeed? Did that code get pushed back here yet?

@ahalterman
Copy link
Member Author

Following up on this. Are any of these (1, 2, 3, 4) complete? This is necessary for us to produce Arabic event data.

@ahalterman ahalterman added the critical must address before program functions label Jan 25, 2019
@Sayeedsalam
Copy link

  1. We have updated UniversalPetrarch to consume JSON formatted data. The UD-Petrarch coder for English is running side-by-side with Petrarch2 event coder in the SPEC Pipeline.

  2. We are not using stanford-pipeline project to generate event coding from raw text data. Instead, we running our distributed framework. So we haven't tested adding UD-Petrarch to that pipeline.

  3. We currently use ufal-udpipe package for Python to do the parsing. https://pypi.org/project/ufal.udpipe. So use the parser, we need to install the package and download language-specific model files which can be automated. (i.e using requirements.txt and some programming)

  4. Yes, we are already using UD-Petrarch to code the English sentences and both input and output are in JSON format. Any incompatibility with OEDA format can be addressed. We are using MongoDB to store that data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical must address before program functions
Projects
None yet
Development

No branches or pull requests

3 participants