Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work-in-progress RDF/XML parser #9

Merged
merged 50 commits into from
Jun 13, 2019
Merged

Work-in-progress RDF/XML parser #9

merged 50 commits into from
Jun 13, 2019

Conversation

althonos
Copy link
Contributor

@althonos althonos commented May 22, 2019

Hi!

This is a work in progress branch, so it's not clear to merge right now, but this is how far I am currently with an RDF/XML parser. I'm parsing correctly most all of the RDF/XML 1.1 examples, and I need to add the tests from rdf-test.

I'll also feature-gate the parser behind an XML feature since it requires an additional dependency (quick-xml).

Missing features

  • Proper error reporting (currently everything panics, but I'm not sure I want to work on this before Move from error-chain to thiserror for error management #8 is resolved)
  • rdf:parseType="Literal" (this is likely to need a PR in quick-xml)
  • rdf:parseType="Collection"
  • Statements reification
  • Validation of elements names

sophia/src/parser/xml.rs Outdated Show resolved Hide resolved
sophia/Cargo.toml Show resolved Hide resolved
@althonos
Copy link
Contributor Author

cc @phillord if you want to see where this is going

@Tpt
Copy link
Contributor

Tpt commented Jun 5, 2019

Just to make sure you are aware: rudf provides a simple RDF/XML parser: https://github.com/Tpt/rudf/blob/master/lib/src/rio/xml.rs

@phillord
Copy link

phillord commented Jun 5, 2019

@althonos
Copy link
Contributor Author

althonos commented Jun 6, 2019

@Tpt : mine however supports parseType="collection" 😉

althonos added 24 commits June 6, 2019 18:59
@althonos
Copy link
Contributor Author

althonos commented Jun 7, 2019

Finally, this is feature-complete ! I still need to do a bit of refactoring, in particular to reduce code duplication and complexity, but this version behaves correctly against the RDF/XML test suite, including errors (it fails where a failure is expected). The only exception is the parseType="Literal" feature, which is not supported by the underlying quick-xml library, but I opened an issue in there to request that.

Streaming (i.e. iterating over the produced triples and unwraping the result) the Gene Ontology (go.owl) takes about 5~10 seconds on my machine, but there is probably still some optimisations to be carried out.

Copy link
Owner

@pchampin pchampin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much, it looks really good.
I suggested a few changes. Some of them (error chaining vs. linking, regex vs. pest) are more discussion prompts than requirement -- I'm ok with merging the PR even if they are left as is.

sophia/src/error.rs Outdated Show resolved Hide resolved
sophia/src/ns.rs Outdated Show resolved Hide resolved
sophia/src/ns.rs Outdated Show resolved Hide resolved
sophia/src/ns.rs Outdated Show resolved Hide resolved
sophia/src/parser/xml.rs Outdated Show resolved Hide resolved
@pchampin
Copy link
Owner

I know that last commit cost you 😉, much appreciated.
I'll take over to please Travis, and do the merge.
Thanks again

@pchampin pchampin merged commit abc5888 into pchampin:master Jun 13, 2019
@althonos
Copy link
Contributor Author

@pchampin : my only regret is not being able to add better error reports, I should have used xmlparser instead of quick-xml to have input spans and better error reporting, I may try to experiment to see how easy it is to replace it ! 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants