Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error reporting when parsing RDF #7

Open
GoogleCodeExporter opened this issue Jul 31, 2015 · 4 comments
Open

Improve error reporting when parsing RDF #7

GoogleCodeExporter opened this issue Jul 31, 2015 · 4 comments

Comments

@GoogleCodeExporter
Copy link

Certain errors are swallowed silently when Foresite parses RDF XML.  It
appears that certain syntax errors (e.g. illegal/not-well-formed URIs) in
ORE RDF XML are not found eagerly.  They are found at the time Jena
attempts to read that portion of the graph.

For example, calling JenaOREParser.parse(InputStream is) with the attached
ReM returns null, with no errors reported.  If
JenaOREParser.parse(InputStream is) is modified to write out the model
prior to creating the ReM, the error is reported.  It would be nice if
Foresite/Jena could eagerly parse a ReM and report errors without silently
swallowing them.

After modifying JenaOREParser.parse(InputStream is) to write out the model,
I was able to see the error:
ERROR [main]: datapub.mapping.ForesiteOreRemMapper@104 2009-12-14
11:08:11,587 ORE ReM parsing failed: Only well-formed absolute URIrefs can
be included in RDF/XML output: <info://figure:figure1> Code:
0/ILLEGAL_CHARACTER in PORT: The character violates the grammar rules for
URIs/IRIs. 
com.hp.hpl.jena.shared.BadURIException: Only well-formed absolute URIrefs
can be included in RDF/XML output: <info://figure:figure1> Code:
0/ILLEGAL_CHARACTER in PORT: The character violates the grammar rules for
URIs/IRIs.
    at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.checkURI(BaseXMLWriter.java:768)
    at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.relativize(BaseXMLWriter.java:745)
    at com.hp.hpl.jena.xmloutput.impl.Basic.writeResourceReference(Basic.java:154)
    at com.hp.hpl.jena.xmloutput.impl.Basic.writePredicate(Basic.java:101)
    at com.hp.hpl.jena.xmloutput.impl.Basic.writeRDFStatements(Basic.java:77)
    at com.hp.hpl.jena.xmloutput.impl.Basic.writeRDFStatements(Basic.java:66)
    at com.hp.hpl.jena.xmloutput.impl.Basic.writeBody(Basic.java:40)
    at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.writeXMLBody(BaseXMLWriter.java:452
)
    at com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:424)
    at com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:410)
    at com.hp.hpl.jena.rdf.model.impl.ModelCom.write(ModelCom.java:270)
    at org.dspace.foresite.jena.JenaOREParser.parse(JenaOREParser.java:70)
    at
edu.jhu.library.datapub.mapping.ForesiteOreRemMapper.fromPublisherRem(ForesiteOr
eRemMapper.java:97)
    at
edu.jhu.library.datapub.mapping.MapperTest.testSimplePublisherMapping(MapperTest
.java:78)


Original issue reported on code.google.com by emets...@gmail.com on 14 Dec 2009 at 4:20

Attachments:

@GoogleCodeExporter
Copy link
Author

Attaching correct file.

Original comment by emets...@gmail.com on 14 Dec 2009 at 4:24

Attachments:

@GoogleCodeExporter
Copy link
Author

Original comment by azarot...@gmail.com on 15 Dec 2009 at 3:50

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Author

Test case added to repository, will accept patch if available?

Original comment by azarot...@gmail.com on 15 Dec 2009 at 4:05

@GoogleCodeExporter
Copy link
Author

Thanks for accepting!  I'm not sure yet what a good solution for this issue is. 
 I
suspect that a complete solution would require a bit of coding.  It would be 
nice to
take a ReM and check to see that it conforms to the assertions stated in
http://www.openarchives.org/ore/1.0/datamodel (e.g. the MUSTs in sections 3, 
4).  I
know that some are covered explicitly by Foresite (e.g. the protocol 
requirement in
URIs) but it seems that others are not (they are implicitly covered by Jena).

That said, what I did as a hack was to add the following to 
JenaOREParser.ResourceMap
parse(InputStream is):
Model model = this.parseToModel( is );
// Serialize the model to a null output stream
model.write( new OutputStream()
            {
                @Override
                public void write( int b ) throws IOException
                {
                    // do nothing
                }
            } );
...

I don't think this is the best solution, I'm not sure what the performance
implications are.  I'll think on this some more.  I'm wondering if there are 
some
options to pass to the Jena ARP parser which may help, but I haven't explored in
detail: http://jena.sourceforge.net/ARP/standalone.html
http://jena.sourceforge.net/IO/iohowto.html#input


Original comment by emets...@gmail.com on 15 Dec 2009 at 4:58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant