Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IRI with space #780

Closed
skodapetr opened this issue Mar 13, 2017 · 4 comments
Closed

IRI with space #780

skodapetr opened this issue Mar 13, 2017 · 4 comments
Milestone

Comments

@skodapetr
Copy link
Contributor

File input.ttl contains IRI with encoded space:

_:b25978837	a <http://purl.bioontology.org/ontology/UATC/\u0020SERINE\u0020\u0020> .

The Turtle parser is able to read this IRI (with encoded spaces) as a valid IRI.
However Turtle writer does not encode the space on output, resulting in:

_:genid-ca45343d5be841429acf1ca16a6fe928-b25978837 a <http://purl.bioontology.org/ontology/UATC/ SERINE  > .

When we try to load this statement the parser fail with:

Exception in thread "main" org.eclipse.rdf4j.rio.RDFParseException: IRI included an unencoded space: '32' [line 2]
	at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportError(RDFParserHelper.java:322)
	at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportError(AbstractRDFParser.java:601)
...

A simple program can be utilized to reproduce this error:

    public static void main(String[] args) throws Exception {
        try (InputStream input = new FileInputStream(new File("input.ttl"));
                OutputStream output = new FileOutputStream(new File("output.ttl"))) {
            RDFParser parser = Rio.createParser(RDFFormat.TURTLE);
            parser.setRDFHandler(Rio.createWriter(RDFFormat.TURTLE, output));
            parser.parse(input, "http://localhost/base/");
        }
        try (InputStream input = new FileInputStream(new File("output.ttl"))) {
            RDFParser parser = Rio.createParser(RDFFormat.TURTLE);
            parser.parse(input, "http://localhost/base/");
        }
    }

This problem also propagates to the querying remote SPARQL endpoints.

The same issue can be experienced when using N-Triples, however for JSON-LD and RDF/XML no error is given.
The comment in TurtleParser suggest that the \n and similar (spaces?) should be handled as errors.

It's not clear what is an intended behaviour.

Tested with rdf4j 2.2 .

@barthanssens
Copy link
Contributor

I assume this is related to #69 (and that there is no strict checking on generating/writing IRIs because it impacts performance)

@abrokenjester
Copy link
Contributor

Also related to #712

catch-point pushed a commit to catch-point/rdf4j that referenced this issue Jun 1, 2017
catch-point pushed a commit to catch-point/rdf4j that referenced this issue Jun 1, 2017
Signed-off-by: James Leigh <james.leigh@ontotext.com>
catch-point pushed a commit to catch-point/rdf4j that referenced this issue Jun 1, 2017
Signed-off-by: James Leigh <james.leigh@ontotext.com>
catch-point pushed a commit to catch-point/rdf4j that referenced this issue Jun 1, 2017
Signed-off-by: James Leigh <james.leigh@ontotext.com>
@KnowledgeGarden
Copy link

Importing a ttl file with the TURTLE format, line 1 of which is this:
http://dbpedia.org/property/name http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/1999/02/22-rdf-syntax-ns#Property .
landed this log statement:
org.eclipse.rdf4j.rio.RDFParseException: IRI included an unencoded space: '32' [line 1]
at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportError(RDFParserHelper.java:322)

@KnowledgeGarden
Copy link

I now believe that my errors reported here were based on inappropriate use of the APIs. By switching to Model-based parsing before adding to the repo, all errors disappeared.

catch-point pushed a commit that referenced this issue Jun 7, 2017
Fix #780: Validate IRI by default when parsing RDF files
@catch-point catch-point added this to the 2.3 milestone Jun 7, 2017
heshanjse pushed a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
Signed-off-by: James Leigh <james.leigh@ontotext.com>

Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
heshanjse pushed a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
…-rdf4j#780-validate-iri

Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
heshanjse pushed a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
Signed-off-by: James Leigh <james.leigh@ontotext.com>

Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
heshanjse pushed a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
Signed-off-by: James Leigh <james.leigh@ontotext.com>

Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
heshanjse pushed a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
Signed-off-by: James Leigh <james.leigh@ontotext.com>

Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
heshanjse pushed a commit to heshanjse/rdf4j that referenced this issue Aug 27, 2017
…4j#780-validate-iri

Fix eclipse-rdf4j#780: Validate IRI by default when parsing RDF files
Signed-off-by: Heshan Jayasinghe <shanujse@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants