-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConjunctiveGraph doesn't handle parsing datasets with default graphs properly #436
Comments
It might make sense that one should simply parse into the cg = rdflib.ConjunctiveGraph()
cg.default_context.parse(data=data, format='trig')
print cg.serialize(format='trig') By doing it like this (along with a bunch of fairly recent fixes on RDFLib master), this could be considered good enough. It doesn't seem intuitive though. Leaving this open in case we want to redesign the parsing of datasets to make this more obvious. |
hmm, so maybe the 6.0.0 label was wrong? can this go in 4.2.2 then (so no backwards incompatibility) and just be closed and re-opened if desired? |
There would be no change by telling users to parse into I'd say leave this open (but for 5.0.0 maybe?) since it is about changing the parsing usage/behaviour when parsing dataset syntaxes (nquads, trig, json-ld and trix). The current wiring of graphs, contexts and underlying stores could really do with such an overhaul. |
This issue is still a problem in RDFlib 6.0.2. The workaround of We really do need to be able to say: cg = Dataset()
cg.parse("some-quads-file.trig") # RDF file type worked out by guess_format() ... and then have the default_context == whatever the Trig file said the default graph was. |
Fix is more or less ready, please have a look: |
When ConjunctiveGraph.parse is called, it wraps its underlying store in a regular Graph instance. This causes problems for parsers of datasets, e.g. NQuads, TriG and JSON-LD.
Specifically, the triples in the default graph of a dataset haphazardly end up in bnode-named contexts.
Example:
While I've attempted to overcome this by using the underlying
graph.store
in these parsers, they cannot access thedefault_context
of ConjunctiveGraph through this store. It is there in the underlying store, but its identifier is inaccessible to the parser without further changes to the parse method of ConjunctiveGraph.This becomes tricky because the contract for ConjunctiveGraph:s parse method is:
I am not sure how we can change this behaviour, since client code may rely on this. We could either add a new method, e.g.
parse_dataset
, or a flag. That would not be obvious to all users though, and somehow I would like to change the behaviour to handle datasets as well. It is always possible to get/create a named graph from a conjunctive graph and parse data into that.I have gotten further by adding
publicID=cg.default_context.identifier
to the parse invocation. This causes the TriG parser to behave properly (and it is easy to adapt the nquads parser to work from there on). But I am not sure if this is a wise solution to the problem.I'll mull more on this given time, but it would be good to have more people consider a proper revision of the parsing mechanism for datasets.
This underlies the problems described in #432 and #433 (and is related #428).
(Obviously, this in turn causes the serializers for the same formats to emit unexpected bnode-named graphs when data has been read through these parsers.)
The text was updated successfully, but these errors were encountered: