Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is possible to generate BlankNodes from data references? #271

Open
dachafra opened this issue Jul 13, 2022 · 10 comments
Open

Is possible to generate BlankNodes from data references? #271

dachafra opened this issue Jul 13, 2022 · 10 comments
Labels
Question Further information is requested

Comments

@dachafra
Copy link

The behavior should be similar to the one in RML:

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix ex: <http://example/> .
@prefix : <http://example.org/> .
@base <http://example.org/> .

:firstTM a rr:TriplesMap ;
    rml:logicalSource [
        rml:source "data.csv";
        rml:referenceFormulation ql:CSV
    ];
    rml:subjectMap [
        rml:reference "c1" ;
        rr:termType rr:BlankNode
    ];
    rr:predicateObjectMap [
        rr:predicate ex:p ;
        rml:objectMap [
            rr:template "http://example/{c2}"
        ]
    ] .

Input

c1,c2
b0,A

Output:

 _:b0 ex:p ex:A
@enridaga
Copy link
Member

You can just construct bnodes:

PREFIX ex: <http://example/> 
PREFIX fx:  <http://sparql.xyz/facade-x/ns/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>

CONSTRUCT {
 [] ex:p ?A
} WHERE {
 SERVICE <x-sparql-anything:> {
	fx:properties fx:location "./data.csv" ; fx:csv.headers true .
 	[] xyz:c2 ?A
 }
}

or, if you want to control the bnode identifier for some reason:

PREFIX ex: <http://example/> 
PREFIX fx:  <http://sparql.xyz/facade-x/ns/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>

CONSTRUCT {
 ?bnode ex:p ?A
} WHERE {
 SERVICE <x-sparql-anything:> {
	fx:properties fx:location "./data.csv" ; fx:csv.headers true .
 	[] xyz:c1 ?b0 ; xyz:c2 ?A
 }
 BIND ( BNODE ( ?b0 ) as ?bnode ) 
}

@enridaga enridaga added the Question Further information is requested label Jul 13, 2022
@dachafra
Copy link
Author

I've arrived at this point, yes, but you can not take the identifier of the BN from the input source, right?

@enridaga
Copy link
Member

I've arrived at this point, yes, but you can not take the identifier of the BN from the input source, right?

You can take it from there, as you see in the second query. I am not sure I get the use case here.
Do you mean that you want to keep blank node identifier in the generated graph?
The generated blank node ids depend on the serialiser. BNode identifiers are supposed to be local and are usually generated during serialisation or during data loading. So, what's the point of forcing them?
If you want to mint an identifier, you probably want an IRI instead. Am I getting it right?

@justin2004
Copy link
Contributor

justin2004 commented Jul 14, 2022

you could do this:

curl --silent 'http://localhost:3000/sparql.anything'  \
--header "Accept: text/csv" \
--data-urlencode 'query=
PREFIX  fx:   <http://sparql.xyz/facade-x/ns/>
SELECT  *
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/input.csv" ;
                  fx:csv.headers  true .
        ?s        ?p              ?o
        BIND(iri(?s) AS ?s_iri)
      }
  }
'

yielding:

s p o s_iri
_:b0 http://sparql\.xyz/facade\-x/data/c1 b0 _:file:/app/input.csv##row1
_:b0 http://sparql\.xyz/facade\-x/data/c2 A _:file:/app/input.csv##row1
_:b1 http://www\.w3\.org/1999/02/22\-rdf\-syntax\-ns\#type http://sparql\.xyz/facade\-x/ns/root _:file:/app/input.csv#
_:b1 http://www\.w3\.org/1999/02/22\-rdf\-syntax\-ns\#\_1 _:b0 _:file:/app/input.csv#

@justin2004
Copy link
Contributor

oh, i know what you want now.
one minute.

@justin2004
Copy link
Contributor

justin2004 commented Jul 14, 2022

it appears that apache jena does not let you synthesize a bnode identifier manually.
this is as close as i can get but neither quad is what you are looking for (one isn't a well formed quad and i'm not sure about the other).
though i think an actual IRI is what i would use in practice.

curl --silent 'http://localhost:3000/sparql.anything'  \
--header "Accept: application/n-quads" \
--data-urlencode 'query=
PREFIX  :     <http://example.com/>
PREFIX  xyz:  <http://sparql.xyz/facade-x/data/>
PREFIX  fx:   <http://sparql.xyz/facade-x/ns/>
CONSTRUCT 
  { 
    ?new_s_iri :p ?new_c2 .
    ?new_s_str :p ?new_c2 .
  }
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/input.csv" ;
                  fx:csv.headers  true .
        ?s        xyz:c1          ?c1 ;
                  xyz:c2          ?c2
        BIND(iri(concat("_:", ?c1)) AS ?new_s_iri)
        BIND(concat("_:", ?c1) AS ?new_s_str)
        BIND(iri(concat(str(:), ?c2)) AS ?new_c2)
      }
  }
'

yields:

"_:b0" <http://example.com/p> <http://example.com/A> .
<_:b0> <http://example.com/p> <http://example.com/A> .

@dachafra
Copy link
Author

@justin2004 yeah, exactly! I was able to obtain the same results, but I don't think that any of the results are valid RDF, right?

For letting you know, this is coming from this R2RML test-cases: https://www.w3.org/2001/sw/rdb2rdf/test-cases/#R2RMLTC0002b. It is not that I specifically want to have this feature in the engine but it is more for comparing both solutions. One of the main benefits of having this feature is that identifiers do not have to be maintained in memory during the execution.

@enridaga
Copy link
Member

I don't think it is possible to control the blank nodes that are generated by the serializer, but this is probably a question for users@jena.apache.org.

However, while playing with this use case I found an interesting issue when one wants to generate multiple triples with the same bnode on different construct template projections. At the moment, a new bnode is generated for every projection, even if we use the BNODE function. This is reproducible by adding more rows to the example CSV. A new bnode is created for each one of them. I will open a separate issue for that.

@justin2004
Copy link
Contributor

At the moment, a new bnode is generated for every projection, even if we use the BNODE function.

I thought I just wasn't understanding how to use bnode() with an argument but since you might have also expected different behavior I opened an issue:
https://issues.apache.org/jira/browse/JENA-2340

@enridaga
Copy link
Member

For letting you know, this is coming from this R2RML test-cases: https://www.w3.org/2001/sw/rdb2rdf/test-cases/#R2RMLTC0002b. It is not that I specifically want to have this feature in the engine but it is more for comparing both solutions.

Considering they are bnodes, the comparison can be done via graph isomorphism (there are some useful utils for this in Jena).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants