-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle optional keys in JSON files #476
Comments
hey @Max-Bld , PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <https://example.com/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
CONSTRUCT
{
?business rdf:type ex:BusinessPlace .
?business ex:openingHours ?opening_hours .
?opening_hours ex:Monday ?monday .
?opening_hours ex:Thursday ?thursday .
?opening_hours ex:Wednesday ?wednesday .
?opening_hours ex:Tuesday ?tuesday .
?opening_hours ex:Friday ?friday .
?opening_hours ex:Saturday ?saturday .
?opening_hours ex:Sunday ?sunday .
}
WHERE
{ SERVICE <x-sparql-anything:location=/app/yelp.json>
{ ?root rdf:type fx:root ;
fx:anySlot ?slot .
?slot xyz:business_id ?business_id
BIND(iri(concat(str(ex:), "business/", encode_for_uri(?business_id))) AS ?business)
OPTIONAL
{ ?slot xyz:hours ?hours
BIND(bnode() AS ?opening_hours)
OPTIONAL
{ ?hours xyz:Monday ?monday }
OPTIONAL
{ ?hours xyz:Thursday ?thursday }
OPTIONAL
{ ?hours xyz:Wednesday ?wednesday }
OPTIONAL
{ ?hours xyz:Tuesday ?tuesday }
OPTIONAL
{ ?hours xyz:Friday ?friday }
OPTIONAL
{ ?hours xyz:Saturday ?saturday }
OPTIONAL
{ ?hours xyz:Sunday ?sunday }
}
}
} which produces: @prefix ex: <https://example.com/> .
@prefix fx: <http://sparql.xyz/facade-x/ns/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xyz: <http://sparql.xyz/facade-x/data/> .
<https://example.com/business/mpf3x-BjTdTEA3yCZrAYPw>
rdf:type ex:BusinessPlace ;
ex:openingHours [ ex:Friday "8:0-18:30" ;
ex:Monday "0:0-0:0" ;
ex:Saturday "8:0-14:0" ;
ex:Thursday "8:0-18:30" ;
ex:Tuesday "8:0-18:30" ;
ex:Wednesday "8:0-18:30"
] .
<https://example.com/business/Pns2l4eNsfO8kk83dixA6A>
rdf:type ex:BusinessPlace .
<https://example.com/business/tUFrWirKiKi_TAnsVWINQQ>
rdf:type ex:BusinessPlace ;
ex:openingHours [ ex:Friday "8:0-23:0" ;
ex:Monday "8:0-22:0" ;
ex:Saturday "8:0-23:0" ;
ex:Sunday "8:0-22:0" ;
ex:Thursday "8:0-22:0" ;
ex:Tuesday "8:0-22:0" ;
ex:Wednesday "8:0-22:0"
] .
<https://example.com/business/qkRM_2X51Yqxk3btlwAQIg>
rdf:type ex:BusinessPlace .
|
btw, thanks for including the input data, the query, and the output data. makes it easier to jump in and help. |
Thank you very much for the answer, I will try it soon. |
Hello, I open this thread again because I come back often to this problem. Most of the time, I have a voluminous .json file and I want to align it to an ontology, and want to convert its content into RDF. I do my mapping file with SPARQL Anything, which I find really nice, but here is my problem: only those instances which possess all the properties mentioned in my mapping file are actually converted to RDF, all the others where just a single property misses, they are not taken into account. The only solution is to enclose all triples into OPTIONAL clause, but this is painful to do when you have a complex mapping file. Furthermore, it seems like it consumes more resources to add this OPTIONAL clause. Is there any other solution other than the OPTIONAL clause? Best, |
i'll be interested in what @enridaga and @luigi-asprino say but one approach i have taken for this kind of thing is to make multiple .rq (sparql construct query files) files such that i don't need to put OPTIONAL clauses in them. this allow me to run the queries in parallel too. |
Thank you for your interesting suggestion. I don't see how to link the optional properties to their corresponding objects since I generate these lasts' ids with the STRUUID() function. For now, I am looking to RML solution which seem to be natively compatible with optional properties. |
Oh, do you not have a way to get a stable identifier from which to mint IRIs? That is essential or your can't do multiple different .rq files.
Are these fields you are trying to pluck out of the source file all the at the same depth? If they are at different depths you can nest your OPTIONALs like this.
How big is your source json file? If it is really big you would probably benefit from slicing the file and operating on the slices. |
Thank you again for your help. I took a look at your .rq files and got inspired by them by "flattening" mine, making easier to turn the triples I want optional. It is very straightforward (I put each triple in optional brackets), but it might be improvable from a performance standpoint.
You are right, I generate STRUUID() only for a couple of blank nodes. But for the object identifier, I reuse the identifier present in the original file, so this is solved.
Both cases are appearing.
I see that your SPARQL-Anything mapping files are relatively flat compared to mine, which try to spouse the nested JSON form, I am trying to do something similar to your, and it seems that this syntax solved the problem.
It is data I receive from a paginated REST API, so I have 23 JSON files of about 170 KiB each. Then I use pysparql_anything to process them in batch. JSON and .rq files For the sake of clarity, here is an excerpt of the JSON file I am trying to convert to RDF, my old .rq file, and the new one I got inspired from justin2004.
[
{
"url":"https://acceslibre.beta.gouv.fr/api/erps/tabac-epicerie-de-nouan/",
"web_url":"https://acceslibre.beta.gouv.fr/app/41-saint-laurent-nouan/a/restauration-rapide/erp/tabac-epicerie-de-nouan/",
"uuid":"2643996a-105f-474e-a801-75e3151adacc",
"activite":{
"nom":"Restauration rapide",
"slug":"restauration-rapide"
},
"nom":"Tabac & \u00c9picerie de Nouan",
"slug":"tabac-epicerie-de-nouan",
"adresse":"53 Rue Nationale 41220 Saint-Laurent-Nouan",
"commune":"Saint-Laurent-Nouan",
"code_insee":"41220",
"code_postal":"41220",
"geom":{
"type":"Point",
"coordinates":[
1.560289,
47.685245
]
},
"ban_id":"41220_0100_00053",
"siret":null,
"telephone":null,
"site_internet":"https://lepiceriedenouan.eatbu.com/",
"contact_email":null,
"contact_url":null,
"user_type":"system",
"accessibilite":{
"url":"https://acceslibre.beta.gouv.fr/api/accessibilite/653858/",
"erp":"https://acceslibre.beta.gouv.fr/api/erps/tabac-epicerie-de-nouan/",
"transport":{
"stationnement_ext_presence":true,
"stationnement_ext_pmr":true
},
"entree":{
"entree_porte_presence":true,
"entree_plain_pied":true,
"entree_largeur_mini":80
}
},
"distance":null,
"source_id":"ChIJwfctxXG85EcRMQ3QQYrulv0",
"asp_id":null,
"updated_at":"2025-01-07T03:47:24.217200+01:00",
"created_at":"2025-01-07T03:47:24.217182+01:00",
"published":true,
"sources":[
{
"id":1176950,
"source":"outscraper",
"source_id":"ChIJwfctxXG85EcRMQ3QQYrulv0"
}
]
},
{
"url":"https://acceslibre.beta.gouv.fr/api/erps/le-bouftard/",
"web_url":"https://acceslibre.beta.gouv.fr/app/41-vendome/a/hotel-restaurant/erp/le-bouftard/",
"uuid":"766f0960-25b3-4fa0-975d-6e7ecef05e7b",
"activite":{
"nom":"H\u00f4tel restaurant",
"slug":"hotel-restaurant"
},
"nom":"Le Bouf'Tard",
"slug":"le-bouftard",
"adresse":"40 Rue du 20\u00e8me Chasseurs 41100 Vend\u00f4me",
"commune":"Vend\u00f4me",
"code_insee":null,
"code_postal":"41100",
"geom":{
"type":"Point",
"coordinates":[
1.063163,
47.803184
]
},
"ban_id":"41269_1850_00040",
"siret":null,
"telephone":"02.54.73.17.12",
"site_internet":"https://le-bouftard.fr/",
"contact_email":"le.bouftard@orange.fr",
"contact_url":null,
"user_type":"public",
"accessibilite":{
"url":"https://acceslibre.beta.gouv.fr/api/accessibilite/645526/",
"erp":"https://acceslibre.beta.gouv.fr/api/erps/le-bouftard/",
"transport":{
"transport_station_presence":true,
"transport_information":"BUS MOVE LIGNES C, F, H, J, M : ARRET JEAN EMOND",
"stationnement_presence":true,
"stationnement_pmr":false,
"stationnement_ext_presence":true,
"stationnement_ext_pmr":false
},
"cheminement_ext":{
"cheminement_ext_presence":false
},
"entree":{
"entree_reperage":true,
"entree_porte_presence":true,
"entree_porte_manoeuvre":"battante",
"entree_porte_type":"manuelle",
"entree_vitree":true,
"entree_vitree_vitrophanie":true,
"entree_plain_pied":true,
"entree_dispositif_appel":false,
"entree_largeur_mini":80,
"entree_pmr":true,
"entree_pmr_informations":"Les PMR peuvent entrer par l'entr\u00e9e de parking priv\u00e9 qui donne sur les chambres"
},
"accueil":{
"accueil_visibilite":true,
"accueil_cheminement_plain_pied":true
},
"commentaire":{
"commentaire":"Un local ferm\u00e9 est \u00e0 disposition pour les v\u00e9los ou deux roues motoris\u00e9s.\r\nL'h\u00f4tel dispose d'un acc\u00e8s PMR pour entrer."
}
},
"distance":null,
"source_id":null,
"asp_id":"",
"updated_at":"2024-12-06T14:12:43.646997+01:00",
"created_at":"2024-12-06T14:00:16.012419+01:00",
"published":true,
"sources":[
]
}
]
prefix fx: <http://sparql.xyz/facade-x/ns/>
prefix xyz: <http://sparql.xyz/facade-x/data/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix schema: <http://schema.org/>
prefix datatourisme: <https://www.datatourisme.fr/ontology/core#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix kb: <https://www.datatourisme.fr/resource/core#>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix acceslibre: <https://acceslibre.beta.gouv.fr/ontology#>
CONSTRUCT {
?id a ?class ;
rdfs:label ?nom ;
datatourisme:isLocatedAt ?place ;
acceslibre:hasAccessibility ?accessibility.
?place a datatourisme:Place ;
schema:address ?address ;
schema:geo ?geo .
?geo a schema:GeoCoordinates ;
schema:latitude ?lat ;
schema:longitude ?lon ;
datatourisme:latlon ?latlon.
?address a schema:PostalAddress, datatourisme:PostalAddress ;
schema:addressLocality ?commune ;
schema:postalCode ?codePostal ;
schema:streetAddress ?adresse.
?contact a foaf:Agent, datatourisme:Agent ;
foaf:homepage ?url ;
schema:telephone ?telephone.
?description a datatourisme:Description ;
dc:description ?histoire ;
datatourisme:shortDescription ?atout;
owl:topDataProperty ?interet.
?accessibility a acceslibre:Accessibility ;
acceslibre:parkingInTheFacility ?stationnement_presence ;
acceslibre:parkingNearTheFacility ?stationnement_ext_presence ;
acceslibre:adaptedParkingCloseToTheFacility ?stationnement_ext_pmr ;
acceslibre:accessAdditionalInformation ?transport_information ;
acceslibre:easilyIdentifiableEntrance ?entree_reperage ;
acceslibre:hasADoor ?entree_porte_presence ;
acceslibre:glazedEntrance ?entree_vitree ;
acceslibre:outsidePath ?cheminement_ext_presence ;
acceslibre:visibleReceptionArea ?accueil_visibilite;
acceslibre:pathwayBetweenEntranceAndReception ?accueil_cheminement_plain_pied
}
WHERE {
SERVICE <x-sparql-anything:> {
fx:properties
fx:location "./data/acceslibre/results-acceslibre-0.json" ;
fx:media-type "application/json".
[] fx:anySlot [
xyz:uuid ?uuid ;
xyz:nom ?nom ;
xyz:activite [
xyz:nom ?nomActivite
] ;
xyz:adresse ?adresse ;
xyz:commune ?commune ;
xyz:code_postal ?codePostal ;
xyz:geom [
xyz:coordinates [ rdf:_1 ?lon; rdf:_2 ?lat]
] ;
xyz:site_internet ?url ;
xyz:telephone ?telephone ;
xyz:accessibilite [
xyz:transport [
xyz:stationnement_presence ?stationnement_presence ;
xyz:stationnement_ext_presence ?stationnement_ext_presence ;
xyz:stationnement_ext_pmr ?stationnement_ext_pmr ;
xyz:transport_information ?transport_information
] ;
xyz:entree [
xyz:entree_porte_presence ?entree_porte_presence ;
xyz:entree_reperage ?entree_reperage ;
xyz:entree_vitree ?entree_vitree
] ;
xyz:cheminement_ext [
xyz:cheminement_ext_presence ?cheminement_ext_presence
] ;
xyz:accueil [
xyz:accueil_visibilite ?accueil_visibilite ;
xyz:accueil_cheminement_plain_pied ?accueil_cheminement_plain_pied
]
]
]
}
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", ?uuid)) AS ?id)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?place)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?accessibility)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?address)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?geo)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?contact)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?description)
BIND (
COALESCE(
IF(?nomActivite = "Restauration Rapide", IRI("https://www.datatourisme.fr/ontology/core#FastFoodRestaurant"), 1/0),
IF(?nomActivite = "Droguerie", IRI("https://www.datatourisme.fr/ontology/core#Store"), 1/0),
IF(?nomActivite = "Office du tourisme", IRI("https://www.datatourisme.fr/ontology/core#LocalTouristOffice"), 1/0),
IF(?nomActivite = "Cimetière", IRI("https://www.datatourisme.fr/ontology/core#RemembranceSite"), 1/0),
IF((?nomActivite = "Ost\u00e9opathie" || ?nomActivite = "Herboristerie naturopathie"), IRI("https://www.datatourisme.fr/ontology/core#HealthcareProfessional"), 1/0),
IF(?nomActivite = "Caf\u00e9, bar, brasserie", IRI("https://www.datatourisme.fr/ontology/core#BarOrPub"), 1/0),
IF(?nomActivite = "Bien-être", IRI("https://www.datatourisme.fr/ontology/core#ServiceProvider"), 1/0),
IF(?nomActivite = "Hôtel restaurant", IRI("https://www.datatourisme.fr/ontology/core#HotelRestaurant"), 1/0),
IF(?nomActivite = "Hôtel", IRI("https://www.datatourisme.fr/ontology/core#Hotel"), 1/0),
IF(?nomActivite = "Bijouterie joaillerie", IRI("https://www.datatourisme.fr/ontology/core#CraftsmanShop"), 1/0),
IRI("https://www.datatourisme.fr/ontology/core#PointOfInterest")
) AS ?class
)
BIND(STRDT(CONCAT(STR(?lat), "#", STR(?lon)), <http://www.bigdata.com/rdf/geospatial/literals/v1#lat-lon>) AS ?latlon)
}
prefix fx: <http://sparql.xyz/facade-x/ns/>
prefix xyz: <http://sparql.xyz/facade-x/data/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix schema: <http://schema.org/>
prefix datatourisme: <https://www.datatourisme.fr/ontology/core#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix kb: <https://www.datatourisme.fr/resource/core#>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix acceslibre: <https://acceslibre.beta.gouv.fr/ontology#>
CONSTRUCT {
?id a ?class ;
rdfs:label ?nom ;
datatourisme:isLocatedAt ?place ;
acceslibre:hasAccessibility ?accessibility.
?place a datatourisme:Place ;
schema:address ?address ;
schema:geo ?geo .
?address a schema:PostalAddress, datatourisme:PostalAddress ;
schema:addressLocality ?commune ;
schema:postalCode ?codePostal ;
schema:streetAddress ?adresse.
?contact a foaf:Agent, datatourisme:Agent ;
foaf:homepage ?url ;
schema:telephone ?telephone.
?description a datatourisme:Description ;
dc:description ?histoire ;
datatourisme:shortDescription ?atout;
owl:topDataProperty ?interet.
?accessibility a acceslibre:Accessibility ;
acceslibre:parkingInTheFacility ?stationnement_presence ;
acceslibre:parkingNearTheFacility ?stationnement_ext_presence ;
acceslibre:adaptedParkingCloseToTheFacility ?stationnement_ext_pmr ;
acceslibre:accessAdditionalInformation ?transport_information ;
acceslibre:easilyIdentifiableEntrance ?entree_reperage ;
acceslibre:hasADoor ?entree_porte_presence ;
acceslibre:glazedEntrance ?entree_vitree ;
acceslibre:outsidePath ?cheminement_ext_presence ;
acceslibre:visibleReceptionArea ?accueil_visibilite;
acceslibre:pathwayBetweenEntranceAndReception ?accueil_cheminement_plain_pied
}
WHERE {
SERVICE <x-sparql-anything:> {
fx:properties
fx:location "./data/acceslibre/results-acceslibre-0.json" ;
fx:media-type "application/json".
?s fx:anySlot ?o .
optional { ?o xyz:uuid ?uuid }
optional { ?o xyz:nom ?nom }
optional { ?o xyz:activite [ xyz:nom ?nomActivite ] }
optional { ?o xyz:adresse ?adresse }
optional { ?o xyz:commune ?commune }
optional { ?o xyz:code_postal ?codePostal }
optional { ?o xyz:geom [ xyz:coordinates [ rdf:_1 ?lon; rdf:_2 ?lat] ] }
optional { ?o xyz:site_internet ?url }
optional { ?o xyz:telephone ?telephone }
optional { ?o xyz:accessibilite ?accessibilite }
optional { ?accessibilite xyz:transport [ xyz:stationnement_presence ?stationnement_presence] }
optional { ?accessibilite xyz:transport [ xyz:stationnement_ext_presence ?stationnement_ext_presence] }
optional { ?accessibilite xyz:transport [ xyz:stationnement_ext_pmr ?stationnement_ext_pmr] }
optional { ?accessibilite xyz:entree [ xyz:entree_porte_presence ?entree_porte_presence] }
optional { ?accessibilite xyz:entree [ xyz:entree_reperage ?entree_reperage] }
optional { ?accessibilite xyz:entree [ xyz:entree_vitree ?entree_vitree] }
optional { ?accessibilite xyz:cheminement_ext [ xyz:cheminement_ext_presence ?cheminement_ext_presence] }
optional { ?accessibilite xyz:accueil [ xyz:accueil_visibilite ?accueil_visibilite] }
optional { ?accessibilite xyz:accueil [ xyz:accueil_cheminement_plain_pied ?accueil_cheminement_plain_pied] }
}
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", ?uuid)) AS ?id)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?place)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?accessibility)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?address)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?geo)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?contact)
BIND(IRI(CONCAT("https://www.datatourisme.fr/ontology/core#", STRUUID())) AS ?description)
BIND (
COALESCE(
IF(?nomActivite = "Restauration Rapide", IRI("https://www.datatourisme.fr/ontology/core#FastFoodRestaurant"), 1/0),
IF(?nomActivite = "Droguerie", IRI("https://www.datatourisme.fr/ontology/core#Store"), 1/0),
IF(?nomActivite = "Office du tourisme", IRI("https://www.datatourisme.fr/ontology/core#LocalTouristOffice"), 1/0),
IF(?nomActivite = "Cimetière", IRI("https://www.datatourisme.fr/ontology/core#RemembranceSite"), 1/0),
IF((?nomActivite = "Ost\u00e9opathie" || ?nomActivite = "Herboristerie naturopathie"), IRI("https://www.datatourisme.fr/ontology/core#HealthcareProfessional"), 1/0),
IF(?nomActivite = "Caf\u00e9, bar, brasserie", IRI("https://www.datatourisme.fr/ontology/core#BarOrPub"), 1/0),
IF(?nomActivite = "Bien-être", IRI("https://www.datatourisme.fr/ontology/core#ServiceProvider"), 1/0),
IF(?nomActivite = "Hôtel restaurant", IRI("https://www.datatourisme.fr/ontology/core#HotelRestaurant"), 1/0),
IF(?nomActivite = "Hôtel", IRI("https://www.datatourisme.fr/ontology/core#Hotel"), 1/0),
IF(?nomActivite = "Bijouterie joaillerie", IRI("https://www.datatourisme.fr/ontology/core#CraftsmanShop"), 1/0),
IRI("https://www.datatourisme.fr/ontology/core#PointOfInterest")
) AS ?class
)
} If you need some explanation or you have suggestions, don't hesitate. |
you can nest also like this:
and i think you can delete
how is the query performance? acceptable? |
Hello,
I would like to triplify the Yelp business dataset. But I face this issue: it is composed of JSON objects containing optional keys which are sometimes missing or incomplete when they contain themselves other optional keys. The problem is that SPARQL-Anything skips any JSON object with a missing or incomplete key.
For example, I would like to construct a knowledge graph with the businessID and its opening hours during the week.
This is a sample with four different JSON object cases with the optional key "hours" containing a JSON object with a key for each day:
This is my construct query:
The result is that only the JSON object case 3 with the optional key "hours" and all its sub-keys filled is constructed:
Is there a way to handle optional or missing keys in JSON files so that all the cases mentioned above are included in the constructed knowledge graph?
Thanks in advance.
The text was updated successfully, but these errors were encountered: