You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem has appeared after merging nodes in wikibase, when translating wikibase into neo4j graph we lost crossreferences edges ('exact match' statements) due to missing nodes in neo4j. So if we want to translate neo4j into wikibase using neo4j_to_wd, we will lose this information also in wikibase, then from the graph definitely.
To ensure neo4j-wikibase graph consistency and to avoid missing information, all statements should be formed by properties of datatype item, and those properties of datatype 'not-item' should be translated into neo4j as node attributes.
THE PROBLEM IN DETAIL
We came across this issue merging first in wikibase all three 'NGLY1-deficiency' disease nodes into one, then this caused the following import into Neo4j to fail. Specifically, we did in this order:
Merged nodes in wikibase: On Wikibase frontpage, via the special page 'merge two items', we merged first (Item:Q8226|NGLY1-deficiency (OMIM:615273)) into (Item:Q6420|NGLY1-deficiency (MONDO:0014109)), then (Item:Q183|NGLY1-deficiency (DOID:0060728)) into (Item:Q6420|NGLY1-deficiency (MONDO:0014109)).
Dumped the graph from Wikibase: Using Krusty:wd_to_neo4j.py we dumped wikibase graph into CSV files:
The import error message:
MONDO:0014109 (global id space)-[skos:exactMatch]->OMIM:615273 (global id space) referring to missing node OMIM:615273
MONDO:0014109 (global id space)-[skos:exactMatch]->OMIM:615273 (global id space) referring to missing node OMIM:615273
MONDO:0014109 (global id space)-[skos:exactMatch]->DOID:0060728 (global id space) referring to missing node DOID:0060728
The currently temporary solution is doing the import using relaxing tags ignore-missing-nodesignore-duplicate-nodes:
NOTE: the dump-import parts 2 and 3 are executed through CRON jobs in both servers.
The text was updated successfully, but these errors were encountered:
NuriaQueralt
changed the title
[wd_to_neo4j] 'exact match' property of datatype string turn out to missing nodes
[wd_to_neo4j] 'exact match' property of datatype string turn out to missing nodes after merging
Jul 18, 2019
NuriaQueralt
changed the title
[wd_to_neo4j] 'exact match' property of datatype string turn out to missing nodes after merging
[wd_to_neo4j] after merging nodes in wikibase, 'exact match' property of datatype string turns out to missing nodes in Neo4j
Jul 18, 2019
NuriaQueralt
changed the title
[wd_to_neo4j] after merging nodes in wikibase, 'exact match' property of datatype string turns out to missing nodes in Neo4j
[wd_to_neo4j] after merging nodes in wikibase, 'exact match' property of datatype string disappears in Neo4j
Jul 18, 2019
NuriaQueralt
changed the title
[wd_to_neo4j] after merging nodes in wikibase, 'exact match' property of datatype string disappears in Neo4j
[wd_to_neo4j] 'exact match' property of datatype string disappears in Neo4j
Jul 18, 2019
My opinion is that this is a bug (albeit a minor one). If those skos:exactMatch statements are dropped from the import into neo4j, then it would fail to reproduce exactly the same data on the round-trip export from neo4j to wikibase.
My tentative guess at a solution is the following:
The problem has appeared after merging nodes in wikibase, when translating wikibase into neo4j graph we lost crossreferences edges ('exact match' statements) due to missing nodes in neo4j. So if we want to translate neo4j into wikibase using neo4j_to_wd, we will lose this information also in wikibase, then from the graph definitely.
To ensure neo4j-wikibase graph consistency and to avoid missing information, all statements should be formed by properties of datatype item, and those properties of datatype 'not-item' should be translated into neo4j as node attributes.
THE PROBLEM IN DETAIL
We came across this issue merging first in wikibase all three 'NGLY1-deficiency' disease nodes into one, then this caused the following import into Neo4j to fail. Specifically, we did in this order:
Merged nodes in wikibase: On Wikibase frontpage, via the special page 'merge two items', we merged first (Item:Q8226|NGLY1-deficiency (OMIM:615273)) into (Item:Q6420|NGLY1-deficiency (MONDO:0014109)), then (Item:Q183|NGLY1-deficiency (DOID:0060728)) into (Item:Q6420|NGLY1-deficiency (MONDO:0014109)).
Dumped the graph from Wikibase: Using Krusty:wd_to_neo4j.py we dumped wikibase graph into CSV files:
$ python3 Krusty/wd_to_neo4j.py --mediawiki_api_url http://ngly1graph.org:8181/w/api.php --sparql_endpoint_url http://ngly1graph.org:8282/proxy/wdqs/bigdata/namespace/wdq/sparql --node-out-path neo4j/import/concepts.csv --edge-out-path neo4j/import/statements.csv
$ neo4j/bin/neo4j-admin import --id-type string --nodes neo4j/import/concepts.csv --relationships neo4j/import/statements.csv
The import error message:
MONDO:0014109 (global id space)-[skos:exactMatch]->OMIM:615273 (global id space) referring to missing node OMIM:615273
MONDO:0014109 (global id space)-[skos:exactMatch]->OMIM:615273 (global id space) referring to missing node OMIM:615273
MONDO:0014109 (global id space)-[skos:exactMatch]->DOID:0060728 (global id space) referring to missing node DOID:0060728
The currently temporary solution is doing the import using relaxing tags
ignore-missing-nodes
ignore-duplicate-nodes
:$ neo4j/bin/neo4j-admin import --id-type string --nodes neo4j/import/concepts.csv --relationships neo4j/import/statements.csv --ignore-missing-nodes --ignore-duplicate-nodes
NOTE: the dump-import parts 2 and 3 are executed through CRON jobs in both servers.
The text was updated successfully, but these errors were encountered: