You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to import ConceptNet 5 into OrientDB despite my historical lack of luck with graph databases. I've distilled it down to what I think is its simplest form: a tab-separated CSV file of (predicate, start node, end node) triples. There's more I'd consider including, but making this work would be the first step.
Here's my ETL file (conceptnet-import.json), adapted from this StackOverflow question because none of the examples in your documentation take in simple lists of edges between a single type of node:
ConceptNet has no information in its nodes (terms), only in its edges (assertions).
I would expect that running ETL on this file would get me a simple graph of ConceptNet. Instead, it crashes:
$ ./oetl.sh conceptnet-import.json
OrientDB etl v.2.2.0 (build develop@r79d281140b01c0bc3b566a46a64f1573cb359783; 2016-05-18 14:14:32+0000) www.orientdb.com
[csv] INFO column types: {rel=STRING, start=STRING, end=STRING}
BEGIN ETL PROCESSOR
[file] INFO Reading from file /home/rspeer/conceptnet5/data/assertions/simple.csv with encoding UTF-8
Started execution with 1 worker threads
Error in Pipeline execution: com.orientechnologies.orient.core.storage.ORecordDuplicatedException: Cannot index record Term{rel:/r/Antonym,start:/c/en/abash/v,end:/c/en/reassure}: found duplicated key 'null' in index 'Term.uri' previously assigned to the record #17:0
Storage URL="plocal:/home/rspeer/conceptnet5/data/tmp/orient-conceptnet"INDEX=Term.uri RID=#17:0
+ extracted 503 rows (0 rows/sec) - 503 rows -> loaded 1 vertices (0 vertices/sec) Total time: 999ms [0 warnings, 1 errors]
+ extracted 503 rows (0 rows/sec) - 503 rows -> loaded 1 vertices (0 vertices/sec) Total time: 1999ms [0 warnings, 1 errors]
+ extracted 503 rows (0 rows/sec) - 503 rows -> loaded 1 vertices (0 vertices/sec) Total time: 2999ms [0 warnings, 1 errors]
+ extracted 503 rows (0 rows/sec) - 503 rows -> loaded 1 vertices (0 vertices/sec) Total time: 4s [0 warnings, 1 errors]
+ extracted 503 rows (0 rows/sec) - 503 rows -> loaded 1 vertices (0 vertices/sec) Total time: 5s [0 warnings, 1 errors]
Steps to reproduce the problem
Put the given tab-separated data in /home/rspeer/conceptnet5/data/assertions/simple.csv.
Save the above ETL file as conceptnet-import.json in the orientdb/bin directory (it can't find it if it's not in the same directory, it seems) and run:
./oetl.sh conceptnet-import.json
Important Questions
Runninng Mode
Embedded, using PLOCAL access mode
Embedded, using MEMORY access mode
Remote
Misc
I have a distributed setup with multiple servers. How many?
I'm using the Enterprise Edition
OrientDB Version
v2.0.x - Please specify last number:
v2.1.x - Please specify last number:
v2.2.x - Please specify last number: 0
Operating System
Linux
MacOSX
Windows
Other Unix
Other, name?
Java Version
6
7
8
The text was updated successfully, but these errors were encountered:
The message is clear: "found duplicated key 'null' in index 'Term.uri' previously assigned to the record #17:0". You aren't setting the uri field, so it's null and you cannot have multiple null because it's a UNIQUE index.
I intend to be setting the URI field. Of course I don't want it to be null. How should I set it?
If I've missed some documentation -- it seems strange that I would have to rely on StackOverflow answers for the case of loading a graph from a list of edges -- please point me to it.
Expected behavior and actual behavior
I'm trying to import ConceptNet 5 into OrientDB despite my historical lack of luck with graph databases. I've distilled it down to what I think is its simplest form: a tab-separated CSV file of (predicate, start node, end node) triples. There's more I'd consider including, but making this work would be the first step.
The file (
simple.csv
) looks like this:and so on.
Here's my ETL file (
conceptnet-import.json
), adapted from this StackOverflow question because none of the examples in your documentation take in simple lists of edges between a single type of node:ConceptNet has no information in its nodes (terms), only in its edges (assertions).
I would expect that running ETL on this file would get me a simple graph of ConceptNet. Instead, it crashes:
Steps to reproduce the problem
Put the given tab-separated data in
/home/rspeer/conceptnet5/data/assertions/simple.csv
.Save the above ETL file as
conceptnet-import.json
in theorientdb/bin
directory (it can't find it if it's not in the same directory, it seems) and run:Important Questions
Runninng Mode
Misc
OrientDB Version
Operating System
Java Version
The text was updated successfully, but these errors were encountered: