-
Notifications
You must be signed in to change notification settings - Fork 173
Bulk_Data_Load
DataLoader utility may be used to create and/or load RDF data into a local database instance. Directories will be recursively processed. The data files may be compressed using zip or gzip, but the loader does not support multiple data files within a single archive.
Command line:
java -cp *:*.jar com.bigdata.rdf.store.DataLoader [-quiet][-closure][-verbose][-namespace namespace] propertyFile (fileOrDir)*
parameter | definition |
---|---|
-quiet | Suppress all stdout messages. |
-verbose | Show additional messages detailing the load performance. |
-closure | Compute the RDF(S)+ closure. |
-namespace | The namespace of the KB instance. |
propertyFile | The configuration file for the database instance. |
fileOrDir | Zero or more files or directories containing the data to be loaded. |
Examples:
1. Load all files from /opt/data/upload/ directory using /opt/data/upload/journal.properties properties file:
java -cp *:*.jar com.bigdata.rdf.store.DataLoader /opt/data/upload/journal.properties /opt/data/upload/
2. Load an archive /opt/data/data.nt.gz using /opt/data/upload/journal.properties properties file into a specified namespace:
java -cp *:*.jar com.bigdata.rdf.store.DataLoader -namespace someNameSpace /opt/data/upload/journal.properties /opt/data/data.nt.gz
If you are loading data with an enabled inferencing, then a temporary file will be created to compute the delta in entailments. The temporary file could grow extremely in case of loading a large data set. It may cause "no space left on device" error and, as a consequence, the data loading process will be interrupted. To avoid such a situation, it is strongly recommended to specify the DataLoader.Options.CLOSURE property as ClosureEnum.None in the properties file:
com.bigdata.rdf.store.DataLoader.closure=None
You may need to specify Java heap size to match data size. In most cases 6G will be enough (add java parameter: -Xmx6g). Also beware of setting more than 8G heap due to garbage collector pressure.
Then load the data using the DataLoader and pass it the -closure option:
java -Xmx6g -cp *:*.jar com.bigdata.rdf.store.DataLoader -closure /opt/data/upload/journal.properties /opt/data/upload/
The DataLoader will not do incremental truth maintenance during the load. Once the load is complete it will compute all entailments. This will be the "database-at-once" closure and will not use a temporary store to compute the delta in entailments. Thus the temporary store will not "eat your disk".