Working with JSON LD

Converting JSON-LD to One Object Per Line

Download and install jq from http://stedolan.github.io/jq/
To convert JSON Array to one json object per line from command line

   jq ".[]" -c <filepath> > <outputfilepath>

Merging JSON-LD Files

Sometimes you model multiple sources and produce multiple JSON-LD files, and then you want to merge the JSON-LD files into a single file. There are two cases:

Reducing: this involves combining top-level JSON-LD objects by URI. The reducer is smart so it first combines objects at the top level, and then proceeds recursively to combine objects at all levels of the tree.
Joining: Need to provide an example.

Reducing JSON-LD

Joining JSON-LD

To do joins of JSON-LD files you need to set up Hadoop and Hive on your machine, and then you run a script to join your files.

Hadoop and Hive Setup

http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
http://www.apache.org/dyn/closer.cgi/hive/

After you download the files, unpack them then copy them to a safe place. My recommendation on a Mac is that you put them in /usr/local. My setup looks as follows:

szeke (2):local szekely> pwd
/usr/local
szeke (2):local szekely> ls -1
apache-hive-1.1.0-bin
hadoop-2.6.0
# and many other files

To make things convenient, I put the following in my ~/.profile, which makes it convenient to run the hive command later.

# For HIVE
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export HIVE_HOME=/usr/local/apache-hive-1.1.0-bin
export PATH=${HIVE_HOME}/bin:$PATH

export HADOOP_HEAPSIZE=4096
export HADOOP_CLIENT_OPTS=-Xmx4196m

Download and setup `hive-join-example`

Download hive-join-example.zip and unpack the file, which will create a folder with the following files:

szeke (2):hive-join-example szekely> ls -1
derby.log
lib
merged
metastore_db
scripts
source
target
szeke (2):hive-join-example szekely>

Build karma-mr-0.0.1-SNAPSHOT-shaded.jar

You need to create karma-mr-0.0.1-SNAPSHOT-shaded.jar using the following command in your Web-Karma directory:

mvn package -P shaded,cloudera

karma-mr-0.0.1-SNAPSHOT-shaded.jar is created in Web-Karma/karma-mr/target. Once you have karma-mr-0.0.1-SNAPSHOT-shaded.jar, copy it to the lib folder of your hive-join-example.

Define the target

The target is the file that will receive the JSON objects from the source. The target is a text file that contains one JSON object per line. If you are starting with a file that contains a JSON array of objects, you first need to convert it to one JSON object per line (see Converting JSON-LD to One Object Per Line in this page).

Once you have your target file, move it to the target folder.

Make sure that your target folder only has the files you want to join into as hive will use all the files you have there.

Define the source

The source is the file that contains the JSON objects that you want to insert into the target. Like the target, the source must be a text file that contains one JSON object per line.

Move your source files into the source folder.

Define `JSON_PATH_TO_MERGE`

Define `JSON_PATH_FOR_MERGE_URIS`

Define `ATID`

Run the `do-everything.sh` script

Explain how to run and where the results will be.

Loading JSON-LD into Elastic Search

git clone https://github.com/usc-isi-i2/dig-elasticsearch.git
Change directory to types/webpage/scripts
Type python loadDataElasticSearch.py -h. This will provide help for the script as below

   usage: loadDataElasticSearch.py [-h] [-hostname HOSTNAME] [-port PORT]
                                   [-mappingFilePath MAPPINGFILEPATH] dataFileType
                                filepath indexname doctype

   positional arguments:
      filepath            json file to be loaded in ElasticSearch
      indexname           desired name of the index in ElasticSearch
      doctype             type of the document to be indexed
      dataFileType        Specify '0' if every line in the data file is
                          different json object or '1' otherwise

   optional arguments:
      -h, --help                       show this help message and exit
      -hostname HOSTNAME               Elastic Search Server hostname, defaults to 'localhost'
      -port PORT                       Elastic Search Server port,defaults to 9200
      -mappingFilePath MAPPINGFILEPATH mapping/setting file for the index

d. Execute:

python loadDataElasticSearch.py <filepath> <index-name> WebPage

If you don't have Elastic Search please download it from https://www.elastic.co/products/elasticsearch and follow the installation instructions.

Home

Installation: One Click Install

Installation: Source Code

Working with Geospatial Data

Publishing Data

Batch Mode

Working with JSON-LD

Working with Karma using Map/Reduce

Karma As A Service

Auxiliary Services

Coding Advice and Standards

TroubleShooting

Working with GitHub

Quick Start Guide for Developers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working with JSON LD

Converting JSON-LD to One Object Per Line

Merging JSON-LD Files

Reducing JSON-LD

Joining JSON-LD

Hadoop and Hive Setup

Download and setup `hive-join-example`

Build karma-mr-0.0.1-SNAPSHOT-shaded.jar

Define the target

Define the source

Define `JSON_PATH_TO_MERGE`

Define `JSON_PATH_FOR_MERGE_URIS`

Define `ATID`

Run the `do-everything.sh` script

Loading JSON-LD into Elastic Search

Clone this wiki locally

Working with JSON LD

Converting JSON-LD to One Object Per Line

Merging JSON-LD Files

Reducing JSON-LD

Joining JSON-LD

Hadoop and Hive Setup

Download and setup hive-join-example

Build karma-mr-0.0.1-SNAPSHOT-shaded.jar

Define the target

Define the source

Define JSON_PATH_TO_MERGE

Define JSON_PATH_FOR_MERGE_URIS

Define ATID

Run the do-everything.sh script

Loading JSON-LD into Elastic Search

Clone this wiki locally

Download and setup `hive-join-example`

Define `JSON_PATH_TO_MERGE`

Define `JSON_PATH_FOR_MERGE_URIS`

Define `ATID`

Run the `do-everything.sh` script