Skip to content
climbage edited this page Mar 19, 2013 · 4 revisions

The JSON Utilities provide helper methods for developers using Esri JSON file formats in MapReduce applications.

This library contains:

  • InputFormats - These are classes that Hadoop uses to decide how input data is split between mappers.
  • RecordReaders - These are classes that define how a data-split is broken into records (Key/Value pairs). RecordReaders are used in conjunction with the InputFormats.
  • EsriFeatureClass - This class provides a data structure that represents a set of geometries and associated attributes that can be directly constructed from the Enclosed JSON file format.

InputFormats and RecordReaders

InputFormats and RecordReaders separate data into splits and records that can be distributed across multiple mappers and map tasks. Each split is given to a mapper which then loops through each record in the split and calls map<K,V>() on it. K is the key associated with a record and V is the value. For our formats, this key is the character offset from the start of the file to the start of the record in the file, and the value is the JSON text of the record. Don't focus too much on the K as we are really only interested in the V.

Take this small dataset of two Unenclosed JSON files that live in the Hadoop file system (HDFS). Each file has a couple of records that contain the U.S. state and a geometry representing the state's boundary.

  • /path/to/data/
  • data-1.json
    { "attributes" : { "Name" : "California" }, "geometry" : ... }
    { "attributes" : { "Name" : "Arizona" }, "geometry" : ... }
  • data-2.json
    { "attributes" : { "Name" : "Utah" }, "geometry" : ... }
    { "attributes" : { "Name" : "Colorado" }, "geometry" : ... }

For this dataset, we will use UnenclosedJsonInputFormat and UnenclosedJsonRecordReader. The record reader is created for each input split by the input format under the covers.

In this case, each mapper will receive an entire .json file.

Mapper 1 (data-1.json)

  1. Record 1
  • Key 0
  • Value { "attributes" : { "Name" : "California" }, "geometry" : ... }
  1. Record 2
  • Key 62
  • Value { "attributes" : { "Name" : "Arizona" }, "geometry" : ... }

Mapper 2 (data-2.json)

  1. Record 1
  • Key 0
  • Value { "attributes" : { "Name" : "Utah" }, "geometry" : ... }
  1. Record 2
  • Key 56
  • Value { "attributes" : { "Name" : "Colorado" }, "geometry" : ... }

EsriFeatureClass

The class EsriFeatureClass is a direct mapping to the data contained in an Enclosed JSON document.

Creating a feature class object is as simple as calling EsriFeatureClass.fromJson(InputStream) where InputStream contains the entire JSON file. Here is an example of a how a mapper would create a feature class in Hadoop in its setup method.

EsriFeatureClass featureClass;

@Override
public void setup(Context context)
{
	Configuration config = context.getConfiguration();
	FSDataInputStream iStream = null;
	
	try {
		// load the JSON file provided as argument 0
		FileSystem hdfs = FileSystem.get(config);
		iStream = hdfs.open(new Path(config.get("com.esri.geometry")));
        
                // create feature class from stream
		featureClass = EsriFeatureClass.fromJson(iStream);
	} 
	catch (Exception e)
	{
		e.printStackTrace();
	} 
	finally
	{
		if (iStream != null)
		{
			try {
				iStream.close();
			} catch (IOException e) { }
		}
	}
}

Now that you have the feature class loaded, each map method can access the geometries and attributes in the feature. For Example:

@Override
public void map(LongWritable key, Text val, Context context)
		throws IOException, InterruptedException {

    // code to process these values and pull out longitude and latitude
    ... 

    // construct a point using the Point class from the esri-geometry-api
    // and set the x,y values to coordinates from our source data
    Point p = new Point(longitude, latitude);
    
    // loop through each feature in the feature class
    for (EsriFeature feature : featureClass.features)
    {
        // check to see if the feature geometry contains the point that we
        // are interested in
    	if (GeometryEngine.contains(feature.geometry, point, spatialReference))
    	{
    		String name = (String)feature.attributes.get("name");
    		
                // emit the name as a key and any data you want associated with that
                // key as the value
    		context.write(new Text(name), ...);
    		break;
    	}
    }
}