-
Notifications
You must be signed in to change notification settings - Fork 159
JSON Utilities
The JSON Utilities provide helper methods for developers using Esri JSON file formats in MapReduce applications.
This library contains:
- InputFormats - These are classes that Hadoop uses to decide how input data is split between mappers.
- RecordReaders - These are classes that define how a data-split is broken into records (key/value pairs). RecordReaders are used in conjunction with the InputFormats.
- EsriFeatureClass - This class provides a data structure that represents a set of geometries and associated attributes that can be directly constructed from the Enclosed JSON file format.
InputFormats and RecordReaders separate data into splits and records that can be distributed across multiple mappers and map tasks. Each split is given to a mapper which then loops through each record in the split and calls map<K,V>()
on it. K
is the key associated with a record and V
is the value. For our formats, this key is the character offset from the start of the file to the start of the record in the file, and the value is the JSON text of the record. Don't focus too much on the K
as we are really only interested in the V
.
Take this small dataset of two Unenclosed JSON files that live in the Hadoop file system (HDFS). Each file has a couple of records that contain a U.S. state name and a geometry representing the state's boundary.
- /path/to/data/
- data-1.json
{ "attributes" : { "Name" : "California" }, "geometry" : ... } { "attributes" : { "Name" : "Arizona" }, "geometry" : ... }
- data-2.json
{ "attributes" : { "Name" : "Utah" }, "geometry" : ... } { "attributes" : { "Name" : "Colorado" }, "geometry" : ... }
For this dataset, we will use UnenclosedEsriJsonInputFormat
and UnenclosedEsriJsonRecordReader
. The RecordReader is created for each input split by the input format under the covers.
In this case, each mapper will receive an entire .json
file.
Mapper 1 (data-1.json)
- Record 1
-
Key
0
-
Value
{ "attributes" : { "Name" : "California" }, "geometry" : ... }
- Record 2
-
Key
62
-
Value
{ "attributes" : { "Name" : "Arizona" }, "geometry" : ... }
Mapper 2 (data-2.json)
- Record 1
-
Key
0
-
Value
{ "attributes" : { "Name" : "Utah" }, "geometry" : ... }
- Record 2
-
Key
56
-
Value
{ "attributes" : { "Name" : "Colorado" }, "geometry" : ... }
The class EsriFeatureClass
is a direct mapping to the data contained in an Enclosed JSON document.
Creating a feature class object is as simple as calling EsriFeatureClass.fromJson(InputStream)
where InputStream
contains the entire JSON file. Here is an example of a how a mapper would create a feature class in Hadoop in its setup
method.
EsriFeatureClass featureClass;
@Override
public void setup(Context context)
{
Configuration config = context.getConfiguration();
FSDataInputStream iStream = null;
try {
// load the JSON file provided as argument 0
FileSystem hdfs = FileSystem.get(config);
iStream = hdfs.open(new Path(config.get("com.esri.geometry")));
// create feature class from stream
featureClass = EsriFeatureClass.fromJson(iStream);
}
catch (Exception e)
{
e.printStackTrace();
}
finally
{
if (iStream != null)
{
try {
iStream.close();
} catch (IOException e) { }
}
}
}
Now that you have the feature class loaded, each map
method can access the geometries and attributes in the feature. For Example:
@Override
public void map(LongWritable key, Text val, Context context)
throws IOException, InterruptedException {
// code to process these values and pull out longitude and latitude
...
// construct a point using the Point class from the esri-geometry-api
// and set the x,y values to coordinates from our source data
Point p = new Point(longitude, latitude);
// loop through each feature in the feature class
for (EsriFeature feature : featureClass.features)
{
// check to see if the feature geometry contains the point that we
// are interested in
if (GeometryEngine.contains(feature.geometry, point, spatialReference))
{
String name = (String)feature.attributes.get("name");
// emit the name as a key and any data you want associated with that
// key as the value
context.write(new Text(name), ...);
break;
}
}
}