earthquakes.csv has different schema than sample expects #24

ddkaiser · 2015-03-05T03:42:50Z

The “create table earthquakes” instructions given at: https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-hive no longer aligns with the schema of the data located at: https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/data/earthquake-data

(I’m guessing that the earthquake-data is occasionally pulled from a USGS or similar source, and they changed their column definitions?)

I had to insert an additional column “unknown” of type double in front of the Magnitude column.

For example, the instructions provide the following schema:

(earthquake_date STRING, latitude DOUBLE, longitude DOUBLE, magnitude DOUBLE)

and a random sample line from the file (the unknown column is 80.0 and the magnitude is 6.5):

1930/12/06 07:03:28.00,53.0,-172.0,80.0,6.5,ML,0,,,,AK,

The schema that I used:

(earthquake_date STRING, latitude DOUBLE, longitude DOUBLE, unknown DOUBLE, magnitude DOUBLE)

If corrected, this works:

hive> select min(magnitude), max(magnitude) from earthquakes;
OK
5.0 9.1

If magnitude still points to the wrong column, you will see:

hive> select min(magnitude), max(magnitude) from earthquakes;
OK
-5.0    700.0

The text was updated successfully, but these errors were encountered:

randallwhitman · 2015-03-05T16:38:51Z

The version of earthquakes.csv with header row, contains the following header:

datetime,latitude,longitude,depth,magnitude,magtype,nbstations,gap,distance,rms,source,eventid

randallwhitman · 2015-03-05T17:02:02Z

The DDL (in the README and in run-sample.sql) matches a column-subset variant of the data that we also had. The mismatch can be resolved either by updating the DDL in both files - or by uploading the column-subset version of the earthquake data.

data as shared with the custom MapReduce sample (#24)

randallwhitman added a commit that referenced this issue Mar 10, 2015

Hive sample earthquake schema updated to match complete earthquake CSV

a905d49

data as shared with the custom MapReduce sample (#24)

JamesLMilner mentioned this issue Apr 8, 2015

Copy from HDFS not working #22

Open

randallwhitman added this to the v2.1 milestone Mar 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

earthquakes.csv has different schema than sample expects #24

earthquakes.csv has different schema than sample expects #24

ddkaiser commented Mar 5, 2015

randallwhitman commented Mar 5, 2015

randallwhitman commented Mar 5, 2015

earthquakes.csv has different schema than sample expects #24

earthquakes.csv has different schema than sample expects #24

Comments

ddkaiser commented Mar 5, 2015

randallwhitman commented Mar 5, 2015

randallwhitman commented Mar 5, 2015