-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using the spatial framework for hadoop with data stored in ORC files #85
Comments
Interesting. We will try to reproduce this. In the meantime, can you disable vectorization to try and get around the error?
This may affect the performance of the queries. |
Hi Michael Using set hive.vectorized.execution.enabled = false; and set hive.default.fileformat=TextFile; made the queries work. Having a look at the source code it looks like ORC does not know how to work with the spatial types in columns. 630 private ColumnVector More ...allocateColumnVector(String type, int defaultSize) { 631 if (type.equalsIgnoreCase("double")) { 632 return new DoubleColumnVector(defaultSize); 633 } else if (VectorizationContext.isStringFamily(type)) { 634 return new BytesColumnVector(defaultSize); 635 } else if (VectorizationContext.decimalTypePattern.matcher(type).matches()){ 636 int [] precisionScale = getScalePrecisionFromDecimalType(type); 637 return new DecimalColumnVector(defaultSize, precisionScale[0], precisionScale[1]); 638 } else if (type.equalsIgnoreCase("long") || 639 type.equalsIgnoreCase("date") || 640 type.equalsIgnoreCase("timestamp")) { 641 return new LongColumnVector(defaultSize); 642 } else { 643 throw new Error("Cannot allocate vector column for " + type); 644 } 645 } 646 Thank you very much for your help. You can close this issue. Regards Derck From: Michael Park [mailto:notifications@github.com] Interesting. We will try to reproduce this. In the meantime, can you disable vectorization to try and get around the error? set hive.vectorized.execution.enabled = false; This may affect the performance of the queries. — |
Hey we are able to run spatial data with ORC Files. I ran to the same problem as you. After Some research I figured that TEZ Engine uses Vectorization which does not support Binary Datatype. When we compute ST_Point or ST_Polygon the result is binary data. So just disabling vectorization for this step solves your problem |
I don't see this method that is called out on master: do we think this is still a problem on hive-master? It looks like it was changed in this commit: then the code in at some point was updated to include binary support. or it appears that way. |
Good Afternoon,
The ORC format allows for the efficient storage and retrieval of big data files. For more details see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC.
We have installed a Hadoop Cluster based on the Hortonworks Data Platform 2.2.6.0-2800.
When we work with csv files in hive we do not have any problems . When we use the ORC file format we get the following problems.
ORC Problem
[hive@srv-hc10 ~]$ hive
hive> add jar esri-geometry-api-1.2.1.jar spatial-sdk-hive-1.0.3-SNAPSHOT.jar spatial-sdk-json-1.0.3-SNAPSHOT.jar;
Added [esri-geometry-api-1.2.1.jar, spatial-sdk-hive-1.0.3-SNAPSHOT.jar, spatial-sdk-json-1.0.3-SNAPSHOT.jar] to class path
Added resources: [esri-geometry-api-1.2.1.jar, spatial-sdk-hive-1.0.3-SNAPSHOT.jar, spatial-sdk-json-1.0.3-SNAPSHOT.jar]
hive> create temporary function ST_Bin as 'com.esri.hadoop.hive.ST_Bin';
OK
Time taken: 0.636 seconds
hive> create temporary function ST_BinEnvelope as 'com.esri.hadoop.hive.ST_BinEnvelope';
OK
Time taken: 0.014 seconds
hive> describe formatted xxxxxxx.events_orc;
OK
col_name data_type comment
vehicle_id int
ignition smallint
event_ts bigint
event_description string
longitude double
latitude double
altitude string
speed smallint
bearing smallint
linear_g double
lateral_g double
trip_no int
Detailed Table Information
Database: xxxxxxx
Owner: root
CreateTime: Thu Jun 18 22:41:42 SAST 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://srv-hcm01.esri-southafrica.com:8020/apps/hive/warehouse/xxxxxxx.db/events_orc
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
auto.purge true
comment xxxxxxx analysis table
last_modified_by root
last_modified_time 1434727038
numFiles 62
numRows -1
orc.compress SNAPPY
rawDataSize -1
totalSize 1954173667
transient_lastDdlTime 1434727038
Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets: 62
Bucket Columns: [vehicle_id]
Sort Columns: [Order(col:event_ts, order:1)]
Storage Desc Params:
serialization.format 1
Time taken: 1.135 seconds, Fetched: 47 row(s)
hive> select ST_Bin(0.001, ST_Point(longitude, latitude)) as binvalue, count(*) as freq
> from xxxxxxx.events_orc
> where longitude is not null and latitude is not null and vehicle_id = 63962497
> group by ST_Bin(0.001, ST_Point(longitude, latitude));
Query ID = hive_20150623124949_0461acf6-46d8-41e4-99e1-6b62836abf6a
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1434395264469_0091)
Map 1 FAILED 68 0 0 68 153 67
Reducer 2 KILLED 8 0 0 8 0 8
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 23.79 s
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1434395264469_0091_1_00, diagnostics=[Task failed, taskId=task_1434395264469_0091_1_00_000011, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.Error: Cannot allocate vector column for None
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Error: Cannot allocate vector column for None
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.allocateColumnVector(VectorizedRowBatchCtx.java:643)
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:606)
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.createValue(TezGroupedSplitsInputFormat.java:141)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:150)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:609)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:588)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:140)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:361)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:134)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
... 13 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.Error: Cannot allocate vector column for None
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Error: Cannot allocate vector column for None
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.allocateColumnVector(VectorizedRowBatchCtx.java:643)
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:606)
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.createValue(TezGroupedSplitsInputFormat.java:141)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:150)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:609)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:588)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:140)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:361)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:134)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
... 13 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.Error: Cannot allocate vector column for None
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Error: Cannot allocate vector column for None
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.allocateColumnVector(VectorizedRowBatchCtx.java:643)
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:606)
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.createValue(TezGroupedSplitsInputFormat.java:141)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:150)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:609)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:588)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:140)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:361)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:134)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
... 13 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.Error: Cannot allocate vector column for None
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Error: Cannot allocate vector column for None
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.allocateColumnVector(VectorizedRowBatchCtx.java:643)
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:606)
at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.createValue(TezGroupedSplitsInputFormat.java:141)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:150)
at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:609)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:588)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:140)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:361)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:134)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
... 13 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1434395264469_0091_1_00 [Map 1] killed/failed due to:null]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1434395264469_0091_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1434395264469_0091_1_01 [Reducer 2] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
hive>
Could you please investigate if this is viable.
Regards
Derck
The text was updated successfully, but these errors were encountered: