You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Anyway I was trying Aggregation sample (taxi demo) and when I tried aggregation (step 9) it was for looooooooooooooong time and the process bar (map and %) was like:
0%, after 30 minutes 0%, after another 30 minutes 89% and 3 times the same (only Map), in Reduce was 0% ... This is okey? because I hade to turn off my lapton I was using for it so I don´t know ..
but then I tried it again, skip step 9 and use 10 and 11 and get this:
Could not find job application_1430678691120_0002. The job might not be running yet.
Job job_1430678691120_0002 could not be found: {"RemoteException":{"exception":"NotFoundException","message":"java.lang.Exception: job, job_1430678691120_0002, is not found","javaClassName":"org.apache.hadoop.yarn.webapp.NotFoundException"}} (error 404)
Steps 1-8 worked well ;) I don´t know if step 9 had some error message yet because I had to turn it off ...
but I will try it again ... (its really long process - its was running more than 1 hour... and still 89% Map then again 30% ... I will try it and write results ...
Hadoop HDF and stuff ... That would be great to implement that to these tools ...
BTW: Taxi demo sample ... I am again in step 9 .. .change the value from 0.01 to 1 to make it faster BUT ... it is still slow ... or its okey? I am just asking because I don´t know ... Isn ´t it weird ?
hive> FROM (SELECT ST_Bin(1, ST_Point(dropoff_longitude,dropoff_latitude)) bin_id, *FROM taxi_demo) bins
> SELECT ST_BinEnvelope(1, bin_id) shape,
> COUNT(*) count
> GROUP BY bin_id;
Query ID = root_20150507120909_e000001c-4259-48dd-8e98-3684d0e94566
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1431013082714_0001, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1431013082714_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1431013082714_0001
Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 3
2015-05-07 12:11:05,397 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:12:09,782 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:13:25,554 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:14:37,769 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:15:38,096 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:16:42,504 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:17:25,371 Stage-1 map = 89%, reduce = 0%
2015-05-07 12:18:11,890 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:19:12,073 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:20:12,697 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:21:26,323 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:22:26,650 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:23:33,421 Stage-1 map = 11%, reduce = 0%
2015-05-07 12:23:35,272 Stage-1 map = 56%, reduce = 0%
2015-05-07 12:24:09,535 Stage-1 map = 89%, reduce = 0%
2015-05-07 12:24:41,054 Stage-1 map = 67%, reduce = 0%
2015-05-07 12:25:28,244 Stage-1 map = 44%, reduce = 0%
2015-05-07 12:26:22,278 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:28:36,400 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:29:46,988 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:30:47,851 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:31:48,892 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:33:13,617 Stage-1 map = 0%, reduce = 0%
2015-05-07 12:34:14,299 Stage-1 map = 0%, reduce = 0%
EDIT:
I guess that my last code isn´t good for me because it end now with this:
2015-05-07 12:56:44,267 Stage-1 map = 0%, reduce = 0%
2015-05-07 13:07:46,126 Stage-1 map = 89%, reduce = 0%
java.io.IOException: Job status not available
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:329)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:598)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:288)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Ended Job = job_1431013082714_0001 with exception 'java.io.IOException(Job status not available )'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
hive>
The text was updated successfully, but these errors were encountered:
I completed the aggregation example in the Sandbox - and it did complete in a reasonable time (238 seconds).
What do you see when you check your tracking URL while running the job (example from above: Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1431013082714_0001/)?
What do you get when you do select * from taxi_agg limit 2;
Migrated from Issue #25, from @TikoS
Anyway I was trying Aggregation sample (taxi demo) and when I tried aggregation (step 9) it was for looooooooooooooong time and the process bar (map and %) was like:
0%, after 30 minutes 0%, after another 30 minutes 89% and 3 times the same (only Map), in Reduce was 0% ... This is okey? because I hade to turn off my lapton I was using for it so I don´t know ..
but then I tried it again, skip step 9 and use 10 and 11 and get this:
Steps 1-8 worked well ;) I don´t know if step 9 had some error message yet because I had to turn it off ...
but I will try it again ... (its really long process - its was running more than 1 hour... and still 89% Map then again 30% ... I will try it and write results ...
Hadoop HDF and stuff ... That would be great to implement that to these tools ...
BTW: Taxi demo sample ... I am again in step 9 .. .change the value from 0.01 to 1 to make it faster BUT ... it is still slow ... or its okey? I am just asking because I don´t know ... Isn ´t it weird ?
EDIT:
I guess that my last code isn´t good for me because it end now with this:
The text was updated successfully, but these errors were encountered: