Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map doesn't complete running aggregation example in Hortonworks sandbox #28

Closed
smambrose opened this issue May 12, 2015 · 1 comment
Closed

Comments

@smambrose
Copy link
Contributor

Migrated from Issue #25, from @TikoS

Anyway I was trying Aggregation sample (taxi demo) and when I tried aggregation (step 9) it was for looooooooooooooong time and the process bar (map and %) was like:

0%, after 30 minutes 0%, after another 30 minutes 89% and 3 times the same (only Map), in Reduce was 0% ... This is okey? because I hade to turn off my lapton I was using for it so I don´t know ..

but then I tried it again, skip step 9 and use 10 and 11 and get this:

Could not find job application_1430678691120_0002. The job might not be running yet.

Job job_1430678691120_0002 could not be found: {"RemoteException":{"exception":"NotFoundException","message":"java.lang.Exception: job, job_1430678691120_0002, is not found","javaClassName":"org.apache.hadoop.yarn.webapp.NotFoundException"}} (error 404)

Steps 1-8 worked well ;) I don´t know if step 9 had some error message yet because I had to turn it off ...
but I will try it again ... (its really long process - its was running more than 1 hour... and still 89% Map then again 30% ... I will try it and write results ...

Hadoop HDF and stuff ... That would be great to implement that to these tools ...


BTW: Taxi demo sample ... I am again in step 9 .. .change the value from 0.01 to 1 to make it faster BUT ... it is still slow ... or its okey? I am just asking because I don´t know ... Isn ´t it weird ?

hive> FROM (SELECT ST_Bin(1, ST_Point(dropoff_longitude,dropoff_latitude)) bin_id, *FROM taxi_demo) bins
    > SELECT ST_BinEnvelope(1, bin_id) shape,
    > COUNT(*) count
    > GROUP BY bin_id;
Query ID = root_20150507120909_e000001c-4259-48dd-8e98-3684d0e94566
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1431013082714_0001, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1431013082714_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1431013082714_0001
Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 3
2015-05-07 12:11:05,397 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:12:09,782 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:13:25,554 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:14:37,769 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:15:38,096 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:16:42,504 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:17:25,371 Stage-1 map = 89%,  reduce = 0%
2015-05-07 12:18:11,890 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:19:12,073 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:20:12,697 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:21:26,323 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:22:26,650 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:23:33,421 Stage-1 map = 11%,  reduce = 0%
2015-05-07 12:23:35,272 Stage-1 map = 56%,  reduce = 0%
2015-05-07 12:24:09,535 Stage-1 map = 89%,  reduce = 0%
2015-05-07 12:24:41,054 Stage-1 map = 67%,  reduce = 0%
2015-05-07 12:25:28,244 Stage-1 map = 44%,  reduce = 0%
2015-05-07 12:26:22,278 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:28:36,400 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:29:46,988 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:30:47,851 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:31:48,892 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:33:13,617 Stage-1 map = 0%,  reduce = 0%
2015-05-07 12:34:14,299 Stage-1 map = 0%,  reduce = 0%

EDIT:
I guess that my last code isn´t good for me because it end now with this:

2015-05-07 12:56:44,267 Stage-1 map = 0%,  reduce = 0%
2015-05-07 13:07:46,126 Stage-1 map = 89%,  reduce = 0%
java.io.IOException: Job status not available
        at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
        at org.apache.hadoop.mapreduce.Job.getStatus(Job.java:329)
        at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:598)
        at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:288)
        at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Ended Job = job_1431013082714_0001 with exception 'java.io.IOException(Job status not available )'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
hive>
@smambrose
Copy link
Contributor Author

Hi @TikoS,

I completed the aggregation example in the Sandbox - and it did complete in a reasonable time (238 seconds).

What do you see when you check your tracking URL while running the job (example from above: Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1431013082714_0001/)?

What do you get when you do select * from taxi_agg limit 2;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants