-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-529: Avoid evoking job.toString() in ParquetLoader #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-529: Avoid evoking job.toString() in ParquetLoader #326
Conversation
|
@julienledem @rdblue @liancheng @danielcweeks @aniket486 would you mind taking a look at this when you have time? This has been blocking [Parquet-401: Deprecate Log and move to SLF4J Logger][PR#319]. Thanks! |
|
|
||
| @Override | ||
| public void setLocation(String location, Job job) throws IOException { | ||
| if (DEBUG) LOG.debug("LoadFunc.setLocation(" + location + ", " + job + ")"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
job.getId or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Added ed getJobId() and getJobName().
Would you mind taking another look at this?
|
+1, thanks @proflin! |
When ran under hadoop2 environment and log level setting to `DEBUG`, ParquetLoader would evoke `job.toString()` in several methods, which might cause the whole application to stop due to :
```
java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:283)
at org.apache.hadoop.mapreduce.Job.toString(Job.java:452)
at java.lang.String.valueOf(String.java:2847)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at org.apache.parquet.pig.ParquetLoader.getSchema(ParquetLoader.java:260)
at org.apache.parquet.pig.TestParquetLoader.testSchema(TestParquetLoader.java:54)
...
```
The reason is that in the hadoop 2.x branch, `org.apache.hadoop.mapreduce.Job.toString()` has added an `ensureState(JobState.RUNNING)` check; see [map-reduce: Job.java#452](http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.3.0/org/apache/hadoop/mapreduce/Job.java#452). In contrast, the hadoop 1.x branch does not contain such checks, so `ParquetLoader` works well.
This PR simply avoids evoking `job.toString()` in `ParquetLoader`.
Author: proflin <proflin.me@gmail.com>
Author: Liwei Lin <proflin.me@gmail.com>
Closes apache#326 from proflin/PARQUET-529--Avoid-evoking-job.toString()-in-ParquetLoader and squashes the following commits:
f464c7b [proflin] Add jobToString
5d4c750 [proflin] PARQUET-529: Avoid evoking job.toString() in ParquetLoader.java
bb4283a [Liwei Lin] Merge branch 'master' of https://github.com/proflin/parquet-mr
839b458 [proflin] Merge remote-tracking branch 'refs/remotes/apache/master'
When ran under hadoop2 environment and log level setting to `DEBUG`, ParquetLoader would evoke `job.toString()` in several methods, which might cause the whole application to stop due to :
```
java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:283)
at org.apache.hadoop.mapreduce.Job.toString(Job.java:452)
at java.lang.String.valueOf(String.java:2847)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at org.apache.parquet.pig.ParquetLoader.getSchema(ParquetLoader.java:260)
at org.apache.parquet.pig.TestParquetLoader.testSchema(TestParquetLoader.java:54)
...
```
The reason is that in the hadoop 2.x branch, `org.apache.hadoop.mapreduce.Job.toString()` has added an `ensureState(JobState.RUNNING)` check; see [map-reduce: Job.java#452](http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.3.0/org/apache/hadoop/mapreduce/Job.java#452). In contrast, the hadoop 1.x branch does not contain such checks, so `ParquetLoader` works well.
This PR simply avoids evoking `job.toString()` in `ParquetLoader`.
Author: proflin <proflin.me@gmail.com>
Author: Liwei Lin <proflin.me@gmail.com>
Closes apache#326 from proflin/PARQUET-529--Avoid-evoking-job.toString()-in-ParquetLoader and squashes the following commits:
f464c7b [proflin] Add jobToString
5d4c750 [proflin] PARQUET-529: Avoid evoking job.toString() in ParquetLoader.java
bb4283a [Liwei Lin] Merge branch 'master' of https://github.com/proflin/parquet-mr
839b458 [proflin] Merge remote-tracking branch 'refs/remotes/apache/master'
When ran under hadoop2 environment and log level setting to `DEBUG`, ParquetLoader would evoke `job.toString()` in several methods, which might cause the whole application to stop due to :
```
java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:283)
at org.apache.hadoop.mapreduce.Job.toString(Job.java:452)
at java.lang.String.valueOf(String.java:2847)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at org.apache.parquet.pig.ParquetLoader.getSchema(ParquetLoader.java:260)
at org.apache.parquet.pig.TestParquetLoader.testSchema(TestParquetLoader.java:54)
...
```
The reason is that in the hadoop 2.x branch, `org.apache.hadoop.mapreduce.Job.toString()` has added an `ensureState(JobState.RUNNING)` check; see [map-reduce: Job.java#452](http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.3.0/org/apache/hadoop/mapreduce/Job.java#452). In contrast, the hadoop 1.x branch does not contain such checks, so `ParquetLoader` works well.
This PR simply avoids evoking `job.toString()` in `ParquetLoader`.
Author: proflin <proflin.me@gmail.com>
Author: Liwei Lin <proflin.me@gmail.com>
Closes apache#326 from proflin/PARQUET-529--Avoid-evoking-job.toString()-in-ParquetLoader and squashes the following commits:
f464c7b [proflin] Add jobToString
5d4c750 [proflin] PARQUET-529: Avoid evoking job.toString() in ParquetLoader.java
bb4283a [Liwei Lin] Merge branch 'master' of https://github.com/proflin/parquet-mr
839b458 [proflin] Merge remote-tracking branch 'refs/remotes/apache/master'
When ran under hadoop2 environment and log level setting to `DEBUG`, ParquetLoader would evoke `job.toString()` in several methods, which might cause the whole application to stop due to :
```
java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:283)
at org.apache.hadoop.mapreduce.Job.toString(Job.java:452)
at java.lang.String.valueOf(String.java:2847)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at org.apache.parquet.pig.ParquetLoader.getSchema(ParquetLoader.java:260)
at org.apache.parquet.pig.TestParquetLoader.testSchema(TestParquetLoader.java:54)
...
```
The reason is that in the hadoop 2.x branch, `org.apache.hadoop.mapreduce.Job.toString()` has added an `ensureState(JobState.RUNNING)` check; see [map-reduce: Job.java#452](http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.3.0/org/apache/hadoop/mapreduce/Job.java#452). In contrast, the hadoop 1.x branch does not contain such checks, so `ParquetLoader` works well.
This PR simply avoids evoking `job.toString()` in `ParquetLoader`.
Author: proflin <proflin.me@gmail.com>
Author: Liwei Lin <proflin.me@gmail.com>
Closes apache#326 from proflin/PARQUET-529--Avoid-evoking-job.toString()-in-ParquetLoader and squashes the following commits:
f464c7b [proflin] Add jobToString
5d4c750 [proflin] PARQUET-529: Avoid evoking job.toString() in ParquetLoader.java
bb4283a [Liwei Lin] Merge branch 'master' of https://github.com/proflin/parquet-mr
839b458 [proflin] Merge remote-tracking branch 'refs/remotes/apache/master'
When ran under hadoop2 environment and log level setting to
DEBUG, ParquetLoader would evokejob.toString()in several methods, which might cause the whole application to stop due to :The reason is that in the hadoop 2.x branch,
org.apache.hadoop.mapreduce.Job.toString()has added anensureState(JobState.RUNNING)check; see map-reduce: Job.java#452. In contrast, the hadoop 1.x branch does not contain such checks, soParquetLoaderworks well.This PR simply avoids evoking
job.toString()inParquetLoader.