-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the max line size in LineRecordReader #11370
Conversation
@@ -107,6 +109,8 @@ public void updateConfiguration(Configuration config) | |||
{ | |||
copy(resourcesConfiguration, config); | |||
|
|||
config.setInt(LineRecordReader.MAX_LINE_LENGTH, toIntExact(new DataSize(100, DataSize.Unit.MEGABYTE).toBytes())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not something closer to the original limit, which was 20x bigger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the original limit, there will be many humongous byte[] arrays that are close to 2GB allocated and then collected (see org.apache.hadoop.io.Text.setCapacity()). This would cause GC pressure if this OutOfMemoryError on array size is not hit first. In one dump we got I saw ~20 2GB byte[] being collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that we maybe overriding a value set in a hadoop.conf file, so maybe we should only set this if it is not already set. Also, I think we should make this value configurable.
I'm not sure about the default value. @findepi what do you think is a good default for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So more thinking, I think we should handle this like we do the dfsTimeout
config option in this method... there is a value that comes from the HiveClientConfig
, and we always apply that.
We would still need a good default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this becomes configurable, I am not that much concerned about the value.. The only problem would be that in case of a too long line, exception message would not be helpful in tracing the config switch governing the behavior.
As this is supposed to be OOM-preventing safety mechanism only, we can start with 100M being the default and revisit the decision if we face any problems with this.
52c5129
to
e94eb91
Compare
@@ -892,6 +894,18 @@ public HiveClientConfig setAssumeCanonicalPartitionKeys(boolean assumeCanonicalP | |||
return this; | |||
} | |||
|
|||
public DataSize getMaxLineLength() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add
@MinDataSize("1kB") ← technically 1B, but that's not practical min value
@MaxDataSize(...) ← highest value that won't OOM (may vary between JVMs, see io.airlift.slice.Slices#MAX_ARRAY_SIZE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi I couldn't find io.airlift.slice.Slices#MAX_ARRAY_SIZE, so I'm using @MaxDataSize("1GB") for now
return maxLineLength; | ||
} | ||
|
||
@Config("mapreduce.input.linerecordreader.line.maxlength") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what the config name should be, but all config names in this class begin with "hive."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the original config name in org.apache.hadoop.mapreduce.lib.input.LineRecordReader. @dain do you think it's ok to keep using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should follow the convention of the other file formats (e.g., hive.orc.max-merge-distance
) so something like hive.text.max-line-length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dain would it cause confusion if it's different than the original config in org.apache.hadoop.mapreduce.lib.input.LineRecordReader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. None of the configuration properties use the Hadoop names. Instead, I think it might be more confusing to have one property that is not following the existing convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change the config name so that it's clear this is truncation. How about hive.text.line-truncation-limit
.
@@ -107,7 +107,8 @@ public void testDefaults() | |||
.setCreatesOfNonManagedTablesEnabled(true) | |||
.setHdfsWireEncryptionEnabled(false) | |||
.setPartitionStatisticsSampleSize(100) | |||
.setCollectColumnStatisticsOnWrite(false)); | |||
.setCollectColumnStatisticsOnWrite(false) | |||
.setMaxLineLength(new DataSize(100, DataSize.Unit.MEGABYTE))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add this in the same order as in the HCC (before isUseParquetColumnNames
)
(having same order everywhere reduces potential conflicts)
0e0630b
to
7db27c0
Compare
@@ -96,6 +96,8 @@ | |||
|
|||
private List<String> resourceConfigFiles = ImmutableList.of(); | |||
|
|||
private DataSize maxLineLength = new DataSize(100, DataSize.Unit.MEGABYTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi, do you have a suggestion for a good default here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dain Piotr had this comment before:
findepi 13 hours ago Member
Once this becomes configurable, I am not that much concerned about the value.. The only problem would be that in case of a too long line, exception message would not be helpful in tracing the config switch governing the behavior.
As this is supposed to be OOM-preventing safety mechanism only, we can start with 100M being the default and revisit the decision if we face any problems with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the error message look like when we hit this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dain It's just this line in GC log:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
However after this change there shouldn't be hitting this anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I mean when we hit this limit do we get something like "Test line larger than 100MB"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. I would not have expected that. @findepi does that seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With David's suggest config changes, I think this is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. As long as we agree that truncating the line at 100MB is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rest of the line is silently discarded.
a bit scary. Is it really discarded or is it taken to constitute a next line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi The chars after the limit is really discarded.
@@ -154,6 +155,7 @@ public void testExplicitPropertyMappings() | |||
.put("hive.force-local-scheduling", "true") | |||
.put("hive.max-concurrent-file-renames", "100") | |||
.put("hive.assume-canonical-partition-keys", "true") | |||
.put("mapreduce.input.linerecordreader.line.maxlength", "1MB") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe pick a more "unnatural" value that could never be a possible default.... like 13MB
return maxLineLength; | ||
} | ||
|
||
@Config("mapreduce.input.linerecordreader.line.maxlength") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change the config name so that it's clear this is truncation. How about hive.text.line-truncation-limit
.
return maxLineLength; | ||
} | ||
|
||
@Config("mapreduce.input.linerecordreader.line.maxlength") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a description
@ConfigDescription("Text file lines larger than this will be silently truncated")
@@ -78,6 +78,7 @@ public void testDefaults() | |||
.setMaxPartitionsPerWriter(100) | |||
.setMaxOpenSortFiles(50) | |||
.setWriteValidationThreads(16) | |||
.setMaxLineLength(new DataSize(100, DataSize.Unit.MEGABYTE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DataSize
prefix is not necessary since Unit
is already imported. Your IDE should have a warning for this.
@@ -892,6 +894,20 @@ public HiveClientConfig setAssumeCanonicalPartitionKeys(boolean assumeCanonicalP | |||
return this; | |||
} | |||
|
|||
@MinDataSize("1kB") | |||
@MaxDataSize("1GB") | |||
public DataSize getMaxLineLength() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add @NotNull
@@ -892,6 +894,20 @@ public HiveClientConfig setAssumeCanonicalPartitionKeys(boolean assumeCanonicalP | |||
return this; | |||
} | |||
|
|||
@MinDataSize("1kB") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this 1B
? It would be strange to set it this low, but I see no reason why we need to prevent it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7db27c0
to
b6c90b3
Compare
return maxLineLength; | ||
} | ||
|
||
@Config("hive.text.line-truncation-limit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hive.text.line-truncation-threshold
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Do we really want silent truncation? I feel like failing loudly for larger lines is better (in terms of correctness) than silently returning a subset of data. We have other similar limits where we fail loudly, e.g., |
@nezihyigitbasi, I agree, but the problem is the Text reader code comes from Hadoop, and it doesn't have a "fail" mode, so we are trying to figure the best alternative to writing our own text reader. For perspective, the current behavior is the whole JVM crashes because we get an OOM |
JVM crash is a bad thing. However, if we don't crash, we let a query return wrong (truncated) results. Both options don't look great. |
@findepi the phrase "wrong (truncated) results" is dependent on your definition of what correct results are. If we define correctness as what Hive/Hadoop will produces, then that behavior is dependent on the |
Probably the "functionally correct way" is to set this default limit as the array size VM limit, which is 2,147,483,645 on our system. Then we can avoid the error "java.lang.OutOfMemoryError: Requested array size exceeds VM limit", and lines longer than this is not supported by Hadoop anyways. But this may cause "java.lang.OutOfMemoryError: Heap space" on the other hand, because the Hadoop code keep allocating and collecting arrays of near 2GB size. In the heap dump we saw ~20 such arrays which takes 40GB. To see this look at org.apache.hadoop.io.Text.setCapacity(). But then in the hadoop.conf file we can put a stricter limit like 100MB and let users know the change, and monitor if this still causes any GC pressure. @dain @findepi @nezihyigitbasi @electrum What do you think about this? |
@dain indeed, i didn't look at this that way.
@yingsu00 that was my thinking as well. However, as you observed, this doesn't really prevent OOM from happening, so might not solve the real problem. What about setting this to some large value, like 100 MB and sending a patch to Hadoop changing the default upstream as well? |
@findepi Did you mean changing the truncation limit to 100MB in Hadoop as well? Anyways I think we can enable this PR on presto side first. We can send patch to Hadoop with both options later. Now, do we have an agreement setting the default truncation limit as 100MB in Presto code? I have asked Hive team and they don't have such limit, which means the limit is 2GB for them. But they don't really care because each job uses a separate JVM. So we have two options here
@dain @findepi Could you please let me know your opinion on this? Thanks! |
I understand all that. If we set a limit of, say ~100MB, and after we read a line, can we detect in our record readers/cursor that the line we just read is ~100MB and then we fail the query? |
I'm pretty sure we get the final decoded data back instead of the raw text line back. |
@yingsu00 we're considering overriding Hadoop's default behavior with something safer, at the cost of a discrepancy. Promoting this change upstream removes the discrepancy. as to the default value -- it's apparent there isn't a silver bullet here, until we update |
@findepi hi Piotr, before LineRecordReader is able to introduce the fail mode, we won't be able to tell the "bad" files or queries. These queries will now succeed. |
@yingsu00 yes, I know. I suspect that LineRecordReader won't support the fail mode, unless we make it do that. |
We had a discussion with @electrum and we will create our own copy of the LineReader. This will be done in two seperate PRs on presto-hadoop-apache2 repo. |
b6c90b3
to
eb55287
Compare
hi all, I have updated the PR that fails the query (throw IOException when the line is too long) in PR prestodb/presto-hadoop-apache2#30 |
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Outdated
Show resolved
Hide resolved
If the line is over the limit, the following error will be thrown: |
This is the fix for prestodb#11367 When reading large text files, in fairly rare cases Presto hive is requesting allocation of byte[] of size 2,147,483,643 to 2,147,483,647. This exceeds the VM limit which is Integer.MAX_VALUE-5. However apache hadoop library uses Integer.MAX_VALUE as the limit and allows such allocation. This results in the OutOfMemoryError but it's not caught by Presto Hive. As a result of our JVM configuration, this results in JVM restarts and kills all queries running on the cluster. This commit is to 1) fix the above bug in Hadoop and allow loud failure when the line is over the max limit; 2) make Presto depend on the snapshot version of the Hadoop library; 3) Set the default limit of text file line size 100MB in Presto.
eb55287
to
ec145dd
Compare
@@ -367,7 +367,7 @@ | |||
<dependency> | |||
<groupId>com.facebook.presto.hadoop</groupId> | |||
<artifactId>hadoop-apache2</artifactId> | |||
<version>2.7.4-3</version> | |||
<version>2.7.4-4-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to 2.7.4-4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to the release version of hadoop-apache2
before merging
This is the fix for #11367
When reading large text files, in fairly rare cases Presto hive is
requesting allocation of byte[] of size 2,147,483,643 to 2,147,483,647.
This exceeds the VM limit which is Integer.MAX_VALUE-5. However apache
hadoop library uses Integer.MAX_VALUE as the limit and allows such
allocation. This results in the OutOfMemoryError but it's not caught
by Presto Hive. As a result of our JVM configuration, this results in
JVM restarts and kills all queries running on the cluster.
The hadoop snapshot library PR is at:
prestodb/presto-hadoop-apache2#30