Limit the max line size in LineRecordReader #11370

yingsu00 · 2018-08-28T04:52:32Z

This is the fix for #11367

When reading large text files, in fairly rare cases Presto hive is
requesting allocation of byte[] of size 2,147,483,643 to 2,147,483,647.
This exceeds the VM limit which is Integer.MAX_VALUE-5. However apache
hadoop library uses Integer.MAX_VALUE as the limit and allows such
allocation. This results in the OutOfMemoryError but it's not caught
by Presto Hive. As a result of our JVM configuration, this results in
JVM restarts and kills all queries running on the cluster.

This commit is to
1) fix the above bug in Hadoop and allow loud failure when the line
is over the max limit;
2) make Presto depend on the snapshot version of the Hadoop library;
3) Set the default limit of text file line size 100MB in Presto.

The hadoop snapshot library PR is at:
prestodb/presto-hadoop-apache2#30

findepi · 2018-08-28T07:00:03Z

presto-hive/src/main/java/com/facebook/presto/hive/HdfsConfigurationUpdater.java

@@ -107,6 +109,8 @@ public void updateConfiguration(Configuration config)
    {
        copy(resourcesConfiguration, config);

+        config.setInt(LineRecordReader.MAX_LINE_LENGTH, toIntExact(new DataSize(100, DataSize.Unit.MEGABYTE).toBytes()));


Why not something closer to the original limit, which was 20x bigger?

With the original limit, there will be many humongous byte[] arrays that are close to 2GB allocated and then collected (see org.apache.hadoop.io.Text.setCapacity()). This would cause GC pressure if this OutOfMemoryError on array size is not hit first. In one dump we got I saw ~20 2GB byte[] being collected.

I'm concerned that we maybe overriding a value set in a hadoop.conf file, so maybe we should only set this if it is not already set. Also, I think we should make this value configurable.

I'm not sure about the default value. @findepi what do you think is a good default for this?

So more thinking, I think we should handle this like we do the dfsTimeout config option in this method... there is a value that comes from the HiveClientConfig, and we always apply that.
We would still need a good default value

Once this becomes configurable, I am not that much concerned about the value.. The only problem would be that in case of a too long line, exception message would not be helpful in tracing the config switch governing the behavior.
As this is supposed to be OOM-preventing safety mechanism only, we can start with 100M being the default and revisit the decision if we face any problems with this.

findepi · 2018-08-29T07:22:25Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

@@ -892,6 +894,18 @@ public HiveClientConfig setAssumeCanonicalPartitionKeys(boolean assumeCanonicalP
        return this;
    }

+    public DataSize getMaxLineLength()


add

@MinDataSize("1kB") ← technically 1B, but that's not practical min value @MaxDataSize(...) ← highest value that won't OOM (may vary between JVMs, see io.airlift.slice.Slices#MAX_ARRAY_SIZE)

@findepi I couldn't find io.airlift.slice.Slices#MAX_ARRAY_SIZE, so I'm using @MaxDataSize("1GB") for now

findepi · 2018-08-29T07:22:59Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

+        return maxLineLength;
+    }
+
+    @Config("mapreduce.input.linerecordreader.line.maxlength")


not sure what the config name should be, but all config names in this class begin with "hive."

This is the original config name in org.apache.hadoop.mapreduce.lib.input.LineRecordReader. @dain do you think it's ok to keep using it?

It should follow the convention of the other file formats (e.g., hive.orc.max-merge-distance) so something like hive.text.max-line-length

@dain would it cause confusion if it's different than the original config in org.apache.hadoop.mapreduce.lib.input.LineRecordReader?

I don't think so. None of the configuration properties use the Hadoop names. Instead, I think it might be more confusing to have one property that is not following the existing convention.

Let's change the config name so that it's clear this is truncation. How about hive.text.line-truncation-limit.

findepi · 2018-08-29T07:23:47Z

presto-hive/src/test/java/com/facebook/presto/hive/TestHiveClientConfig.java

@@ -107,7 +107,8 @@ public void testDefaults()
                .setCreatesOfNonManagedTablesEnabled(true)
                .setHdfsWireEncryptionEnabled(false)
                .setPartitionStatisticsSampleSize(100)
-                .setCollectColumnStatisticsOnWrite(false));
+                .setCollectColumnStatisticsOnWrite(false)
+                .setMaxLineLength(new DataSize(100, DataSize.Unit.MEGABYTE)));


please add this in the same order as in the HCC (before isUseParquetColumnNames)

(having same order everywhere reduces potential conflicts)

dain · 2018-08-29T20:15:58Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

@@ -96,6 +96,8 @@

    private List<String> resourceConfigFiles = ImmutableList.of();

+    private DataSize maxLineLength = new DataSize(100, DataSize.Unit.MEGABYTE);


@findepi, do you have a suggestion for a good default here?

@dain Piotr had this comment before:
findepi 13 hours ago Member
Once this becomes configurable, I am not that much concerned about the value.. The only problem would be that in case of a too long line, exception message would not be helpful in tracing the config switch governing the behavior.
As this is supposed to be OOM-preventing safety mechanism only, we can start with 100M being the default and revisit the decision if we face any problems with this.

What does the error message look like when we hit this?

@dain It's just this line in GC log:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit

However after this change there shouldn't be hitting this anymore.

Right. I mean when we hit this limit do we get something like "Test line larger than 100MB"?

Wow. I would not have expected that. @findepi does that seem reasonable?

With David's suggest config changes, I think this is reasonable.

Cool. As long as we agree that truncating the line at 100MB is fine.

the rest of the line is silently discarded.

a bit scary. Is it really discarded or is it taken to constitute a next line?

@findepi The chars after the limit is really discarded.

dain · 2018-08-29T20:16:56Z

presto-hive/src/test/java/com/facebook/presto/hive/TestHiveClientConfig.java

@@ -154,6 +155,7 @@ public void testExplicitPropertyMappings()
                .put("hive.force-local-scheduling", "true")
                .put("hive.max-concurrent-file-renames", "100")
                .put("hive.assume-canonical-partition-keys", "true")
+                .put("mapreduce.input.linerecordreader.line.maxlength", "1MB")


Maybe pick a more "unnatural" value that could never be a possible default.... like 13MB

electrum · 2018-08-29T22:17:46Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

+        return maxLineLength;
+    }
+
+    @Config("mapreduce.input.linerecordreader.line.maxlength")


Let's change the config name so that it's clear this is truncation. How about hive.text.line-truncation-limit.

electrum · 2018-08-29T22:18:30Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

+        return maxLineLength;
+    }
+
+    @Config("mapreduce.input.linerecordreader.line.maxlength")


Add a description

@ConfigDescription("Text file lines larger than this will be silently truncated")

electrum · 2018-08-29T22:19:41Z

presto-hive/src/test/java/com/facebook/presto/hive/TestHiveClientConfig.java

@@ -78,6 +78,7 @@ public void testDefaults()
                .setMaxPartitionsPerWriter(100)
                .setMaxOpenSortFiles(50)
                .setWriteValidationThreads(16)
+                .setMaxLineLength(new DataSize(100, DataSize.Unit.MEGABYTE))


The DataSize prefix is not necessary since Unit is already imported. Your IDE should have a warning for this.

electrum · 2018-08-29T22:20:23Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

@@ -892,6 +894,20 @@ public HiveClientConfig setAssumeCanonicalPartitionKeys(boolean assumeCanonicalP
        return this;
    }

+    @MinDataSize("1kB")
+    @MaxDataSize("1GB")
+    public DataSize getMaxLineLength()


Also add @NotNull

electrum · 2018-08-29T22:22:12Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

@@ -892,6 +894,20 @@ public HiveClientConfig setAssumeCanonicalPartitionKeys(boolean assumeCanonicalP
        return this;
    }

+    @MinDataSize("1kB")


Why not make this 1B? It would be strange to set it this low, but I see no reason why we need to prevent it.

@electrum @findepi suggested to make this 1kB
findepi 15 hours ago Member
add

@MinDataSize("1kB") ← technically 1B, but that's not practical min value

findepi · 2018-08-30T08:38:56Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

+        return maxLineLength;
+    }
+
+    @Config("hive.text.line-truncation-limit")


hive.text.line-truncation-threshold ?

nezihyigitbasi · 2018-08-30T10:08:34Z

Do we really want silent truncation? I feel like failing loudly for larger lines is better (in terms of correctness) than silently returning a subset of data. We have other similar limits where we fail loudly, e.g., SliceDirectStreamReader::readBlock() we have a 1GB limit.

dain · 2018-08-30T18:16:49Z

@nezihyigitbasi, I agree, but the problem is the Text reader code comes from Hadoop, and it doesn't have a "fail" mode, so we are trying to figure the best alternative to writing our own text reader. For perspective, the current behavior is the whole JVM crashes because we get an OOM

findepi · 2018-08-30T20:25:18Z

For perspective, the current behavior is the whole JVM crashes because we get an OOM

JVM crash is a bad thing. However, if we don't crash, we let a query return wrong (truncated) results. Both options don't look great.

dain · 2018-08-30T22:55:23Z

@findepi the phrase "wrong (truncated) results" is dependent on your definition of what correct results are. If we define correctness as what Hive/Hadoop will produces, then that behavior is dependent on the hadoop.conf file, which can have this property set. I don't believe that Hadoop has a default for this, so maybe we should make the config null by default?

yingsu00 · 2018-08-31T07:33:44Z

Probably the "functionally correct way" is to set this default limit as the array size VM limit, which is 2,147,483,645 on our system. Then we can avoid the error "java.lang.OutOfMemoryError: Requested array size exceeds VM limit", and lines longer than this is not supported by Hadoop anyways. But this may cause "java.lang.OutOfMemoryError: Heap space" on the other hand, because the Hadoop code keep allocating and collecting arrays of near 2GB size. In the heap dump we saw ~20 such arrays which takes 40GB. To see this look at org.apache.hadoop.io.Text.setCapacity().

But then in the hadoop.conf file we can put a stricter limit like 100MB and let users know the change, and monitor if this still causes any GC pressure. @dain @findepi @nezihyigitbasi @electrum What do you think about this?

findepi · 2018-08-31T09:34:41Z

the phrase "wrong (truncated) results" is dependent on your definition of what correct results are. If we define correctness as what Hive/Hadoop will produce ...

@dain indeed, i didn't look at this that way.

Probably the "functionally correct way" is to set this default limit as the array size VM limit, which is 2,147,483,645 on our system [..]

@yingsu00 that was my thinking as well. However, as you observed, this doesn't really prevent OOM from happening, so might not solve the real problem.

What about setting this to some large value, like 100 MB and sending a patch to Hadoop changing the default upstream as well?
OR what about changing LineRecordReader to throw on too long lines and sending a patch upstream?
(OR both)

yingsu00 · 2018-08-31T22:34:11Z

@findepi Did you mean changing the truncation limit to 100MB in Hadoop as well? Anyways I think we can enable this PR on presto side first. We can send patch to Hadoop with both options later.

Now, do we have an agreement setting the default truncation limit as 100MB in Presto code? I have asked Hive team and they don't have such limit, which means the limit is 2GB for them. But they don't really care because each job uses a separate JVM. So we have two options here

Set the default limit at 100MB in the code as what this PR currently does.
Set the default limit as 2GB-2, and set hive.text.line-truncation-limit to 100MB in the Hive Configuration Properties file

@dain @findepi Could you please let me know your opinion on this? Thanks!

nezihyigitbasi · 2018-09-01T07:48:39Z

@dain

I agree, but the problem is the Text reader code comes from Hadoop, and it doesn't have a "fail" mode, so we are trying to figure the best alternative to writing our own text reader

I understand all that. If we set a limit of, say ~100MB, and after we read a line, can we detect in our record readers/cursor that the line we just read is ~100MB and then we fail the query?

dain · 2018-09-01T20:33:43Z

@nezihyigitbasi

I understand all that. If we set a limit of, say ~100MB, and after we read a line, can we detect in our record readers/cursor that the line we just read is ~100MB and then we fail the query?

I'm pretty sure we get the final decoded data back instead of the raw text line back.

findepi · 2018-09-01T22:19:12Z

Did you mean changing the truncation limit to 100MB in Hadoop as well? Anyways I think we can enable this PR on presto side first. We can send patch to Hadoop with both options later.

@yingsu00 we're considering overriding Hadoop's default behavior with something safer, at the cost of a discrepancy. Promoting this change upstream removes the discrepancy.
And yes, PR on presto side first

as to the default value -- it's apparent there isn't a silver bullet here, until we update LineRecordReader to introduce an option to fail on too long lines.
As I understand you had users running queries over files with very long lines (or maybe files that weren't in the expected format), leading to node crashes via OOM. I still need to understand what your workflow now will be, once you introduce the limit. The queries will stop crashing the nodes via OOM, but how will you identify those bad queries / bad files?

yingsu00 · 2018-09-04T18:52:05Z

@findepi hi Piotr, before LineRecordReader is able to introduce the fail mode, we won't be able to tell the "bad" files or queries. These queries will now succeed.

findepi · 2018-09-04T21:04:42Z

@yingsu00 yes, I know. I suspect that LineRecordReader won't support the fail mode, unless we make it do that.

yingsu00 · 2018-09-05T05:51:13Z

We had a discussion with @electrum and we will create our own copy of the LineReader. This will be done in two seperate PRs on presto-hadoop-apache2 repo.

yingsu00 · 2018-09-07T06:36:45Z

hi all, I have updated the PR that fails the query (throw IOException when the line is too long) in PR prestodb/presto-hadoop-apache2#30

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

yingsu00 · 2018-09-14T05:50:09Z

If the line is over the limit, the following error will be thrown:
java.lang.RuntimeException: Too many bytes before newline: n

This is the fix for prestodb#11367 When reading large text files, in fairly rare cases Presto hive is requesting allocation of byte[] of size 2,147,483,643 to 2,147,483,647. This exceeds the VM limit which is Integer.MAX_VALUE-5. However apache hadoop library uses Integer.MAX_VALUE as the limit and allows such allocation. This results in the OutOfMemoryError but it's not caught by Presto Hive. As a result of our JVM configuration, this results in JVM restarts and kills all queries running on the cluster. This commit is to 1) fix the above bug in Hadoop and allow loud failure when the line is over the max limit; 2) make Presto depend on the snapshot version of the Hadoop library; 3) Set the default limit of text file line size 100MB in Presto.

electrum · 2018-09-14T23:00:51Z

pom.xml

@@ -367,7 +367,7 @@
            <dependency>
                <groupId>com.facebook.presto.hadoop</groupId>
                <artifactId>hadoop-apache2</artifactId>
-                <version>2.7.4-3</version>
+                <version>2.7.4-4-SNAPSHOT</version>


Update to 2.7.4-4

electrum

Update to the release version of hadoop-apache2 before merging

yingsu00 assigned dain Aug 28, 2018

yingsu00 requested review from electrum, martint and dain August 28, 2018 04:52

facebook-github-bot added the CLA Signed label Aug 28, 2018

findepi reviewed Aug 28, 2018

View reviewed changes

yingsu00 force-pushed the LineReaderMaxBytes branch from 52c5129 to e94eb91 Compare August 29, 2018 01:55

findepi reviewed Aug 29, 2018

View reviewed changes

yingsu00 force-pushed the LineReaderMaxBytes branch 2 times, most recently from 0e0630b to 7db27c0 Compare August 29, 2018 19:38

dain reviewed Aug 29, 2018

View reviewed changes

electrum reviewed Aug 29, 2018

View reviewed changes

yingsu00 force-pushed the LineReaderMaxBytes branch from 7db27c0 to b6c90b3 Compare August 29, 2018 23:09

findepi reviewed Aug 30, 2018

View reviewed changes

yingsu00 force-pushed the LineReaderMaxBytes branch from b6c90b3 to eb55287 Compare September 7, 2018 06:34

dain assigned electrum and unassigned dain Sep 10, 2018

electrum reviewed Sep 14, 2018

View reviewed changes

yingsu00 force-pushed the LineReaderMaxBytes branch from eb55287 to ec145dd Compare September 14, 2018 09:03

electrum reviewed Sep 14, 2018

View reviewed changes

electrum approved these changes Sep 14, 2018

View reviewed changes

yingsu00 merged commit 42ec5ca into prestodb:master Sep 14, 2018

		@@ -96,6 +96,8 @@

		private List<String> resourceConfigFiles = ImmutableList.of();

		private DataSize maxLineLength = new DataSize(100, DataSize.Unit.MEGABYTE);

Limit the max line size in LineRecordReader #11370

Limit the max line size in LineRecordReader #11370

Conversation

yingsu00 commented Aug 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nezihyigitbasi commented Aug 30, 2018

dain commented Aug 30, 2018

findepi commented Aug 30, 2018

dain commented Aug 30, 2018

yingsu00 commented Aug 31, 2018

findepi commented Aug 31, 2018

yingsu00 commented Aug 31, 2018

nezihyigitbasi commented Sep 1, 2018

dain commented Sep 1, 2018 • edited Loading

findepi commented Sep 1, 2018

yingsu00 commented Sep 4, 2018

findepi commented Sep 4, 2018

yingsu00 commented Sep 5, 2018

yingsu00 commented Sep 7, 2018

yingsu00 commented Sep 14, 2018

Choose a reason for hiding this comment

electrum left a comment

Choose a reason for hiding this comment

yingsu00 commented Aug 28, 2018 •

edited

Loading

dain commented Sep 1, 2018 •

edited

Loading