Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18383. Codecs with @DoNotPool annotation are not closed causing memory leak #4585

Merged
merged 10 commits into from
Aug 12, 2022

Conversation

kevins-29
Copy link
Contributor

Description of PR

Explicitly call end() when returning Compressor or Decompressor implementations with DoNotPool annotation to the CodecPool.

How was this patch tested?

I created the following project to demo the leak. You can run the demo with

./gradlew run

and then monitor the memory usage using

while true; do echo \"$(date +%Y-%m-%d' '%H:%M:%S)\",$(pmap -x <PID> | grep "total kB" | awk '{print $4}'); sleep 10; done;

Results - Before Patch

"2022-07-18 03:21:49",1113060
"2022-07-18 03:22:00",1126184
"2022-07-18 03:22:10",1126248
"2022-07-18 03:22:20",1126248
"2022-07-18 03:22:30",1130204
"2022-07-18 03:22:40",1130216
"2022-07-18 03:22:50",1130244
"2022-07-18 03:23:00",1130776
"2022-07-18 03:23:10",1130776
"2022-07-18 03:23:20",1130776
"2022-07-18 03:23:30",1130776
"2022-07-18 03:23:40",1130888
"2022-07-18 03:23:50",1130888
"2022-07-18 03:24:00",1130888
"2022-07-18 03:24:10",1130928
"2022-07-18 03:24:20",1130928
"2022-07-18 03:24:30",1130928
"2022-07-18 03:24:40",1131204
"2022-07-18 03:24:50",1131204
"2022-07-18 03:25:00",1131204
"2022-07-18 03:25:10",1131204
"2022-07-18 03:25:20",1139044
"2022-07-18 03:25:30",1140900
"2022-07-18 03:25:40",1140900
"2022-07-18 03:25:50",1140900
"2022-07-18 03:26:00",1140900
"2022-07-18 03:26:10",1141164
"2022-07-18 03:26:20",1141164
"2022-07-18 03:26:30",1141164
"2022-07-18 03:26:40",1141164
"2022-07-18 03:26:50",1141164
"2022-07-18 03:27:00",1141164
"2022-07-18 03:27:10",1141164

Results - After Patch

"2022-07-18 03:34:36",1098112
"2022-07-18 03:34:46",1098112
"2022-07-18 03:34:56",1098204
"2022-07-18 03:35:06",1098152
"2022-07-18 03:35:16",1098152
"2022-07-18 03:35:26",1098172
"2022-07-18 03:35:36",1098172
"2022-07-18 03:35:46",1098172
"2022-07-18 03:35:57",1098172
"2022-07-18 03:36:07",1098268
"2022-07-18 03:36:17",1098268
"2022-07-18 03:36:27",1098268
"2022-07-18 03:36:37",1098292
"2022-07-18 03:36:47",1098292
"2022-07-18 03:36:57",1098292
"2022-07-18 03:37:07",1098320
"2022-07-18 03:37:17",1098320
"2022-07-18 03:37:27",1098320
"2022-07-18 03:37:37",1098320
"2022-07-18 03:37:47",1098320
"2022-07-18 03:37:57",1098340
"2022-07-18 03:38:07",1098340
"2022-07-18 03:38:17",1098340

@dbtsai
Copy link
Member

dbtsai commented Jul 19, 2022

cc @sunchao

@sunchao
Copy link
Member

sunchao commented Jul 19, 2022

Thanks @kevins-29 , the fix looks good to me. However, is this addressing the issue mentioned in HADOOP-12007? my understanding is that the issue there is related to CompressorStream overrides close and doesn't return the compressor to the pool as result.

gzipCodec.createOutputStream(new ByteArrayOutputStream(), compressor)) {
outputStream.write(1);
fail("Compressor from Codec with @DoNotPool should not be useable after returning to CodecPool");
} catch (NullPointerException exception) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPE is the best we can do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I couldn't find another way to test that the underlying Compressor/Decompress has been closed. There is finished but that is set by reset() and has different semantics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a check in the place where we would encounter the null and trigger a more friendly exception from there?
Something like an already closed exception?

@kevins-29
Copy link
Contributor Author

Thanks @kevins-29 , the fix looks good to me. However, is this addressing the issue mentioned in HADOOP-12007? my understanding is that the issue there is related to CompressorStream overrides close and doesn't return the compressor to the pool as result.

@sunchao Should I create a new Jira ticket for this?

@sunchao
Copy link
Member

sunchao commented Jul 20, 2022

@sunchao Should I create a new Jira ticket for this?

@kevins-29 , yes please create a new JIRA for this issue, and change the PR title afterwards. Thanks.

@kevins-29 kevins-29 changed the title HADOOP-12007. GzipCodec native CodecPool leaks memory HADOOP-18383. Codecs with @DoNotPool annotation are not closed causing memory leak Aug 1, 2022
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 39s Maven dependency ordering for branch
+1 💚 mvninstall 27m 16s trunk passed
+1 💚 compile 23m 21s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 20m 45s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 18s trunk passed
+1 💚 mvnsite 19m 8s trunk passed
+1 💚 javadoc 8m 17s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 7m 23s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 39m 8s trunk passed
+1 💚 shadedclient 53m 1s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 54s Maven dependency ordering for patch
+1 💚 mvninstall 24m 28s the patch passed
+1 💚 compile 22m 43s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 22m 43s the patch passed
+1 💚 compile 20m 50s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 20m 50s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 14s /results-checkstyle-root.txt root: The patch generated 4 new + 19 unchanged - 0 fixed = 23 total (was 19)
+1 💚 mvnsite 18m 54s the patch passed
+1 💚 javadoc 8m 4s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 7m 27s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 39m 31s the patch passed
+1 💚 shadedclient 52m 37s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 792m 29s root in the patch passed.
+1 💚 asflicense 2m 44s The patch does not generate ASF License warnings.
1154m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4585/7/artifact/out/Dockerfile
GITHUB PR #4585
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 53730c54daf8 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 622e5fd
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4585/7/testReport/
Max. process+thread count 3058 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-common-project . U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4585/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Aug 2, 2022
@apache apache deleted a comment from hadoop-yetus Aug 2, 2022
@apache apache deleted a comment from hadoop-yetus Aug 2, 2022
@apache apache deleted a comment from hadoop-yetus Aug 2, 2022
@apache apache deleted a comment from hadoop-yetus Aug 2, 2022
@apache apache deleted a comment from hadoop-yetus Aug 2, 2022
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 44s Maven dependency ordering for branch
+1 💚 mvninstall 25m 25s trunk passed
+1 💚 compile 23m 20s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 20m 53s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 23s trunk passed
+1 💚 mvnsite 19m 18s trunk passed
+1 💚 javadoc 8m 17s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 7m 23s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 38m 51s trunk passed
+1 💚 shadedclient 52m 23s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 52s Maven dependency ordering for patch
+1 💚 mvninstall 24m 36s the patch passed
+1 💚 compile 25m 36s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 25m 36s the patch passed
+1 💚 compile 20m 56s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 20m 56s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 17s /results-checkstyle-root.txt root: The patch generated 1 new + 19 unchanged - 0 fixed = 20 total (was 19)
+1 💚 mvnsite 18m 46s the patch passed
+1 💚 javadoc 8m 6s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 7m 25s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 39m 39s the patch passed
+1 💚 shadedclient 52m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 793m 33s /patch-unit-root.txt root in the patch passed.
+1 💚 asflicense 2m 37s The patch does not generate ASF License warnings.
1156m 34s
Reason Tests
Failed junit tests hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4585/8/artifact/out/Dockerfile
GITHUB PR #4585
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux 7fa250d6d7a1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ca8dfc3
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4585/8/testReport/
Max. process+thread count 2876 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-common-project . U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4585/8/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. cc @viirya too since he authored some of the Gzip code.

@viirya
Copy link
Member

viirya commented Aug 12, 2022

lgtm, but why there are many failures in CI?

@sunchao
Copy link
Member

sunchao commented Aug 12, 2022

There's one unit test failure:

hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy

which is unrelated.

@sunchao sunchao merged commit b737869 into apache:trunk Aug 12, 2022
@sunchao
Copy link
Member

sunchao commented Aug 12, 2022

Merged to trunk, thanks @kevins-29 !

@sunchao
Copy link
Member

sunchao commented Aug 12, 2022

@kevins-29 could you open a PR targeting branch-3.3 too? I tried to backport it there but some conflicts happen due to BuiltInGzipCompressor doesn't exist there.

@kevins-29
Copy link
Contributor Author

@sunchao will do.

@kevins-29
Copy link
Contributor Author

@sunchao #4739

HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants