Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmarking][Java] new java.lang.OutOfMemoryError in Java benchmarks after local build cache change #40775

Closed
austin3dickey opened this issue Mar 25, 2024 · 5 comments

Comments

@austin3dickey
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

As part of the Arrow benchmarking suite, we have been running the Java microbenchmarks with Archery on every Arrow commit. They are run on a bare-metal machine with the characteristics listed in the "hardware" section of this Conbench page.

Note that the benchmarks run this Arrow CI script to build Java Arrow, with the environment variables in this file used for building and running benchmarks.

Starting with this PR's commit:

the Java benchmarks have started to time out after the default timeout of 6 hours. They used to take less than an hour. Before timing out, the following traceback is printed:

Exception in thread "CommonsExecStreamPumper-pool-9-thread-2" java.lang.OutOfMemoryError: Java heap space
	at java.lang.StringCoding.decode(StringCoding.java:215)
	at java.lang.String.<init>(String.java:463)
	at java.lang.String.<init>(String.java:515)
	at com.gradle.d.a.a.a(SourceFile:83)
	at com.gradle.d.a.a.flush(SourceFile:78)
	at com.gradle.d.a.a.write(SourceFile:72)
	at java.io.PrintStream.write(PrintStream.java:480)
	at com.gradle.d.a.b.write(SourceFile:203)
	at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:112)
	at java.lang.Thread.run(Thread.java:750)

I assume that the Java memory configs need to be changed, but I don't have any experience with Java. This could be an easy fix by changing the environment variable file linked above, and using @ursabot please benchmark lang=Java to see if the change fixed anything.

Note that the microbenchmark results will not be tracked until this is fixed.

Component(s)

Benchmarking, Java

@austin3dickey
Copy link
Contributor Author

Note: I consider it a (different) bug that the bot posted this success message on the PR in question: #39708 (comment)

Since no Java benchmark results were ever posted to Conbench, the bot saw that all posted results were successful, and did not comment on the lack of Java results.

@danepitkin
Copy link
Member

danepitkin commented Mar 27, 2024

I am hitting OOM running locally. 100s of GB of this message are being printed:

LOGGER.debug("Writing buffer with size: {}", length);

...
15:51:50.577 [org.apache.arrow.vector.ipc.WriteChannelBenchmark.alignBenchmark-jmh-worker-1] DEBUG org.apache.arrow.vector.ipc.WriteChannel -- Writing buffer with size: 7
15:51:50.577 [org.apache.arrow.vector.ipc.WriteChannelBenchmark.alignBenchmark-jmh-worker-1] DEBUG org.apache.arrow.vector.ipc.WriteChannel -- Writing buffer with size: 1
...

@danepitkin
Copy link
Member

danepitkin commented Mar 27, 2024

I think some of the performance benchmarks might be configured incorrectly. A few Java benchmarks use Level.Invocation when setting up and tearing down, including WriteChannelBenchmark:

@Setup(Level.Invocation)
public void prepareInvoke() throws IOException {
baos = new ByteArrayOutputStream(8);
writeChannel = new WriteChannel(Channels.newChannel(baos));
writeChannel.write(new byte[8 - alignSize]);
}
@TearDown(Level.Invocation)
public void tearDownInvoke() throws IOException {
writeChannel.close();
baos.close();
}

According to the javadocs, Level.Invocation has dragons:

https://javadoc.io/static/org.openjdk.jmh/jmh-core/1.1.1/org/openjdk/jmh/annotations/Level.html#Invocation

Level.Invocation is only for benchmark methods that take >=1ms. Our benchmark is zeroing out an unaligned 8 byte buffer.. Running the benchmark shows billions of these buffer alignment operations in the benchmark, which makes me think there is some race condition happening on setup/teardown causing a near infinite loop.

@danepitkin
Copy link
Member

Nvm, I think this is okay. It's configured to run for 5 iterations 10 seconds each. We are just printing out billions of debug lines so just need to disable DEBUG logging.

@kou kou changed the title [Benchmarking] [Java] new java.lang.OutOfMemoryError in Java benchmarks after local build cache change [Benchmarking][Java] new java.lang.OutOfMemoryError in Java benchmarks after local build cache change Mar 28, 2024
@danepitkin danepitkin added this to the 16.0.0 milestone Apr 4, 2024
kou pushed a commit that referenced this issue Apr 5, 2024
### Rationale for this change

The java build script has been recently updated and it is affecting conbench, which is now seeing timeouts when building java. The logs are producing 100s of GB of data due to an unnecessary debug log msg.

### What changes are included in this PR?

* Delete log message on write to memory

### Are these changes tested?

Yes, via conbench

### Are there any user-facing changes?

No
* GitHub Issue: #40775

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou
Copy link
Member

kou commented Apr 5, 2024

Issue resolved by pull request 40786
#40786

@kou kou closed this as completed Apr 5, 2024
tolleybot pushed a commit to tmct/arrow that referenced this issue May 2, 2024
### Rationale for this change

The java build script has been recently updated and it is affecting conbench, which is now seeing timeouts when building java. The logs are producing 100s of GB of data due to an unnecessary debug log msg.

### What changes are included in this PR?

* Delete log message on write to memory

### Are these changes tested?

Yes, via conbench

### Are there any user-facing changes?

No
* GitHub Issue: apache#40775

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
### Rationale for this change

The java build script has been recently updated and it is affecting conbench, which is now seeing timeouts when building java. The logs are producing 100s of GB of data due to an unnecessary debug log msg.

### What changes are included in this PR?

* Delete log message on write to memory

### Are these changes tested?

Yes, via conbench

### Are there any user-facing changes?

No
* GitHub Issue: apache#40775

Authored-by: Dane Pitkin <dane@voltrondata.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants