Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check results for new CI, allow some benchmarks to fail #211

Merged
merged 29 commits into from
Sep 20, 2023

Conversation

qinsoon
Copy link
Member

@qinsoon qinsoon commented Mar 28, 2023

This PR adds a yaml file that records our expected results for DaCapo tests. It also adds a script to check the results for each CI run. We should generally expect a CI run to succeed (which means the results are expected) once this PR is merged.

However, as we may have random failures that we haven't noticed, it is possible a CI run still fails. In such a case, we should update the ci-expected-results.yml, create an Github issue, and update #246.

@qinsoon
Copy link
Member Author

qinsoon commented Aug 14, 2023

I am taking this approach to figure out the expected results:

  1. Set expected results to 'pass' for every benchmark and every plan. Run it.
  2. If any benchmark fails, set the expected results to 'fail'. Run again.
  3. If any benchmark that is expected to fail passes, or if any benchmark that is expected to pass fails, set the expected results to 'ignore', and we do not check results for ignored benchmarks. Run again.
  4. Repeat Step 3 until the results are stable.

@caizixian
Copy link
Member

I am taking this approach to figure out the expected results:

1. Set expected results to 'pass' for every benchmark and every plan. Run it.

2. If any benchmark fails, set the expected results to 'fail'. Run again.

3. If any benchmark that is expected to fail passes, or if any benchmark that is expected to pass fails, set the expected results to 'ignore', and we do not check results for ignored benchmarks. Run again.

4. Repeat Step 3 until the results are stable.

Yes, this sounds reasonable. It's importantly that CI doesn't just fail all the time so we develop a habit of ignoring it instead of noticing real regressions.

For each (intermittently) failed benchmark, we should have an issue for it and put it in the comment of the yaml. Once the issues are fixed, the yaml should be updated.

@k-sareen
Copy link
Collaborator

k-sareen commented Aug 16, 2023

So I guess if a PR fixes a bug then we would have to update the expected results manually?

@qinsoon
Copy link
Member Author

qinsoon commented Aug 16, 2023

So I guess if a PR fixes a bug then we would have to update the expected results manually?

Yes.

@qinsoon
Copy link
Member Author

qinsoon commented Aug 22, 2023

I am still testing this. There are a few issues I have seen. I did not mark those tests as 'ignore' yet, I feel those are possibly related with the CI environment rather than the code we test.

Script quit after one plan

release-kafka in https://github.com/mmtk/mmtk-openjdk/actions/runs/5886494948/job/15997308570?pr=211. The job quit during running the second plan.

kafka 2500 608 0.

The log for the first plan (semispace)

===== DaCapo evaluation-git-04132797 kafka starting =====
Trogdor is running the workload....
Starting 1000000 requests...

5%
10%
15%
20%
25%
30%
35%
40%
45%
50%[2023-08-18 01:06:02,073] ERROR Error while appending records to foo1-8 in dir /tmp/runbms-hmqvnk5o/scratch/kafka-logs (kafka.server.LogDirFailureChannel)
java.io.IOException: No space left on device
	at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
	at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
	at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
	at org.apache.kafka.common.record.MemoryRecords.writeFullyTo(MemoryRecords.java:92)
	at org.apache.kafka.common.record.FileRecords.append(FileRecords.java:188)
	at kafka.log.LogSegment.append(LogSegment.scala:158)
	at kafka.log.LocalLog.append(LocalLog.scala:436)
	at kafka.log.UnifiedLog.append(UnifiedLog.scala:929)
	at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:740)
	at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1167)
	at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1155)
	at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:947)
	at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
	at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
	at scala.collection.mutable.HashMap.map(HashMap.scala:35)
	at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:935)
	at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:593)
	at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:665)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
	at java.base/java.lang.Thread.run(Thread.java:829)
[2023-08-18 01:06:02,104] ERROR Shutdown broker because all log dirs in /tmp/runbms-hmqvnk5o/scratch/kafka-logs have failed (kafka.log.LogManager)

The log for the second plan

===== DaCapo evaluation-git-04132797 kafka starting =====
Trogdor is running the workload....
Starting 1000000 requests...

5%
10%
15%
20%
25%
30%
35%
40%
45%
50%[2023-08-18 01:06:14,999] ERROR Error while appending records to foo1-1 in dir /tmp/runbm

All plans failed

fastdebug-h2o in https://github.com/mmtk/mmtk-openjdk/actions/runs/5886494948/job/15997306311?pr=211

h2o 2500 340 0.......

No extra logging about the error, just something like this for all the plans.

Using scaled threading model. 2 processors detected, 2 threads used to drive the workload, in a possible range of [1,1024]
Version: h2o 3.38.0.3
Nominal stats: NEP: 2, NES: 3, NET: 9, NEW: 1, NMH: 102, NML: 1, NMR: 1212, NMS: 20, NMT: 100, NMU: 130, NOA: 142, NOL: 152, NOM: 16, NOS: 16

@caizixian
Copy link
Member

actions/runner-images#2840
https://github.com/marketplace/actions/maximize-build-disk-space

We can apply these workarounds.

This shouldn't be a problem once we switch to DaCapo Chopin RC2 (there's a kafka issue in DaCapo) @steveblackburn

@qinsoon
Copy link
Member Author

qinsoon commented Aug 22, 2023

It seems often that many jobs are skipped because of out of disk space (showing failure at the step of 'Extract OpenJDK'): https://github.com/mmtk/mmtk-openjdk/actions/runs/5934085396

@k-sareen
Copy link
Collaborator

k-sareen commented Sep 8, 2023

Should we try and get this merged? I think applying the workarounds mentioned by @caizixian should allow us to merge this.

@qinsoon
Copy link
Member Author

qinsoon commented Sep 8, 2023

Should we try and get this merged? I think applying the workarounds mentioned by @caizixian should allow us to merge this.

I added a step to free up space, as what Zixian posted. It did not solve the problem. The runs may still fail due to out of disk. The tests are too flaky at the moment.

@qinsoon
Copy link
Member Author

qinsoon commented Sep 8, 2023

Feel free to push any changes or attempts to the PR if you are aware of a workaround.

@qinsoon
Copy link
Member Author

qinsoon commented Sep 14, 2023

Should we try and get this merged? I think applying the workarounds mentioned by @caizixian should allow us to merge this.

I added a step to free up space, as what Zixian posted. It did not solve the problem. The runs may still fail due to out of disk. The tests are too flaky at the moment.

The out-of-disk error should be fixed by #241.

@qinsoon
Copy link
Member Author

qinsoon commented Sep 15, 2023

It seems like even semispace could randomly fail. I feel if we ignore any benchmark that may randomly fail, we may end up ignoring a lot more than what we want.

@k-sareen
Copy link
Collaborator

k-sareen commented Sep 15, 2023

Hm. SemiSpace failing is odd. The CI uploads artifacts -- I'll take a look

EDIT: That's annoying. It just says Exception in thread "Thread-1" and then fails the digest verification. Perhaps the heap size is too small for SS, but I'm not sure.

@k-sareen
Copy link
Collaborator

@qinsoon You will have to rename it to fail_on_oom. "-" is reserved in modifier name

@caizixian
Copy link
Member

caizixian commented Sep 19, 2023

@qinsoon you can run DaCapo benchmarks with -preserve, and then use CopyFile to get the stdout.log and stderr.log out of scratch. Might be useful in debugging. https://anupli.github.io/running-ng/commands/runbms.html#copyfile

@qinsoon
Copy link
Member Author

qinsoon commented Sep 19, 2023

@qinsoon you can run DaCapo benchmarks with -preserve, and then use CopyFile to get the stdout.log and stderr.log out of scratch. Might be useful in debugging. https://anupli.github.io/running-ng/commands/runbms.html#copyfile

Sure. I haven't used plugins before. I am not sure if I use it in the right way. 37593c6

@caizixian
Copy link
Member

caizixian commented Sep 19, 2023

@qinsoon -preserve needs to be a ProgramArg. It's a DaCapo argument not runbms. The collected files will be saved in the usual log folder (not sure whether you are saving the entire log folder at the moment).

@qinsoon qinsoon marked this pull request as ready for review September 19, 2023 22:05
Copy link
Member

@caizixian caizixian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@caizixian caizixian merged commit 0e04c17 into mmtk:master Sep 20, 2023
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants