Check results for new CI, allow some benchmarks to fail #211

qinsoon · 2023-03-28T06:36:14Z

This PR adds a yaml file that records our expected results for DaCapo tests. It also adds a script to check the results for each CI run. We should generally expect a CI run to succeed (which means the results are expected) once this PR is merged.

However, as we may have random failures that we haven't noticed, it is possible a CI run still fails. In such a case, we should update the ci-expected-results.yml, create an Github issue, and update #246.

script to check it.

qinsoon · 2023-08-14T06:24:12Z

I am taking this approach to figure out the expected results:

Set expected results to 'pass' for every benchmark and every plan. Run it.
If any benchmark fails, set the expected results to 'fail'. Run again.
If any benchmark that is expected to fail passes, or if any benchmark that is expected to pass fails, set the expected results to 'ignore', and we do not check results for ignored benchmarks. Run again.
Repeat Step 3 until the results are stable.

caizixian · 2023-08-15T05:09:23Z

I am taking this approach to figure out the expected results:

1. Set expected results to 'pass' for every benchmark and every plan. Run it.

2. If any benchmark fails, set the expected results to 'fail'. Run again.

3. If any benchmark that is expected to fail passes, or if any benchmark that is expected to pass fails, set the expected results to 'ignore', and we do not check results for ignored benchmarks. Run again.

4. Repeat Step 3 until the results are stable.

Yes, this sounds reasonable. It's importantly that CI doesn't just fail all the time so we develop a habit of ignoring it instead of noticing real regressions.

For each (intermittently) failed benchmark, we should have an issue for it and put it in the comment of the yaml. Once the issues are fixed, the yaml should be updated.

k-sareen · 2023-08-16T07:31:27Z

So I guess if a PR fixes a bug then we would have to update the expected results manually?

qinsoon · 2023-08-16T07:49:38Z

So I guess if a PR fixes a bug then we would have to update the expected results manually?

Yes.

qinsoon · 2023-08-22T03:33:14Z

I am still testing this. There are a few issues I have seen. I did not mark those tests as 'ignore' yet, I feel those are possibly related with the CI environment rather than the code we test.

Script quit after one plan

release-kafka in https://github.com/mmtk/mmtk-openjdk/actions/runs/5886494948/job/15997308570?pr=211. The job quit during running the second plan.

kafka 2500 608 0.

The log for the first plan (semispace)

===== DaCapo evaluation-git-04132797 kafka starting =====
Trogdor is running the workload....
Starting 1000000 requests...

5%
10%
15%
20%
25%
30%
35%
40%
45%
50%[2023-08-18 01:06:02,073] ERROR Error while appending records to foo1-8 in dir /tmp/runbms-hmqvnk5o/scratch/kafka-logs (kafka.server.LogDirFailureChannel)
java.io.IOException: No space left on device
	at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
	at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
	at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
	at org.apache.kafka.common.record.MemoryRecords.writeFullyTo(MemoryRecords.java:92)
	at org.apache.kafka.common.record.FileRecords.append(FileRecords.java:188)
	at kafka.log.LogSegment.append(LogSegment.scala:158)
	at kafka.log.LocalLog.append(LocalLog.scala:436)
	at kafka.log.UnifiedLog.append(UnifiedLog.scala:929)
	at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:740)
	at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1167)
	at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1155)
	at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:947)
	at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
	at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
	at scala.collection.mutable.HashMap.map(HashMap.scala:35)
	at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:935)
	at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:593)
	at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:665)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
	at java.base/java.lang.Thread.run(Thread.java:829)
[2023-08-18 01:06:02,104] ERROR Shutdown broker because all log dirs in /tmp/runbms-hmqvnk5o/scratch/kafka-logs have failed (kafka.log.LogManager)

The log for the second plan

===== DaCapo evaluation-git-04132797 kafka starting =====
Trogdor is running the workload....
Starting 1000000 requests...

5%
10%
15%
20%
25%
30%
35%
40%
45%
50%[2023-08-18 01:06:14,999] ERROR Error while appending records to foo1-1 in dir /tmp/runbm

All plans failed

fastdebug-h2o in https://github.com/mmtk/mmtk-openjdk/actions/runs/5886494948/job/15997306311?pr=211

h2o 2500 340 0.......

No extra logging about the error, just something like this for all the plans.

Using scaled threading model. 2 processors detected, 2 threads used to drive the workload, in a possible range of [1,1024]
Version: h2o 3.38.0.3
Nominal stats: NEP: 2, NES: 3, NET: 9, NEW: 1, NMH: 102, NML: 1, NMR: 1212, NMS: 20, NMT: 100, NMU: 130, NOA: 142, NOL: 152, NOM: 16, NOS: 16

caizixian · 2023-08-22T03:36:46Z

actions/runner-images#2840
https://github.com/marketplace/actions/maximize-build-disk-space

We can apply these workarounds.

This shouldn't be a problem once we switch to DaCapo Chopin RC2 (there's a kafka issue in DaCapo) @steveblackburn

qinsoon · 2023-08-22T23:27:51Z

It seems often that many jobs are skipped because of out of disk space (showing failure at the step of 'Extract OpenJDK'): https://github.com/mmtk/mmtk-openjdk/actions/runs/5934085396

k-sareen · 2023-09-08T00:22:19Z

Should we try and get this merged? I think applying the workarounds mentioned by @caizixian should allow us to merge this.

qinsoon · 2023-09-08T00:25:46Z

Should we try and get this merged? I think applying the workarounds mentioned by @caizixian should allow us to merge this.

I added a step to free up space, as what Zixian posted. It did not solve the problem. The runs may still fail due to out of disk. The tests are too flaky at the moment.

qinsoon · 2023-09-08T01:47:53Z

Feel free to push any changes or attempts to the PR if you are aware of a workaround.

qinsoon · 2023-09-14T03:15:15Z

Should we try and get this merged? I think applying the workarounds mentioned by @caizixian should allow us to merge this.

I added a step to free up space, as what Zixian posted. It did not solve the problem. The runs may still fail due to out of disk. The tests are too flaky at the moment.

The out-of-disk error should be fixed by #241.

qinsoon · 2023-09-15T05:22:37Z

It seems like even semispace could randomly fail. I feel if we ignore any benchmark that may randomly fail, we may end up ignoring a lot more than what we want.

k-sareen · 2023-09-15T05:36:26Z

Hm. SemiSpace failing is odd. The CI uploads artifacts -- I'll take a look

EDIT: That's annoying. It just says Exception in thread "Thread-1" and then fails the digest verification. Perhaps the heap size is too small for SS, but I'm not sure.

k-sareen · 2023-09-18T04:53:53Z

@qinsoon You will have to rename it to fail_on_oom. "-" is reserved in modifier name

caizixian · 2023-09-19T06:13:35Z

@qinsoon you can run DaCapo benchmarks with -preserve, and then use CopyFile to get the stdout.log and stderr.log out of scratch. Might be useful in debugging. https://anupli.github.io/running-ng/commands/runbms.html#copyfile

qinsoon · 2023-09-19T06:22:44Z

@qinsoon you can run DaCapo benchmarks with -preserve, and then use CopyFile to get the stdout.log and stderr.log out of scratch. Might be useful in debugging. https://anupli.github.io/running-ng/commands/runbms.html#copyfile

Sure. I haven't used plugins before. I am not sure if I use it in the right way. 37593c6

caizixian · 2023-09-19T06:23:00Z

@qinsoon -preserve needs to be a ProgramArg. It's a DaCapo argument not runbms. The collected files will be saved in the usual log folder (not sure whether you are saving the entire log folder at the moment).

caizixian

LGTM

qinsoon added 7 commits March 28, 2023 06:32

Add a yml file to store expected results for each benchmark, and use a

be337c6

script to check it.

Format yml properly

471c2ce

Group results with linux-x64. Update expected results from last run.

a6a38b3

Add note for some benchmark results

899a97e

Update check script to allow ignoring benchmarks

1ee1118

Merge branch 'master' into check-ci-results

366cfbb

Reset expected results: expect all benchmark to pass

9dba124

qinsoon added 2 commits August 15, 2023 01:44

Fix script

0e7eaad

Update expected results: 1

a5d0249

qinsoon added 3 commits August 15, 2023 05:39

Update expected results: 2

02a72ed

Merge branch 'master' into check-ci-results

72fdab3

Set release-xalan-GenCopy to ignore

86a4c55

qinsoon added 3 commits August 17, 2023 03:38

Ignore a few more benchmarks

7c33b32

Set a few MS results to ignore

94e4564

Merge branch 'master' into check-ci-results

102653a

Add a step to free up build space

384de82

Merge branch 'master' into check-ci-results

87cb24a

qinsoon added 2 commits September 14, 2023 05:34

Ignore marksweep for release-h2

9e9044e

Set fastdebug-h2 to ignore

d281ef9

Ignore semispace/xalan/release and genimmix/tomcat/release

fc01230

Make tests crash on OOM

32dbaab

qinsoon added 4 commits September 18, 2023 04:55

fail_on_oom

4930e2e

Fix pattern matching in the script

de8e299

Make comparison easier to read

8a74185

Ignore two more tests

4d51e7e

qinsoon mentioned this pull request Sep 19, 2023

Tracking CI correctness #246

Open

Print logs for failed runs

55acac2

Keep stdout/stderr in the logs

37593c6

qinsoon added 2 commits September 19, 2023 06:26

Pass -preserve to DaCapo

22e5083

Ignore markcompact for fastdebug-kafka

be9ee6a

qinsoon marked this pull request as ready for review September 19, 2023 22:05

qinsoon requested a review from caizixian September 19, 2023 22:05

caizixian approved these changes Sep 20, 2023

View reviewed changes

caizixian merged commit 0e04c17 into mmtk:master Sep 20, 2023
45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check results for new CI, allow some benchmarks to fail #211

Check results for new CI, allow some benchmarks to fail #211

qinsoon commented Mar 28, 2023 •

edited

Loading

qinsoon commented Aug 14, 2023 •

edited

Loading

caizixian commented Aug 15, 2023

k-sareen commented Aug 16, 2023 •

edited

Loading

qinsoon commented Aug 16, 2023

qinsoon commented Aug 22, 2023

caizixian commented Aug 22, 2023

qinsoon commented Aug 22, 2023

k-sareen commented Sep 8, 2023

qinsoon commented Sep 8, 2023

qinsoon commented Sep 8, 2023

qinsoon commented Sep 14, 2023

qinsoon commented Sep 15, 2023

k-sareen commented Sep 15, 2023 •

edited

Loading

k-sareen commented Sep 18, 2023

caizixian commented Sep 19, 2023 •

edited

Loading

qinsoon commented Sep 19, 2023

caizixian commented Sep 19, 2023 •

edited

Loading

caizixian left a comment

Check results for new CI, allow some benchmarks to fail #211

Check results for new CI, allow some benchmarks to fail #211

Conversation

qinsoon commented Mar 28, 2023 • edited Loading

qinsoon commented Aug 14, 2023 • edited Loading

caizixian commented Aug 15, 2023

k-sareen commented Aug 16, 2023 • edited Loading

qinsoon commented Aug 16, 2023

qinsoon commented Aug 22, 2023

Script quit after one plan

All plans failed

caizixian commented Aug 22, 2023

qinsoon commented Aug 22, 2023

k-sareen commented Sep 8, 2023

qinsoon commented Sep 8, 2023

qinsoon commented Sep 8, 2023

qinsoon commented Sep 14, 2023

qinsoon commented Sep 15, 2023

k-sareen commented Sep 15, 2023 • edited Loading

k-sareen commented Sep 18, 2023

caizixian commented Sep 19, 2023 • edited Loading

qinsoon commented Sep 19, 2023

caizixian commented Sep 19, 2023 • edited Loading

caizixian left a comment

Choose a reason for hiding this comment

qinsoon commented Mar 28, 2023 •

edited

Loading

qinsoon commented Aug 14, 2023 •

edited

Loading

k-sareen commented Aug 16, 2023 •

edited

Loading

k-sareen commented Sep 15, 2023 •

edited

Loading

caizixian commented Sep 19, 2023 •

edited

Loading

caizixian commented Sep 19, 2023 •

edited

Loading