Bad performance during TestStep creation #2666

jkronegg · 2022-12-20T13:46:36Z

👓 What did you see?

On my Java project with about 400 Cucumber test scenarios and about 150 step definitions, the IntelliJ flame graph shows that creating the TestStep takes about 10% of the total time (that's about 1 second on my project). Most of the time is consumed in the TestAbortedExceptions.createIsTestAbortedExceptionPredicate() method. See the flamegraph.png file in the annexed ZIP file.

✅ What did you expect to see?

I expect the TestAbortedExceptions.createIsTestAbortedExceptionPredicate() Predicate<Throwable> creation to be lightweight, with no noticeable impact on the overall execution time.

I implemented some variants and did some benchmarking to determine what is the best alternative.

Note: some optimisations have been produced using ChatGPT (on 19.12.2022). I could have written the code by myself but wanted to compare what an IA could produce. The produced code is impressive and could have been produced by human developers. But as human developers, ChatGPT makes mistakes (e.g. the first produced code was not able to identify subclasses), so it needs some code review and test harness.

All variants have been tested using the following JUnit5 test (contains a two positive and one negative example):

@ParameterizedTest
@MethodSource("methods")
void testPredicate(String name, Predicate<Throwable> myMethod) {
    assertFalse(myMethod.test(new RuntimeException()));
    assertTrue(myMethod.test(new SkipException()));
    assertTrue(myMethod.test(new SkipExceptionSubClass()));
}

Even if the implemented variants pass this tests, there are slight differences with the classloading behavior:

classloader: the original Cucumber version takes the current thread classloader (when existing). This way, one Thread could have a classloader with JUnit4 AssumptionViolatedException and another Thread could have a classloader with TestNG SkipException. Such cases are probably rare, but the original code support it. Variants which use a static classloader are not able to support this use-case.
caching classes: the original Cucumber version loads the aborted exceptions classes each time the method is called. Thus, Cucumber is able to manage dynamic classloading (e.g. at first the classloader doesn't know a class, but it loads the class afterwards). Again, this dynamic classloader behavior is probably rare, but the original code support it.

As these use-cases are very rare, we could agree on changing the original behavior if the performance improvement is significative (to be decided by Cucumber core team).

The table below summarizes the different variants:

Name	Description	classloader	cached classes	code readability	ops/s
createIsTestAbortedExceptionPredicate0	Original Cucumber 7.10.0 version	by thread	no	low	5'517
createIsTestAbortedExceptionPredicate1	ChatGPT (2nd trial) + human code review + minor human improvements	by thread	no	average	261'792'878
createIsTestAbortedExceptionPredicate6	ChatGPT (3rd trial) + human code review + IntelliJ refactoring + minor human refactoring to make it compile/behave like original	by thread	no	high	266'445'123
createIsTestAbortedExceptionPredicate5	Optimized by a human developer, with ThreadLocal cached Predicate	by thread	yes	average	260'493'066
createIsTestAbortedExceptionPredicate2	ChatGPT (3rd trial) + human code review + IntelliJ refactoring + minor human refactoring to make it compile	static	no	high	443'061'213
createIsTestAbortedExceptionPredicate3	ChatGPT (4rth trial with caching) + human code review + minor human refactoring to make it compile + minor human improvements	static	yes	low	443'764'493
createIsTestAbortedExceptionPredicate4	Optimized by a human developer	static	yes	average	443'207'453

Based on the results above, my recommendation would be to use createIsTestAbortedExceptionPredicate6 which keeps the same classloader behavior ("by thread"), without cached classes. This way, the original behavior is conserved, the code readability is improved and the performance is improved (about 48'000 times faster). We could have even better performance, but this would require to change the original behavior (e.g. createIsTestAbortedExceptionPredicate5 or createIsTestAbortedExceptionPredicate3), which may cause side effects on Cucumber users.

Below is the proposed method:

static Predicate<Throwable> createIsTestAbortedExceptionPredicate6() {
    ClassLoader defaultClassLoader = ClassLoaders.getDefaultClassLoader();
    return throwable -> Arrays.stream(TEST_ABORTED_EXCEPTIONS)
            .anyMatch(s -> {
                try {
                    Class<?> aClass = defaultClassLoader.loadClass(s);
                    return aClass.isInstance(throwable);
                } catch (Throwable t) {
                    rethrowIfUnrecoverable(t);
                    log.debug(t,
                            () -> String.format("Failed to load class %s: will not be supported for aborted executions.", s));
                }
                return false;
            });
}

I can do a PR with the solution preferred by the Cucumber core team.

📦 Which tool/library version are you using?

Cucumber 7.10.0

🔬 How could we reproduce it?

The ZIP file in annex contains the JMH benchmark and the implemented variants.
runner.zip

These files can be copied in a minimal Maven project with the following dependencies:

    <dependency>
        <groupId>io.cucumber</groupId>
        <artifactId>cucumber-java</artifactId>
        <version>${cucumber.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>io.cucumber</groupId>
        <artifactId>cucumber-junit-platform-engine</artifactId>
        <version>${cucumber.version}</version>
        <scope>test</scope>
        <exclusions>
            <exclusion><!-- conflicts with the version from junit-platform-suite -->
                <groupId>org.junit.platform</groupId>
                <artifactId>junit-platform-engine</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>io.cucumber</groupId>
        <artifactId>cucumber-picocontainer</artifactId>
        <version>${cucumber.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.openjdk.jmh</groupId>
        <artifactId>jmh-generator-annprocess</artifactId>
        <version>1.36</version>
        <scope>test</scope>
    </dependency>

The text was updated successfully, but these errors were encountered:

mpkorstanje · 2022-12-20T15:57:38Z

Very interesting. I would be happy to accept option 6.

That said, I'm going to have to find some broader consensus on the use ChatGTP. It has been known to reproduce it's training data more or less verbatim.

jkronegg · 2022-12-20T21:57:12Z

@mpkorstanje : I'll do a PR with option 6.

Regarding ChatGPT, you may be interested by the transcript of option 3 (I finally did not include the last solution proposed by ChatGPT because it was becoming unreadable code) :
ChatGPT optimizes TestAbortedExceptions.pdf (my inputs are in bold text and the ChatGPT output are in unformatted text).
I would certainly not use ChatGPT output blindly, but I must admit that the quality is pretty good. And human developers also reproduce data structures and algorithms that are familiar to them 😉

Changed implementation to use variant 6 proposed in cucumber#2666 .

Refactored `TestAbortedExceptions` `Predicate` creation. Performance gain (`Predicate` creation is about 50'000 times faster). Fixes #2666

jkronegg · 2023-01-13T15:30:19Z

On my project with ~400 scenarios, the cucumber tests execution duration is on average 20.5 seconds instead of 24.3 seconds (that's a 16% improvement).

@fslev : could you please test cucumber 7.11.0 (which includes this patch) on your suite of ~1000 scenarios in dry-run mode ?

jkronegg added a commit to jkronegg/cucumber-jvm that referenced this issue Dec 20, 2022

Improved performance

6f86f99

Changed implementation to use variant 6 proposed in cucumber#2666 .

jkronegg mentioned this issue Dec 20, 2022

Improved TestStep creation performance #2667

Merged

7 tasks

mpkorstanje closed this as completed in #2667 Dec 22, 2022

mpkorstanje pushed a commit that referenced this issue Dec 22, 2022

Improved TestStep creation performance (#2667)

f53abb6

Refactored `TestAbortedExceptions` `Predicate` creation. Performance gain (`Predicate` creation is about 50'000 times faster). Fixes #2666

jkronegg mentioned this issue May 17, 2023

Initialization of io.cucumber.core.runner.TestStep causes performance issues #2755

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad performance during TestStep creation #2666

Bad performance during TestStep creation #2666

jkronegg commented Dec 20, 2022 •

edited

Loading

mpkorstanje commented Dec 20, 2022

jkronegg commented Dec 20, 2022 •

edited

Loading

jkronegg commented Jan 13, 2023 •

edited

Loading

Bad performance during TestStep creation #2666

Bad performance during TestStep creation #2666

Comments

jkronegg commented Dec 20, 2022 • edited Loading

👓 What did you see?

✅ What did you expect to see?

📦 Which tool/library version are you using?

🔬 How could we reproduce it?

mpkorstanje commented Dec 20, 2022

jkronegg commented Dec 20, 2022 • edited Loading

jkronegg commented Jan 13, 2023 • edited Loading

jkronegg commented Dec 20, 2022 •

edited

Loading

jkronegg commented Dec 20, 2022 •

edited

Loading

jkronegg commented Jan 13, 2023 •

edited

Loading