Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance during TestStep creation #2666

Closed
jkronegg opened this issue Dec 20, 2022 · 3 comments · Fixed by #2667
Closed

Bad performance during TestStep creation #2666

jkronegg opened this issue Dec 20, 2022 · 3 comments · Fixed by #2667

Comments

@jkronegg
Copy link
Contributor

jkronegg commented Dec 20, 2022

👓 What did you see?

On my Java project with about 400 Cucumber test scenarios and about 150 step definitions, the IntelliJ flame graph shows that creating the TestStep takes about 10% of the total time (that's about 1 second on my project). Most of the time is consumed in the TestAbortedExceptions.createIsTestAbortedExceptionPredicate() method. See the flamegraph.png file in the annexed ZIP file.

✅ What did you expect to see?

I expect the TestAbortedExceptions.createIsTestAbortedExceptionPredicate() Predicate<Throwable> creation to be lightweight, with no noticeable impact on the overall execution time.

I implemented some variants and did some benchmarking to determine what is the best alternative.

Note: some optimisations have been produced using ChatGPT (on 19.12.2022). I could have written the code by myself but wanted to compare what an IA could produce. The produced code is impressive and could have been produced by human developers. But as human developers, ChatGPT makes mistakes (e.g. the first produced code was not able to identify subclasses), so it needs some code review and test harness.

All variants have been tested using the following JUnit5 test (contains a two positive and one negative example):

@ParameterizedTest
@MethodSource("methods")
void testPredicate(String name, Predicate<Throwable> myMethod) {
    assertFalse(myMethod.test(new RuntimeException()));
    assertTrue(myMethod.test(new SkipException()));
    assertTrue(myMethod.test(new SkipExceptionSubClass()));
}

Even if the implemented variants pass this tests, there are slight differences with the classloading behavior:

  • classloader: the original Cucumber version takes the current thread classloader (when existing). This way, one Thread could have a classloader with JUnit4 AssumptionViolatedException and another Thread could have a classloader with TestNG SkipException. Such cases are probably rare, but the original code support it. Variants which use a static classloader are not able to support this use-case.
  • caching classes: the original Cucumber version loads the aborted exceptions classes each time the method is called. Thus, Cucumber is able to manage dynamic classloading (e.g. at first the classloader doesn't know a class, but it loads the class afterwards). Again, this dynamic classloader behavior is probably rare, but the original code support it.

As these use-cases are very rare, we could agree on changing the original behavior if the performance improvement is significative (to be decided by Cucumber core team).

The table below summarizes the different variants:

Name Description classloader cached classes code readability ops/s
createIsTestAbortedExceptionPredicate0 Original Cucumber 7.10.0 version by thread no low 5'517
createIsTestAbortedExceptionPredicate1 ChatGPT (2nd trial) + human code review + minor human improvements by thread no average 261'792'878
createIsTestAbortedExceptionPredicate6 ChatGPT (3rd trial) + human code review + IntelliJ refactoring + minor human refactoring to make it compile/behave like original by thread no high 266'445'123
createIsTestAbortedExceptionPredicate5 Optimized by a human developer, with ThreadLocal cached Predicate by thread yes average 260'493'066
createIsTestAbortedExceptionPredicate2 ChatGPT (3rd trial) + human code review + IntelliJ refactoring + minor human refactoring to make it compile static no high 443'061'213
createIsTestAbortedExceptionPredicate3 ChatGPT (4rth trial with caching) + human code review + minor human refactoring to make it compile + minor human improvements static yes low 443'764'493
createIsTestAbortedExceptionPredicate4 Optimized by a human developer static yes average 443'207'453

Based on the results above, my recommendation would be to use createIsTestAbortedExceptionPredicate6 which keeps the same classloader behavior ("by thread"), without cached classes. This way, the original behavior is conserved, the code readability is improved and the performance is improved (about 48'000 times faster). We could have even better performance, but this would require to change the original behavior (e.g. createIsTestAbortedExceptionPredicate5 or createIsTestAbortedExceptionPredicate3), which may cause side effects on Cucumber users.

Below is the proposed method:

static Predicate<Throwable> createIsTestAbortedExceptionPredicate6() {
    ClassLoader defaultClassLoader = ClassLoaders.getDefaultClassLoader();
    return throwable -> Arrays.stream(TEST_ABORTED_EXCEPTIONS)
            .anyMatch(s -> {
                try {
                    Class<?> aClass = defaultClassLoader.loadClass(s);
                    return aClass.isInstance(throwable);
                } catch (Throwable t) {
                    rethrowIfUnrecoverable(t);
                    log.debug(t,
                            () -> String.format("Failed to load class %s: will not be supported for aborted executions.", s));
                }
                return false;
            });
}

I can do a PR with the solution preferred by the Cucumber core team.

📦 Which tool/library version are you using?

Cucumber 7.10.0

🔬 How could we reproduce it?

The ZIP file in annex contains the JMH benchmark and the implemented variants.
runner.zip

These files can be copied in a minimal Maven project with the following dependencies:

    <dependency>
        <groupId>io.cucumber</groupId>
        <artifactId>cucumber-java</artifactId>
        <version>${cucumber.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>io.cucumber</groupId>
        <artifactId>cucumber-junit-platform-engine</artifactId>
        <version>${cucumber.version}</version>
        <scope>test</scope>
        <exclusions>
            <exclusion><!-- conflicts with the version from junit-platform-suite -->
                <groupId>org.junit.platform</groupId>
                <artifactId>junit-platform-engine</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>io.cucumber</groupId>
        <artifactId>cucumber-picocontainer</artifactId>
        <version>${cucumber.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.openjdk.jmh</groupId>
        <artifactId>jmh-generator-annprocess</artifactId>
        <version>1.36</version>
        <scope>test</scope>
    </dependency>
@mpkorstanje
Copy link
Contributor

Very interesting. I would be happy to accept option 6.

That said, I'm going to have to find some broader consensus on the use ChatGTP. It has been known to reproduce it's training data more or less verbatim.

@jkronegg
Copy link
Contributor Author

jkronegg commented Dec 20, 2022

@mpkorstanje : I'll do a PR with option 6.

Regarding ChatGPT, you may be interested by the transcript of option 3 (I finally did not include the last solution proposed by ChatGPT because it was becoming unreadable code) :
ChatGPT optimizes TestAbortedExceptions.pdf (my inputs are in bold text and the ChatGPT output are in unformatted text).
I would certainly not use ChatGPT output blindly, but I must admit that the quality is pretty good. And human developers also reproduce data structures and algorithms that are familiar to them 😉

jkronegg added a commit to jkronegg/cucumber-jvm that referenced this issue Dec 20, 2022
Changed implementation to use variant 6 proposed in cucumber#2666 .
mpkorstanje pushed a commit that referenced this issue Dec 22, 2022
Refactored `TestAbortedExceptions` `Predicate` creation.

Performance gain (`Predicate` creation is about 50'000 times faster).

Fixes #2666
@jkronegg
Copy link
Contributor Author

jkronegg commented Jan 13, 2023

On my project with ~400 scenarios, the cucumber tests execution duration is on average 20.5 seconds instead of 24.3 seconds (that's a 16% improvement).

@fslev : could you please test cucumber 7.11.0 (which includes this patch) on your suite of ~1000 scenarios in dry-run mode ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants