Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idle timeout occured sometimes on HTTP/2 client with InputStreamResponseListener #8558

Closed
clardeur opened this issue Sep 8, 2022 · 8 comments · Fixed by #8585
Closed

Idle timeout occured sometimes on HTTP/2 client with InputStreamResponseListener #8558

clardeur opened this issue Sep 8, 2022 · 8 comments · Fixed by #8585
Assignees
Labels
Bug For general bugs on Jetty side Outstanding Bug Report! The bug report was outstanding!

Comments

@clardeur
Copy link

clardeur commented Sep 8, 2022

Jetty version(s)
11.0.8

Java version/vendor
openjdk version "11.0.16" 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu118.04)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu118.04, mixed mode, sharing)

OS type/version
Linux 5.4.0-125-generic 141~18.04.1-Ubuntu x86_64 x86_64 x86_64 GNU/Linux

Description
We have switched in production the response listener of the HTTP/2 client, before we used the FutureResponseListener and now the InputStreamResponseListener, since this change we are observing regularly failures with timeouts and it happens randomly.

This is very similar in symptoms to the issue #7192

How to reproduce?
The problem is difficult to reproduce due to its randomness. As described in the comment of the similar issue (#7192), when we increase the log level to DEBUG, the problem is even less reproducible.

I still managed to reproduce the problem thanks to a test in the jetty project that triggers the timeout almost every time.

Test case - Code

public class HttpClientTransportOverHTTP2Test extends AbstractTest
{

    private static final Logger LOG = LoggerFactory.getLogger(HttpClientTransportOverHTTP2Test.class);

    @Test
    public void testInputStreamResponseListener() throws Exception
    {
        var bytes = 100_000;
        start(new ServerSessionListener.Adapter()
        {
            @Override
            public Stream.Listener onNewStream(Stream stream, HeadersFrame frame)
            {
                int streamId = stream.getId();
                MetaData.Response response = new MetaData.Response(HttpVersion.HTTP_2, HttpStatus.OK_200, HttpFields.EMPTY);
                HeadersFrame responseFrame = new HeadersFrame(streamId, response, null, false);
                Callback.Completable callback = new Callback.Completable();
                stream.headers(responseFrame, callback);
                callback.thenRun(() -> stream.data(new DataFrame(streamId, ByteBuffer.wrap(new byte[bytes]), true), Callback.NOOP));
                return null;
            }
        });

        var nbRequests = 100_000;
        var progress = new AtomicInteger(0);
        IntStream.range(0, nbRequests).forEach(i -> {
            try {
                if (progress.incrementAndGet() % 1000 == 0) {
                    LOG.info("progress {}/{}", progress.get(), nbRequests);
                }
                InputStreamResponseListener listener = new InputStreamResponseListener();
                client.newRequest("localhost", connector.getLocalPort()).headers(httpFields -> httpFields.put("X-Request-Id", Integer.toString(i))).send(listener);
                Response response = listener.get(60, TimeUnit.SECONDS);
                assertEquals(HttpStatus.OK_200, response.getStatus());
                assertEquals(bytes, listener.getInputStream().readAllBytes().length);
            } catch (Exception e) {
                LOG.error("request {} failed: {}", i, e.getMessage());
                fail(e);
            }
        });
    }

}

Test case - Failure Stacktrace

org.opentest4j.AssertionFailedError
	at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:47)
	at org.junit.jupiter.api.Assertions.fail(Assertions.java:157)
	at org.eclipse.jetty.http2.http.client.transport/org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2Test.lambda$testInputStreamResponseListener$1(HttpClientTransportOverHTTP2Test.java:77)
	at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
	at java.base/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:593)
	at org.eclipse.jetty.http2.http.client.transport/org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2Test.testInputStreamResponseListener(HttpClientTransportOverHTTP2Test.java:65)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86)
	at org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86)
	at org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:53)
	at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:71)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
	at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
Caused by: java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 5000 ms
	at org.eclipse.jetty.client/org.eclipse.jetty.client.util.InputStreamResponseListener$Input.toIOException(InputStreamResponseListener.java:342)
	at org.eclipse.jetty.client/org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:310)
	at java.base/java.io.InputStream.readNBytes(InputStream.java:409)
	at java.base/java.io.InputStream.readAllBytes(InputStream.java:346)
	at org.eclipse.jetty.http2.http.client.transport/org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2Test.lambda$testInputStreamResponseListener$1(HttpClientTransportOverHTTP2Test.java:74)
	... 72 more
Caused by: java.util.concurrent.TimeoutException: Idle timeout expired: 5000 ms
	at org.eclipse.jetty.http2.http.client.transport/org.eclipse.jetty.http2.client.http.HTTPSessionListenerPromise.onIdleTimeout(HTTPSessionListenerPromise.java:103)
	at org.eclipse.jetty.http2.common/org.eclipse.jetty.http2.HTTP2Session.notifyIdleTimeout(HTTP2Session.java:1165)
	at org.eclipse.jetty.http2.common/org.eclipse.jetty.http2.HTTP2Session$StreamsState.onIdleTimeout(HTTP2Session.java:1916)
	at org.eclipse.jetty.http2.common/org.eclipse.jetty.http2.HTTP2Session.onIdleTimeout(HTTP2Session.java:1006)
	at org.eclipse.jetty.http2.common/org.eclipse.jetty.http2.HTTP2Connection.onIdleExpired(HTTP2Connection.java:180)
	at org.eclipse.jetty.io/org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407)
	at org.eclipse.jetty.io/org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
	at org.eclipse.jetty.io/org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:108)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)

Test case - Output

Full output

Interesting snippets

The failure happened after that the ContentNotifier will be marked as inactive and stalled in this code path :

            // Process only if there is demand.
            try (AutoLock l = lock.lock())
            {
                if (!resume && demand() <= 0)
                {
                    if (LOG.isDebugEnabled())
                        LOG.debug("Stalling processing, content available but no demand");
                    active = false;
                    stalled = true;
                    return;
                }
            }

After that the content notifier does not resume the processing and therefore does not notify the InputStreamResponseListener, hence the final timeout.

2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-27: [null][ContentNotifier@405c852b] Resuming(false) processing(true) of content [stalled=false]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-27: [ContentNotifier@405c852b] Stalling processing, content available but no demand
2022-09-08 11:33:03.461:INFO :oejc.HttpReceiver:client-29: [HttpReceiverOverHTTP2@52c3b6de(rsp=TRANSIENT,failure=null)] Response demand=1/1, resume=false [stalled=false]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@503c7006[DataFrame@2960b028#8493{length:16375,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@54b2c4c6[DataFrame@79facf75#8493{length:9,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@4af53c0c[DataFrame@6638348a#8493{length:16366,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@246e487f[DataFrame@b40aae0#8493{length:18,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@6836f2d8[DataFrame@4f8d1730#8493{length:16357,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@19f6930e[DataFrame@176f3d7a#8493{length:27,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@4890ad7d[DataFrame@6f0307c8#8493{length:16348,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@424097c5[DataFrame@2caae61a#8493{length:36,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@619c4db7[DataFrame@16563b0b#8493{length:16339,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@506e68bb[DataFrame@2b0fce9a#8493{length:45,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@287ddfb9[DataFrame@fd437de#8493{length:16330,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@2cdcad2f[DataFrame@7972cd30#8493{length:54,end=false}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [ContentNotifier@405c852b] Queueing content DataInfo@3f3445b0[DataFrame@11f0ef4f#8493{length:1696,end=true}]
2022-09-08 11:33:03.461:INFO :oejhch.HttpReceiverOverHTTP2:client-29: [4246][ContentNotifier@405c852b] Resuming(false) processing(false) of content [stalled=true]
2022-09-08 11:33:08.469:INFO :oejc.HttpReceiver:HttpClient@13e344d-scheduler-1: cleanup HttpReceiverOverHTTP2@52c3b6de(rsp=FAILURE,failure=java.util.concurrent.TimeoutException: Idle timeout expired: 5000 ms)
2022-09-08 11:33:08.469:ERROR:oejhch.HttpClientTransportOverHTTP2Test:main: request 4246 failed: java.util.concurrent.TimeoutException: Idle timeout expired: 5000 ms

After that my understanding of the code and the design of the components is not sufficient to go further in the analysis. I don't know if I'm completely on the wrong track or not, WDYT ?

@clardeur clardeur added the Bug For general bugs on Jetty side label Sep 8, 2022
@sbordet
Copy link
Contributor

sbordet commented Sep 12, 2022

@clardeur thanks for the detailed report!

I can definitely see a race condition, we're working to analyze it and fix it.

@joakime joakime moved this to To do in Jetty 9.4.49 FROZEN Sep 14, 2022
@joakime joakime moved this from To do to In progress in Jetty 10.0.12 / 11.0.12 FROZEN Sep 14, 2022
sbordet added a commit that referenced this issue Sep 14, 2022
…istener.

The issue was that HttpReceiverOverHTTP2.ContentNotifier.offer() was racy,
as a network thread could have offered a DATA frame, but not yet called
process() -- yet an application thread could have stolen the DATA frame
completed the response and started another response, causing the network
thread to interact with the wrong response.

The implementation has been changed so that HttpReceiverOverHTTP2.ContentNotifier
does not have a queue anymore and it demands DATA frames to the Stream
only when the application demands more -- a simpler model that just forwards
the demand.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@sbordet
Copy link
Contributor

sbordet commented Sep 14, 2022

@clardeur would you be able to try branch jetty-10.0.x-8858-idle-timeout-http2-inputstreamresponselistener and see if it solves the problem for you?

@clardeur
Copy link
Author

@clardeur would you be able to try branch jetty-10.0.x-8858-idle-timeout-http2-inputstreamresponselistener and see if it solves the problem for you?

Hi @sbordet, could you backport the commit on the 11.0.x branch ? We have dependencies like websocket-jakarta-server that does not exist on the branch 10.0.x, it's easier to test for us on our production stack with the same major version.

If it's not possible, I can backport the changes myself locally just for testing purpose. But in any case, we need the fix on the 11.0.x version. Thanks for your work.

sbordet added a commit that referenced this issue Sep 15, 2022
…istener.

Updates after review.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Repository owner moved this from In progress to Done in Jetty 10.0.12 / 11.0.12 FROZEN Sep 15, 2022
Repository owner moved this from To do to Done in Jetty 9.4.49 FROZEN Sep 15, 2022
@sbordet
Copy link
Contributor

sbordet commented Sep 15, 2022

@clardeur I just merged the changes to Jetty 11 too, so you can build locally branch jetty-11.0.x and test.
Let us know how it goes. Thanks!

@sbordet sbordet added the Outstanding Bug Report! The bug report was outstanding! label Sep 15, 2022
sbordet added a commit that referenced this issue Sep 15, 2022
…istener.

Fixed override of reset().

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@clardeur
Copy link
Author

Let us know how it goes.

I have re-run a job that crash every time due to this issue and it goes well 👍

My 2 cents
It could be benefit for the project to add more client/server chaotic "production" load test like the HttpClientLoadTest with some parametrized scenarios that mix different implementations like Listener.Adapter or ConnectionPool to detect race condition or just functionnal regressions. @sbordet WDYT ?

@sbordet
Copy link
Contributor

sbordet commented Sep 15, 2022

@clardeur we have added your test to our suite, if that's what you meant.

The problem is that we can have a million edge cases, with 7 different transports, with many different ways of doing the same thing, and while we try, we cannot possibly cover all of them.

Your case only happens with HTTP/2, only with InputStreamResponseListener, only with more than 1 request sent iteratively, and only with a very particular thread scheduling 😅

I'm open to suggestions, but the number of permutations (and some of them like thread scheduling can only be stochastic) is typically too large to have a perfect coverage.

@clardeur
Copy link
Author

@clardeur we have added your test to our suite, if that's what you meant.

IMO my test case is very specific and related to this issue. I'm not sure if it's a good test to add.

Your case only happens with HTTP/2, only with InputStreamResponseListener, only with more than 1 request sent iteratively, and only with a very particular thread scheduling

Hard to find, indeed 😅

The problem is that we can have a million edge cases, with 7 different transports, with many different ways of doing the same thing, and while we try, we cannot possibly cover all of them.

I'm open to suggestions, but the number of permutations (and some of them like thread scheduling can only be stochastic) is typically too large to have a perfect coverage.

Yeah I know, a complex problem that requires time and expertise in testing. Since we have been using Jetty in production for 7 years now, some issues (#7107) (486829) that have been reported previously could have been avoided with some "basic" scenario. It's likely that some issues are more related to the client part of the project which is maybe less used in production compared to the server part.

The first step might be to add a nominal test case for each transport layer and parametrize only one major component at a time to avoid an exponential explosion, like the connection pool, which is a core component in my opinion. And in a second step, adding some "chaos" like randomness, latency, etc.

I know it's far from being a perfect coverage but it's like a safety net that does not require too much effort.

In any case, @sbordet, it's always a pleasure to take time to explore the code, reproduce the problem, understand how it's works to submit a proper issue, open source maintainers deserve it !

@dsmiley
Copy link

dsmiley commented Nov 4, 2022

The problem is that we can have a million edge cases, with 7 different transports, with many different ways of doing the same thing, and while we try, we cannot possibly cover all of them.

A "randomized testing" philosophy is intended to address this. Given enough randomized executions, you may well cover all of the scenarios :-). Apache Lucene and Solr embrace this philosophy. We randomize the JDKs, GCs, TimeZones, and within Lucene & Solr, arbitrary choices in implementations or various things specific to a test. Like randomly picking some data to use to test on. The main requirement is an ability to have repeatable randomness so that there is a greater chance of being able to reproduce a problem that is discovered. Lucene & Solr use a testing utility (JUnit runners, etc.) with an apt name for this -- randomizedtesting. It's not all wonderful. Debugging a failure that does not reproduce, despite the seed, is a challenge. Sometimes the outcome is the test needing to be less random or guard against something that can occur based on the randomness -- thus not a real success. But it does find obscure bugs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side Outstanding Bug Report! The bug report was outstanding!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants