Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FISH-464 Race conditions and stability #26

Conversation

dmatej
Copy link

@dmatej dmatej commented Nov 20, 2020

This is a PR with commits going also into Eclipse's master (3.0.0-SNAPSHOT) backported also to Payara maintenance.

The release of 2.4.4.payara-p5 is not planned at this moment, but this branch was used to test changes with yesterday's state of Payara Server master and passed all available tests.

Current Payara 5.2020.7-SNAPSHOT with these commits in it's own patch of Grizzly passed following tests:

- 19001 is often occupied in Kubuntu
- formatting synced with eclipse/master
- if the queue size is not same as reserved space, and if it is the last content
  and the content is a trailer, send trailers
- if the queue size is same as reserved space, and record was already processed,
  don't send trailers.
- continue otherwise.
- TrailersTest - just formatting and finals
- as eclipse's master formatted code, this is more similar
- original code was throwing NPE
- handler can be null
  - then we have nothing to handle the reason
- closeReason can be null
  - then we use "locally closed" as a default (may be incorrect)
- even ignored exception should be visible somewhere (logging)
- based on h2spec results - client was able to cause ISE on the server simply
  by misformatted header
- assertions are worthless in production
- fixes another NPE seen in logs
- in this commit passed with current Payara 5.2020.7-SNAPSHOT
  - TCK websockets
  - TCK servlet
  - h2spec except 3.5 (h2spec error) and 8.2.1.6 (still race conditions)
  - HTTP2 Push reproducer "100 delphines"
@dmatej dmatej requested a review from MattGill98 November 20, 2020 17:34
@dmatej dmatej self-assigned this Nov 20, 2020
@dmatej dmatej requested a review from lprimak November 20, 2020 17:35
@dmatej
Copy link
Author

dmatej commented Nov 20, 2020

How to test (not a script)

cd grizzly; 
mvn clean install;
cd payara;
mvn clean install -Dgrizzly.version=2.4.4.payara-p5-SNAPSHOT -Pfastest,QuickBuild
unzip 
asadmin start-domain
enable http2 push - Configurations - server-config - Network Config - Protocols - http-listener-2 - HTTP - check HTTP/2 Push Enabled and save 
boost capacity: Configurations - server-config - Thread Pools - http-thread-pool - set Max Thread Pool Size to 100 and save
asadmin restart-domain

Then you can try Delphines and also PrimeFaces reproducers.

@dmatej dmatej requested a review from Pandrex247 November 24, 2020 09:28
@lprimak
Copy link

lprimak commented Nov 24, 2020

Looks good to me. I don't see any 'race conditions' fixed here though or am I missing something?
I do see exception thrown correctly when they were just ignored before

@dmatej
Copy link
Author

dmatej commented Nov 25, 2020

It is not easy to explain, I just went step by step after exceptions in logs, failing tests, reproducers, etc.

Some "race condition" is still possible, but it is even aceptable by the spec, because whole system is so much asynchronous and multithreaded, that when something goes wrong (exceptions like NPE, buffer underflow/overflow, misformatted header, etc.), server is allowed to respond to the client by GOAWAY header and closing the stream OR simply by closing the stream/connection.
That is pretty hard to test - the h2spec test set usually has some set of "acceptable results" of the test, so often it is both "socket closed" and "GOAWAY". BUT! The system is multithreaded, so it is also possible that the cliend even already accepted some data from other streams.

So those race conditions are not that "internal JVM classic" like wrong locking and synchronization we originally expected, but rather issues with causality and correlation.

I'm still not sure if it is everything, but probably the best way how to find out is to release it - maybe at the start of the next year, so I would have time to revisit the Apache Client reproducer and check once again that there is no other trap - and if it is, fix it :-)
But that should be in another standalone PR as this one was heavily tested.

@dmatej dmatej merged commit 9ccaef8 into payara:2.4.4.payara-maintenance Jan 26, 2021
@dmatej dmatej deleted the FISH-464-race-conditions-and-stability branch January 30, 2021 20:25
@Pandrex247 Pandrex247 mentioned this pull request Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants