Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(archiveupload): improve performance of archive uploads and validation #742

Merged
merged 31 commits into from
Nov 3, 2021

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Oct 29, 2021

Fixes #740

The primary fix in this PR is in the V1 RecordingPostHandler - the implementation of the function for validating JFR binary data has been changed from one that parses the data and constructs a list of events - potentially using a threadpool - to one that simply serially parses the JFR binary chunk-by-chunk, thereby ensuring that that binary is parseable as JFR data and contains at least 1 chunk of data. This is simpler, faster, and less resource intensive, and provides strong enough validation of the uploaded data for this use case.

Along the way I discovered and fixed some other minor deficiencies in uploaded file handling, for RecordingPostHandler and others using the Vert.x BodyHandler class - namely that body handlers in several instances were accepting file uploads (the default) where only a standard HTTP form upload was expected, or that invalid file uploads would result in a rejected API request but the uploaded file would remain in Vert.x's temporary file-upload storage.

Finally, I also determined that with large (50MB+) JFR files, POSTing to the V1 RecordingPostHandler is very slow - before the Cryostat handler implementation even gets to run. The Vertx BodyHandler itself is actually quite slow here and consumes significant memory while processing the multipart form-data file upload:

1. As an initial experimental workaround I have implemented the Beta API version of the RecordingPostHandler, which accepts the JFR data directly as the POST request body without multipart file boundaries etc., and which directly handles the streaming data instead of delegating to a Vert.x body handler. This has some caveats, including that the Expect: 100-continue header must be sent by the requesting client - this is respected by cURL by default but is not implemented by Firefox nor Chromium, so it is only reliably useful from the CLI and not the web-client. However, the upload performance using this handler is superior - there is no multipart decode occurring and the upload is handled using streaming IO to the filesystem, and then proceeds to the chunk-by-chunk JFR binary validation. Dropped from the PR
2. Diving deeper into the cause of the poor Vertx Body Handler performance I discovered there was a Netty performance regression in the multipart decoder (netty/netty#10508). Upgrading the Vert.x version from 3.9.7 to 3.9.9 also bumps the Netty version, and the original V1 API upload handler using multipart form-data uploads performs nearly as well as the Beta API binary file upload handler after the upgrade, while still remaining compatible with web browsers (not requiring the Expect:100-continue header).

This second point is responsible for the vast majority of the performance fix in the PR. Simply using a newer version of Netty (4.1.60.Final -> 4.1.67.Final) makes the BodyHandler run at a much more reasonable speed and I haven't triggered the OOM kill with it, either. Combined with the slimmed down JFR parsing, the overall resource overhead is greatly reduced.

I have a ~150MB JFR file that I can share privately for reviewers to test this PR with.

@andrewazores andrewazores added feat New feature or request fix labels Oct 29, 2021
@andrewazores andrewazores force-pushed the archive-upload-validation branch from 90f5ddd to 71e18e2 Compare October 29, 2021 19:49
@andrewazores andrewazores force-pushed the archive-upload-validation branch from 9f2889a to 6c52c80 Compare November 1, 2021 15:38
Includes a Netty version bump which fixes a performance regression in multipart form-data decoding
@andrewazores andrewazores changed the title fix(archiveupload): reduce memory consumption of archive uploading fix(archiveupload): improve performance of archive uploads and validation Nov 1, 2021
@andrewazores andrewazores removed the feat New feature or request label Nov 1, 2021
@andrewazores andrewazores marked this pull request as ready for review November 1, 2021 19:48
@andrewazores andrewazores requested a review from ebaron November 1, 2021 19:48
@andrewazores
Copy link
Member Author

@ebaron I'm not too attached to the new Beta API handler I added since the Vert.x 3.9.9 version upgrade brings the performance of the original handler back up to an acceptable level. In my testing using curl, uploading to either the V1 or the Beta endpoints and using the Expect:100-continue header in both cases (to narrow down the timing to file I/O differences and not incur additional roundtrip latency), the 150MB JFR file takes ~1.8s either way. With the Vert.x 3.9.7 V1 API handler this time is more like 10 minutes. So, I am open to simply dropping the Beta handler from the PR - the Vert.x version upgrade transitively pulling in a Netty version upgrade seems to fix the vast majority of the performance issue. What little improvement remains is easily applied to the V1 handler as well by the JFR parsing change I made.

@ebaron
Copy link
Member

ebaron commented Nov 2, 2021

I think if the Beta handler is simply a workaround for the Netty performance issue, then it makes sense to drop it. If it has a purpose going forward then we can keep it.

I did notice that the linked issue netty/netty#10508 was supposedly fixed in 4.1.60.Final, which is used by Vert.x 3.9.7. I wonder if there's another bug.

@andrewazores
Copy link
Member Author

I think if the Beta handler is simply a workaround for the Netty performance issue, then it makes sense to drop it. If it has a purpose going forward then we can keep it.

I did notice that the linked issue netty/netty#10508 was supposedly fixed in 4.1.60.Final, which is used by Vert.x 3.9.7. I wonder if there's another bug.

There was another Netty bug: netty/netty#11188

I updated the linked issue to mention this but forgot to update this PR's description of the problem too, sorry. That Netty fix is in 4.1.65.Final, or Vert.x 3.9.8+.

@andrewazores
Copy link
Member Author

andrewazores commented Nov 2, 2021

From what I can tell, it may have been 10508 that's responsible for the slow apparent upload speed I observed - although that should apparently be fixed in Vert.x 3.9.7, but it seems to match the description of what I see occuring. That's not actually an upload speed limitation but seems to be Netty being slow to process data out of TCP buffers, causing the transfer rate to suffer. There may be another related Vert.x/Netty bug causing the transfer rate to suffer however, other than 10508 - not sure. 11188 causes additional memory pressure when processing the file upload, so that's responsible for (or heavily contributes to) the OOM side of the original issue.

Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good. I just need to test it with that large JFR file.

@andrewazores
Copy link
Member Author

Code changes look good. I just need to test it with that large JFR file.

I sent you an email with a link to it.

HTTP_API.md Outdated Show resolved Hide resolved
Copy link
Member

@ebaron ebaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested out in CRC with the provided JFR file. Uploaded quickly.

@andrewazores andrewazores merged commit 460e2fd into cryostatio:main Nov 3, 2021
@andrewazores andrewazores deleted the archive-upload-validation branch November 3, 2021 18:26
mergify bot pushed a commit that referenced this pull request Nov 3, 2021
…tion (#742)

* write archive uploads into the existing archives directory

mount a testing archives directory on disk rather than using tmpfs

* perform faster and lighter validation of uploaded binaries

* implement binary file uploading beta recordings handler

* document beta api handlers

* pause/resume request to ensure all buffer data is correctly handled

* Set content-type in example

* Allow Content-Type headers in CORS

* refactor to use more vertx async io

* handle read timeouts

* use ctx.fail rather than throwing, to ensure cleanup endhandler is invoked

* fix(recordingupload): delete badly named uploaded temp files

* fix(templatesupload): delete badly named uploaded files

* fix(bodyhandlers): do not handle file uploads where not expected

* create and open file in one step rather than two

* improve stream-to-file handling

* ensure file-uploads dir is created

* Revert "improve stream-to-file handling"

This reverts commit 71e18e2.

* replace System.nanoTime with Clock.getMonotonicTime

* replace makeAsyncResult methods with FutureFactory

* fixup! ensure file-uploads dir is created

* reorder request header and param validations

* Update doc with required headers

* apply spotless formatting

* correct broken unit tests

* remove debug logging

* delete invalid uploaded files

* Upgrade vertx to 3.9.9

Includes a Netty version bump which fixes a performance regression in multipart form-data decoding

* Remove POST /api/beta/recordings handler

* apply spotless formatting

* remove unneeded CORS change after removal of Beta API handler

* doc(httpapi): remove reference to Beta POST /api/beta/recordings/:recordingName handler

(cherry picked from commit 460e2fd)

# Conflicts:
#	run.sh
andrewazores added a commit that referenced this pull request Nov 3, 2021
andrewazores added a commit that referenced this pull request Nov 3, 2021
…tion (backport #742) (#746)

* fix(archiveupload): improve performance of archive uploads and validation (#742)

* write archive uploads into the existing archives directory

mount a testing archives directory on disk rather than using tmpfs

* perform faster and lighter validation of uploaded binaries

* implement binary file uploading beta recordings handler

* document beta api handlers

* pause/resume request to ensure all buffer data is correctly handled

* Set content-type in example

* Allow Content-Type headers in CORS

* refactor to use more vertx async io

* handle read timeouts

* use ctx.fail rather than throwing, to ensure cleanup endhandler is invoked

* fix(recordingupload): delete badly named uploaded temp files

* fix(templatesupload): delete badly named uploaded files

* fix(bodyhandlers): do not handle file uploads where not expected

* create and open file in one step rather than two

* improve stream-to-file handling

* ensure file-uploads dir is created

* Revert "improve stream-to-file handling"

This reverts commit 71e18e2.

* replace System.nanoTime with Clock.getMonotonicTime

* replace makeAsyncResult methods with FutureFactory

* fixup! ensure file-uploads dir is created

* reorder request header and param validations

* Update doc with required headers

* apply spotless formatting

* correct broken unit tests

* remove debug logging

* delete invalid uploaded files

* Upgrade vertx to 3.9.9

Includes a Netty version bump which fixes a performance regression in multipart form-data decoding

* Remove POST /api/beta/recordings handler

* apply spotless formatting

* remove unneeded CORS change after removal of Beta API handler

* doc(httpapi): remove reference to Beta POST /api/beta/recordings/:recordingName handler

(cherry picked from commit 460e2fd)

# Conflicts:
#	run.sh

* fixup! fix(archiveupload): improve performance of archive uploads and validation (#742)

Co-authored-by: Andrew Azores <aazores@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

POST /api/v1/recordings of uploaded file consumes more memory than expected
2 participants