-
Notifications
You must be signed in to change notification settings - Fork 38.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebFlux MultipartParser Blocks When Uploading Large Files, Causing Uploads to Occasionally Hang #34178
Comments
Perhaps it would be more appropriate to raise this issue in the spring-framework project. I apologize for overlooking this. Could the developers please transfer this issue to the spring-framework repository?🤪 |
Several uploading tries of the same file 1.7 GiB [####################] 6138071211851868018.multipart
1.5 GiB [################# ] 14413026015246421036.multipart
1.4 GiB [################ ] 15576621334079039706.multipart
1.1 GiB [############ ] 14246059143399172415.multipart |
Thanks for the very high quality reproducer, that's much appreciated. So far I am unable to reproduce after 10 tries with |
Through my debugging process. If upstream() != null && !this.sink.isCancelled() && this.sink.requestedFromDownstream() == 0 && !this.requestOutstanding.get() then it would hang up for 100% through my local tests about dozens of times. Whenever the upload is successful, the previous debug point will never be reached. |
@chenggangpro it would be interesting to know which condition doesn't match. Is "this.sink.requestedFromDownstream() == 0" or "this.requestOutstanding == false"? This is probably a concurrency issue in the parser and we need to pinpoint exactly the issue to fix it. I think the proposed fix in #34388 accidentally fixes things by over-requesting data but queuing "onNext" might cause other issues, too much memory consumption or even parsing errors? |
@bclozel I think it's the condition My local debugging is as below: I add some debug logging points to
public static Flux<Token> parse(Flux<DataBuffer> buffers, byte[] boundary, int maxHeadersSize, Charset headersCharset) {
return Flux.create(sink -> {
MultipartParser parser = new MultipartParser(sink, boundary, maxHeadersSize, headersCharset);
sink.onCancel(parser::onSinkCancel);
sink.onRequest(l -> logger.warn("===== Sink On request : " + l));// here is the debug logging point
buffers.subscribe(parser);
});
}
private void requestBuffer() {
if (upstream() != null &&
!this.sink.isCancelled() &&
this.sink.requestedFromDownstream() > 0 &&
this.requestOutstanding.compareAndSet(false, true)) {
request(1);
}else if(!this.requestOutstanding.get()){
// here is the debug logging point
logger.warn("===== Request buffer called =================");
logger.warn("===== Sink is cancelled :" + sink.isCancelled());
logger.warn("===== Sink requested from down stream :" + sink.requestedFromDownstream());
logger.warn("===== Request buffer called =================");
}
} Then I uploaded You can see lines between I kept running PR #34388 for 6 hours yesterday, and there were no hanging issues or parsing errors. However, I am not sure if my fix is actually correct or if there are any potential errors that I haven't noticed. I hope my debugging process is useful for you all in solving this issue. |
Thanks for looking into this! There isn’t a fixed frequency for reproduction—it might take just 3 attempts sometimes, while other times it doesn’t reproduce even after dozens of tries. It seems to require a bit of luck. However, in applications with a large user base, users do encounter this issue. Fortunately, they can usually resolve it by retrying, as seen in cases like halo-dev/halo#7170 |
@chemicL We suspect that here when #34388 was proposing to add |
@sdeleuze I had just a very brief look. My observations follow. Let me know if you'd need more assistance and I can dedicate more time to help. My understanding is that there will be a situation where the downstream (the subscriber to At the same time, the upstream ( Eventually, the downstream would issue the Consider registering a connection between Now, having seen the above PR (#34388 which aims to introduce exactly that link) and the concerns you raise, I think they are valid and that's a similar case I was facing in I don't know the specifics of the buffers chunking and buffering but requesting more than one at a time is not bad per se, since reactive signals are still happening sequentially, just the source can buffer up more independently. There would need to be some understanding built-in of how the downstream demand of N translates to an M of upstream demand though, and it might be a safer bet to just request 1 at at time as it's now - and I understand the stalls are the current concern and not the throughput. |
Thanks for the detailed feedback, based on those insights I am going to reopen #34388 and close this issue. I will do more tests locally before merging it. |
Description
When using WebFlux to upload large files, the upload process sometimes hangs before reaching the route's business logic. Specifically, the issue occurs during the client’s file upload to the
org.springframework.http.codec.multipart.MultipartParser
when writing to a temporary file, causing the upload to become stuck. This problem is intermittent but has a certain probability of occurring when uploading files of 200MB, between 2-3GB, and between 4-5GB.Steps to Reproduce
xxx.multipart
is incomplete, and the request remains in a pending state without completing the upload.Additional Resources
Kapture.2024-12-30.at.16.32.36.mp4
In the screen recording, I used an ISO file that I downloaded from the Manjaro official website: https://download.manjaro.org/gnome/24.2.1/manjaro-gnome-24.2.1-241216-linux612.iso
The file upload progress in the video is stuck at 68% and does not complete, regardless of how long you wait, with no error messages displayed
Expected Behavior
Large files should upload and be processed smoothly without hanging or blocking at any stage.
Actual Behavior
When uploading large files, the upload process sometimes hangs during the
MultipartParser
phase of writing to the temporary file, preventing the upload from completing.Environment Information
spring-boot-starter-webflux
Note: The issue has been tested and reproduced across all the above-mentioned systems and JDK versions.
Additional Information
This issue is not caused by business logic but by the framework’s handling of large file uploads and writing to temporary files, which leads to blocking. We hope the development team can investigate and resolve this issue to enhance the stability and reliability of large file uploads.
The text was updated successfully, but these errors were encountered: