-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proxy] Limit replay buffer size in AdminProxyHandler #10944
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. Saw the issue but didn't get a chance to dig deeper with Summit :)
I wonder if we need a at least a tweak or a different approach... If there is any place where we have an arbitrary body that has a redirect in the loop, I would be nervous that you would hit this and then get a truncated body on redirect.
I know for certain that upcoming https://github.com/apache/pulsar/wiki/PIP-64%3A-Introduce-REST-endpoints-for-producing%2C-consuming-and-reading-messages would likely be such a case, as we use re-directs to get to the right broker and if a user sends a > 1 MB body, that would be an issue.
Partial requests are total nightmare, so I don't know if there is any easy way to avoid it not sending the entire body before getting the redirect.
I think that means are options are:
- make it tuneable and the user needs to be aware if they are using HTTP endpoints for producing/consuming. Perhaps just setting the default of 5 MB of max message size would cover 90% use case? I think this is the minimum
- do something smarter and use an off heap buffer or some smarter allocation strategy
- make the proxy smarter and if it is a request that has a post body and may be redirected we do a lookup and then always hit the correct broker. This might be a longer term effort
I would say the second strategy is likely best bet, but we could always start with first one and improve the buffer
@MarvinCai can you correct me if I am wrong about the rest endpoints using redirects?
Thanks for the great suggestions @addisonj .
Yes a different approach would be useful. It seems just wrong to buffer all request bodies in memory for the duration of the request handling.
Makes sense. I could add a configuration option. I thought it wouldn't be so common to have http requests with large bodies and it would be mainly the Pulsar Function jar uploads that could exceed 1MB. I guess with new APIs such as PIP-64, 1MB could exceed when sending large messages.
Allocating on heap isn't the biggest problem. It's the usage of ByteArrayOutputStream which uses a single byte array.
I think this would be a good solution. |
2cfdab1
to
b52229e
Compare
@addisonj I added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am +1, once again, I would see this more as a temporary solution, with the only downside being we may eventually have a deprecated config parameter, but it is sufficiently narrow in scope that it doesn't seem to unreasonable to me
I might like getting @merlimat's opinion to see if he has any concerns
We chatted a bit about this on the community meeting, let's merge this and we may consider improvements later as there are questions more broadly about the proxy to make it "smarter" |
Fixes apache#10908 ### Motivation Pulsar Proxy uses a lot of heap memory when uploading large function jar files. This also leads to high GC activity since a continuous block of memory (byte array for the size of the upload) is allocated. GC will have to do compaction for the heap (which gets fragmented) to find a continuous block of memory. This is the reason why allocating large arrays are costly from GC perspective. The buffering solution added as part of apache#5361. The solution buffers also very large uploads to memory. ### Modifications * Limit the replay buffer size to a configurable limit which defaults to 5MB. This is configured with the `httpInputMaxReplayBufferSize` proxy configuration parameter. * Add unit test to see that buffer size gets limited * Add unit test for apache#5361 (cherry picked from commit 2324618)
Fixes apache#10908 ### Motivation Pulsar Proxy uses a lot of heap memory when uploading large function jar files. This also leads to high GC activity since a continuous block of memory (byte array for the size of the upload) is allocated. GC will have to do compaction for the heap (which gets fragmented) to find a continuous block of memory. This is the reason why allocating large arrays are costly from GC perspective. The buffering solution added as part of apache#5361. The solution buffers also very large uploads to memory. ### Modifications * Limit the replay buffer size to a configurable limit which defaults to 5MB. This is configured with the `httpInputMaxReplayBufferSize` proxy configuration parameter. * Add unit test to see that buffer size gets limited * Add unit test for apache#5361
Fixes apache#10908 ### Motivation Pulsar Proxy uses a lot of heap memory when uploading large function jar files. This also leads to high GC activity since a continuous block of memory (byte array for the size of the upload) is allocated. GC will have to do compaction for the heap (which gets fragmented) to find a continuous block of memory. This is the reason why allocating large arrays are costly from GC perspective. The buffering solution added as part of apache#5361. The solution buffers also very large uploads to memory. ### Modifications * Limit the replay buffer size to a configurable limit which defaults to 5MB. This is configured with the `httpInputMaxReplayBufferSize` proxy configuration parameter. * Add unit test to see that buffer size gets limited * Add unit test for apache#5361
Fixes apache#10908 Pulsar Proxy uses a lot of heap memory when uploading large function jar files. This also leads to high GC activity since a continuous block of memory (byte array for the size of the upload) is allocated. GC will have to do compaction for the heap (which gets fragmented) to find a continuous block of memory. This is the reason why allocating large arrays are costly from GC perspective. The buffering solution added as part of apache#5361. The solution buffers also very large uploads to memory. * Limit the replay buffer size to a configurable limit which defaults to 5MB. This is configured with the `httpInputMaxReplayBufferSize` proxy configuration parameter. * Add unit test to see that buffer size gets limited * Add unit test for apache#5361 (cherry picked from commit 2324618) (cherry picked from commit 7fa88cc)
Fixes apache#10908 ### Motivation Pulsar Proxy uses a lot of heap memory when uploading large function jar files. This also leads to high GC activity since a continuous block of memory (byte array for the size of the upload) is allocated. GC will have to do compaction for the heap (which gets fragmented) to find a continuous block of memory. This is the reason why allocating large arrays are costly from GC perspective. The buffering solution added as part of apache#5361. The solution buffers also very large uploads to memory. ### Modifications * Limit the replay buffer size to a configurable limit which defaults to 5MB. This is configured with the `httpInputMaxReplayBufferSize` proxy configuration parameter. * Add unit test to see that buffer size gets limited * Add unit test for apache#5361
Fixes #10908
Motivation
Pulsar Proxy uses a lot of heap memory when uploading large function jar files. This also leads to high GC activity since a continuous block of memory (byte array for the size of the upload) is allocated. GC will have to do compaction for the heap (which gets fragmented) to find a continuous block of memory. This is the reason why allocating large arrays are costly from GC perspective.
The buffering solution added as part of #5361. The solution buffers also very large uploads to memory.
Modifications
httpInputMaxReplayBufferSize
proxy configuration parameter.