Fallback to `readChunkSize` if `InputStream.available()` not implemented #2949

idelpivnitskiy · 2024-06-04T16:35:38Z

Motivation:

FromInputStreamPublisher attempts to read only 1 byte when InputStream.available() is not implemented. As a result, users of blocking streaming API with payload body as InputStream may write and flush by 1 byte if their InputStream implementation always returns 0 available bytes (default).

Modifications:

Consider available() as a best effort to avoid blocking. If it returns 0 bytes, attempt to read up to readChunkSize bytes;

Result:

Improved efficiency of writing InputStream data for outgoing requests.

Motivation: `FromInputStreamPublisher` attempts to read only 1 byte when `InputStream.available()` is not implemented. As a result, users of blocking streaming API with payload body as `InputStream` may write and flush by 1 byte if their `InputStream` implementation always returns 0 available bytes (default). Modifications: - Consider `available()` as a best effort to avoid blocking. If it returns 0 bytes, attempt to read up to `readChunkSize` bytes; Result: Improved efficiency of writing `InputStream` data for outgoing requests.

bryce-anderson · 2024-06-04T22:27:38Z

...alk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/FromInputStreamPublisher.java

@@ -171,7 +171,7 @@ private void readAndDeliver(final Subscriber<? super byte[]> subscriber) {
                    int available = stream.available();
                    if (available == 0) {
                        // Work around InputStreams that don't strictly honor the 0 == EOF contract.
-                        available = buffer != null ? buffer.length : 1;
+                        available = buffer != null ? buffer.length : readChunkSize;


As you mentioned offline, the default value is 64kB which could be rough on GC in adversarial cases. Do you plan to change that as well in a different PR, or are we fine trying it as-is?

Yes, I already played with different values and wire-logging enabled, 64Kb is not a good default.
Still need to run some benchmarks and will update the value in a separate PR.

As a data point, in AHC the InputStreamEntity uses OUTPUT_BUFFER_SIZE = 4096; as the chunk size for this. (that said, the change dates back to 2013 and there doesn't seem to be an explanation why 4k was chosen, so maybe we can come up with something better (larger?).

Motivation: In apple#2949 we optimized a case when `available()` is not implemented and always returns `0`. However, we de-optimized a use-case when it's implemented because after that change the last call to `available()` always returns 0, but we still allocate a buffer of size `readChunkSize` that won't be used at all. Modifications: - Enhance `doNotFailOnInputStreamWithBrokenAvailableCall(...)` test before any changes to have better test coverage. - Remove `byte[] buffer` from a class variable. It can be a local variable because it's never reused in practice. Only the last `buffer` won't be used nullified, but we don't need it after that. - When `available()` returns `0`, try reading a single byte and then check availability again instead of always falling back to `readChunkSize`. - Adjust `doNotFailOnInputStreamWithBrokenAvailableCall()` test to account for the 2nd call to `available()`; - Add `singleReadTriggersMoreAvailability()` test to simulate when the 2nd call to `available()` returns positive value; Result: 1. No allocation of a `buffer` that won't be used at the EOF. 2. Account for new availability if it appears after a `read()`.

Motivation: In #2949 we optimized a case when `available()` is not implemented and always returns `0`. However, we de-optimized a use-case when it's implemented because after that change the last call to `available()` always returns 0, but we still allocate a buffer of size `readChunkSize` that won't be used at all. Modifications: - Enhance `doNotFailOnInputStreamWithBrokenAvailableCall(...)` test before any changes to have better test coverage. - Remove `byte[] buffer` from a class variable. It can be a local variable because it's never reused in practice. Only the last `buffer` won't be used nullified, but we don't need it after that. - When `available()` returns `0`, try reading a single byte and then check availability again instead of always falling back to `readChunkSize`. - Adjust `doNotFailOnInputStreamWithBrokenAvailableCall()` test to account for the 2nd call to `available()`; - Add `singleReadTriggersMoreAvailability()` test to simulate when the 2nd call to `available()` returns positive value; Result: 1. No allocation of a `buffer` that won't be used at the EOF. 2. Account for new availability if it appears after a `read()`.

Motivation: In #2949 we optimized a case when `available()` is not implemented and always returns `0`. However, we de-optimized a use-case when it's implemented because the last call to `available()` always returns 0, but we still allocate a buffer of size `readChunkSize` that won't be used. Modifications: - Enhance `doNotFailOnInputStreamWithBrokenAvailableCall(...)` test before any changes for better test coverage. - Remove `byte[] buffer` from a class variable. It can be a local variable because it's never reused in practice. Only the last `buffer` won't be nullified, but we don't need it after that. - When `available()` returns `0`, try reading a single byte and then recheck availability instead of always falling back to `readChunkSize`. - Adjust `doNotFailOnInputStreamWithBrokenAvailableCall()` test to account for the 2nd call to `available()`; - Add `singleReadTriggersMoreAvailability()` test to simulate when the 2nd call to `available()` returns positive value; Result: 1. No allocation of a `buffer` that won't be used at the EOF. 2. Account for new availability if it appears after a `read()`.

idelpivnitskiy requested review from daschl and bryce-anderson June 4, 2024 16:35

idelpivnitskiy self-assigned this Jun 4, 2024

idelpivnitskiy force-pushed the is branch from 4a180aa to a132f91 Compare June 4, 2024 18:03

bryce-anderson approved these changes Jun 4, 2024

View reviewed changes

daschl approved these changes Jun 5, 2024

View reviewed changes

idelpivnitskiy merged commit 8949741 into apple:main Jun 5, 2024
15 checks passed

idelpivnitskiy deleted the is branch June 5, 2024 15:33

idelpivnitskiy mentioned this pull request Jun 12, 2024

FromInputStreamPublisher: avoid extra allocation of a buffer #2965

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback to `readChunkSize` if `InputStream.available()` not implemented #2949

Fallback to `readChunkSize` if `InputStream.available()` not implemented #2949

idelpivnitskiy commented Jun 4, 2024

bryce-anderson Jun 4, 2024

idelpivnitskiy Jun 5, 2024

daschl Jun 5, 2024 •

edited

Loading

Fallback to readChunkSize if InputStream.available() not implemented #2949

Fallback to readChunkSize if InputStream.available() not implemented #2949

Conversation

idelpivnitskiy commented Jun 4, 2024

bryce-anderson Jun 4, 2024

Choose a reason for hiding this comment

idelpivnitskiy Jun 5, 2024

Choose a reason for hiding this comment

daschl Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

Fallback to `readChunkSize` if `InputStream.available()` not implemented #2949

Fallback to `readChunkSize` if `InputStream.available()` not implemented #2949

daschl Jun 5, 2024 •

edited

Loading