Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to readChunkSize if InputStream.available() not implemented #2949

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

idelpivnitskiy
Copy link
Member

Motivation:

FromInputStreamPublisher attempts to read only 1 byte when InputStream.available() is not implemented. As a result, users of blocking streaming API with payload body as InputStream may write and flush by 1 byte if their InputStream implementation always returns 0 available bytes (default).

Modifications:

  • Consider available() as a best effort to avoid blocking. If it returns 0 bytes, attempt to read up to readChunkSize bytes;

Result:

Improved efficiency of writing InputStream data for outgoing requests.

@idelpivnitskiy idelpivnitskiy self-assigned this Jun 4, 2024
Motivation:

`FromInputStreamPublisher` attempts to read only 1 byte when
`InputStream.available()` is not implemented. As a result, users of
blocking streaming API with payload body as `InputStream` may write and
flush by 1 byte if their `InputStream` implementation always returns 0
available bytes (default).

Modifications:

- Consider `available()` as a best effort to avoid blocking. If it
returns 0 bytes, attempt to read up to `readChunkSize` bytes;

Result:

Improved efficiency of writing `InputStream` data for outgoing requests.
@@ -171,7 +171,7 @@ private void readAndDeliver(final Subscriber<? super byte[]> subscriber) {
int available = stream.available();
if (available == 0) {
// Work around InputStreams that don't strictly honor the 0 == EOF contract.
available = buffer != null ? buffer.length : 1;
available = buffer != null ? buffer.length : readChunkSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned offline, the default value is 64kB which could be rough on GC in adversarial cases. Do you plan to change that as well in a different PR, or are we fine trying it as-is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I already played with different values and wire-logging enabled, 64Kb is not a good default.
Still need to run some benchmarks and will update the value in a separate PR.

Copy link
Contributor

@daschl daschl Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a data point, in AHC the InputStreamEntity uses OUTPUT_BUFFER_SIZE = 4096; as the chunk size for this. (that said, the change dates back to 2013 and there doesn't seem to be an explanation why 4k was chosen, so maybe we can come up with something better (larger?).

@idelpivnitskiy idelpivnitskiy merged commit 8949741 into apple:main Jun 5, 2024
15 checks passed
@idelpivnitskiy idelpivnitskiy deleted the is branch June 5, 2024 15:33
idelpivnitskiy added a commit to idelpivnitskiy/servicetalk that referenced this pull request Jun 12, 2024
Motivation:

In apple#2949 we optimized a case when `available()` is not implemented and
always returns `0`. However, we de-optimized a use-case when it's
implemented because after that change the last call to `available()`
always returns 0, but we still allocate a buffer of size `readChunkSize`
that won't be used at all.

Modifications:
- Enhance `doNotFailOnInputStreamWithBrokenAvailableCall(...)` test
before any changes to have better test coverage.
- Remove `byte[] buffer` from a class variable. It can be a local
variable because it's never reused in practice. Only the last `buffer`
won't be used nullified, but we don't need it after that.
- When `available()` returns `0`, try reading a single byte and then
check availability again instead of always falling back to
`readChunkSize`.
- Adjust `doNotFailOnInputStreamWithBrokenAvailableCall()` test to
account for the 2nd call to `available()`;
- Add `singleReadTriggersMoreAvailability()` test to simulate when the
2nd call to `available()` returns positive value;

Result:

1. No allocation of a `buffer` that won't be used at the EOF.
2. Account for new availability if it appears after a `read()`.
idelpivnitskiy added a commit that referenced this pull request Jun 13, 2024
Motivation:

In #2949 we optimized a case when `available()` is not implemented and
always returns `0`. However, we de-optimized a use-case when it's
implemented because after that change the last call to `available()`
always returns 0, but we still allocate a buffer of size `readChunkSize`
that won't be used at all.

Modifications:
- Enhance `doNotFailOnInputStreamWithBrokenAvailableCall(...)` test
before any changes to have better test coverage.
- Remove `byte[] buffer` from a class variable. It can be a local
variable because it's never reused in practice. Only the last `buffer`
won't be used nullified, but we don't need it after that.
- When `available()` returns `0`, try reading a single byte and then
check availability again instead of always falling back to
`readChunkSize`.
- Adjust `doNotFailOnInputStreamWithBrokenAvailableCall()` test to
account for the 2nd call to `available()`;
- Add `singleReadTriggersMoreAvailability()` test to simulate when the
2nd call to `available()` returns positive value;

Result:

1. No allocation of a `buffer` that won't be used at the EOF.
2. Account for new availability if it appears after a `read()`.
idelpivnitskiy added a commit that referenced this pull request Jun 14, 2024
Motivation:

In #2949 we optimized a case when `available()` is not implemented and always returns `0`. However, we de-optimized a use-case when it's implemented because the last call to `available()` always returns 0, but we still allocate a buffer of size `readChunkSize` that won't be used.

Modifications:
- Enhance `doNotFailOnInputStreamWithBrokenAvailableCall(...)` test before any changes for better test coverage.
- Remove `byte[] buffer` from a class variable. It can be a local variable because it's never reused in practice. Only the last `buffer` won't be nullified, but we don't need it after that.
- When `available()` returns `0`, try reading a single byte and then recheck availability instead of always falling back to
`readChunkSize`.
- Adjust `doNotFailOnInputStreamWithBrokenAvailableCall()` test to account for the 2nd call to `available()`;
- Add `singleReadTriggersMoreAvailability()` test to simulate when the 2nd call to `available()` returns positive value;

Result:

1. No allocation of a `buffer` that won't be used at the EOF.
2. Account for new availability if it appears after a `read()`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants