Skip to content

Conversation

lauzadis
Copy link
Contributor

  • Implements gzip compression using zlib bindings (GzipCompressor)
  • Deduplicates cross-module definitions of decompressGzipBytes
    • Required making this function @InternalApi public, which I think is a fine tradeoff

Issue #

Description of changes

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

0marperez and others added 30 commits December 16, 2024 11:30
@lauzadis lauzadis requested a review from a team as a code owner January 29, 2025 21:05
@lauzadis lauzadis added the no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly. label Jan 29, 2025
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

Comment on lines 102 to 119
var finished = false

while (!finished) {
val outputPin = buffer.pin()
stream.next_out = outputPin.addressOf(0).reinterpret()
stream.avail_out = BUFFER_SIZE.toUInt()

val deflateResult = deflate(stream.ptr, Z_FINISH)
if (deflateResult != Z_STREAM_END && deflateResult != Z_OK) {
throw RuntimeException("Deflate failed during finish: $deflateResult")
}

val bytesWritten = BUFFER_SIZE - stream.avail_out.toInt()
outputBuffer.addAll(buffer.take(bytesWritten))

finished = deflateResult == Z_STREAM_END
outputPin.unpin()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Can this be replaced by a call to consume(availableForRead)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly, because the data is still inside the stream, not in the internal buffer we manage. It could possibly be replaced with update(byteArrayOf()) followed by consume(availableForRead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually an empty update won't work, because we need to specify Z_FINISH rather than Z_NO_FLUSH to ensure the entire contents are flushed out

Comment on lines +15 to +17
companion object {
internal const val BUFFER_SIZE = 16384
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why is this the right buffer size? Would we ever want to use different sizes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason, I just chose this number. I don't think it matters too much, it just controls how often we call deflate (whether the input passed to update can fit in single deflate call or not)

Comment on lines 46 to 51
// If still no data is available and the channel is closed, we've hit EOF. Close the compressor and write the remaining bytes
if (compressor.availableForRead == 0 && channel.isClosedForRead) {
val terminationBytes = compressor.close()
sink.write(terminationBytes)
return terminationBytes.size.toLong()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: Is it always true that availableForRead == 0 after compressor.update(...) means EOF? Zlib's manual seems to imply that calls to deflate might result in data being internally buffered but not necessarily producing output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't, but if the underlying channel is closed (second half of the condition), then it's safe to flush the compressor (preparing all the internally-buffered data) and write the remaining bytes

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.


internal val isClosed
get() = _isClosed.value
private val outputBuffer = SdkByteChannel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we have to use a channel instead of a source/sink here? This makes all of the data methods suspend but I cannot see why they need to be, conceptually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use an SdkSource or an SdkSink because those only support either read or write but not both. I'll try SdkBuffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use SdkBuffer

Comment on lines 45 to 47
runBlocking {
GzipSdkSource(bytes.source()).readToByteArray().asByteStream()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why do we need to invoke a suspend method here? We're pulling this all into memory and blocking until it's done so it seems like we can avoid runBlocking/suspend altogether...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #1228 (comment)

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions
Copy link

github-actions bot commented Feb 3, 2025

Affected Artifacts

Changed in size
Artifact Pull Request (bytes) Latest Release (bytes) Delta (bytes) Delta (percentage)
http-test-jvm.jar 61,604 58,690 2,914 4.97%
aws-signing-default-jvm.jar 53,497 51,946 1,551 2.99%
crt-util-jvm.jar 21,451 20,952 499 2.38%
runtime-core-jvm.jar 835,835 818,814 17,021 2.08%
aws-signing-tests-jvm.jar 456,687 456,627 60 0.01%
test-suite-jvm.jar 96,930 97,206 -276 -0.28%

@lauzadis lauzadis merged commit 99d1214 into kn-main Feb 3, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants