Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByteString implementation #148

Merged
merged 32 commits into from
Jun 30, 2023
Merged

ByteString implementation #148

merged 32 commits into from
Jun 30, 2023

Conversation

fzhinkin
Copy link
Collaborator

@fzhinkin fzhinkin commented Jun 23, 2023

Implemented ByteString - an immutable wrapper around byte sequence and added base support to the core IO library.

Missing stuff:

  • Module documentation

Closes #133

@fzhinkin fzhinkin added this to the 0.2.0 milestone Jun 23, 2023
@fzhinkin fzhinkin force-pushed the prototype-preview-byte-strings branch 2 times, most recently from eafb2dc to 6a535f8 Compare June 23, 2023 13:57
@fzhinkin fzhinkin requested a review from shanshin June 23, 2023 14:20
@fzhinkin fzhinkin self-assigned this Jun 23, 2023
@fzhinkin fzhinkin linked an issue Jun 23, 2023 that may be closed by this pull request
@fzhinkin fzhinkin force-pushed the prototype-preview-byte-strings branch 2 times, most recently from 08c8f2f to 77f27a8 Compare June 23, 2023 14:34
@fzhinkin fzhinkin marked this pull request as ready for review June 26, 2023 12:44
Copy link
Contributor

@whyoleg whyoleg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor questions, mainly regarding API parity with stdlib String/ByteArray

bytestring/build.gradle.kts Outdated Show resolved Hide resolved
bytestring/src/commonMain/kotlin/Annotations.kt Outdated Show resolved Hide resolved
bytestring/src/commonMain/kotlin/ByteString.kt Outdated Show resolved Hide resolved
bytestring/src/commonMain/kotlin/ByteStringBuilder.kt Outdated Show resolved Hide resolved
bytestring/src/commonMain/kotlin/ByteStringBuilder.kt Outdated Show resolved Hide resolved
bytestring/src/commonMain/kotlin/ByteStringBuilder.kt Outdated Show resolved Hide resolved
bytestring/src/jvmMain/kotlin/ByteStringJvmExt.kt Outdated Show resolved Hide resolved
core/common/src/ByteStringExt.kt Show resolved Hide resolved
@fzhinkin fzhinkin force-pushed the prototype-preview-byte-strings branch from 4684c07 to 43a9e76 Compare June 26, 2023 15:27
bytestring/common/src/ByteString.kt Outdated Show resolved Hide resolved
bytestring/common/src/ByteStringBuilder.kt Outdated Show resolved Hide resolved
bytestring/common/src/unsafe/UnsafeByteStringOperations.kt Outdated Show resolved Hide resolved
bytestring/jvm/src/ByteStringJvmExt.kt Outdated Show resolved Hide resolved
Base automatically changed from prototype-preview-trimmed-down-api to prototype-preview June 27, 2023 10:39
@fzhinkin fzhinkin force-pushed the prototype-preview-byte-strings branch from 9b44bd9 to 30e76b0 Compare June 27, 2023 11:52
* @param string the string to be encoded.
*/
public fun ByteString.Companion.fromUtf8String(string: String): ByteString {
return wrap(string.encodeToByteArray())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodeToByteArray behaves differently on JVM and Native when encoding invalid code points.

It may be fine, but primitives from the code module behave similarly on all targets (and the behavior is the same as encodeToByteArray's on JVM).

So we have to either hoist UTF-8 support from the code module into a separate module and reuse it here or change the behavior in the core module to match stdlib's behavior.

Either way, I'd rather postpone a fix and discuss it separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, does it make sense to create an YouTrack issue on Kotlin side mentioning this inconsistency? Specifically in scope of ongoing efforts on stabilising K/N runtime/stdlib APIs and behaviour?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: K/JS behaviour of encodeToByteArray is the same as in K/N (not sure about K/WASM). In that case only K/JVM behaviour is differs from other platforms.
And now in kotlinx-io we adopting K/JVM behaviour.
While it could be fine, it's a little inconsistent

@fzhinkin fzhinkin force-pushed the prototype-preview-byte-strings branch from 30e76b0 to 7702cbd Compare June 27, 2023 15:13
bytestring/common/src/ByteStringBuilder.kt Outdated Show resolved Hide resolved
bytestring/common/test/ByteStringBuilderTest.kt Outdated Show resolved Hide resolved
core/common/src/ByteStringExt.kt Show resolved Hide resolved
core/common/src/ByteStringExt.kt Outdated Show resolved Hide resolved
core/common/test/AbstractSourceTest.kt Outdated Show resolved Hide resolved
bytestring/jvm/test/ByteStringJvmTest.kt Show resolved Hide resolved
}

/**
* Compares a byte sequence wrapped by this byte string to a byte sequence wrapped by [other]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth adding either a more detailed description or examples of exactly how the comparison works.

If I am not mistaken, the lexicographic order does not strictly define the relation of values such as (1, 2, 3) ? (1, 3)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mention that the behavior is similar to String::compareTo and then add a sample comparing different byte strings in a separate PR (for #134)

bytestring/common/src/ByteString.kt Outdated Show resolved Hide resolved
bytestring/common/src/ByteString.kt Outdated Show resolved Hide resolved
@fzhinkin fzhinkin force-pushed the prototype-preview-byte-strings branch from 7702cbd to 5e0fb14 Compare June 28, 2023 09:08
@fzhinkin fzhinkin requested a review from qwwdfsad June 28, 2023 09:39
@fzhinkin fzhinkin changed the base branch from prototype-preview to develop June 28, 2023 15:29
@qwwdfsad qwwdfsad requested a review from shanshin June 28, 2023 16:30

import kotlinx.io.bytestring.ByteString
import kotlinx.io.bytestring.buildByteString

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed it during the original review.
We typically prefer extensions for class Foo either placed in the same file (-> classFooKt) for small and compact additions, or in plural form of the entity -- Foos (Channels.kt, Strings.kt, Serializers.kt). Bit nicer stacktraces, a bit more consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also use @file:JvmMultifileClass for files spread across platforms, but maybe it's a bit too redundant

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing it!


perPackageOption {
suppress.set(true)
matchingRegex.set(".*unsafe.*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth excluding it from the official documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, it looks like a Forbidden fruit - we're asking not to use it by any means and then making it visible to everyone.

bytestring/common/src/ByteString.kt Show resolved Hide resolved
bytestring/common/src/ByteString.kt Outdated Show resolved Hide resolved
bytestring/common/src/ByteString.kt Outdated Show resolved Hide resolved
bytestring/common/src/ByteString.kt Show resolved Hide resolved
bytestring/common/src/unsafe/Annotations.kt Outdated Show resolved Hide resolved
bytestring/common/src/unsafe/UnsafeByteStringOperations.kt Outdated Show resolved Hide resolved
@fzhinkin fzhinkin requested a review from qwwdfsad June 28, 2023 18:14
fzhinkin added 3 commits June 28, 2023 21:39
Currently, we use ByteString.decodeToString and String.encodeToByteString as names for conversion methods between String and ByteString, where non-parameterized functions convert to/from UTF-8, and JVM-specific extensions accept Charset.
At the same time, the core module uses different naming for methods reading/writing UTF-8 string and methods reading/writing strings using specific Charset:
- Source.readUtf8/Sink.writeUtf8 to work with UTF-8 strings
- Source.readString/Sink.writeString to work with strings using the given Charset on JVM.

The naming is inconsistent and it seems reasonable to unify read/write methods naming with what we have for ByteString.
We can use readString/writeString w/o charset for UTF-8 strings (as these are treated as default in the library) and same-titled JVM-specific extensions accepting a Charset.
@fzhinkin fzhinkin merged commit 19db7f2 into develop Jun 30, 2023
@fzhinkin fzhinkin deleted the prototype-preview-byte-strings branch June 30, 2023 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ByteStrings
4 participants