Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent and confusing Sink/Source read/write-methods naming #137

Closed
fzhinkin opened this issue Jun 12, 2023 · 7 comments · Fixed by #136
Closed

Inconsistent and confusing Sink/Source read/write-methods naming #137

fzhinkin opened this issue Jun 12, 2023 · 7 comments · Fixed by #136
Assignees
Milestone

Comments

@fzhinkin
Copy link
Collaborator

Sink and Source provide methods to read and write data; for some of them, the behavior could be unclear from the name itself. The naming could be improved to make what to expect from different methods more obvious.

Here is a table describing methods and their behavior:

Method Behavior
RawSource::read(sink: Buffer, bytesCount: Long): Long reads up to byteCount bytes into sink and returns the number of bytes read, or -1 if the source is exhausted
Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int reads up to byteCount bytes and writes it into sink starting at offset, returns the number of bytes read, or -1 if the source is exhausted
Source::readFully(sink: RawSink, byteCount: Long): Unit reads byteCount bytes from the source and writes it into sink, throws exception if there are not enough data in the source
Source::readAll(sink: RawSink): Long Reads all data until exhaustion and writes it into sink, return the number of bytes read
Source::readFully(sink: ByteArray) Reads sink.length bytes from from the source
Source::readByteArray() Reads all data from the source into a newly allocated array
Source::readByteArray(byteCount: Long): ByteArray Reads byteCount bytes from the source into newly allocated array
RawSink::write(source: Buffer, byteCount: Long) writes byteCount bytes from source
Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink writes byteCount bytes from source starting from offset
Sink::writeAll(source: RawSource): Long reads data from source until its exhaustion and writes it to this sink, return the number of bytes written
Sink::write(source: RawSource, byteCount: Long): Sink reads byteCount bytes from source and writes into this sink, throws whatever exception source will throw during the read

There are some inconsistencies in the behavior of read/write method pairs, as well as in their naming:

  • Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int is a unix-style read method that does not guarantee that the whole array's slice will be filled but instead returns how many bytes were actually read. Corresponding write-method, Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink, unlike it's unix ancestor, writes the whole slice.
  • Source::readFully(sink: RawSink, byteCount: Long): Unit contains Fully in the name, it's write counterpart, Sink::write(source: RawSource, byteCount: Long): Sink, doesn't.
  • At the same time, there are readAll write writeAll methods, which do conceptually the same thing as readFully/write, but use a different naming convention.

The behavior of Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int and Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink mimics java.io.InputStream/OutputStream, so leaving names and behavior without any changes might be OK. Also, there are extension functions to fill the whole array with data.

However, other functions may have names allowing to distinguish them from each other and to provide a more obvious hint regarding expected behavior.

I looked at some other IO libraries, and here are the naming conventions adopted there:

I'm suggesting the following rename:

Old name New name
Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int don't change
Source::readFully(sink: RawSink, byteCount: Long): Unit Source::readExactly(sink: RawSink, byteCount: Long): Unit
Source::readFully(sink: ByteArray) Source::readExactly(sink: ByteArray)
Source::readAll(sink: RawSink): Long Source::readAllTo(sink: RawSink): Long
Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink don't change
Sink::writeAll(source: RawSource): Long Sink::writeAllFrom(source: RawSource): Long
Sink::write(source: RawSource, byteCount: Long): Sink don't change

Write methods, except writeAll, have the same behavior so they could have the same name; readExactly better describes the operation compared to readFully; readAllTo/writeAllFrom additionally stress out the direction of an operation.

Your opinion will be welcomed if you have better ideas regarding the naming.

@fzhinkin fzhinkin added this to the 0.2.0 milestone Jun 12, 2023
@fzhinkin fzhinkin self-assigned this Jun 12, 2023
@fzhinkin fzhinkin mentioned this issue Jun 12, 2023
3 tasks
@swankjesse
Copy link

Nice improvements!

I’m a bit skeptical of readExactly. I might expect this to be like an assertion, where I’m telling the source what I expect it to contain?

val expected = "foo".toByteArray()
val source = Buffer().writeUtf8("bar")
source.readExactly(expected) // throws because it didn’t read what was expected

It doesn’t do this obviously, but I could be forgiven for expecting it.

@swankjesse
Copy link

I also like transferTo / transferFrom a lot. Okio gets it wrong here.

@fzhinkin
Copy link
Collaborator Author

I’m a bit skeptical of readExactly. I might expect this to be like an assertion, where I’m telling the source what I expect it to contain?

@swankjesse Another option is to adopt readNBytes from Java. Such a name should leave less space for wrong expectations. :)

@whyoleg
Copy link
Contributor

whyoleg commented Jun 13, 2023

Hey! Finally kotlinx-io is resurrecting!
I'm now only started to look at Okio and so kotlinx-io API (previously mostly worked with ktor-io and buffer-like IO like ByteBuffer, netty ButeBuf and panama MemorySegment), and so presenting some thoughts on how it can look like from my perspective.
IMO being a little more explicit in regard of will we fail or just return partial result is crucial for understanding IO
API, as there are a lot of different APIs which uses the same method name, but overloads do fully different things (f.e.
JDK MessageDigest::digest overloads or Cipher::doFinal overloads). So I like the direction in which this issue propose to change functions.

But, I believe that may be some more radical changes are needed. I could be wrong in some parts, as I was less involved in streaming IO comparing to working with fixed-size buffers. So here is my vision of core methods naming:

For writing:

Current New
RawSink::write(source: Buffer, byteCount: Long) RawSink::transferExactlyFrom(source: Buffer, byteCount: Long)
Sink::write(source: RawSource, byteCount: Long): Sink Sink::transferExactlyFrom(source: RawSource, byteCount: Long): Sink
nothing, optional change, just for consistency Sink::transferAvailableFrom(source: RawSource, byteCount: Long): Long
Sink::writeAll(source: RawSource): Long Sink::writeAllFrom(source: RawSource): Long
Sink::write(source: ByteArray, offset: Int = 0, byteCount: Int = source.size - offset): Sink Sink::writeExactlyFrom(source: ByteArray, offset: Int = 0, byteCount: Int = source.size - offset): Sink
nothing, optional change, just for consistency Sink::writeAvailableFrom(source: ByteArray, offset: Int = 0, byteCount: Int = source.size - offset): Int
Sink::writeUtf8(string: String, beginIndex: Int = 0, endIndex: Int = string.length): Sink no changes
Sink::writeByte(byte: Byte): Sink no changes
Sink::writeInt(int: Int): Sink no changes

For reading:

Current New
RawSource::read(sink: Buffer, byteCount: Long): Long RawSource::transferAvailableTo(sink: Buffer, byteCount: Long): Long
nothing, optional change, just for consistency Source::transferAvailableTo(sink: RawSink, byteCount: Long): Long
Source::readFully(sink: RawSink, byteCount: Long) Source::transferExactlyTo(sink: RawSink, byteCount: Long)
Source::readAll(sink: RawSink): Long Source::readAllTo(sink: RawSink): Long
Source::readFully(sink: ByteArray) Source::readExactlyTo(sink: ByteArray, offset: Int = 0, byteCount: Int = sink.size - offset)
Source::read(sink: ByteArray, offset: Int = 0, byteCount: Int = sink.size - offset): Int Source::readAvailableTo(sink: ByteArray, offset: Int = 0, byteCount: Int = sink.size - offset): Int
Source::readByteArray(): ByteArray no changes
Source::readByteArray(byteCount: Long): ByteArray no changes
Source::readUtf8(): String no changes
Source::readUtf8(byteCount: Long): String no changes
Source::readByte(): Byte no changes
Source::readInt(): Int no changes

Contract:

  • transfer* - MAY transfer PARTIAL data if it fails because of a source exhausted
  • read*/write* - SHOULD transfer NO data if they fail
  • *Available* - if there is not enough data/space, return successfully with numbers of bytes read (always < byteCount)
  • *Exactly* - if there is not enough data/space, throw exception
  • *Int/*Long/*Primitive - same contract as *Exactly*
    (so readInt/writeInt is overall readExactlyInt/writeExactlyInt)
  • *ByteArray/*Utf8 - same contract as *Exactly

Some notes:

  • may be Available naming is not good enough, though I haven't found better word :)
  • another naming of Available/Exactly is to use try as prefix: tryTransferTo/transferTo instead
    of transferAvailableTo/transferExactlyTo. In this case naming will be better aligned with *Int/*Long/*Utf8
    etc.
  • transfer* naming also relates to issue Source::readFully inconsistent with other read methods #139

Questions (maybe better to move those to separate issues):

  • While looking at API, I've always struggled: is it really needed for Sink.write* methods to return Sink?
    While this could be useful sometime, it somehow makes API inconsistent, as some methods return Unit or
    even Long/Int. In this case, chaining will be not available. In Kotlin, we have apply and other fancy ways to
    reduce duplication of code, so maybe chaining is not really necessary?
  • is Utf8 naming final? Why don't use String and later, if/when Charset will be introduced (I believe it should)
    just additional parameter can be introduced to functions

@JakeWharton
Copy link
Contributor

JakeWharton commented Jun 13, 2023

is it really needed for Sink.write* methods to return Sink?

Right, probably unnecessary in Kotlin where a scoping function can trivially make your Sink the receiver. In Okio it's a holdover from when it was written in Java where chaining was more helpful in reducing verbosity.

@fzhinkin
Copy link
Collaborator Author

write* methods will be updated not to return Sink from it, the proposal sounds good to me.

As for naming convention for read/write methods, we're gravitating towards the following:

  • use transferTo/transferFrom for methods moving all bytes from a source to a sink until exhaustion;
  • leave all other write* methods as it is right now: all these methods have the same semantics - write all bytes representing an argument to a sink, so there is no need to use any specific adjectives like fully, exactly, etc. as all these methods are "exact" writes; all such write-methods always write something passed as an argument, so it doesn't make a lot of sence to use From in its names.
  • adopt the following naming for read* methods:
    • until a name suggests the opposite, a read is an exact/fully read, so don't use additional adjectives to highlight it;
    • use additional adjectives only for non-exact reads;
    • read-methods reading at most N-bytes (like existing read methods) should be named as readAtMost; available is not a good choice here because such methods read some bytes, but it not necessarily bytes "available" in a buffer, or bytes "available" in an underlying source;
    • use read<type name> for reads returning data it reads (like readByte, readInt, readByteArray);
    • use readTo for reads where data is fed into a method's argument (like read(ByteArray)) - we can distinguish what type of data will be read from a method signature, so there is no need to explicitly mention it in the name (and it's always some sequence of bytes, so it seems narural to use the same name for different overloads here).
  • don't alter the naming to describe behavior in a failure-mode: the documentation is a better place to describe whether data will be consumed if read fails, or all bytes will remain in a buffered source.

This set of rules leads to following renaming:

Existing name Proposed name
RawSource::read(buffer: Buffer, byteCount: Long): Long RawSource::readAtMostTo(buffer: Buffer, byteCount: Long): Long
Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int Source::readAtMostTo(sink: ByteArray, beginIndex: Int, endIndex: Int): Int
Source::readFully(sink: RawSink, byteCount: Long) Source::readTo(sink: RawSink, byteCount: Long)
Source::readFully(sink: ByteArray) Source::readTo(sink: ByteArray, beginIndex: Int, endIndex: Int)
Source::readByteArray(): ByteArray don't change
Source::readByteArray(count: Int): ByteArray don't change
Source::readAll(sink: RawSink): Long Source::transferTo(sink: RawSink): Long
Source::readByte/readInt/readShortLe/readLong/readSmthg don't change
RawSink::write(buffer: Buffer, byteCount: Long) don't change
Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink don't change
Sink::writeAll(source: RawSource): Long Sink::transferFrom(source: RawSource): Long
Sink::write(source: RawSource, byteCount: Long): Sink don't change
Sink::writeByte/writeInt/writeShortLe/writeLong/writeSmthg don't change

@swankjesse
Copy link

Strong thumbs up on these naming improvements.

@fzhinkin fzhinkin linked a pull request Jun 23, 2023 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants