Inconsistent and confusing Sink/Source read/write-methods naming #137

fzhinkin · 2023-06-12T13:44:09Z

Sink and Source provide methods to read and write data; for some of them, the behavior could be unclear from the name itself. The naming could be improved to make what to expect from different methods more obvious.

Here is a table describing methods and their behavior:

Method	Behavior
`RawSource::read(sink: Buffer, bytesCount: Long): Long`	reads up to `byteCount` bytes into `sink` and returns the number of bytes read, or `-1` if the source is exhausted
`Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int`	reads up to `byteCount` bytes and writes it into `sink` starting at `offset`, returns the number of bytes read, or `-1` if the source is exhausted
`Source::readFully(sink: RawSink, byteCount: Long): Unit`	reads `byteCount` bytes from the source and writes it into `sink`, throws exception if there are not enough data in the source
`Source::readAll(sink: RawSink): Long`	Reads all data until exhaustion and writes it into `sink`, return the number of bytes read
`Source::readFully(sink: ByteArray)`	Reads `sink.length` bytes from from the source
`Source::readByteArray()`	Reads all data from the source into a newly allocated array
`Source::readByteArray(byteCount: Long): ByteArray`	Reads `byteCount` bytes from the source into newly allocated array
`RawSink::write(source: Buffer, byteCount: Long)`	writes `byteCount` bytes from `source`
`Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink`	writes `byteCount` bytes from `source` starting from `offset`
`Sink::writeAll(source: RawSource): Long`	reads data from `source` until its exhaustion and writes it to this sink, return the number of bytes written
`Sink::write(source: RawSource, byteCount: Long): Sink`	reads `byteCount` bytes from `source` and writes into this sink, throws whatever exception `source` will throw during the read

There are some inconsistencies in the behavior of read/write method pairs, as well as in their naming:

Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int is a unix-style read method that does not guarantee that the whole array's slice will be filled but instead returns how many bytes were actually read. Corresponding write-method, Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink, unlike it's unix ancestor, writes the whole slice.
Source::readFully(sink: RawSink, byteCount: Long): Unit contains Fully in the name, it's write counterpart, Sink::write(source: RawSource, byteCount: Long): Sink, doesn't.
At the same time, there are readAll write writeAll methods, which do conceptually the same thing as readFully/write, but use a different naming convention.

The behavior of Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int and Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink mimics java.io.InputStream/OutputStream, so leaving names and behavior without any changes might be OK. Also, there are extension functions to fill the whole array with data.

However, other functions may have names allowing to distinguish them from each other and to provide a more obvious hint regarding expected behavior.

I looked at some other IO libraries, and here are the naming conventions adopted there:

to read exactly N bytes of data: java.io.InputStream::readNBytes, System.IO.Stream::readExactly, std::io::Read::read_exact
to copy all data from a source to a sink: java.io.InputStream::transferTo, System.IO.Stream::CopyTo

I'm suggesting the following rename:

Old name	New name
`Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int`	don't change
`Source::readFully(sink: RawSink, byteCount: Long): Unit`	`Source::readExactly(sink: RawSink, byteCount: Long): Unit`
`Source::readFully(sink: ByteArray)`	`Source::readExactly(sink: ByteArray)`
`Source::readAll(sink: RawSink): Long`	`Source::readAllTo(sink: RawSink): Long`
`Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink`	don't change
`Sink::writeAll(source: RawSource): Long`	`Sink::writeAllFrom(source: RawSource): Long`
`Sink::write(source: RawSource, byteCount: Long): Sink`	don't change

Write methods, except writeAll, have the same behavior so they could have the same name; readExactly better describes the operation compared to readFully; readAllTo/writeAllFrom additionally stress out the direction of an operation.

Your opinion will be welcomed if you have better ideas regarding the naming.

The text was updated successfully, but these errors were encountered:

swankjesse · 2023-06-13T11:05:43Z

Nice improvements!

I’m a bit skeptical of readExactly. I might expect this to be like an assertion, where I’m telling the source what I expect it to contain?

val expected = "foo".toByteArray()
val source = Buffer().writeUtf8("bar")
source.readExactly(expected) // throws because it didn’t read what was expected

It doesn’t do this obviously, but I could be forgiven for expecting it.

swankjesse · 2023-06-13T11:07:26Z

I also like transferTo / transferFrom a lot. Okio gets it wrong here.

fzhinkin · 2023-06-13T11:12:05Z

I’m a bit skeptical of readExactly. I might expect this to be like an assertion, where I’m telling the source what I expect it to contain?

@swankjesse Another option is to adopt readNBytes from Java. Such a name should leave less space for wrong expectations. :)

whyoleg · 2023-06-13T14:25:53Z

Hey! Finally kotlinx-io is resurrecting!
I'm now only started to look at Okio and so kotlinx-io API (previously mostly worked with ktor-io and buffer-like IO like ByteBuffer, netty ButeBuf and panama MemorySegment), and so presenting some thoughts on how it can look like from my perspective.
IMO being a little more explicit in regard of will we fail or just return partial result is crucial for understanding IO
API, as there are a lot of different APIs which uses the same method name, but overloads do fully different things (f.e.
JDK MessageDigest::digest overloads or Cipher::doFinal overloads). So I like the direction in which this issue propose to change functions.

But, I believe that may be some more radical changes are needed. I could be wrong in some parts, as I was less involved in streaming IO comparing to working with fixed-size buffers. So here is my vision of core methods naming:

For writing:

Current	New
RawSink::write(source: Buffer, byteCount: Long)	RawSink::transferExactlyFrom(source: Buffer, byteCount: Long)
Sink::write(source: RawSource, byteCount: Long): Sink	Sink::transferExactlyFrom(source: RawSource, byteCount: Long): Sink
nothing, optional change, just for consistency	Sink::transferAvailableFrom(source: RawSource, byteCount: Long): Long
Sink::writeAll(source: RawSource): Long	Sink::writeAllFrom(source: RawSource): Long
Sink::write(source: ByteArray, offset: Int = 0, byteCount: Int = source.size - offset): Sink	Sink::writeExactlyFrom(source: ByteArray, offset: Int = 0, byteCount: Int = source.size - offset): Sink
nothing, optional change, just for consistency	Sink::writeAvailableFrom(source: ByteArray, offset: Int = 0, byteCount: Int = source.size - offset): Int
Sink::writeUtf8(string: String, beginIndex: Int = 0, endIndex: Int = string.length): Sink	no changes
Sink::writeByte(byte: Byte): Sink	no changes
Sink::writeInt(int: Int): Sink	no changes

For reading:

Current	New
RawSource::read(sink: Buffer, byteCount: Long): Long	RawSource::transferAvailableTo(sink: Buffer, byteCount: Long): Long
nothing, optional change, just for consistency	Source::transferAvailableTo(sink: RawSink, byteCount: Long): Long
Source::readFully(sink: RawSink, byteCount: Long)	Source::transferExactlyTo(sink: RawSink, byteCount: Long)
Source::readAll(sink: RawSink): Long	Source::readAllTo(sink: RawSink): Long
Source::readFully(sink: ByteArray)	Source::readExactlyTo(sink: ByteArray, offset: Int = 0, byteCount: Int = sink.size - offset)
Source::read(sink: ByteArray, offset: Int = 0, byteCount: Int = sink.size - offset): Int	Source::readAvailableTo(sink: ByteArray, offset: Int = 0, byteCount: Int = sink.size - offset): Int
Source::readByteArray(): ByteArray	no changes
Source::readByteArray(byteCount: Long): ByteArray	no changes
Source::readUtf8(): String	no changes
Source::readUtf8(byteCount: Long): String	no changes
Source::readByte(): Byte	no changes
Source::readInt(): Int	no changes

Contract:

transfer* - MAY transfer PARTIAL data if it fails because of a source exhausted
read*/write* - SHOULD transfer NO data if they fail
*Available* - if there is not enough data/space, return successfully with numbers of bytes read (always < byteCount)
*Exactly* - if there is not enough data/space, throw exception
*Int/*Long/*Primitive - same contract as *Exactly*
(so readInt/writeInt is overall readExactlyInt/writeExactlyInt)
*ByteArray/*Utf8 - same contract as *Exactly

Some notes:

may be Available naming is not good enough, though I haven't found better word :)
another naming of Available/Exactly is to use try as prefix: tryTransferTo/transferTo instead
of transferAvailableTo/transferExactlyTo. In this case naming will be better aligned with *Int/*Long/*Utf8
etc.
transfer* naming also relates to issue Source::readFully inconsistent with other read methods #139

Questions (maybe better to move those to separate issues):

While looking at API, I've always struggled: is it really needed for Sink.write* methods to return Sink?
While this could be useful sometime, it somehow makes API inconsistent, as some methods return Unit or
even Long/Int. In this case, chaining will be not available. In Kotlin, we have apply and other fancy ways to
reduce duplication of code, so maybe chaining is not really necessary?
is Utf8 naming final? Why don't use String and later, if/when Charset will be introduced (I believe it should)
just additional parameter can be introduced to functions

JakeWharton · 2023-06-13T14:44:41Z

is it really needed for Sink.write* methods to return Sink?

Right, probably unnecessary in Kotlin where a scoping function can trivially make your Sink the receiver. In Okio it's a holdover from when it was written in Java where chaining was more helpful in reducing verbosity.

fzhinkin · 2023-06-16T08:38:18Z

write* methods will be updated not to return Sink from it, the proposal sounds good to me.

As for naming convention for read/write methods, we're gravitating towards the following:

use transferTo/transferFrom for methods moving all bytes from a source to a sink until exhaustion;
leave all other write* methods as it is right now: all these methods have the same semantics - write all bytes representing an argument to a sink, so there is no need to use any specific adjectives like fully, exactly, etc. as all these methods are "exact" writes; all such write-methods always write something passed as an argument, so it doesn't make a lot of sence to use From in its names.
adopt the following naming for read* methods:
- until a name suggests the opposite, a read is an exact/fully read, so don't use additional adjectives to highlight it;
- use additional adjectives only for non-exact reads;
- read-methods reading at most N-bytes (like existing read methods) should be named as readAtMost; available is not a good choice here because such methods read some bytes, but it not necessarily bytes "available" in a buffer, or bytes "available" in an underlying source;
- use read<type name> for reads returning data it reads (like readByte, readInt, readByteArray);
- use readTo for reads where data is fed into a method's argument (like read(ByteArray)) - we can distinguish what type of data will be read from a method signature, so there is no need to explicitly mention it in the name (and it's always some sequence of bytes, so it seems narural to use the same name for different overloads here).
don't alter the naming to describe behavior in a failure-mode: the documentation is a better place to describe whether data will be consumed if read fails, or all bytes will remain in a buffered source.

This set of rules leads to following renaming:

Existing name	Proposed name
`RawSource::read(buffer: Buffer, byteCount: Long): Long`	`RawSource::readAtMostTo(buffer: Buffer, byteCount: Long): Long`
`Source::read(sink: ByteArray, offset: Int, byteCount: Int): Int`	`Source::readAtMostTo(sink: ByteArray, beginIndex: Int, endIndex: Int): Int`
`Source::readFully(sink: RawSink, byteCount: Long)`	`Source::readTo(sink: RawSink, byteCount: Long)`
`Source::readFully(sink: ByteArray)`	`Source::readTo(sink: ByteArray, beginIndex: Int, endIndex: Int)`
`Source::readByteArray(): ByteArray`	don't change
`Source::readByteArray(count: Int): ByteArray`	don't change
`Source::readAll(sink: RawSink): Long`	`Source::transferTo(sink: RawSink): Long`
`Source::readByte/readInt/readShortLe/readLong/readSmthg`	don't change
`RawSink::write(buffer: Buffer, byteCount: Long)`	don't change
`Sink::write(source: ByteArray, offset: Int, byteCount: Int): Sink`	don't change
`Sink::writeAll(source: RawSource): Long`	`Sink::transferFrom(source: RawSource): Long`
`Sink::write(source: RawSource, byteCount: Long): Sink`	don't change
`Sink::writeByte/writeInt/writeShortLe/writeLong/writeSmthg`	don't change

swankjesse · 2023-06-16T20:46:36Z

Strong thumbs up on these naming improvements.

fzhinkin added this to the 0.2.0 milestone Jun 12, 2023

fzhinkin self-assigned this Jun 12, 2023

fzhinkin mentioned this issue Jun 12, 2023

A trimmed-down API #136

Merged

3 tasks

fzhinkin linked a pull request Jun 23, 2023 that will close this issue

A trimmed-down API #136

Merged

3 tasks

fzhinkin closed this as completed Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent and confusing Sink/Source read/write-methods naming #137

Inconsistent and confusing Sink/Source read/write-methods naming #137

fzhinkin commented Jun 12, 2023

swankjesse commented Jun 13, 2023

swankjesse commented Jun 13, 2023

fzhinkin commented Jun 13, 2023

whyoleg commented Jun 13, 2023 •

edited

Loading

JakeWharton commented Jun 13, 2023 •

edited

Loading

fzhinkin commented Jun 16, 2023

swankjesse commented Jun 16, 2023

Inconsistent and confusing Sink/Source read/write-methods naming #137

Inconsistent and confusing Sink/Source read/write-methods naming #137

Comments

fzhinkin commented Jun 12, 2023

swankjesse commented Jun 13, 2023

swankjesse commented Jun 13, 2023

fzhinkin commented Jun 13, 2023

whyoleg commented Jun 13, 2023 • edited Loading

JakeWharton commented Jun 13, 2023 • edited Loading

fzhinkin commented Jun 16, 2023

swankjesse commented Jun 16, 2023

whyoleg commented Jun 13, 2023 •

edited

Loading

JakeWharton commented Jun 13, 2023 •

edited

Loading