Make IO#skip IO#write returns the number of bytes it skipped/written #9233

bcardiff · 2020-05-05T21:44:56Z

Closes #9185

IO#skip still allows any Int as the argument, but UInt64 is returned
Fix build (🤞) and added a couple of assertions w.r.t. IO#skip returns the number of bytes it skipped #9185
Make IO#write_* : UInt64
Make IO#printf(...) : UInt64

bcardiff · 2020-05-07T19:16:28Z

I added a tiny amount of specs for write.

I am in doubt whether all write operations should return the byte counts or not.

If we limit to only IO#write(Slice) it would seem rather limited.
If we allow all write_* methods there is a need to for a BytesCounter to allow nested writes accounting. Which is nice, but it adds some overhead clearly.
I added the same mechanism to count for IO#printf, but if that method returns the bytes written I feel that the same should happen with IO#puts and IO#print. I see puts and print as more high level operations, but I am not sure about were does printf fall.

It's a bit weird that write would return UInt64 but read Int32. I understand it is convenient, but it still feels odd.

I think this is ready for a round of feedback before going further.

Return bytes only in write_* operations. printf, puts, etc return Nil

bcardiff · 2020-05-13T22:05:38Z

Rebased and updated. I removed the wrapper and implement the accounting in the ByteFormat and Encoding. These are the only cases needed if we keep the printf, puts and other methods to return Nil. Making only the write_* return the number of bytes written.

@carlhoerberg , how does this sound for your use cases?

carlhoerberg · 2020-05-13T22:08:29Z

How's the performance for the nested byte accounting?

bcardiff · 2020-05-13T22:41:37Z

@carlhoerberg not good. It adds an extra layer. We also check that in ruby printf/puts return Nil.

I wanted to have a composable solution, hence I went on BytesCounter. But if the API is enough as it is right now, there is no need for it.

asterite · 2020-05-13T22:45:13Z

Maybe &+ should be used instead of +? I don't know if that's where the overhead comes from. Do we want the write to fail on overflow if exceeding UInt64 max?

bcardiff · 2020-05-13T22:47:43Z

Well, it can't overflow. There is not way to write more that UInt64 bytes in those method.
Since it can't overflow using &+ should be safe.

The overhead i refer was due to a IO wrapper, not the addition itself.

carlhoerberg · 2020-05-13T22:59:17Z

Ok, let's leave the ByteCounter out then, performance is way more important here. Let's just return the bytes where it's easy and cheap.

asterite · 2020-05-13T23:27:12Z

Why is the written length needed? What's a use case?

asterite · 2020-05-13T23:29:03Z

If we allow all write_* methods there is a need to for a BytesCounter

If the nested write calls return the size, why a wrapper is needed?

asterite · 2020-05-13T23:30:05Z

If we allow all write_* methods there

Also, what other write_ methods are there? I only found write_byte and write_bytes.

waj · 2020-05-13T23:35:11Z

I only found write_byte and write_bytes

There is also write_utf8 and write(Slice). Those are the low level functions to write bytes, and where counting is more likely to be useful. If you do puts or printf the output is then more text oriented.

Note that IO#write in Ruby or Write in Go, also returns the number of bytes. For these functions is quite simple and direct to obtain this number.

asterite · 2020-05-14T12:02:55Z

@waj Oh, I see.

I'm still curious which methods needed a ByteCounter.

asterite · 2020-05-14T12:04:57Z

Also a note again that in Go for example the return value is an int, not an unsigned int. My guess is that if it works fine for Go, it will work fine for Crystal. But if you think UInt64 is better, then go ahead.

bcardiff · 2020-05-14T13:33:01Z

The return type of these methods should be UInt64. This will match #7660.
ref: (#9185 (review))

It bugs me a little the inconsistency with read operations. But it bugs me more if it would be with copy that is a write operation :-).

I think it's mainly low level so UInt64 is fine here.

@asterite making printf return number of bytes required (or was easier) with the wrapper. The implementation of String::Formatter would've be tainted with accounting of those bytes otherwise. And its implementation uses io <<, which returns the IO and not the amount of bytes written.

I changed the implementation to use &+ when possible.

asterite · 2020-05-14T13:39:52Z

Oh, I see. Yeah, for textual things maybe returning the number of bytes is not needed. In Go printf returns the number of bytes written, though. I just don't know what one does with that number anyway :-)

src/compress/deflate/writer.cr

src/compress/gzip/writer.cr

src/compress/zlib/writer.cr

didactic-drunk · 2020-05-22T16:57:42Z

Is every write(slice) method supposed to return slice.bytesize.to_u64? How much overhead if any does that add?

HankAnderson · 2020-05-22T17:48:45Z

In Go Write returns the number of bytes written and an optional error. That means that you might get less bytes written in case of an error, and you know that.

Since in Crystal if write is successful it means all the slice was written, it's absolutely pointless to return slice.size. It's actually more job for the runtime for when you don't need it, and you can always know it by doing slice.size, as @didactic-drunk mentions.

What would be needed here is for exceptions to capture how many bytes were written before the exception happened.

That said, no use case has been provided for this change other than "avoiding to call size on the given slice instead of getting the value from the call".

This is required for Crystal 0.35.0 (see crystal-lang/crystal#9233)

carlhoerberg · 2020-06-11T08:23:31Z

@didactic-drunk @HankAnderson that's not true, consider this:

class CustomProtcol < IO
  def write(slice)
    bytes = 0_i64
    bytes += @io.write "prefix".to_slice
    bytes += @io.write slice
    bytes += @io.write "suffix".to_slice
  end
end

more than slice.size is written, so you can't just blindly return slice.size. Even more important in to_io, eg.

struct ShortString
  def initialize(@string)
  end

  def to_io(io)
    io.write_byte @string.bytesize.to_u8
    io.write(@string.to_slice) + sizeof(UInt8)
  end
end

socket = TCPSocket
bytes_written = socket.write_bytes ShortString.new("foo")
bytes_written # 4, not 3

It vastly simplifies position accounting.

asterite · 2020-06-11T09:21:55Z

I don't think a custom protocol should be an IO, really... you are using IO in a wrong way. For example if you print an object, or puts an object to that IO, you will get many prefixes and suffixes, which is probably not expected.

If you want to track written slice size in the custom protocol (which is not an IO!) then do it, but all IOs just write the full slice as a byte size.

For that matter, Flate, Gzip, etc. currently return the passed slice size, but in reality they might write less data (because it is compressed!). So already writer's return value is confusing. Is it returning how much data was actually written, or how much of the slice was written? If it's the latter, then your custom protocol should return the slice size, regardless of what else was written. If it's the former, then the stdlib is already broken.

If you check what Go does, https://golang.org/pkg/io/#Writer (maybe you wanted this is Crystal because of Go or Ruby?) you'll notice that the return value is always in bounds with the passed slice size. It's returning how many bytes were written from the slice, not to the actual device (this is hard or impossible to know when data is buffered and compressed).

So your custom protocol should return slice.size.

In fact all IOs on crystal will return that because if they are successful, all the slice will be written out. If they are not, an exception will be raised. The problem is that this exception doesn't tell you how many bytes were written, and this is the only useful thing that we should add if we wanted to.

But right now, IO#write return value is completely broken and nonsense.

straight-shoota · 2020-06-11T09:30:50Z

But right now, IO#write return value is completely broken and nonsense.

I agree. It's also completely lacking any documentation.

In what use case would it not return slice.size?

carlhoerberg · 2020-06-11T11:51:08Z

yes, it was a bad example, i'm not implementing a protocol with a custom IO, but the write_bytes is correct, you can't know how many bytes that was actually written if write_bytes/to_io doesn't return the actual number of written written, with prefixes/compressed or whatever. The "problem" it solves is that you don't have to have to repeatedly call IO#pos, which gets expensive on files and impossible on sockets, or riddle your code with implementation details (how many extra bytes does a ShortString add?)

But I agree, the lack of enforcing is problematic.

carlhoerberg · 2020-06-11T11:59:33Z

To me it's a nice to have, not a need to have, you can hack around it with manual position accounting. And its current form seems to cause a lot of confusion and is hard (impossible?) to enforce.

asterite · 2020-06-11T12:28:44Z

The "problem" it solves is that you don't have to have to repeatedly call IO#pos

Maybe you can show us your code to understand why you need to know the position at every point?

asterite · 2020-06-11T12:29:41Z

you can't know how many bytes that was actually written if write_bytes/to_io doesn't return the actual number of written written

Then those methods should return ~~how many bytes were written~~ how many bytes they translate to, but there's no need to do it for IO#write.

bcardiff · 2020-06-11T13:56:18Z

We can enforce that the write returns always the slice size if we change the design of IO and add write_impl(io : IO) : Nil (or whatever name) that can be override. Then IO#write will take care of returning the slice.

But that will be half way. The write_bytes implementation is also subject to not behave as expected.

There are some contracts that can't be enforced by types. We can discuss a design that is less fragile, but it can always be broken in some way by the implementation.

I agree that protocols are better to be implemented by using and IO rather than inheriting one. But if we think of some transformation that can be activated or not to decorate an IO, inheriting make sense in the current state of the std-lib. If these decorators change the the amount of bytes written from the input maybe it is desired to still keep track of the underlying position count. I know is a maybe. But that is why I feel more comfortable with the proposed change.

waj · 2020-06-11T15:11:52Z

I think the changes on the IO interface were a mistake and we should revert it. Let me explain why:

Trivial: the value that should be returned by IO#write is always slice.size, otherwise the implementation is incorrect. Returning a value that could be trivially calculated at the call site sounds bad to me.
Confusing: this could be mitigated with documentation, but the interface alone is confusing, I think mainly because of the previous reason. Is it the returned value the number of bytes meant to be write or the bytes actually written? Because the first option it's too trivial to be true, then it feels the answer should be the second one. (but note that is impossible in the general case)
Error prone: as we could see already, if someone extends IO but it doesn't return slice.size, then the implementation is wrong.

At the end we have something that makes programmers less happy without too much gain.

Now, for the cases where having this value is actually useful, those could be solved in a different way:

Make a (private) alternative method to write bytes, that also returns the slice.size
Create a AccountingIO that keeps count of bytes written so it always know the position without constantly calling pos.

I know going back and forward with changes like this is far from ideal. But since we're not at 1.0 yet it's a good opportunity to assume our mistakes and fix them before it's too late.

straight-shoota · 2020-06-11T15:34:19Z

Just for reference, because the original discussion about adding byte count to IO#write isn't linked here: It's at #9186

…written (crystal-lang#9233)" This reverts commit 7f13250.

* Revert "Change IO#write, IO#skip, IO.copy to return Int64 (#9363)" This reverts commit 6380fa9. * Revert "Make IO#skip IO#write returns the number of bytes it skipped/written (#9233)" This reverts commit 7f13250. * Keep IO::Stabled#skip_to_end * Keep IO#copy returning Int64 * Code cleanup

bcardiff added kind:feature breaking-change topic:stdlib:networking labels May 5, 2020

bcardiff force-pushed the feature/io-write-skip-return branch from 72b57bc to 7da54a3 Compare May 7, 2020 19:14

bcardiff marked this pull request as ready for review May 7, 2020 19:17

carlhoerberg and others added 7 commits May 13, 2020 18:30

IO#skip returns the number of bytes it skipped

61e7d2a

make IO#skip return UInt64

73d1649

Allow Int arguments in IO#skip

0d00cf9

Add specs assertions for skip / skip_to_end

38c834d

Refactor: delegate IO::Stapled#skip_to_end directly to reader

05b1aa2

Make IO#write return bytes written

b0752c5

Add specs, make printf return written bytes

32e7526

bcardiff force-pushed the feature/io-write-skip-return branch from 7da54a3 to 4243c5e Compare May 13, 2020 21:55

Avoid using the BytesCounter wrapper

65d41d1

Return bytes only in write_* operations. printf, puts, etc return Nil

bcardiff force-pushed the feature/io-write-skip-return branch from cb3c2c3 to 65d41d1 Compare May 13, 2020 22:05

Use &+

eb78013

bcardiff added this to the 0.35.0 milestone May 14, 2020

waj reviewed May 14, 2020

View reviewed changes

src/compress/deflate/writer.cr Outdated Show resolved Hide resolved

src/compress/gzip/writer.cr Outdated Show resolved Hide resolved

src/compress/zlib/writer.cr Outdated Show resolved Hide resolved

waj approved these changes May 14, 2020

View reviewed changes

Remove CHECK comments

da03c5e

bcardiff merged commit 7f13250 into crystal-lang:master May 18, 2020

bcardiff deleted the feature/io-write-skip-return branch May 18, 2020 19:18

mamantoha mentioned this pull request May 24, 2020

Upgrade support to Crystal 0.35 naqvis/crystar#7

Closed

bcardiff mentioned this pull request May 27, 2020

Change IO#write, IO#skip, IO.copy to return Int64 #9363

Merged

straight-shoota added a commit to straight-shoota/baked_file_system that referenced this pull request Jun 9, 2020

Return written size in StringEncoder#write

a81f245

This is required for Crystal 0.35.0 (see crystal-lang/crystal#9233)

Blacksmoke16 mentioned this pull request Jun 10, 2020

Error: def write_byte must return Int64 but it is returning (Int64 | Nil) makenowjust/crustache#27

Closed

straight-shoota mentioned this pull request Jun 11, 2020

Upgrade to Crystal 0.35.0 schovi/baked_file_system#36

Closed

bcardiff pushed a commit to bcardiff/crystal that referenced this pull request Jun 12, 2020

Revert "Make IO#skip IO#write returns the number of bytes it skipped/…

0d18b14

…written (crystal-lang#9233)" This reverts commit 7f13250.

bcardiff mentioned this pull request Jun 12, 2020

Revert IO#write changes in 0.35.0 and let it return Nil #9469

Merged

l3kn mentioned this pull request Jun 14, 2020

Fixed for 0.35.0 stumpycr/stumpy_png#29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make IO#skip IO#write returns the number of bytes it skipped/written #9233

Make IO#skip IO#write returns the number of bytes it skipped/written #9233

bcardiff commented May 5, 2020 •

edited

Loading

bcardiff commented May 7, 2020

bcardiff commented May 13, 2020

carlhoerberg commented May 13, 2020

bcardiff commented May 13, 2020

asterite commented May 13, 2020

bcardiff commented May 13, 2020

carlhoerberg commented May 13, 2020

asterite commented May 13, 2020

asterite commented May 13, 2020

asterite commented May 13, 2020

waj commented May 13, 2020

asterite commented May 14, 2020

asterite commented May 14, 2020

bcardiff commented May 14, 2020

asterite commented May 14, 2020

didactic-drunk commented May 22, 2020

HankAnderson commented May 22, 2020

carlhoerberg commented Jun 11, 2020

asterite commented Jun 11, 2020

straight-shoota commented Jun 11, 2020

carlhoerberg commented Jun 11, 2020

carlhoerberg commented Jun 11, 2020

asterite commented Jun 11, 2020

asterite commented Jun 11, 2020 •

edited

Loading

bcardiff commented Jun 11, 2020

waj commented Jun 11, 2020

straight-shoota commented Jun 11, 2020

Make IO#skip IO#write returns the number of bytes it skipped/written #9233

Make IO#skip IO#write returns the number of bytes it skipped/written #9233

Conversation

bcardiff commented May 5, 2020 • edited Loading

bcardiff commented May 7, 2020

bcardiff commented May 13, 2020

carlhoerberg commented May 13, 2020

bcardiff commented May 13, 2020

asterite commented May 13, 2020

bcardiff commented May 13, 2020

carlhoerberg commented May 13, 2020

asterite commented May 13, 2020

asterite commented May 13, 2020

asterite commented May 13, 2020

waj commented May 13, 2020

asterite commented May 14, 2020

asterite commented May 14, 2020

bcardiff commented May 14, 2020

asterite commented May 14, 2020

didactic-drunk commented May 22, 2020

HankAnderson commented May 22, 2020

carlhoerberg commented Jun 11, 2020

asterite commented Jun 11, 2020

straight-shoota commented Jun 11, 2020

carlhoerberg commented Jun 11, 2020

carlhoerberg commented Jun 11, 2020

asterite commented Jun 11, 2020

asterite commented Jun 11, 2020 • edited Loading

bcardiff commented Jun 11, 2020

waj commented Jun 11, 2020

straight-shoota commented Jun 11, 2020

bcardiff commented May 5, 2020 •

edited

Loading

asterite commented Jun 11, 2020 •

edited

Loading