Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make IO#skip IO#write returns the number of bytes it skipped/written #9233

Merged
merged 10 commits into from
May 18, 2020

Conversation

bcardiff
Copy link
Member

@bcardiff bcardiff commented May 5, 2020

Closes #9185

@bcardiff
Copy link
Member Author

bcardiff commented May 7, 2020

I added a tiny amount of specs for write.

I am in doubt whether all write operations should return the byte counts or not.

  • If we limit to only IO#write(Slice) it would seem rather limited.
  • If we allow all write_* methods there is a need to for a BytesCounter to allow nested writes accounting. Which is nice, but it adds some overhead clearly.
  • I added the same mechanism to count for IO#printf, but if that method returns the bytes written I feel that the same should happen with IO#puts and IO#print. I see puts and print as more high level operations, but I am not sure about were does printf fall.

It's a bit weird that write would return UInt64 but read Int32. I understand it is convenient, but it still feels odd.

I think this is ready for a round of feedback before going further.

@bcardiff bcardiff marked this pull request as ready for review May 7, 2020 19:17
@bcardiff bcardiff force-pushed the feature/io-write-skip-return branch from 7da54a3 to 4243c5e Compare May 13, 2020 21:55
Return bytes only in write_* operations.
printf, puts, etc return Nil
@bcardiff bcardiff force-pushed the feature/io-write-skip-return branch from cb3c2c3 to 65d41d1 Compare May 13, 2020 22:05
@bcardiff
Copy link
Member Author

Rebased and updated. I removed the wrapper and implement the accounting in the ByteFormat and Encoding. These are the only cases needed if we keep the printf, puts and other methods to return Nil. Making only the write_* return the number of bytes written.

@carlhoerberg , how does this sound for your use cases?

@carlhoerberg
Copy link
Contributor

How's the performance for the nested byte accounting?

@bcardiff
Copy link
Member Author

@carlhoerberg not good. It adds an extra layer. We also check that in ruby printf/puts return Nil.

I wanted to have a composable solution, hence I went on BytesCounter. But if the API is enough as it is right now, there is no need for it.

@asterite
Copy link
Member

Maybe &+ should be used instead of +? I don't know if that's where the overhead comes from. Do we want the write to fail on overflow if exceeding UInt64 max?

@bcardiff
Copy link
Member Author

Well, it can't overflow. There is not way to write more that UInt64 bytes in those method.
Since it can't overflow using &+ should be safe.

The overhead i refer was due to a IO wrapper, not the addition itself.

@carlhoerberg
Copy link
Contributor

Ok, let's leave the ByteCounter out then, performance is way more important here. Let's just return the bytes where it's easy and cheap.

@asterite
Copy link
Member

Why is the written length needed? What's a use case?

@asterite
Copy link
Member

If we allow all write_* methods there is a need to for a BytesCounter

If the nested write calls return the size, why a wrapper is needed?

@asterite
Copy link
Member

If we allow all write_* methods there

Also, what other write_ methods are there? I only found write_byte and write_bytes.

@waj
Copy link
Member

waj commented May 13, 2020

I only found write_byte and write_bytes

There is also write_utf8 and write(Slice). Those are the low level functions to write bytes, and where counting is more likely to be useful. If you do puts or printf the output is then more text oriented.

Note that IO#write in Ruby or Write in Go, also returns the number of bytes. For these functions is quite simple and direct to obtain this number.

@asterite
Copy link
Member

@waj Oh, I see.

I'm still curious which methods needed a ByteCounter.

@asterite
Copy link
Member

Also a note again that in Go for example the return value is an int, not an unsigned int. My guess is that if it works fine for Go, it will work fine for Crystal. But if you think UInt64 is better, then go ahead.

@bcardiff
Copy link
Member Author

The return type of these methods should be UInt64. This will match #7660.
ref: (#9185 (review))

It bugs me a little the inconsistency with read operations. But it bugs me more if it would be with copy that is a write operation :-).


I think it's mainly low level so UInt64 is fine here.

@asterite making printf return number of bytes required (or was easier) with the wrapper. The implementation of String::Formatter would've be tainted with accounting of those bytes otherwise. And its implementation uses io <<, which returns the IO and not the amount of bytes written.


I changed the implementation to use &+ when possible.

@asterite
Copy link
Member

Oh, I see. Yeah, for textual things maybe returning the number of bytes is not needed. In Go printf returns the number of bytes written, though. I just don't know what one does with that number anyway :-)

@bcardiff bcardiff added this to the 0.35.0 milestone May 14, 2020
src/compress/deflate/writer.cr Outdated Show resolved Hide resolved
src/compress/gzip/writer.cr Outdated Show resolved Hide resolved
src/compress/zlib/writer.cr Outdated Show resolved Hide resolved
@bcardiff bcardiff merged commit 7f13250 into crystal-lang:master May 18, 2020
@bcardiff bcardiff deleted the feature/io-write-skip-return branch May 18, 2020 19:18
@didactic-drunk
Copy link
Contributor

Is every write(slice) method supposed to return slice.bytesize.to_u64? How much overhead if any does that add?

@HankAnderson
Copy link

In Go Write returns the number of bytes written and an optional error. That means that you might get less bytes written in case of an error, and you know that.

Since in Crystal if write is successful it means all the slice was written, it's absolutely pointless to return slice.size. It's actually more job for the runtime for when you don't need it, and you can always know it by doing slice.size, as @didactic-drunk mentions.

What would be needed here is for exceptions to capture how many bytes were written before the exception happened.

That said, no use case has been provided for this change other than "avoiding to call size on the given slice instead of getting the value from the call".

@carlhoerberg
Copy link
Contributor

@didactic-drunk @HankAnderson that's not true, consider this:

class CustomProtcol < IO
  def write(slice)
    bytes = 0_i64
    bytes += @io.write "prefix".to_slice
    bytes += @io.write slice
    bytes += @io.write "suffix".to_slice
  end
end

more than slice.size is written, so you can't just blindly return slice.size. Even more important in to_io, eg.

struct ShortString
  def initialize(@string)
  end

  def to_io(io)
    io.write_byte @string.bytesize.to_u8
    io.write(@string.to_slice) + sizeof(UInt8)
  end
end

socket = TCPSocket
bytes_written = socket.write_bytes ShortString.new("foo")
bytes_written # 4, not 3

It vastly simplifies position accounting.

@asterite
Copy link
Member

I don't think a custom protocol should be an IO, really... you are using IO in a wrong way. For example if you print an object, or puts an object to that IO, you will get many prefixes and suffixes, which is probably not expected.

If you want to track written slice size in the custom protocol (which is not an IO!) then do it, but all IOs just write the full slice as a byte size.

For that matter, Flate, Gzip, etc. currently return the passed slice size, but in reality they might write less data (because it is compressed!). So already writer's return value is confusing. Is it returning how much data was actually written, or how much of the slice was written? If it's the latter, then your custom protocol should return the slice size, regardless of what else was written. If it's the former, then the stdlib is already broken.

If you check what Go does, https://golang.org/pkg/io/#Writer (maybe you wanted this is Crystal because of Go or Ruby?) you'll notice that the return value is always in bounds with the passed slice size. It's returning how many bytes were written from the slice, not to the actual device (this is hard or impossible to know when data is buffered and compressed).

So your custom protocol should return slice.size.

In fact all IOs on crystal will return that because if they are successful, all the slice will be written out. If they are not, an exception will be raised. The problem is that this exception doesn't tell you how many bytes were written, and this is the only useful thing that we should add if we wanted to.

But right now, IO#write return value is completely broken and nonsense.

@straight-shoota
Copy link
Member

But right now, IO#write return value is completely broken and nonsense.

I agree. It's also completely lacking any documentation.

In what use case would it not return slice.size?

@carlhoerberg
Copy link
Contributor

yes, it was a bad example, i'm not implementing a protocol with a custom IO, but the write_bytes is correct, you can't know how many bytes that was actually written if write_bytes/to_io doesn't return the actual number of written written, with prefixes/compressed or whatever. The "problem" it solves is that you don't have to have to repeatedly call IO#pos, which gets expensive on files and impossible on sockets, or riddle your code with implementation details (how many extra bytes does a ShortString add?)

But I agree, the lack of enforcing is problematic.

@carlhoerberg
Copy link
Contributor

To me it's a nice to have, not a need to have, you can hack around it with manual position accounting. And its current form seems to cause a lot of confusion and is hard (impossible?) to enforce.

@asterite
Copy link
Member

The "problem" it solves is that you don't have to have to repeatedly call IO#pos

Maybe you can show us your code to understand why you need to know the position at every point?

@asterite
Copy link
Member

asterite commented Jun 11, 2020

you can't know how many bytes that was actually written if write_bytes/to_io doesn't return the actual number of written written

Then those methods should return how many bytes were written how many bytes they translate to, but there's no need to do it for IO#write.

@bcardiff
Copy link
Member Author

We can enforce that the write returns always the slice size if we change the design of IO and add write_impl(io : IO) : Nil (or whatever name) that can be override. Then IO#write will take care of returning the slice.

But that will be half way. The write_bytes implementation is also subject to not behave as expected.

There are some contracts that can't be enforced by types. We can discuss a design that is less fragile, but it can always be broken in some way by the implementation.

I agree that protocols are better to be implemented by using and IO rather than inheriting one. But if we think of some transformation that can be activated or not to decorate an IO, inheriting make sense in the current state of the std-lib. If these decorators change the the amount of bytes written from the input maybe it is desired to still keep track of the underlying position count. I know is a maybe. But that is why I feel more comfortable with the proposed change.

@waj
Copy link
Member

waj commented Jun 11, 2020

I think the changes on the IO interface were a mistake and we should revert it. Let me explain why:

  • Trivial: the value that should be returned by IO#write is always slice.size, otherwise the implementation is incorrect. Returning a value that could be trivially calculated at the call site sounds bad to me.
  • Confusing: this could be mitigated with documentation, but the interface alone is confusing, I think mainly because of the previous reason. Is it the returned value the number of bytes meant to be write or the bytes actually written? Because the first option it's too trivial to be true, then it feels the answer should be the second one. (but note that is impossible in the general case)
  • Error prone: as we could see already, if someone extends IO but it doesn't return slice.size, then the implementation is wrong.

At the end we have something that makes programmers less happy without too much gain.

Now, for the cases where having this value is actually useful, those could be solved in a different way:

  • Make a (private) alternative method to write bytes, that also returns the slice.size
  • Create a AccountingIO that keeps count of bytes written so it always know the position without constantly calling pos.

I know going back and forward with changes like this is far from ideal. But since we're not at 1.0 yet it's a good opportunity to assume our mistakes and fix them before it's too late.

@straight-shoota
Copy link
Member

Just for reference, because the original discussion about adding byte count to IO#write isn't linked here: It's at #9186

bcardiff pushed a commit to bcardiff/crystal that referenced this pull request Jun 12, 2020
bcardiff pushed a commit that referenced this pull request Jun 16, 2020
* Revert "Change IO#write, IO#skip, IO.copy to return Int64 (#9363)"

This reverts commit 6380fa9.

* Revert "Make IO#skip IO#write returns the number of bytes it skipped/written (#9233)"

This reverts commit 7f13250.

* Keep IO::Stabled#skip_to_end

* Keep IO#copy returning Int64

* Code cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants