-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modular IO #19
modular IO #19
Conversation
This is a very interesting proposal, that I strongly support. This especially resonated with me:
In practice, I always end up writing everything in Format (even when not use pretty-printing) just to benefit from composable formatters. |
There are a lot of extra features in built-in formats that correspond to usage modes they appear to have been designed for:
It would be nice to get feedback on which of those features we can get rid of, and which are important to preserve, because they may also constrain the design of the user-exposed interface. For example:
|
My understanding is that these two fields could be removed (
This is used to implement
|
yes, there is a mention of that there. Seek only makes sense on some channels (even now), when the user knows it's mapped to a file underneath. Seek would just be a partial function that raises on user defined channels. Of course an alternative is to have
Seems very bad practice to me. If you want a lock, you should use one (see rust again about a locked wrapper around stdin/stdout). I dislike the list of all channels (again, bad practice, There is also a global map, in unix, from raw file descriptors to channels, to implement |
We discussed this RFC at a developer meeting, not in depth but rather to get quick feedback from people who hadn't already looked at it. One remark was that it was not completely clear that the proposed API would, indeed, enable the cool applications mentioned (in particular efficiencient) middle-layer adapters for compression or encryption, and that this could be conclusively answered without doing a PR that modifies the stdlib, just with a third-party prototype of the interface. Would you (@c-cube) be interested in doing this? Some related works were briefly mentioned:
Having a prototype would also have the nice property that it's easy to see the "final" state of the proposal to see how issues are adressed. (For example I would see |
A proof of concept is being developped there fyi. |
will the PoC be discussed at some developer meeting? |
I looked at the proof-of-concept again and raised a couple small issues; I'm planning next to encourage others to give feedback. |
I updated the poc, btw. |
Looking at the RFCs and the proof-of-concept, I think this is a worthwhile extension: this is both fully backward compatible and improves the interoperability of existing libraries. |
type t | ||
|
||
(** Obtain a slice of the current buffer. Empty iff EOF was reached *) | ||
val fill_buf : t -> (bytes * int * int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small difference with the previous interface is that the user cannot choose an upper bound on the size of the slice. It seems that it could matter in term of latency. Is that a limitation in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I'm not sure I follow where latency is involved.
The contract is that fill_buf
returns a non-empty slice, but it doesn't have to be the "full" underlying buffer. The name might be suboptimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought process is that if the client is latency-sensitive (a graphical or a audio client for instance) and it might make sense to provide an upper bound on the amount of work done by a call of fill_buf
to get some data now rather than an unknown amount later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current interface also doesn't give you anything to control that, doesn't it? It's best effort in both cases. As long as you return a non empty slice you can be as lazy as possible (typically, one syscall for reading).
edit: to be more clear, input chan buf i n
asks to read at most n
bytes into the buffer (and at least one, unless EOF is reached). fill_buf chan
asks to return a non empty slice, unless EOF is reached. I don't think one is lazier than the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current interface, one can ask for at most one byte of data, so there is at least (very) theoretically some negotiation between the client and the buffer in term of latency versus throughput. The added complexity is probably not worth it however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess would be that a natural implementation of fill_buf
would stop before a blocking operation, unless there is no data to read at all, similarly to how input
would return less data in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is indeed the current implementation in the proof of concept, no syscall is done unless all has been consumed. I think the name is misleading, it doesn't try to refill the whole buffer on every call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, fill_buf
seems misleading; might something along the lines of get_segment
or get_slice
be better?
Like Florian, I wonder if it might make sense to let the user specify a lower bound and/or an upper bound on the size of the slice that is returned. E.g., specifying a lower bound would save the user the trouble of writing a loop. (The operation would block or raise an exception if there is not enough data.) Specifying an upper bound may help prevent reading data which the user knows is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_slice
is probably better. I think specifying bounds complicates the whole design and contract between consumer and producer: the underlying producer could be, for example, decompressing a stream and not in control of how much is decompressed in one batch. Similarly, reading a TCP stream doesn't give you a lot of control (same as the current API).
To avoid writing a loop, there can be helpers like a read_exactly
that would try and read n bytes, for example. I think Lwt has similar thing (see "read_into_exactly").
One possibility which could come from exposing |
The proposal suggests re-implementing |
@fpottier we would need to keep |
This RFC was discussed at today OCaml developer meeting, and there was a general consensus that this a step in the right direction in term of interoperability. There were some API design questions on the usability of There was also some interrogation on the possibility to completely switch to the OO-interface and remove the builtin implementations (probably at the cost of bytecode performance). Overall, it seems that the remaining finer points can be discussed in a PR. |
This RFC proposes to update the stdlib's types in_channel and out_channel to make them user-definable and composable.
view file