-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace use of InputStream in transport APIs #99
Comments
Thanks for the suggestion! We tried to get rid of InputStream multiple times, but we couldn't find a better alternative. TL;DR: although InputStream is not without its downfalls, InputStream turns out to perform at least as well as its opponents. The InputStream wraps the native byte buffer implementation of the underlying transport, which is generally a collection of memory blocks virtually concatenated that are zero-copy from the network. Because of details such as not being guaranteed to be able to get a byte[] from the block of memory, we can't pass a simple byte[] or byte[][], and may need a destination byte[] to get data out of the buffers. Our core does not depend on any particular protobuf implementation (the proto package does, and it is in core, but we just haven't moved it out yet). Protobuf's CodedInputStream can consume byte[], ByteBuffer, or InputStream (no public method for ByteString!). Nano proto and Wire only can consume byte[]. So those are really the destination types that matter for us. byte[] has the disadvantage of requiring a large, contiguous copy of the data from our buffers, and there is really no real magic available to make it perform better. I'll note here that when compression is being used, we don't know the uncompressed length of the message and it is decompressed on-the-fly. ByteBuffer is no better than byte[], because 1) we don't actually have ByteBuffers and 2) because ByteBuffers can't be concatenated as ByteBuffer is a final class without such support. The current InputStream is read-once; once bytes have been consumed, we release the backing bytes. Read-once does not seem to be a problem. We also ensure that if the InputStream was not closed when the call returned, we close it. If the InputStream is fully consumed, then closing has no effect. ByteSource is actually fairly reasonable, and has been in Guava long enough to not cause too many problems. The only interesting method I see on ByteSource though for our use case is read() (returning byte[]), as we could optimize the method to size the array correctly for uncompressed data. But, actually, you can do the same optimization using InputStream.available(). However, for compressed data, it will require decompressing into byte[]s, and then copying into a new, perfectly-sized byte[] at the end. A user of InputStream would not need a perfectly-sized byte[], so read() doesn't actually look like a win. Our byte buffers are generally pooled, so we can't rely on garbage collection to free them. Thus, we need an interface or API that can communicate some sort of life cycle. For this reason alone ByteSource is a weak choice. Initially, Guava made a ton of sense, but as the project has matured, we are using less and less of it naturally (creating your own Buffer type means you won't find a library to help you, for instance). There also seems to be a steady push to reduce/remove our Guava dependency from outside the core team. At this point, the only API that will have Guava types is the Future Stub (for ListenableFuture and similar). Guava is still used internally many places, but having it as part of an API would need a convincing reason. You made two points noting the issues with InputStream:
I agree with both of these points. However, ByteSource skirts around #1 by not closing our resource (the backing bytes), which doesn't really help us much. Also, in practice, although a bit icky, #1 does not seem like it will be a problem. ByteSource is no better for #2, as the copies still have to occur. |
I agree with not leaking Guava into the API. Preferably implementation as On 20 February 2015 at 07:40, Eric Anderson notifications@github.com
|
Thanks for the information. Can you explain why the stream needs to be read-once? If that is not a hard requirement, then even your own ByteSource-like interface would be an improvement. e.g.: interface Payload {
InputStream openStream() throws IOException;
}
The underlying resources could still be released as soon as the method returns. Even if you stick with InputStream, it is strange to document that the stream "must be closed by the listener" if that is not actually true. Also, it should probably say something about the stream not being usable after the method returns. |
Transport is really an internal-only set of interfaces; normal users should have no reason to use them. Thus, in some ways we don't care how ugly they look as long as the get the job done. There are very few implementations of StreamListener: basically just ChannelImpl and ServerImpl. I was actually a bit wrong about closing the InputStream when the method returns. We do make sure to close it, but from the implementation in Channel/Server, not when calling messageRead(). And so, the documentation is actually correct; full ownership has been passed, as we expect it to be consumed on a different thread. So we must have some way to tell the transport that we are done with the bytes. ByteSource doesn't fit the bill, but "interface Payload" that was Closeable would. Note that in this case, "closing the InputStream" would do nothing, so we aren't any less likely to have bugs than before as now we are passing ownership with Payload instead of InputStream. Try-with-resources is not useful to us in the near future, as it isn't available on Android and ChannelImpl is used on Android. Guava users aren't negatively impacted by the current API because they won't be using transport directly; instead, they will be using Channel/Call which provides a demarshalled object via a Marshaller. If you look at Marshaller, you can see a ByteSource TODO there. For retries it could be nice to have a ByteSource-like object. It may not, though. The serialized representation is likely more compact in memory, and so may be better to retain while allowing the object to be GCed. However, in either case the library would consume the ByteSource; there would still be no benefit to Guava users. |
Thanks for the background, that explanation makes more sense. |
The current use of InputStream has a few problems:
I haven't looked at the implementation, but since the comments say that the InputStream is non-blocking, I assume that it is used instead of byte[] simply to prevent modifications of the data when it is passed to multiple listeners (a reasonable concern).
Since there already appears to be a Guava dependency, the obvious alternative would be to pass a (inherently read-only) ByteSource instead.
(ByteString from protobuf would be another option, but for various reasons ByteSource is probably a better choice)
The text was updated successfully, but these errors were encountered: