-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zero-copy and lazy rows deserialization #571
Comments
I believe we already have a very similar issue: #462 . I agree that the eager allocations are a problem. Both your approach and the one described in #462 address the issue by pointing to the memory of the original, unserialized frame. However, the main difference is with respect to lifetimes: you suggest to use reference counting, and I suggest using explicit lifetimes. I think there is value in both - reference counting is easier to use, but explicit lifetimes gets rid of the (AFAIK atomic) reference counting and makes it possible to use standard library types such as We could unify both approaches if we used the interface from #462. Instead of There is one potential argument against reference counting that I can see. Let's say that you fetch a large page of results, but you only decide to keep one P.S: I have a work-in-progress branch for #462, I worked on it from time to time but it's quite untidy and didn't even manage to make it compile yet: https://github.com/piodul/scylla-rust-driver/tree/462-more-efficient-deserialization |
I don't know how I missed #462 ... -_-'
Agree 100% with it! It's indeed way better this way. I went with
I agree too, but as stated above, giving the choice to the user solves this issue IMO.
Actually, our works are pretty similar ... too much similar in fact, my inspiration could become questionable 😅 at least, it seems to be the right direction. Before continuing, we need to answer some questions:
Maybe I can open a draft PR to start more precise discussion about implementation/naming/etc. |
Actually, I've just realized that it's not really necessary to deserialize metadata. In fact, there could be a column type parsing iterator, the same way as there will be a row parsing iterator. Strings would just be ignored. |
Using
I can see how CqlValue could be changed to a generic, however I can see some challenges:
I'm OK with postponing solving those issues for later, as the API suggested in #462 would allow deserializing query results directly to the types requested by the users. The CqlValue would still be a type which owns its data.
Why would you want to deprecate/remove them?
I agree, the error hierarchy needs some rethinking and simplification...
Let's keep both issues open and close them later in one go. Both of them contain valuable information IMO.
Sure, sounds like a good idea. We can continue the discussion on the PR. |
In fact, you can't borrow a slice and calling
I'm fine with that.
The error type will change, as it will have to include parsing error, but that's a minor concern. However, they also take There is also a discussion about |
OK, I see the problem now... I'm not sure what would be the best way to deal with that. It sounds like we would like to have something that behaves as a slice, but allows to take ownership of it via
I remember that I got this problem while working on my work-in-progress implementation and I had to introduce an
I agree about |
Note: this should be fixed by #665 when it is merged. It follows some ideas that were discussed here. |
Is this still realistic for 0.15.0? |
This is the main addition for 0.15, so it makes no sense to release 0.15 without it. Is the date currently set in the milestone realistic? Definitely not. We'll need to move it. |
Done in #1057. |
Currently,
QueryResult
deserialization deserialize all rows eagerly, and does a lot of allocations:The last point, which can be quite important in workflows with a lot of strings/heavy blobs, is pretty easy to address. Indeed, the crate already use
bytes::Bytes
, so it query result's bytes could be kept along the deserialization, and blob column be deserialized asBytes
, while string column could usestring::String<Bytes>
. I ignore the reason why raw&[u8]
are used, maybe it's because of the API ofbyteorder
, butbytes
crate also provides an endian-aware API, sobyteorder
could be dropped in favor ofbytes
.Allocating a vector for all rows is also a relative overhead regarding queries returning only one row. Instead of deserialized rows, raw
Bytes
could be stored intoQueryResult
. It could then have method returning an iterator of rows deserialized at each iteration. It would still be possible to obtain a vector of rows just by collecting the iterator.Columns deserialization could also avoid using a vector, using a trait system similar to
FromRow
to deserialize rows into tuples. By the way, compatibility of the tuple could be checked only once before iterating the rows (row deserialization would still return aResult
because it must still check there is enough bytes). Old API could still be accessible by makingVec<Option<CqlValue>>
implement the row deserialization trait; actually, there could also be anIterator<Item=Option<CqlValue>>
implementing the trait.To illustrate these points, I've implemented a quick and dirty POC in a branch in order to run some quick benchmarks; they show a (very) significant improvement in terms of memory consumption and performance. I can open a draft PR to make the POC simpler to visualize.
Of course, some of these changes would be breaking:
Response
, but it has to be modified to useBytes
instead of&[u8]
;QueryResult.rows
is public, but it would have to be replaced, for example bybytes: Bytes, row_count: usize
;CqlValue::Blob
/CqlValue::Text
/CqlValue::Ascii
should useBytes
/String<Bytes>
; actually, this change should not be required, as rawCqlValue
may not be used so much anymore.On the other hand, the typed API
rows_typed
/single_row_typed
/etc. (and evenrows()
) could stay relatively untouched, asFromRow
could implement the deserialization trait too , and breaking changes above seems quite minor to me.P.S. There are also some strings in
ColumnSpec
andColumnType
which could also be modified to usestring::String<Bytes>
; it would save the few remaining allocations, while being a minor breaking change.The text was updated successfully, but these errors were encountered: