-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: IO trait (to permit plugging in cloud blob storage) #42
Comments
Would you mind sharing how you would expect this to look like? I'm thinking Tree being generic over a trait F that requires Seek + Read + WhateverElseIsNeeded similar in the way LevelDB defines its files (https://github.com/google/leveldb/blob/068d5ee1a3ac40dabd00d211d5013af44be55bea/helpers/memenv/memenv.cc#L200, https://github.com/google/leveldb/blob/068d5ee1a3ac40dabd00d211d5013af44be55bea/helpers/memenv/memenv.cc#L185)? The crate is definitely very hard-coded to use std::fs right now, so it would be quite a huge refactor to get rid of it all. Plus with V2 there's another crate that would need the same treatment. Contributions are greatly appreciated. |
Been thinking a bit on this. Code golf bits: https://matklad.github.io/2021/09/04/fast-rust-builds.html suggests having the amount of generic code at the interface to crates very thin. So this suggests: However on reflection the bigger problem is going to be function colouring: if the traits are synchronous, then the impl for cloud blob storage is going to be holding an async runtime of some sort, and then blocking on calls into it everywhere (even if masked via a channel as I described). This is not ideal :/. If the core itself was actually async, with the existing sync interface a thin shim over the top, that would work pretty nicely I think. There are some good reasons to want the core to be async btw - for prior art I'll point you at FoundationDB, which build the entire system around async kernel IO, and has some very nice testing and performance benefits as a result. In linux uring, and in Windows, IO Completion Ports, offer non-blocking IO even for local disk - and with SSDs with deeper and deeper IO queuing, this unlocks a reduction in thread count and context switching even in very high IO situations. All of which seems applicable to fjall in its embedded use case, rather than being a special case for the cloud blob storage scenario I've described :) |
I don't think having an async core can be a general, silver bullet solution. Should it be based on Tokio? Would it even give performance for non-sync writes and cached reads which can happen in sub-1µs? I don't think it's possible to have a general solution here. I will point to this explanation in the sled repo: spacejam/sled#1123 (comment) and this experiment done for RocksDB: facebook/rocksdb#11017 and this TiKV experiment: https://openinx.github.io/ppt/io-uring.pdf That being said, I think fjall-rs/fjall#102 could allow some interesting interactions with cloud storage, where you could do (a)synchronous log shipping to an object storage, or put WAL entries into a Kafka stream for example... |
Is your feature request related to a problem? Please describe.
I find many small services end up having most of their cost a running PostgreSQL server which they barely use. Direct blob storage starts to look very attractive - and while multiple-instance services couldn't use fjall, slapping a GRPC front-end onto a single service that provides the data model would work very well I think. But only if the IO used would work on blob storage rather than requiring local disk.
Describe the solution you'd like
An IO trait compatible with e.g. Azure/AWS/Google Rust SDK's for blob storage. That needn't be async, since an internal channel can be used to bridge to async, and I understand lsm-tree to not be async internally.
Describe alternatives you've considered
Writing a new similar project natively targeting blob stores
The text was updated successfully, but these errors were encountered: