feat: IO trait (to permit plugging in cloud blob storage) #42

rbtcollins · 2024-06-11T09:45:57Z

Is your feature request related to a problem? Please describe.

I find many small services end up having most of their cost a running PostgreSQL server which they barely use. Direct blob storage starts to look very attractive - and while multiple-instance services couldn't use fjall, slapping a GRPC front-end onto a single service that provides the data model would work very well I think. But only if the IO used would work on blob storage rather than requiring local disk.

Describe the solution you'd like

An IO trait compatible with e.g. Azure/AWS/Google Rust SDK's for blob storage. That needn't be async, since an internal channel can be used to bridge to async, and I understand lsm-tree to not be async internally.

Describe alternatives you've considered
Writing a new similar project natively targeting blob stores

marvin-j97 · 2024-06-11T10:28:44Z

Would you mind sharing how you would expect this to look like? I'm thinking Tree being generic over a trait F that requires Seek + Read + WhateverElseIsNeeded similar in the way LevelDB defines its files (https://github.com/google/leveldb/blob/068d5ee1a3ac40dabd00d211d5013af44be55bea/helpers/memenv/memenv.cc#L200, https://github.com/google/leveldb/blob/068d5ee1a3ac40dabd00d211d5013af44be55bea/helpers/memenv/memenv.cc#L185)?

The crate is definitely very hard-coded to use std::fs right now, so it would be quite a huge refactor to get rid of it all. Plus with V2 there's another crate that would need the same treatment. Contributions are greatly appreciated.

rbtcollins · 2024-06-30T20:26:50Z

Been thinking a bit on this.

Code golf bits:

https://matklad.github.io/2021/09/04/fast-rust-builds.html suggests having the amount of generic code at the interface to crates very thin.

So this suggests:
A generic struct / structs that expresses the pluggable nature with traits as you describe.
An inner struct that is not generic but holds a dyn impl of the generic type

However on reflection the bigger problem is going to be function colouring: if the traits are synchronous, then the impl for cloud blob storage is going to be holding an async runtime of some sort, and then blocking on calls into it everywhere (even if masked via a channel as I described). This is not ideal :/.

If the core itself was actually async, with the existing sync interface a thin shim over the top, that would work pretty nicely I think. There are some good reasons to want the core to be async btw - for prior art I'll point you at FoundationDB, which build the entire system around async kernel IO, and has some very nice testing and performance benefits as a result. In linux uring, and in Windows, IO Completion Ports, offer non-blocking IO even for local disk - and with SSDs with deeper and deeper IO queuing, this unlocks a reduction in thread count and context switching even in very high IO situations. All of which seems applicable to fjall in its embedded use case, rather than being a special case for the cloud blob storage scenario I've described :)

marvin-j97 · 2024-12-10T18:24:20Z

I don't think having an async core can be a general, silver bullet solution. Should it be based on Tokio? Would it even give performance for non-sync writes and cached reads which can happen in sub-1µs? I don't think it's possible to have a general solution here.

I will point to this explanation in the sled repo: spacejam/sled#1123 (comment)

and this experiment done for RocksDB: facebook/rocksdb#11017

and this TiKV experiment: https://openinx.github.io/ppt/io-uring.pdf

That being said, I think fjall-rs/fjall#102 could allow some interesting interactions with cloud storage, where you could do (a)synchronous log shipping to an object storage, or put WAL entries into a Kafka stream for example...

marvin-j97 added enhancement New feature or request help wanted Extra attention is needed possibly breaking epic labels Jun 11, 2024

marvin-j97 added the idea label Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: IO trait (to permit plugging in cloud blob storage) #42

feat: IO trait (to permit plugging in cloud blob storage) #42

rbtcollins commented Jun 11, 2024 •

edited by marvin-j97

Loading

marvin-j97 commented Jun 11, 2024 •

edited

Loading

rbtcollins commented Jun 30, 2024 •

edited

Loading

marvin-j97 commented Dec 10, 2024

feat: IO trait (to permit plugging in cloud blob storage) #42

feat: IO trait (to permit plugging in cloud blob storage) #42

Comments

rbtcollins commented Jun 11, 2024 • edited by marvin-j97 Loading

marvin-j97 commented Jun 11, 2024 • edited Loading

rbtcollins commented Jun 30, 2024 • edited Loading

marvin-j97 commented Dec 10, 2024

rbtcollins commented Jun 11, 2024 •

edited by marvin-j97

Loading

marvin-j97 commented Jun 11, 2024 •

edited

Loading

rbtcollins commented Jun 30, 2024 •

edited

Loading