Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible concurrent version #43

Open
MirroredLens opened this issue Nov 7, 2024 · 17 comments
Open

Possible concurrent version #43

MirroredLens opened this issue Nov 7, 2024 · 17 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@MirroredLens
Copy link

Hey I didn't see a discussion section, so opened an issue.

Just wondering if a concurrent version of the art tree could be implemented with this library, and if a guide/hint/ref as to how to do so could be given by anyone if possible.

@declanvk
Copy link
Owner

I think a good spot to start for a concurrent implementation of ART would be The ART of practical synchronization by the same authors of the original ART paper.

I think you could also approach the problem with some other strategies as well, like fine grained r/w locks per node.

@declanvk declanvk added enhancement New feature or request question Further information is requested labels Nov 10, 2024
@declanvk
Copy link
Owner

#27 (comment) is also a previous comment talking about this subject. Cc @Gab-Menezes

@Gab-Menezes
Copy link
Collaborator

IMO the concurrent version is kinda hard to implement, or at least I wasn't able to fully understand the paper/other implementations.
Basically we need to do all operations in a loop a check at all times for stale version, which is kinda estrange.
If I had to use blart in concurrent version I would do something similar to DashMap and keep multiple locks.
Of course this has a lot downside, like memory and using locks, but at least is a starting point.

@Gab-Menezes
Copy link
Collaborator

Or maybe do something similar to MVCC, but I don't know about the subject

@MirroredLens
Copy link
Author

This stuff is a little out of my league I assume 😅
But if I had to give it a try, out of curiosity, I was wondering which would be easier to try to modify between trying to turn blart into a concurrent version or trying to make congee allow variable size keys?

@Gab-Menezes
Copy link
Collaborator

Probably change blart to be concurrent. Congee requires fixed len keys because it does a lot of optimization with it.

@Gab-Menezes
Copy link
Collaborator

As I said in the comment probably what makes it hard to implement are operations that hold a reference for an extended period of time, like iterator/entry api. Specially the iterators, since IMO iterators should work in sane way where the contents of the tree don't change while iterating over it. That's why congee doesn't have iterators.
I know that Congee uses the crossbeam epoch as a GC mechanism.

@MirroredLens
Copy link
Author

MirroredLens commented Nov 13, 2024

If you find it difficult, I think I better look elsewhere haha.

Really wanted to use a lock free concurrent art tree for a multi column routing table, I think it would have been the best choice, as far as data structures go.

If you don't mind, could u give your opinion on using a lock free concurrent b+ tree (concurrent_map (sled's internal index)) or a lock free concurrent skiplist (crossbeam skipmap) vs blart as is.

Not sure if I should leave the issue open, feel free to close it if this is outside the scope currently.

Thanks.

@MirroredLens MirroredLens changed the title Possible concurrent version? Possible concurrent version Nov 13, 2024
@declanvk
Copy link
Owner

If you don't mind, could u give your opinion on using a lock free concurrent b+ tree (concurrent_map (sled's internal index)) or a lock free concurrent skiplist (crossbeam skipmap) vs blart as is.

I'm sorry, I haven't used or researched those much, so I can't really give an opinion.

Not sure if I should leave the issue open, feel free to close it if this is outside the scope currently.

Feel free to leave this open, I think its useful to track as a feature request.

@MirroredLens
Copy link
Author

What a feature it will be...one day 🍻

@Gab-Menezes
Copy link
Collaborator

@MirroredLens I have a few thoughts even though I haven't used concurrent_map nor crossbeam_skiplist.

  1. If you need some of the blart features, like fuzzy, prefix, entry... And you know that your workload is read heavy or at least writes can be buffered and batched to be written in a single shot and the memory consumption of cloning the tree is not a problem I would use arc_swap with blart. This allows you to basically have a lock free version with the drawback of using a little more memory when you want to write to the tree.
  2. Of course locks slow things down, but the biggest problem with a lock is hitting the same cache line everytime, that's what causes contention in a lock. DashMap solves this by sharding the data and keeps a lock around each shard, this allows for the locks to be almost "free", since you are spreading the load. Hence the almost linear scaling with threads.
  3. And it goes without saying, benchmarking for your specific case is the best, I don't know about the performance characteristics of this 2 crates. But it's not always true that a lock free version will be faster than a locking one. AFAIK DashMap is one of the fastest implementations because of this, because is basically rust hashmap (which is one of the fastest in the world) with a small perf loss because of the locks.

@MirroredLens
Copy link
Author

MirroredLens commented Nov 16, 2024

  1. 'arc_swap' looks interesting for a read heavy or analytical workload. But this wide column routing table I'm working on is also expected to be updated by multiple writers in real time. So that's not really viable for me.

  2. Yeah it's true that most "lock free" designs aren't truly lock free, especially under contention. I did look into dashmap as well but prefix search and range scans are terrible with hash maps in general.

But I did do some research on methods of scaling with shards - partitioning by key hashing, key range or custom. And I'm not sure how to best approach it for blart.
• I ruled out key hashing partitions, even though it would result in an even, unbiased distribution across the cores, as it would lose locality of reference and be quite terrible for range scans and prefix searches.
• Key range partitioning initially felt great for range scans and prefix search, but I was concerned about the distribution of workload over the keys across the cores being uneven. Which I presume will result in underutilized cores and maybe overwhelming certain shards more (based on "hot" keys)?
• A custom partitioning strategy might be the best way to go. But not sure how to get the best of both worlds - ordered keys and even distribution across the cores.

Overall though, I think key range partitioning might be a good technique and relatively easy to implement a sharding strategy with, compared to key hashing or optimistic lock coupling.
This design may just be much faster and more scalable for many types of workloads and especially for what art trees are good at - prefix search and range scans.
I think what you said about the iterator apis could be ported well to this model too.
What do you think about all of this?

@declanvk
Copy link
Owner

Key range partitioning initially felt great for range scans and prefix search, but I was concerned about the distribution of workload over the keys across the cores being uneven. Which I presume will result in underutilized cores and maybe overwhelming certain shards more (based on "hot" keys)?

I think you're right, key partitioning would be a natural way to shard the TreeMap based on key. Without prior knowledge of the key distribution/access distribution, I also think it would be difficult to assign key ranges to shard at construction.

Maybe you could devise some way to periodically re-balance key ranges across shards? If you had some simple statistics per-key (or more aggregated), maybe you could periodically lock some adjacent shards and do the re-balance with a dedicated thread? You could re-balance only based on shard size or you could attempt to re-balance based on heat (number of read/write accesses). Not really sure, this is more just brainstorming.

I definitely want to echo @Gab-Menezes's point that some good profiling/benchmarking would be very important to trial different ideas.

I think what you said about the iterator apis could be ported well to this model too.

With the sharding model, I think the iterators would be pretty natural. Just lock each shard in key-range order and iterate over the contents.

The trouble comes when the (hypothetical) application expects that the iterator represents a snapshot view of the data-structure, which I think is pretty difficult to accomplish (probably why congee leaves it out). As long as the application is able to comprise on that point, I think iterators will still be viable.

@MirroredLens
Copy link
Author

MirroredLens commented Nov 17, 2024

So,

  1. Without prior knowledge of the keys, we have no way to pre-order the key ranges across the shards.
  2. A re-balancing of key ranges might be required to spread the load across the cores. Like you said perhaps based on some heuristics like shard size or read/write access or more/multiple.
  3. To make iterators consistent across shards may require some kind of consistency mechanism like acid semantics or ssi or logical/vector clocks.

Given these requirements and limitations, maybe a kind of "adaptive key range sharder" would work where keys and key ranges are re-balanced dynamically based on several heuristics,

  1. Assigning and re-balancing keys/key ranges to shards dynamically as keys fill the tree and/or optionally let the user specify some kind of key range configuration for the shards if the distribution is known ahead of time. (?)
  2. Shard size can be monitored to decide things like, when to start using more shards or when to re-balance bigger shards to even out key distribution. (?)
  3. Read/write access over time can be monitored as well to decide when or how to re-balance overloaded shards to even out the core utilization. (?)
  4. Not sure which consistency mechanism is the best here, but I'm assuming a vector/logical clock might be the most performant. Especially if we're watching read/write access, we could use that info to time the iterations with an internal clock to maintain snapshot like consistency across the shards. (?)

I hope I'm right at least theoretically, but even if this is right and also a very scalable architecture, the complexity seems pretty high to get it all working correctly 🤯

@MirroredLens
Copy link
Author

Let me know what you guys think of this strategy. And if you guys feel it's worth a try.
Currently on the frontend of things and will focus on the backend in a while. I'd be happy to attempt it in a few weeks.

@declanvk
Copy link
Owner

Let me know what you guys think of this strategy.

I'm sorry but I can't really evaluate the strategy without knowing more about your constraints and goals, and I think that is starting to get a bit off-topic for this thread. I think your project does sound cool! And if it ends up using a sharded-map with blart underneath then that would be great.

However, I'd like to keep this thread focused on a "possible concurrent version" of blart, and any data points/suggestions/interest relating to that topic.

@MirroredLens
Copy link
Author

You're right that I'm focusing on my use case more than a general version (I have a known ordering of keys) 😅
I'd love to share more in the future and thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants