Deadlock with AsyncCache when using wait and nested caches #30

jrray · 2022-10-11T02:42:36Z

We have encountered a deadlock situation when attempting to have nested caches, as in a key-key-value multi-dimensional map, where there are many spawned tasks trying to interact with the cache at once, and the tasks attempt to call wait after insert of the inner value.

All the tokio (non-blocking) threads become blocked on trying to obtain a lock on the cache. The await inside wait allows one of the other tasks destined to block to be scheduled, eventually causing deadlock. The code below, inner_cache.wait() happens while a read lock on lru is held. Avoiding the inner_cache.wait() call prevents the deadlock.

We have restructured our code so it no longer uses nested caches and avoids this situation, but I wanted to share this discovery. This may not be a usage pattern that is intended to work or be supported. But to my eyes it is not immediately obvious that this would deadlock. I speculate that if the CacheProcessor was spawned as a dedicated thread instead of a green thread it might be able to complete the wait group and unstick things (we didn't experience a deadlock with this same design pattern when using the sync Cache).

Here is an example test that can demonstrate the deadlock:

    #[tokio::test(flavor = "multi_thread")]
    async fn test_wait_with_read_lock() {
        let max_cost = 10_000;
        let lru = Arc::new(
            AsyncCacheBuilder::new(max_cost * 10, max_cost as i64)
                .set_ignore_internal_cost(true)
                .finalize(tokio::spawn)
                .expect("failed to create cache"),
        );

        let key = 1;
        let cost = 1;

        let mut tasks = Vec::new();

        for i in 1..=10_000 {
            let lru = Arc::clone(&lru);
            tasks.push(tokio::spawn(async move {
                let inner_cache = match lru.get(&key) {
                    Some(v) => v,
                    None => {
                        let inner_lru = AsyncCacheBuilder::new(max_cost * 10, max_cost as i64)
                            .set_ignore_internal_cost(true)
                            .finalize(tokio::spawn)
                            .expect("failed to create cache");
                        lru.insert(key, inner_lru, cost).await;
                        lru.wait().await.unwrap();
                        lru.get(&key).unwrap()
                    }
                };
                let inner_cache = inner_cache.value();
                inner_cache.insert(i, 123, cost).await;
                eprintln!("i = {i}, len before wait = {}", inner_cache.len());
                // removing this wait avoids deadlock
                inner_cache.wait().await.unwrap();
            }));
        }

        for task in tasks {
            task.await.unwrap();
        }
    }

The text was updated successfully, but these errors were encountered:

al8n · 2022-10-11T04:15:48Z

Hi @jrray, I think this should be caused by the AsyncWaitGroup implementation of https://github.com/al8n/wg. I suspect we need to replace parking_lot::Mutex and parking_lot::RwLock with async locks for the AsyncCache.

al8n added bug Something isn't working enhancement New feature or request labels Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock with AsyncCache when using wait and nested caches #30

Deadlock with AsyncCache when using wait and nested caches #30

jrray commented Oct 11, 2022

al8n commented Oct 11, 2022 •

edited

Loading

Deadlock with AsyncCache when using wait and nested caches #30

Deadlock with AsyncCache when using wait and nested caches #30

Comments

jrray commented Oct 11, 2022

al8n commented Oct 11, 2022 • edited Loading

al8n commented Oct 11, 2022 •

edited

Loading