Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loffli concurrent bug #2351

Closed
tang-qh opened this issue Sep 23, 2024 · 6 comments · Fixed by #2357
Closed

loffli concurrent bug #2351

tang-qh opened this issue Sep 23, 2024 · 6 comments · Fixed by #2357

Comments

@tang-qh
Copy link

tang-qh commented Sep 23, 2024

Required information

the abaCounter in loffli is not unique:

do
    {
        // we are empty if next points to an element with index of Size
        if (oldHead.indexToNextFreeIndex >= m_size || !m_nextFreeIndex)
        {
            return false;
        }

        // NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic) upper limit of index set by m_size
        newHead.indexToNextFreeIndex = m_nextFreeIndex.get()[oldHead.indexToNextFreeIndex];
        newHead.abaCounter += 1;
    } while (!m_head.compare_exchange_weak(oldHead, newHead, std::memory_order_acq_rel, std::memory_order_acquire));

when the swap failed,may be the m_head have been changed multiple times, so newHead.abaCounter += 1 is not enough;

and we did encounter this problem(although the probability is very low):
Mempool [m_chunkSize = 56, numberOfChunks = 5214, used_chunks = 193 ] has no more space left
2024-09-22_13-32-46.367778 [ Error ]ICEORYX error! POSH__MEMPOOL_POSSIBLE_DOUBLE_FREE

newHead.abaCounter += 1 should be -> newHead.abaCounter = oldHead.abaCounter +1
it can fix it;

Operating system:
E.g. Ubuntu 18.04 LTS

Compiler version:
E.g. GCC 7.4.0

Eclipse iceoryx version:
E.g. v1.2.3 or main branch

Observed result or behaviour:
A clear and precise description of the observed result.

Expected result or behaviour:
What do you expect to happen?

Conditions where it occurred / Performed steps:
Describe how one can reproduce the bug.

Additional helpful information

If there is a core dump, please run the following command and add the output to the issue in a separate comment

gdb --batch \
   --ex "shell printf '\n\033[33m#### Local Variables ####\033[m\n'"  --ex "info locals" \
   --ex "shell printf '\n\033[33m#### Threads ####\033[m\n'"          --ex "info threads" \
   --ex "shell printf '\n\033[33m#### Shared Libraries ####\033[m\n'" --ex "info sharedlibrary" \
   --ex "shell printf '\n\033[33m#### Stack Frame ####\033[m\n'"      --ex "info frame" \
   --ex "shell printf '\n\033[33m#### Register ####\033[m\n'"         --ex "info register" \
   --ex "shell printf '\n\033[33m#### Backtrace ####\033[m'"          --ex "thread apply all bt" \
   --core coreDumpFile binaryFile
@elfenpiff
Copy link
Contributor

@tang-qh this is an amazing finding! You are completely right and we need to fix this - the algorithm is not correct here.

And thank you a lot for providing the fix with it!

Btw. when we ported the lock-free code to next-gen iceoryx2 we fixed this by accident, see this here: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/iceoryx2-bb/lock-free/src/mpmc/unique_index_set.rs#L474

@elfenpiff
Copy link
Contributor

@tang-qh Since you found the bug, would you like to have the honor of fixing it and creating a pull request?

If not, I would fix it by applying your solution.

@tang-qh
Copy link
Author

tang-qh commented Sep 24, 2024

@elfenpiff I don't know which branch to submit to, please fix it directly~

@tang-qh
Copy link
Author

tang-qh commented Sep 24, 2024

diff --git a/iceoryx_hoofs/source/concurrent/loffli.cpp b/iceoryx_hoofs/source/concurrent/loffli.cpp
index 54e178c..6d382cc 100644
--- a/iceoryx_hoofs/source/concurrent/loffli.cpp
+++ b/iceoryx_hoofs/source/concurrent/loffli.cpp
@@ -62,7 +62,7 @@ bool LoFFLi::pop(Index_t& index) noexcept
 
         // NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic) upper limit of index set by m_size
         newHead.indexToNextFreeIndex = m_nextFreeIndex.get()[oldHead.indexToNextFreeIndex];
-        newHead.abaCounter += 1;
+        newHead.abaCounter = oldHead.abaCounter + 1;
     } while (!m_head.compare_exchange_weak(oldHead, newHead, std::memory_order_acq_rel, std::memory_order_acquire));
 
     /// comes from outside, is not shared and therefore no synchronization is needed
@@ -104,7 +104,7 @@ bool LoFFLi::push(const Index_t index) noexcept
         // NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic) index is limited by capacity
         m_nextFreeIndex.get()[index] = oldHead.indexToNextFreeIndex;
         newHead.indexToNextFreeIndex = index;
-        newHead.abaCounter += 1;
+        newHead.abaCounter = oldHead.abaCounter + 1;
     } while (!m_head.compare_exchange_weak(oldHead, newHead, std::memory_order_acq_rel, std::memory_order_acquire));

@elBoberido
Copy link
Member

@tang-qh can you create a pull request?

If you cannot sign the eclipse contribution agreement, we could of course also create a PR and merge this. But since you found and fixed the bug you should also have the honor to be the author of the commit.

@tang-qh
Copy link
Author

tang-qh commented Oct 8, 2024

@elBoberido hi, I have submitted a repaired pull request;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants