Skip to content
This repository has been archived by the owner on Sep 22, 2022. It is now read-only.

Unexpected 'MDBX_PROBLEM' error when reusing cursors from read-only transactions #272

Closed
mriccobene opened this issue Feb 25, 2022 · 7 comments

Comments

@mriccobene
Copy link

I am getting occasionally exceptions (MDBX_PROBLEM) on mdbx_cursor_bind.
For some reasons, I have a cursor with thread storage duration (thread_local) and I rebind it using mdbx_cursor_bind when I need to re-use it for another operation on a different transaction.

The code where the error MDBX_PROBLEM is generated is this:

libmdbx/src/core.c

Lines 16468 to 16474 in d01e44d

if (mc->mc_signature == MDBX_MC_LIVE) {
if (unlikely(!mc->mc_txn ||
mc->mc_txn->mt_signature != MDBX_MT_SIGNATURE)) {
mdbx_error("Wrong cursor's transaction %p 0x%x",
__Wpedantic_format_voidptr(mc->mc_txn),
mc->mc_txn ? mc->mc_txn->mt_signature : 0);
return MDBX_PROBLEM;

In a my failed run the checks of this code are as follows:

  1. mc->mc_signature == MDBX_MC_LIVE => ok, the cursor has thread local duration, it will be closed at program termination (!), so no clean up code was called since the last use
  2. mc->mc_txn != 0 => it is possible: the cursor was previously used so has an old txn address
  3. mc->mc_txn->mt_signature != MDBX_MT_SIGNATURE => this condition is WRONG according to the code but it is NOT_IMPORTANT according to me: the previous transaction was destroyed, we are binding the cursor to another transaction so the memory area of the old transaction may have been reclaimed for other purposes so we are reading dirty data
    On some runs the memory area casually can be untouched so I get no error, sometimes on the contrary can be dirty and I got the error.

I think that the check on the mc->mc_txn is wrong here, it can be omitted.

This is the state of variables:
image

I got the same on the releases: v0.11.4, v0.11.5

@erthink
Copy link
Owner

erthink commented Feb 25, 2022

I will deal with your issue later, but it is reasonable to clarify the essence beforehand:

  • transaction has a linked list of binded cursors;
  • the transaction has a linked list of binded cursors (the mc_txn becomes zero);
  • as another way a cursor can be closed or unbinded explicitly.

But in all these cases, the mc_signature == MDBX_MC_LIVE meas that mc_txn is non-zero and points to the live transaction with valid mt_signature == MDBX_MT_SIGNATURE.

@erthink
Copy link
Owner

erthink commented Feb 26, 2022

Basically, libmdbx's cursors is allocated (malloc/free) internally, and visible outside only as a pointers.
So I didn't understand the reasoning of mentioning thread_local and then about (automatic?) closing the cursor when the program termination.

As I wrote above, a binded cursor is linked to a transaction object and unlink from it on transaction termination or explicitly during cursor rebind or close.

Please provide a test case or a pseudo code of your scenario.

@mriccobene
Copy link
Author

mriccobene commented Mar 2, 2022

I have a simple test case now. Premise: I do not know if I am misusing mdbx API, in this case accept my apologies and please tell me when and how to use the mdbx_cursor_bind.

Try this code:

mdbx::env_managed env = ...
mdbx::cursor_managed cursor;

{
    mdbx::txn_managed txn1 = env.start_read();   // read-only!
    auto map = txn1.create_map(...);
    cursor.bind(txn1, map);
    [[maybe_unused]] auto data = cursor.find(mdbx::slice(...), false);
} // txn1 closed here

[[maybe_unused]] auto dummy = new char[4096];    // hopefully avoid that txn2 will be created at the same address of old txn1

{
    mdbx::txn_managed txn2 = env.start_read();   // read-only!
    auto map = txn2.create_map(...);
    cursor.bind(txn2, map);                      // ERROR HERE!
    [[maybe_unused]] auto data = cursor.find(mdbx::slice(...), false);
} // txn2 closed here

If the second transaction, txn2, will be allocated at a different address than txn1, in the second cursor bind you will see that the cursor has a member mc_txn pointing to the old, destroyed, txn1 so it is a dangling pointer. So when the code that I quoted in my first post is ran, the dangling pointer is used and likely an error is raised.

These are the variables the debug sees in the first and the second bind:

image

@erthink
Copy link
Owner

erthink commented Mar 2, 2022

Seems a bug, I'ill check/fix.

@erthink
Copy link
Owner

erthink commented Mar 3, 2022

Briefly, this commit fixes a missed flaw:

  • Cursor tracking is required to replacing shaded pages and adjusting the positions in writing transactions;
  • Thus, historically, an internal linked list was maintained for a read-write transactions, but not for a read-only.
    For this reason, the API for using cursors should be different for writing and reading transactions;
  • However, the libmdbx's API has been significantly improved, including the ability to reuse cursors and a uniform cursors behavior for any kind of transactions.
    My mistake is that due to working with MithrilDB, I forgot to make a same changes to libmdbx.

@mriccobene
Copy link
Author

Thanks! My test case now works.

@erthink
Copy link
Owner

erthink commented Mar 3, 2022

Thank you for reporting.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants