Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behavior when often resizing #379

Open
skinkie opened this issue Mar 7, 2025 · 4 comments
Open

Strange behavior when often resizing #379

skinkie opened this issue Mar 7, 2025 · 4 comments

Comments

@skinkie
Copy link

skinkie commented Mar 7, 2025

I have some Windows users that cannot use sparse files and I was requested to implement slowly growing files myself. I am doing such via env.set_mapsize(self.initial_size). Effectively every time that lmdb.MapFullError is catched the database increases linearly. I am ending up in very odd errors.

The day before yesterday I opened an issue on the upstream bugtracker. Which was closed stating it was not a bug in LMDB. The error then was Assertion 'IS_BRANCH(mc->mc_pg[mc->mc_top])' failed in mdb_cursor_sibling() now I am ending up with:

LMDB full, resizing...
Resizing LMDB from 805306368 to 1073741824 bytes
build/lib/mdb.c:5966: Assertion 'IS_LEAF(mp)' failed in mdb_cursor_next()

My application in essence does the following:

  1. There are two env's, one is read only, the other one is writing.
  2. The input is read following a extract-transform step, the loading happens in the second env, batched in a separate thread
  3. The writing resizes each time the writer hits the MapFullError

The issue can be fully reproduced. It will result in the same error, at the same position, considering the same input. It does not consistenly occur upon resizing.

I was referred in the upstream bug to the documentation.

Set the size of the memory map to use for this environment.

The size should be a multiple of the OS page size. The default is 10485760 bytes. The size of the memory map is also the maximum size of the database. The value should be chosen as large as possible, to accommodate future growth of the database. This function should be called after mdb_env_create() and before mdb_env_open(). It may be called at later times if no transactions are active in this process. Note that the library does not check for this condition, the caller must ensure it explicitly.

The new size takes effect immediately for the current process but will not be persisted to any others until a write transaction has been committed by the current process. Also, only mapsize increases are persisted into the environment.

If the mapsize is increased by another process, and data has grown beyond the range of the current mapsize, mdb_txn_begin() will return MDB_MAP_RESIZED. This function may be called with a size of zero to adopt the new size.

Any attempt to set a size smaller than the space already consumed by the environment will be silently changed to the current size of the used space.

My understanding from the above.

It may be called at later times if no transactions are active in this process. Note that the library does not check for this condition, the caller must ensure it explicitly. I can call the resize each time a transaction fails due to the MapFullError. Given that there are not other operations on that env. But if I read carefully: in this process. This suggest that no transactions at all may be active, also not reading in the other env.

I hope someone can help me in the right direction.

__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44                                                                                                                
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff76a56d3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:89
#2  0x00007ffff764bba0 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7633582 in __GI_abort () at abort.c:73
#4  0x00007ffff61e56db in mdb_assert_fail (env=0x55555882b8b0, expr_txt=expr_txt@entry=0x7ffff61fc0df "IS_LEAF(mp)", func=func@entry=0x7ffff61fc920 <__func__.11> "mdb_cursor_next", line=line@entry=5966, 
    file=0x7ffff61fc000 "build/lib/mdb.c") at build/lib/mdb.c:1571
#5  0x00007ffff61e6ab9 in mdb_cursor_next (mc=0x555558cfd840, key=0x7fffa2613880, data=0x7fffa2613890, op=MDB_NEXT) at build/lib/mdb.c:5966
#6  0x00007ffff61e9d9b in mdb_cursor_get (mc=0x555558cfd840, key=key@entry=0x7fffa2613880, data=data@entry=0x7fffa2613890, op=op@entry=MDB_NEXT) at build/lib/mdb.c:6480
#7  0x00007ffff61faab8 in _cursor_get_c (op=MDB_NEXT, self=0x7fffa2613830) at lmdb/cpython.c:1977
#8  iter_next (self=0x7fff98484270) at lmdb/cpython.c:3111
#9  0x00007ffff7afb82a in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:2324
#10 0x00007ffff7b03d83 in PyEval_EvalCode (co=0x5555556aca50, globals=<optimized out>, locals=0x7ffff7606080) at Python/ceval.c:578
#11 0x00007ffff7b07878 in run_eval_code_obj (tstate=tstate@entry=0x7ffff7e4a090 <_PyRuntime+458992>, co=co@entry=0x5555556aca50, globals=globals@entry=0x7ffff7606080, locals=locals@entry=0x7ffff7606080) at Python/pythonrun.c:1722
#12 0x00007ffff7b079aa in run_mod (mod=mod@entry=0x55555567eb68, filename=filename@entry=0x7ffff7559060, globals=globals@entry=0x7ffff7606080, locals=locals@entry=0x7ffff7606080, flags=flags@entry=0x7fffffffdc30, 
    arena=arena@entry=0x7ffff752be30) at Python/pythonrun.c:1743
#13 0x00007ffff7b07ad9 in pyrun_file (fp=fp@entry=0x5555555a9290, filename=filename@entry=0x7ffff7559060, start=start@entry=257, globals=globals@entry=0x7ffff7606080, locals=locals@entry=0x7ffff7606080, closeit=closeit@entry=1, 
    flags=0x7fffffffdc30) at Python/pythonrun.c:1643
#14 0x00007ffff7b1decc in _PyRun_SimpleFileObject (fp=0x5555555a9290, filename=0x7ffff7559060, closeit=1, flags=0x7fffffffdc30) at Python/pythonrun.c:433
#15 0x00007ffff7b1ea59 in _PyRun_AnyFileObject (fp=0x5555555a9290, filename=0x7ffff7559060, closeit=1, flags=0x7fffffffdc30) at Python/pythonrun.c:78
#16 0x00007ffff7b23cbf in pymain_run_file_obj (program_name=0x7ffff75596f0, filename=0x7ffff7559060, skip_source_first_line=<optimized out>) at Modules/main.c:360
--Type <RET> for more, q to quit, c to continue without paging--
#17 pymain_run_file (config=0x7ffff7decc70 <_PyRuntime+77008>) at Modules/main.c:379
#18 pymain_run_python (exitcode=0x7fffffffdc00) at Modules/main.c:633
#19 Py_RunMain () at Modules/main.c:713
#20 0x00007ffff7635488 in __libc_start_call_main (main=main@entry=0x555555555020 <main>, argc=argc@entry=7, argv=argv@entry=0x7fffffffde58) at ../sysdeps/nptl/libc_start_call_main.h:58
#21 0x00007ffff763554c in __libc_start_main_impl (main=0x555555555020 <main>, argc=7, argv=0x7fffffffde58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffde48) at ../csu/libc-start.c:360
#22 0x0000555555555055 in _start ()
@vEpiphyte
Copy link

The issue can be fully reproduced.

Do you have a reproducer that can be shared?

@skinkie
Copy link
Author

skinkie commented Mar 9, 2025

It is in a public github repository, but it is not a "few lines of code",.but I might found the issue. I think the issue is actually in a situation where there is both a reader and a writer on the same table. But I still need to validate that it fails at that exact point.

@jnwatson
Copy link
Owner

jnwatson commented Mar 9, 2025

Do you have 2 separate Environment objects in the same process? If so, that's not allowed.

@skinkie
Copy link
Author

skinkie commented Mar 10, 2025

@jnwatson Two seperate objects (one read-only, and one read-write). And with that, I have not seen any issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants