Skip to content
This repository has been archived by the owner on Sep 22, 2022. It is now read-only.

Crash on APPENDDUP #126

Closed
AskAlexSharov opened this issue Oct 23, 2020 · 16 comments
Closed

Crash on APPENDDUP #126

AskAlexSharov opened this issue Oct 23, 2020 · 16 comments
Assignees
Labels

Comments

@AskAlexSharov
Copy link
Contributor

AskAlexSharov commented Oct 23, 2020

backtrace.log
Oops, it's with -O2, will create one with -O0 in 2-3 hours.

@erthink
Copy link
Owner

erthink commented Oct 23, 2020

backtrace.log
Oops, it's with -O2, will create one with -O0 in 2-3 hours.

-Og is enough in most cases but runs much faster.

Nonetheless, current backtrace is enough.
More precisely, it is already clear that backtrace will not help, since the problem happened earlier, i.e. I need to look for a way to reproduce bug through simple tests.

@erthink
Copy link
Owner

erthink commented Oct 23, 2020

Hypothetically, we can try enabling internal audit.
But in your scenarios (and volumes), it may take weeks for the test to reach the bug.

If you have a persistent dataset and a reproducible sequence of operations, we can use an audit-enabled build, but turn audit on immediately before the transaction in which error occurs.

However, it is desirable for me to have a lean testcase based on the current "test framework" to include it in CI.
So I will try to reproduce the problem (i.e. find a testcase) at least today and over the weekend.

@AskAlexSharov
Copy link
Contributor Author

mdbx_chk didn't find errors that db. loading of very similar (but not totally same) data to empty db - passed successfully. will collect full dataset tonight, then will try to reduce it.

@erthink
Copy link
Owner

erthink commented Oct 23, 2020

mdbx_chk didn't find errors that db.

This is the expected behavior because an erroneous transaction was not committed (cannot be committed).

@AskAlexSharov
Copy link
Contributor Author

here data set 14Gb (42Gb): http://188.166.233.190/alex_full.log.gz
smaller data sets don't reproduce problem

@AskAlexSharov
Copy link
Contributor Author

AskAlexSharov commented Oct 24, 2020

FYI: Something more broken in last devel (just did git pull), 1 test in our project failed - but test is complex and i didn’t understand yet what’s wrong (it looks like cursor returned too short value - shorter than anything i put to DBI. But not 100% sure yet). Current master - works.

@AskAlexSharov
Copy link
Contributor Author

On today's devel my test passed

@erthink
Copy link
Owner

erthink commented Oct 26, 2020

I still don't know if there was a problem/error, what caused it, or if anything was fixed.
So I'll continue working with the tests and your test data.

@erthink
Copy link
Owner

erthink commented Oct 26, 2020

Некоторые методы и результаты проверок (кроме доработок и прогона mdbx_test):

  1. Ваши "логи" я преобразовал в текстовый дамп BerkeleyDB, формат которого исторически используется в mdbx_dump и mdbx_load.

Файл HEADER:

VERSION=3
geometry=l268435456,c268435456,u25769803776,s268435456,g268435456
mapsize=756375552
maxreaders=120
format=bytevalue
database=TBL0001
type=btree
db_pagesize=4096
duplicates=1
dupsort=1
HEADER=END
$ (cat HEADER && gzip -d < alex_full.log.gz | sed 's/^\([0-9a-f]\+\),\([0-9a-f]\+\)$/ \1\n \2/g' && echo DATA=END) | lz4 > src.dump.lz4
  1. Этот дамп я загрузил в новую БД посредством mdbx_load.
    Технически при этом используются аналогичные вашим операции добавления данных через курсор.
$ lz4 -d < src.dump.lz4 | ./mdbx_load -an proba.db
  1. После этого получил её дамп, посчитал sha256-дайджест и сравнил с исходным:
$ ./mdbx_dump -a proba.db | tail -n +4 | sha256sum
mdbx_dump v0.9.1-60
Running for proba.db...
ac8a2c7f9acc2201a4c1fc64829aed476821dbfb4c5685e12a422939c80efb09  -
$ lz4 -d < src.dump.lz4 | tail -n +4 | sha256sum
ac8a2c7f9acc2201a4c1fc64829aed476821dbfb4c5685e12a422939c80efb09 -

Прослойка tail -n +4 здесь нужна чтобы отрезать строки заголовка зависящие от размера БД, который после загрузки данных определяется самими данными.
Соответственно, при равенстве дайджестов можно надеяться на совпадение исходных данных и содержимого БД.

На всякий:

$ sha256sum alex_full.log.gz
	0cc400fbeda306d89c4e68aa120a2b5983624e0874578699e33914f9a4a0ae5c  alex_full.log.gz

Это для текущей ветки devel.
Чуть позже попробую по git reflog проверить предыдущие версии, чтобы понять была ли проблема и каким коммитом она исправлена.

@erthink
Copy link
Owner

erthink commented Oct 27, 2020

So, I didn't found trouble at commits: faddc71, ca2ecf2, 2120e39, 112ce74, etc (afa264b6, 44b1a3b).

However, there is space for human error.
So, feed me the commit where you had problems, if there is such information.

@AskAlexSharov
Copy link
Contributor Author

Thank, very useful guide on how to get and share mdbx dumps. Only problem - i can't use mdbx_dump if APPEND_DUP returned error (because last key - where error happened - not written into db).

Let me run again that script which failed on last devel (next time I will add commit hash into messages): faddc71

@AskAlexSharov
Copy link
Contributor Author

Reproduced on faddc71 problem with MDBX_CORRUPTED after APPEND_DUP (of that data which I sent you into DB which was not empty, but DBI was empty):
Do I need send you full dump of db?

@erthink
Copy link
Owner

erthink commented Oct 27, 2020

Reproduced on faddc71 problem with MDBX_CORRUPTED after APPEND_DUP (of that data which I sent you into DB which was not empty, but DBI was empty):
Do I need send you full dump of db?

I need a way to reproduce the problem.
So in general "yes", I would prefer to get a specific DB in its entirety (not just a dump), if this will reproduce the problem.

@erthink
Copy link
Owner

erthink commented Oct 29, 2020

Please, let us and you deal with the problem, even if it requires some effort.
I am waiting for any reasonable information for you.

@AskAlexSharov
Copy link
Contributor Author

yes, i'm working on sharing that db file... (I broken my VPN setup, re-creating it now).

@AskAlexSharov
Copy link
Contributor Author

AskAlexSharov commented Oct 30, 2020

Good news: i've succeed to reproduce problem without original DB - by loading alex_full.log into empty DB.

To do so:

  • put data only by APPENDDUP (without APPEND)
  • don't do intermediate commits (only 1 commit at the end, but script stops earlier).

mdbx_load - does intermediate commits and can't reproduce (when I do intermediate commits - problem also doesn't reproduce).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants