Skip to content
This repository has been archived by the owner on Oct 15, 2024. It is now read-only.

augeas plugin might crash #3120

Closed
markus2330 opened this issue Oct 27, 2019 · 20 comments
Closed

augeas plugin might crash #3120

markus2330 opened this issue Oct 27, 2019 · 20 comments
Milestone

Comments

@markus2330
Copy link
Contributor

When doing kdb ls I got a crash in Augeas:

==14172== Callgrind, a call-graph generating cache profiler
==14172== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==14172== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==14172== Command: kdb ls /
==14172== 
==14172== For interactive control, run 'callgrind_control -h'.
==14172== brk segment overflow in thread #1: can't grow to 0x4830000
==14172== (see section Limitations in user manual)
==14172== NOTE: further instances of this message will not be shown
free(): double free detected in tcache 2

Sorry, I crashed by the signal SIGABRT
This should not have happened!

Please report the issue at https://issues.libelektra.org/
==14172== 
==14172== Process terminating with default action of signal 6 (SIGABRT): dumping core
==14172==    at 0x4B097BB: raise (raise.c:51)
==14172==    by 0x4AF4534: abort (abort.c:79)
==14172==    by 0x1D631B: catchSignal(int) (main.cpp:110)
==14172==    by 0x4B0983F: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
==14172==    by 0x4B097BA: raise (internal-signals.h:84)
==14172==    by 0x4AF4534: abort (abort.c:79)
==14172==    by 0x4B4B507: __libc_message (libc_fatal.c:181)
==14172==    by 0x4B51C19: malloc_printerr (malloc.c:5341)
==14172==    by 0x4B536FC: _int_free (malloc.c:4193)
==14172==    by 0x4B41880: fclose@@GLIBC_2.2.5 (iofclose.c:77)
==14172==    by 0x4F09313: elektraAugeasGet (augeas.c:540)
==14172==    by 0x4908B3C: kdbGet (kdb.c:554)
==14172== 
==14172== Events    : Ir
==14172== Collected : 30693614075
==14172== 
==14172== I   refs:      30,693,614,075
zsh: abort      valgrind --tool=callgrind kdb ls /

The problem was quite obviously introduced by 339d34d which does not return correctly in the error cases anymore.

A code review is needed if other places with return in macros are affected.

@markus2330 markus2330 assigned ghost Oct 27, 2019
@ghost
Copy link

ghost commented Oct 28, 2019

Could you please give me a reproducible example on how to trigger this error?
I pulled the latest master, installed it (with augeas plugin) and do not experience any problems.

@markus2330
Copy link
Contributor Author

markus2330 commented Oct 28, 2019

The error obviously only occurs if fopen fails. It could be reproduced by using LD_PRELOAD (e.g. see https://stackoverflow.com/questions/35771395/why-doesnt-ld-preload-trick-catch-open-when-called-by-fopen) with a shared library that fails on the Nth call of fopen. If you really want to do that, I can help you how to do it. It might also help to find other segfaults or unhelpful error messages.

Otherwise, as already written above, I think code review is the more efficient approach to find these problems.

@markus2330 markus2330 added the bug label Nov 11, 2019
@markus2330
Copy link
Contributor Author

Any progress here? I got the same segfault again.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fdad623e535 in __GI_abort () at abort.c:79
#2  0x000055e1872b330c in catchSignal (signum=<optimized out>) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/main.cpp:110
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00007fdad623e535 in __GI_abort () at abort.c:79
#6  0x00007fdad6295508 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fdad63a028d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#7  0x00007fdad629bc1a in malloc_printerr (str=str@entry=0x7fdad63a1f58 "free(): double free detected in tcache 2") at malloc.c:5341
#8  0x00007fdad629d6fd in _int_free (av=0x7fdad63d7c40 <main_arena>, p=0x55e18e5b2c30, have_lock=<optimized out>) at malloc.c:4193
#9  0x00007fdad628b881 in _IO_new_fclose (fp=fp@entry=0x55e18e5b2c40) at iofclose.c:77
#10 0x00007fdad5fa3314 in elektraAugeasGet (handle=<optimized out>, returned=0x55e1892bcdd0, parentKey=0x55e1891f61d0)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/augeas/augeas.c:540
#11 0x00007fdad659dc7d in elektraGetDoUpdate (parentKey=0x55e1891f61d0, split=0x55e1892b8320) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:592
#12 kdbGet (handle=<optimized out>, ks=0x55e1891b0390, parentKey=0x55e1891f61d0) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:1236
#13 0x000055e187243a2a in kdb::KDB::get (this=<optimized out>, returned=..., parentKey=...)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/bindings/cpp/include/keyset.hpp:592
#14 0x000055e1872612b2 in FindCommand::execute (this=<optimized out>, cl=...) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/find.cpp:33
#15 0x000055e18724194d in main (argc=<optimized out>, argv=0x7ffc15dd7bc8) at /usr/include/c++/8/bits/unique_ptr.h:342

@ghost
Copy link

ghost commented Nov 11, 2019

I will add it higher in my priority

@ghost ghost mentioned this issue Nov 13, 2019
16 tasks
@markus2330 markus2330 added this to the 0.9.1 milestone Nov 15, 2019
@markus2330
Copy link
Contributor Author

Any progress here? It is only about adding a few return statements and such crashes are really severe.

@ghost
Copy link

ghost commented Nov 15, 2019

I will Look at it in sunday/monday morning...

@markus2330
Copy link
Contributor Author

Thank you. The faster the better as this (and #3197) are release blockers and lcdproc is waiting for a release.

@ghost
Copy link

ghost commented Nov 17, 2019

If you really want to do that, I can help you how to do it. It might also help to find other segfaults or unhelpful error messages.

I fear that you need to help me because I cannot reproduce this error. Do you have a mounted yml file which has some problems in it or why exactly is the augeas plugin called?

@markus2330
Copy link
Contributor Author

Please look at the code. It is easy to see in the code what was done wrongly but such error is difficult to reproduce in all its forms.

Without looking at the code, we can never get Elektra as stable as it was before introducing the new errors. Plenty of regressions were introduced in #2686, this cannot be fixed without understanding the code. return also exists in Java, so the semantics should be clear to you.

@ghost
Copy link

ghost commented Nov 17, 2019

Please look at the code.

I did it more than once ... The whole stacktrace tells me that when I try to close the file descriptor, there is a double "free()" call (free(): double free detected in tcache 2). Though I absolutely cannot relate this to anything I have changed.

It is easy to see in the code what was done wrongly

No it is absolutely not for me because this is related to errors in memory management. Also your sentence is partly insulting as you assume I am too dumb to find the problem because it is "easy"...

return also exists in Java, so the semantics should be clear to you.

Yes but not memory management or this kind of file opening/closing mechanism and as far as I understand this error it is related to this

@markus2330
Copy link
Contributor Author

markus2330 commented Nov 17, 2019

Also your sentence is partly insulting as you assume I am too dumb to find the problem because it is "easy"...

This is your interpretation. "easy" was about that the code changes should not change any semantics, so it is enough to check if semantics were changed. What I said (and still say) is that you should look at the code without any assumptions of free/malloc/memory.

So forget everything about the stacktrace. Only look at 339d34d and about where return has been before and where it is missing now. One more hint: Macros are simple textual replacements, returning from them means to return from wherever the macro is used.

@ghost
Copy link

ghost commented Nov 18, 2019

So forget everything about the stacktrace. Only look at 339d34d and about where return has been before and where it is missing now. One more hint: Macros are simple textual replacements, returning from them means to return from wherever the macro is used.

Ok thank you, this was the information I was missing. I was on the wrong path before.
I opened a PR here: #3244

@markus2330
Copy link
Contributor Author

Was this the only occurrence of a macro that had a return in the whole error changes?

@ghost
Copy link

ghost commented Nov 18, 2019

Btw, augeas was the only case which I remember that had this sort of macro replacement. I replaced all three occurrences.

I will take an extra look this evening in the old revision to check for that

@markus2330
Copy link
Contributor Author

Thank you!

@ghost
Copy link

ghost commented Nov 18, 2019

I have looked through the error settings in v0.8.26 and as far as I can tell it did not occur somewhere else than in the augeas section.

So I think after we merge the PR this can be closed

@markus2330
Copy link
Contributor Author

Thank you for looking! So mounting and augeas were the only regressions.

@ghost
Copy link

ghost commented Nov 20, 2019

I think we may close this issue

@markus2330
Copy link
Contributor Author

Looks good, it does not crash anymore, instead I get:

Sorry, module augeas issued the error C01200:
Installation: Lens not found

@markus2330
Copy link
Contributor Author

Actually this is another known bug: the name of the lens should be printed. #203

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant