Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'BadMagicNumberException' #231

Closed
Lithopsian opened this issue Dec 3, 2023 · 26 comments
Closed

Comments

@Lithopsian
Copy link
Contributor

To be specific, from the log:
2023-12-03 21:39:19.922 [30721] DirTreeModel.cpp:645 parent(): THROW Exception: Magic number check failed for address 0x58fb8f221ac0

Note that the line number seems to have changed since the Debian version was built. but the location in parent() should be obvious. I can also reproduce it with a build from master.

To reproduce, fairly reliably but not every time:

  1. Run a package query on all packages.
  2. Open up the tree for one of the packages down to a leaf (I've tried xserver-xorg-core, xserver-xorg, libc6:amd64, mate-themes, and libgl-dev, and crashed them all, but quite a few attempts with firefox-esr didn't crash)
  3. Run a filtered package query (eg. "xserver" or "libc6").

It doesn't always crash, but it seems to be much more reliable after opening some packages than others. I don't see what they have in common. Or it might be just packages that I happen to have been looking at.

@shundhammer
Copy link
Owner

First observations:

I tried it on my desktop PC where I run OpenSUSE Leap 15.5 (rpm based), using a freshly built QDirStat from the current master branch. I could not reproduce the problem.

I tried it with qdirstat pkg:/ from the command line, opening one branch all the way down to a leaf, then using "File" -> "Show Installed Packages" and selecting "xorg", resulting in 12 'xorg-*' packages; no problem.

I tried it again, restarting qdirstat pkg:/ (I also tried that purely from the menus), this time selecting the /usr/bin/Xorg binary from the xorg-x11-server package, then reloading only "xorg" packages; again, no problem. I had thought that maybe a leftover selection in the SelectionModel might interfere if it's the same path as the one before, but that was not the case.

I'll try again with my Laptop that still has an ancient Xubuntu 18.04 LTS (yes, out of support, I know), but also the latest QDirStat.

@shundhammer
Copy link
Owner

It's the same on the Laptop with Xubuntu 18.04 LTS (dpkg based) and a freshly built QDirStat from master: I cannot reproduce this.

Please try a completely fresh rebuild; sometimes even today's sophisticated build systems fail in subtle ways and link some ancient object file to a binary. Please make sure everthing you want to keep is checked in, then

cd src/qdirstat
git checkout master
git clean -dfx
qmake
make -j 8 && sudo make install

(-j 8 for 8 parallel compile jobs, or however many CPU cores your machine has)

And then try again with that freshly built QDirStat.

@Lithopsian
Copy link
Contributor Author

Just to confirm since I didn't make it explicit. This is using a dpkg-based install. Probably not relevant, but who knows.

Some more attempts today. A package filter that returns no packages (eg. "xorg1") can also cause the exception. The package query doesn't have to be against the same package that was opened in the full tree. Might be a red herring, but going with packages near the end of the alphabet seems to help. Still haven't ever got this with firefox-esr or linux-image which sit right up at the top of the list. Sorting the list first (twice, to get alphabetical descending by name) might help.

Just once, after much ctrl-P-ing and opening of packages without hitting the exception, qdirstat just disappeared with no hint in the log. The problem might be entirely elsewhere, but the last logged line is:
2023-12-04 15:31:35.481 [7564] DirTreeModel.cpp:662 sort(): Sorting by NameCol descending

With a completely fresh build, the only difference is the line number for the exception is 588, which matches the code. Obviously lots of library differences, but the Qt version might make a big difference. I have 5.15.8 here. I have about 1,500 packages which is fairly modest. I have amd64 and i386 architectures set in dpkg. Can't think what else might be relevant and major. I have the treemap open. Closed it and it took several attempts but eventually got the exception. Open the treemap pane again and it crashed on the first attempt. Not a statistical sample, but does seem to make a difference.

@shundhammer
Copy link
Owner

shundhammer commented Dec 4, 2023

OK. I tried all kinds of things on both machines, but no problem so far.

My Leap 15.5 uses Qt 5.15, the Xubuntu 18.04 LTS a much older version, of course. But I really don't think that this makes any difference in this context.

The obvious suspect is some FileInfo / DirInfo / PkgInfo pointers pointing to a now-invalid object; something that was nuked as the DirTree was cleared and rebuilt.

That's why all of those objects have a short magic field as their first element, just for safety; because that is usually safest to check if the object is (still or already) a valid one: That field is set to 42 in the constructor and reset to 0 in the destructor, and the CHECK_MAGIC() macro (see Exception.h) checks if the pointer is non-zero, and if it is, if there is a 42 at that position. That might still lead to a segfault, but since it's a read-only access, that is much less likely.

Now, rebuilding the DirTree sends lots of signals to all the world that old pointers are becoming invalid; but that's not so trivial to handle always and everywhere. In some cases, QDirStat does not store pointers, but a path name / URL to prevent that; for example, for selections that are being processed in a batch job: Cleanup actions that iterate over many selected FileInfo items come to mind. If one of them nukes a subtree, another selected one might still be affected, for example if a user selects a directory and also one of its children. If the directory is processed first and then that branch is re-read from disk, the selected child probably became invalid, but is still stored in that set of items that are waiting to be processed.

Such things should happen only very rarely, and definitely not when the whole tree is re-read; that should clear everything. In particular, no dangling FileInfo pointers should be left over.

And from your description, that seems to be what happened to you: Somehow, one or more FileInfo pointers was still left over, not set to NULL, and it tried to traverse up the tree to its parent; which was no longer a valid DirInfo object, and a check revealed that and threw that exception.

When, how and why that happens is the question now. But that will be difficult to investigate if the problem is hard to reproduce.

Maybe start the whole thing in a debugger and set a breakpoint to the code location where that exception is thrown to get a backtrace? gdb is not very user-friendly, but the the embedded version in Qt Creator is.

@shundhammer
Copy link
Owner

You didn't by any chance trigger a new read while one was still running? I just saw that the actions are not disabled, as they should, while reading.

@shundhammer
Copy link
Owner

According to your log snippet in the first comment here, the code location should really be this:

https://github.com/shundhammer/qdirstat/blob/master/src/DirTreeModel.cpp#L588

This is a quite some lines away from DirTreeModel.cpp:645, so it looks like very old code. Does the same thing happen with the latest master or at least with the last released version V1.8.1 (there is a tag for it)?

@shundhammer
Copy link
Owner

shundhammer commented Dec 8, 2023

The problem with this code location is that it's a reimplemented virtual method that is called a gazillion times inside the Qt code, so it's not at all clear when, how and why it was called.

The Qt model/view classes for those widgets all support one internal pointer in the central QModelIndex class so the application can do the mapping between the anonymous QModelIndex and its own data; in this case, a FileInfo pointer that points to one item in QDirStat's DirTree.

The QModelIndex only knows a row and a column number for each item, and it stores the application-supplied internalPointer along with it. For everything else, it asks the application's data model; for example with the QModelIndex::parent() method. That way, it can traverse the tree without knowing anything about it.

And somewhere along that way, it gives us a QModelIndex that has an outdated internalPointer; a non-null one (or we would get a NullPointerException instead) that was already deleted because it's in the FileInfo destructor that the magic field is reset from the holy 42 to the very mortal 0.

It is not completely impossible that a newer Qt version holds on to some more QModelIndex instances that exceed the lifetime of the underlying object that the internalPointer points to; but I am very doubtful about that. I haven't seen that before; which of course does not mean that it never happens.

@Lithopsian
Copy link
Contributor Author

Yes, line 588 with the latest build. The Debian build claims to be 1.8.1, but the changelog for it hails from January.

I don't think I was interrupted ongoing reads. The treemap had displayed which pretty much guarantees the read is complete. Then I started opening up the tree, then reloaded with a package filter. I've tried quite a lot and can't reproduce this with anything except the package queries.

@Lithopsian
Copy link
Contributor Author

I'll have a play with this on a different system at the weekend. I can go with a rather old Ubuntu. I might have to build a new qdirstat for it.

@Lithopsian
Copy link
Contributor Author

So I tried an Ubuntu machine. The distro version was 1.6.1, but I thought I'd give that a try anyway. Got the exception first time:

  1. Run a package query with no filter.
  2. Sort descending for easy access to xserver-xorg-core.
  3. Open up some of the tree, the /usr/bin and /usr/lib directories, and click on a few items.
  4. Run a package query with filter "xserver".

Magic number exception. I had the details pane and treemap open for time out, but the same thing happens with them closed.

@Lithopsian
Copy link
Contributor Author

So I've managed to catch this thing in the debugger. Not sure how much it helps.

It is being called from collapseAll() from readPkg() in MainWindow. Hasn't the tree already been cleared at that point? Looks like Qt didn't get the memo in time? I can dig out more data about the state of variables if you want, but I'm not sure how much it will help.

@shundhammer
Copy link
Owner

Thanks for investigating this further. I'll have a look.

@shundhammer
Copy link
Owner

shundhammer commented Dec 11, 2023

This is the code of QTreeView::collapseAll() in Qt 5.15 (the LTS version of Qt 5.x):

https://github.com/qt/qtbase/blob/5.15/src/widgets/itemviews/qtreeview.cpp#L2740-L2755

I don't see where this would call QAbstractItemModel::parent() directly or indirectly.

Maybe a Debian patch on top of that Qt version does?

@shundhammer
Copy link
Owner

Anyway, commit e4b3f11 is a more defensive approach to checking the internalPointer of that QModelIndex: It simply returns an invalid QModelIndex which is Qt's way of saying "no parent" in this case.

It does leave a bad taste in my mouth, though, since somewhere somehow somebody holds on to a FileInfo (or PkgInfo) pointer that has become invalid.

Why? I don't know.

Is it dangerous, or is it only in that transitional phase where the DirTree and the DirTreeModel and the SelectionModel are cleared? I don't know. With a little bit of luck, it is; but I learned the hard way that relying on luck is not a good strategy in software development.

Please test this with the current master; I could still not reproduce the problem, not on my (aging, admittedly) Xubuntu 18.04 LTS nor on my openSUSE Leap 15.5 (which has that latest Qt 5.15 LTS).

@Lithopsian
Copy link
Contributor Author

I tested this for a few minutes and nothing bad happened. I'm fairly sure the problem I was seeing is gone.

I also had a quick look in the Debian patch tracker. Lots of Qt patches, but I couldn't see anything that looked like it affected QTreeView. In the debugger, there was a fairly substantial stack trace of Qt function calls between collapseAll() and the magic check, but not very informative without a debug version of the library.

@shundhammer
Copy link
Owner

OK. Thank you for coming back to this.

So, for the time being, I'll close this, even if it's just a workaround, not a real fix. Let's reopen it if the problem reappears. And let's have a watchful eye on it.

@shundhammer
Copy link
Owner

I can now reproduce it, too; on my openSUSE Leap 15.5 with RPM.

The problem moved on, of course, to the next CHECK_MAGIC() macro; it's now in

DirTreeModel.cpp:531 flags():  THROW Exception: Magic number check failed for address 0x5349f70

It fails pretty reliably when I do this:

  • Start QDirStat with qdirstat pkg:/

  • When the treemap becomes visible, click into some big tiles - leave the focus with that widget

  • Use Ctrl-P to load a different set of packages, e.g. xorg or emacs

-> Exception and crash.

@shundhammer
Copy link
Owner

I was trying to reconstruct how the signal / slot protocol works between the DirTreeModel and two different views, the DirTreeView and the TreemapView, and the SelectionModel.

I saw that the TreemapView uses a SelectionModelProxy to prevent signal ping-pong; the DirTreeView does not use one. So I wondered if the problem might be specific to the TreemapView being the initiator of the signals.

In past attempts, I had probably always used the DirTreeView and selected items there with the mouse or with the keyboard. That might indeed be the big difference.

Working Theory

Maybe the DirTreeView also needs a SelectionModelProxy to manage blocking and unblocking signals in critical phases; if a signal telling the connected views does not arrive, that view might not get the message that FileInfo pointers are now invalid, and still hold on to them.

@shundhammer
Copy link
Owner

shundhammer commented Dec 16, 2023

After enabling core dumps on that systemd-controlled system, coredumpctl info got me a backtrace. It's basically in the DirTreeView's paint event.

And it happens reliably since I added that BusyPopup to readPkg(): That opens a small splash screen (a.k.a. "toast" in the Android world) and processes some events to ensure it gets displayed, then proceeds to do the time-consuming read operations to query the package database, go through all packages and to lstat() to get the actual file sizes etc. for all the files that belong to each package.

And it appears that those paint operations, needing data from the model, use stored QModelIndexes that hold internal pointers to now outdated FileInfo objects.

Backtrace

#0   raise (libc.so.6 + 0x4ad2b)
#1   abort (libc.so.6 + 0x4c3e5)
#2   n/a (libstdc++.so.6 + 0xa84a7)
#3   n/a (libstdc++.so.6 + 0xb7c1c)
#4   _ZSt9terminatev (libstdc++.so.6 + 0xb7c87)
#5   __cxa_throw (libstdc++.so.6 + 0xb7ee8)
#6   _Z13_throw_helperI23BadMagicNumberExceptionEvRKT_P6LoggerRK7QStringiS8_ (qdirstat + 0x6e0cf)
#7   _ZNK8QDirStat12DirTreeModel5flagsERK11QModelIndex (qdirstat + 0x6b5c9)
          QDirStat::DirTreeModel::flags(QModelIndex const&) const

#8   _ZNK16QTreeViewPrivate18hasVisibleChildrenERK11QModelIndex (libQt5Widgets.so.5 + 0x468c34)
           QTreeViewPrivate::hasVisibleChildren(QModelIndex const&) const

#9   _ZNK16QTreeViewPrivate18itemDecorationRectERK11QModelIndex (libQt5Widgets.so.5 + 0x46c855)
#10  _ZNK16QTreeViewPrivate16itemDecorationAtERK6QPoint (libQt5Widgets.so.5 + 0x46cb00)
#11  _ZNK9QTreeView8drawTreeEP8QPainterRK7QRegion (libQt5Widgets.so.5 + 0x46cd45)
          QTreeView::drawTree(QPainter*, QRegion const&) const

#12  _ZN9QTreeView10paintEventEP11QPaintEvent (libQt5Widgets.so.5 + 0x471285)
#13  _ZN7QWidget5eventEP6QEvent (libQt5Widgets.so.5 + 0x1dfda8)
#14  _ZN6QFrame5eventEP6QEvent (libQt5Widgets.so.5 + 0x28729e)
#15  _ZN17QAbstractItemView13viewportEventEP6QEvent (libQt5Widgets.so.5 + 0x407c93)
#16  _ZN9QTreeView13viewportEventEP6QEvent (libQt5Widgets.so.5 + 0x4721ac)
#17  _ZN23QCoreApplicationPrivate29sendThroughObjectEventFiltersEP7QObjectP6QEvent (libQt5Core.so.5 + 0x2d6e00)
#18  _ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x19f512)
#19  _ZN12QApplication6notifyEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x1a62ff)
#20  _ZN16QCoreApplication15notifyInternal2EP7QObjectP6QEvent (libQt5Core.so.5 + 0x2d7073)
#21  _ZN14QWidgetPrivate14sendPaintEventERK7QRegion (libQt5Widgets.so.5 + 0x1d8235)
#22  _ZN14QWidgetPrivate10drawWidgetEP12QPaintDeviceRK7QRegionRK6QPoint6QFlagsINS_14DrawWidgetFlagEEP8QPainterP21QWidgetRepaintManager (libQt5Widgets.so.5 + 0x1d8e47)
#23  n/a (libQt5Widgets.so.5 + 0x1afa7b)
#24  n/a (libQt5Widgets.so.5 + 0x1b0148)
#25  _ZN14QWidgetPrivate16syncBackingStoreEv (libQt5Widgets.so.5 + 0x1d990f)
#26  _ZN7QWidget5eventEP6QEvent (libQt5Widgets.so.5 + 0x1dff28)
#27  _ZN11QMainWindow5eventEP6QEvent (libQt5Widgets.so.5 + 0x2f1590)
#28  _ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x19f53c)
#29  _ZN12QApplication6notifyEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x1a62ff)
#30  _ZN16QCoreApplication15notifyInternal2EP7QObjectP6QEvent (libQt5Core.so.5 + 0x2d7073)
#31  _ZN23QCoreApplicationPrivate16sendPostedEventsEP7QObjectiP11QThreadData (libQt5Core.so.5 + 0x2d9a91)
#32  n/a (libQt5Widgets.so.5 + 0x4b3f22)
#33  n/a (libQt5Widgets.so.5 + 0x4b8959)
#34  _ZN7QObject5eventEP6QEvent (libQt5Core.so.5 + 0x30b50b)
#35  _ZN14QGraphicsScene5eventEP6QEvent (libQt5Widgets.so.5 + 0x4c444b)
#36  _ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x19f53c)
#37  _ZN12QApplication6notifyEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x1a62ff)
#38  _ZN16QCoreApplication15notifyInternal2EP7QObjectP6QEvent (libQt5Core.so.5 + 0x2d7073)
#39  _ZN23QCoreApplicationPrivate16sendPostedEventsEP7QObjectiP11QThreadData (libQt5Core.so.5 + 0x2d9a91)
#40  n/a (libQt5Core.so.5 + 0x33a313)
#41  g_main_context_dispatch (libglib-2.0.so.0 + 0x5582b)
#42  n/a (libglib-2.0.so.0 + 0x55bd0)
#43  g_main_context_iteration (libglib-2.0.so.0 + 0x55c5c)
#44  _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5 + 0x3399bc)
#45  _ZN10QEventLoop13processEventsE6QFlagsINS_17ProcessEventsFlagEEi (libQt5Core.so.5 + 0x2d5537)
#46  _ZN8QDirStat9BusyPopup13processEventsEi (qdirstat + 0x37c94)
#47  _ZN8QDirStat9BusyPopup4postEv (qdirstat + 0x37ce6)
#48  _ZN8QDirStat9BusyPopupC2ERK7QStringP7QWidgetb (qdirstat + 0x37d9c)
#49  _ZN10MainWindow7readPkgERKN8QDirStat9PkgFilterE (qdirstat + 0xd51e6)
#50  _ZN10MainWindow10askOpenPkgEv (qdirstat + 0xd56f4)
#51  _ZN10MainWindow18qt_static_metacallEP7QObjectN11QMetaObject4CallEiPPv (qdirstat + 0x14234b)
#52  n/a (libQt5Core.so.5 + 0x3147e5)
#53  _ZN7QAction9triggeredEb (libQt5Widgets.so.5 + 0x198f82)
#54  _ZN7QAction8activateENS_11ActionEventE (libQt5Widgets.so.5 + 0x19b5cc)
#55  _ZN7QAction5eventEP6QEvent (libQt5Widgets.so.5 + 0x19bf61)
#56  _ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x19f53c)
#57  _ZN12QApplication6notifyEP7QObjectP6QEvent (libQt5Widgets.so.5 + 0x1a62ff)
#58  _ZN16QCoreApplication15notifyInternal2EP7QObjectP6QEvent (libQt5Core.so.5 + 0x2d7073)
#59  _ZN12QShortcutMap13dispatchEventEP9QKeyEvent (libQt5Gui.so.5 + 0x196ac6)
#60  _ZN12QShortcutMap11tryShortcutEP9QKeyEvent (libQt5Gui.so.5 + 0x19753a)
#61  _ZN22QWindowSystemInterface19handleShortcutEventEP7QWindowmi6QFlagsIN2Qt16KeyboardModifierEEjjjRK7QStringbt (libQt5Gui.so.5 + 0x143503)
#62  _ZN22QGuiApplicationPrivate15processKeyEventEPN29QWindowSystemInterfacePrivate8KeyEventE (libQt5Gui.so.5 + 0x164687)
#63  _ZN22QGuiApplicationPrivate24processWindowSystemEventEPN29QWindowSystemInterfacePrivate17WindowSystemEventE (libQt5Gui.so.5 + 0x169075)

@shundhammer
Copy link
Owner

shundhammer commented Dec 16, 2023

Commit 4ac7e75 changes the order in readPkg():

  • First clear all data
  • Post the BusyPopup which will process events (including the QTreeView::paintEvent()
  • Collapse the tree items (which are now none since the model is empty) unneeded
  • Read the list of packages from the package manager
  • Iterate over those packages and filter out the requested ones:
    • For each relevant one, fetch the file list and do `lstat()´ to get the current file size

@Lithopsian
Copy link
Contributor Author

I couldn't reproduce this by the same method as you, but it does seem like the same thing. Just taking out the collapseAll() call in the original code made my issue go away. I can't provoke it at all in the latest code. Do you think this fixes the whole problem?

@shundhammer
Copy link
Owner

AFAICS yes.

A look into the Qt lib sources seems to indicate that it's a problem of persistent indexes where the QTreeView stores information which branches are opened vs. which ones are collapsed. How and why that information is not deleted when the view's data model is reset (i.e. cleared) I don't understand; all kinds of nasty things could happen if the view holds on to outdated pointers.

@Lithopsian
Copy link
Contributor Author

So, the latest thing. I don't know if it is the same or not.

  1. Open up a big filesystem. Or a package view of everything.
  2. Open up the tree to level 5.
  3. Click on tile in the treemap.
    Infinite loop, 100% cpu. I don't think it ever comes back. Seems to reproduce reliably here, but not so much with smaller filesystems. I just did it on the view of all unpackaged files on root. Happens on both 1.8.1 and on the latest master, so might be unrelated to the latest fixes. Doesn't seem to happen if I open the tree manually and then select a tile in the opened portion of the tree, although I haven't opened the whole filesystem tree by hand.

@shundhammer
Copy link
Owner

No, that's definitely something completely different; it's s different code path.

Opening a large directory tree to tree level 5 is asking for trouble: It needs to iterate over all directories in the entire tree down to that level and sort the items in that subdirectory (which has O(n * ln(n))) and store the sorted FileInfo pointers in a QList.

If you have enough RAM and swap space, it will return eventually, but it will take a looong time, and it will consume a lot of RAM. If you might begin to swap like crazy.

Initially, that "open to tree level" action was available even down to tree level 9; I reduced it to that much more reasonable level 5. Still, if your tree is large, it might explode in your face. That is a function to be used with caution. And a lot of RAM. ;-)

@Lithopsian
Copy link
Contributor Author

Opening the tree I can do, it takes no time at all. I have RAM to burn although it doesn't seem to use up that much. I can navigate in the opened tree and select items in the DirTreeView, but as soon as I click on a tile, it freezes.

I waited it out and it did come back eventually. The tree was closed back down again except for the selected branch, as expected I suppose, but I think that is what is taking the time. It is closing the open branches one by one and there are thousands. See closeAllExcept() in DirTreeView.

@shundhammer
Copy link
Owner

For DirTree::closeAllExcept(), it needs to iterate over all the open tree branches (of which there are a lot after opening all branches down to tree level 5) and close them all, using the persistent indexes that the QTreeView stores for each open branch. So it is to be expected that it takes a while. How long? I don't know; the QTreeView does mysterious things.

This appears to me to be a clear case of "Doctor, it hurts when I do that!" - "Then don't do that." ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants