High cpu usage after viewing a CryFS FUSE filesystem #1543

peesock · 2023-12-14T06:33:53Z

CryFS is a program to encrypt a folder of files and view them in a FUSE mounted directory.

After lf even glances at a CryFS directory, it will start using about 14.5% cpu on my 8 thread machine, sometimes dropping down to ~3% usage for a while before going back up. Obviously CryFS has to use some cpu to create the file tree, but this is finished quite early on as shown by the cryfs process, and the problem persists even when navigating far away from the cryfs dir.

btop also shows the process's thread count going up over time. One day i woke up to find it using over 1000 threads, which actually seemed to lag my system.

Pressing F1 to open help in the pager temporarily halts all cpu usage, and goes back up on quit. Any ideas?

The text was updated successfully, but these errors were encountered:

Michael-Gallo · 2023-12-17T21:58:51Z

I'm able to replicate this, 10% cpu usage if I ever touch the mountdir directory

It seems that this channel is somehow getting flooded with a ridiculous number of directories

When I make a global variable to add one to a counter every time we reach line 384 and print it when the program ends I get a number in the thousands if I open lf in the mountdir for even a few seconds versus 6 if I open any other directory

Edit: also adding even a 5 millsecond delay takes my cpu usage from 10% to 0.8%. The time is actually being spent in app.ui.draw on line 418

Edit pt2: I think I see why this is happening. When a dir is received on dirchan, we call app.nav.checkdir . Then checkDir goes down a codepath where it creates a newDir and passes it into that channel. So it loops on itself.

I'm unsure what checkDir is doing and what the purpose of that go func is so I'm hesistant proposing a solution

joelim-work · 2023-12-18T06:06:48Z

checkDir is simply used to check if a directory has been changed (e.g. a new file was created inside it) since the last time it was loaded by lf, and send an update to the dirChan channel to reload it if so. It is called frequently, through calls like app.nav.renew and app.ui.loadFile.

If the directory has been modified, the load timestamp is updated, so theoretically it should not cause an infinite loop:

lf/nav.go

Line 527 in 2a4adb3

dir.loadTime = now

You might want to check the modified time of the directory and the load timestamp to investigate further.

Michael-Gallo · 2023-12-26T17:41:22Z

The modified time of a mounted cryfs directory always seems to be the current time. This is without any processes (other than cryfs) touching it

Just a thought, the number of Blocks from os.Stat.Sys() on the mounted directory always shows 0, I can't find any other directories that work like this on my system. Perhaps we can use that as a sign to early return/

joelim-work · 2023-12-31T15:26:13Z

That makes sense, if the modified time is constantly updating (to the current time), then the directory will constantly be reloaded. I think it would be acceptable to add some check similar to below to perform an early return for special types of filesystems:

lf/nav.go

Lines 519 to 523 in 4882945

    
           // XXX: Linux builtin exFAT drivers are able to predict modifications in the future 
        
           // https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/1872504 
        
           if s.ModTime().After(now) { 
        
           	return 
        
           }

Thank you for looking into the root cause.

joelim-work · 2024-01-04T09:52:08Z

So it turns out the root cause comes from cryfs (I'm guessing it's due to some underlying library), and the issue was raised a few years ago, but there hasn't been any updates. I think it is fine to ignore mtime for cryfs directories as a workaround, though I would prefer it if the upstream issue was fixed.

Michael-Gallo · 2024-02-09T04:21:39Z

Following up on detecting this

Due to this weird behavior, the access, modify time, and change-time are always the same

This code successfully flags Cryfs directories for me without giving a false positive in /tmp

func isCryfs(directory string) bool {
	directoryStat, err := os.Stat(directory)

	if err != nil {
		return false
	}

	syscallStat, isSysCallStat := directoryStat.Sys().(*syscall.Stat_t)
	if !isSysCallStat {
		return false
	}

	return syscallStat.Atim == syscallStat.Ctim && syscallStat.Ctim == syscallStat.Mtim
}

joelim-work · 2024-02-09T04:57:01Z

Hi @Michael-Gallo

I think it's possible that we can try this approach, though I would still be inclined to keep the block count = 0 check. I'm not exactly keen on this kind of solution, but I think having users run into this high CPU usage situation is much worse.

cc @gokcehan

Michael-Gallo · 2024-02-09T23:36:20Z

Just a thought

Maybe whether we run this check could be a boolean option of some sort? I think it'll be a very small minority of users that are using CryFS so this check can only cause unintended behavior for the overwhelming majority of users, and is an extra comparison operation they don't need.

gokcehan · 2024-02-11T19:39:15Z

@joelim-work Unfortunately, I can't think of a good solution for this either. I think I'm ok with using the @Michael-Gallo 's new isCryfs function . I'm not a fan of workarounds specific to different filesystems, but sometimes it is inevitable as with the exFAT case. Otherwise, lf becomes basically unusable on these filesystems. This was the first time I heard about CryFS, so I'm not sure if it is common enough to justify the workaround, but we can still give it a try.

Boolean option might also be a good idea, I'm not sure. But if we are thinking of a new option, how about a more general option to be able to disable checkDir function altogether so that it can be used on all filesystems where it becomes a problem. Would that be acceptable?

Michael-Gallo · 2024-02-11T22:08:16Z

The project only has 1.9k stars on Github ( https://github.com/cryfs/cryfs ) , and I also have never heard of it prior to reading this issue, and fundamentally I would say the bug is on their end. So yea I'm going to agree we probably shouldn't have a file system specific fix here in this case.

Since the issue is fundamentally checkDir inserting a directory into dirChan and then loop calling checkdir and causing an infinite loop; what if we made it so dirs have a boolean property with a name like loops, it'll be true in almost all parts of the program but when checkdir creates nd it sets that property to false, then *app.loop() can check that property so it doesn't call checkDir again when it's triggered by a dir that was itself created by checkDir?

joelim-work · 2024-02-12T07:14:13Z

After spending some more time to investigate the issue, I'm not actually sure whether it's necessary for the dirChan to call checkDir again, allowing such an infinite loop.

lf/app.go

Lines 381 to 384 in cf99626

    
           case d := <-app.nav.dirChan: 
        
           	app.nav.checkDir(d)

According to the commit history, it looks like it was added in 3e36500 to fix issue #338. But the root cause appears to be the fact that directories were loaded asynchronously (via nav.getDirs) before reading the config file, so there is a race condition depending on whether the directories are loaded first, or the config file is read first. In this case the proper fix would have been to ensure the config file is read first before loading any directories, and this was done in 089e6dc.

Michael-Gallo · 2024-02-18T23:21:26Z

I opened PR #1607 to remove the call to checkDir and confirmed locally that it does fix the high CPU usage. Will wait for @gokcehan to confirm that this solution isn't likely to cause other problems.

gokcehan · 2024-02-19T20:45:53Z

Good catch @joelim-work , thanks for digging through the history, and thanks @Michael-Gallo for submitting the patch. The only case I can think of is that maybe this can now be triggered if an option is changed dynamically without the config file while a directory is being loaded. Unfortunately, I did not have a chance to give it a try yet myself. In any case, we can consider those as niche cases, so I think I'm ok with the patch as it is.

joelim-work · 2024-02-20T00:41:24Z

I did some testing with the new changes, it looks like a race condition is possible again if set hidden is run dynamically with the following config file definition:

&lf -remote "send $id set hidden"

The directory is loaded and sorted nav.loadDirInternal with the default settings.
set hidden is resolved, so now the value setting has changed. However at this time the list of files is still empty so there is nothing to sort.
The loaded directory is then sent to nav.dirChan (I had to manually add a sleep here to ensure a race condition), but the files are sorted with the original settings, which is now out of date.

I still think it's a niche scenario though. Maybe a more proper fix might be to have nav.checkDir only check mtime, and just sort the directory when receiving it from nav.dirChan, but that would end up being a bigger change.

joelim-work · 2024-03-24T23:20:46Z

This is now fixed by #1607

Michael-Gallo mentioned this issue Jan 4, 2024

proposed fix for cryfs filesystem issue #1566

Closed

Michael-Gallo mentioned this issue Feb 18, 2024

Directories added to dirChan no longer trigger checkDir #1607

Merged

joelim-work closed this as completed Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High cpu usage after viewing a CryFS FUSE filesystem #1543

High cpu usage after viewing a CryFS FUSE filesystem #1543

peesock commented Dec 14, 2023

Michael-Gallo commented Dec 17, 2023 •

edited

Loading

joelim-work commented Dec 18, 2023

Michael-Gallo commented Dec 26, 2023 •

edited

Loading

joelim-work commented Dec 31, 2023

joelim-work commented Jan 4, 2024 •

edited

Loading

Michael-Gallo commented Feb 9, 2024

joelim-work commented Feb 9, 2024

Michael-Gallo commented Feb 9, 2024

gokcehan commented Feb 11, 2024

Michael-Gallo commented Feb 11, 2024

joelim-work commented Feb 12, 2024

Michael-Gallo commented Feb 18, 2024

gokcehan commented Feb 19, 2024

joelim-work commented Feb 20, 2024 •

edited

Loading

joelim-work commented Mar 24, 2024

High cpu usage after viewing a CryFS FUSE filesystem #1543

High cpu usage after viewing a CryFS FUSE filesystem #1543

Comments

peesock commented Dec 14, 2023

Michael-Gallo commented Dec 17, 2023 • edited Loading

joelim-work commented Dec 18, 2023

Michael-Gallo commented Dec 26, 2023 • edited Loading

joelim-work commented Dec 31, 2023

joelim-work commented Jan 4, 2024 • edited Loading

Michael-Gallo commented Feb 9, 2024

joelim-work commented Feb 9, 2024

Michael-Gallo commented Feb 9, 2024

gokcehan commented Feb 11, 2024

Michael-Gallo commented Feb 11, 2024

joelim-work commented Feb 12, 2024

Michael-Gallo commented Feb 18, 2024

gokcehan commented Feb 19, 2024

joelim-work commented Feb 20, 2024 • edited Loading

joelim-work commented Mar 24, 2024

Michael-Gallo commented Dec 17, 2023 •

edited

Loading

Michael-Gallo commented Dec 26, 2023 •

edited

Loading

joelim-work commented Jan 4, 2024 •

edited

Loading

joelim-work commented Feb 20, 2024 •

edited

Loading