-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"juicefs gc --delete" fall into infinite loop #5335
Comments
Does the edge exist? It seems like the clone process is not complete. |
Seems the edge does not exist either? |
Seems after clean up orphan records from DB, "gc --delete" can succeed and I am seeing object number in ceph goes down. |
So it seems to be detachedNode left over from the clone process. The CleanupDetachedNodesBefore in the GC tool should remove them. The quota warning log is not the critical issue. You can check the detachedNode table. |
Seems the table contains the orphan inode 25152661. How to call CleanupDetachedNodesBefore ? Not seeing any option in juicefs gc" related to this function. |
gc with --delete flag will call CleanupDetachedNodesBefore. |
But the original problem is "gc --delete" encounter the error below and fall into infinite loop. "gc --delete" can only clean up those detached nodes after I manually delete those orphan inodes from jfs_node table. This is suggesting that the "clone" feature need to be used with carefulness.
|
@frostwind I don't think clone caused the problem. |
@zhijian-pro |
I think this may just be a large amount of data leading to a slow gc, in addition, this repeated log makes you think that you have entered a dead loop, in fact, gc is running, when you manually delete a large number of data, the operation becomes faster. |
Make sense. Thanks! |
What happened:
when running gc with "--delete" option , eg
"juicefs gc postgres://jfs_admin:'xxxx'@jfs_meta_url:5432/jfs --delete"
It fall into infinite loop as below.
What you expected to happen:
It should stop or skip the non-existed inode.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?
I am using below mount option during most of my test
juicefs mount -d -o allow_other --writeback --backup-meta 0 --buffer-size 2000 --cache-partial-only
I use above SQL to dump broken directory structure and saw it populate about 1.2M records.
From the timestamp , eg "mtime" =1728496776274948 , it is on Oct 9 2024, seems to belonging to a directory created by "juicefs clone". eg , as below , inode 25358790 is a directory with 100 files under it , 25358800 is one of the files under 25358790 , this size match how I create the test directory with 100 empty files under it. During my test , I create a directory "dir1" with about totally 20 million files , each layer has many subdirectories, each subdirectory has 100 direct empty files under it. After I have such "dir1" , then I use "juicefs clone" to clone dir1 to dir2. My best guess this broken inode is somehow related to the cloned directory.
I also tried "juicefs fsck --path / --repair --recursive" seems can not fix the issue.
Environment:
JuiceFS version (use
juicefs --version
) or Hadoop Java SDK version:juicefs version 1.2.1+2024-08-30.cd871d19
Cloud provider or hardware configuration running JuiceFS: on-prem hardware with ceph storage backend.
OS (e.g
cat /etc/os-release
):CentOS Linux release 7.9.2009 (Core)
Kernel (e.g.
uname -a
):5.4.206-200.el7.x86_64 #1 SMP Thu Jul 28 14:58:01 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Object storage (cloud provider and region, or self maintained):
Ceph , self hosted
Metadata engine info (version, cloud provider managed or self maintained):
PostgreSQL 17.2
Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
all local network in the same datacenter.
Others:
The text was updated successfully, but these errors were encountered: