Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata backup failed on large volume #5276

Open
polyrabbit opened this issue Nov 4, 2024 · 8 comments
Open

Metadata backup failed on large volume #5276

polyrabbit opened this issue Nov 4, 2024 · 8 comments
Labels
kind/feature New feature or request

Comments

@polyrabbit
Copy link
Contributor

We have a volume with 500M+ inodes, the metadata backup always fails with the following error:

2024/11/04 18:02:41.794812 juicefs[43] <WARNING>: backup metadata failed: GC life time is shorter than transaction duration, transaction starts at 2024-11-04 17:46:12.324 +0800 CST, GC safe point is 2024-11-04 17:52:34.174 +0800 CST [backup.go:84]
@polyrabbit polyrabbit added the kind/feature New feature or request label Nov 4, 2024
@dongjiang1989
Copy link

Which metadata system is used? tikv,redis or sql?

@polyrabbit
Copy link
Contributor Author

Tikv, I suppose GC safe point appears a lot in tikv engine.

@davies
Copy link
Contributor

davies commented Nov 8, 2024

@polyrabbit Can you try #5080?

@polyrabbit
Copy link
Contributor Author

Unfortunately #5080 still fails with:

2024/11/08 11:01:39.893074 juicefs[50149] <FATAL>: GC life time is shorter than transaction duration, transaction starts at 2024-11-08 10:50:43.874 +0800 CST, GC safe point is 2024-11-08 10:51:34.174 +0800 CST [main.go:31]

But this time it runs longer (13min+) than before, I suppose there is another txn opened too long?

@polyrabbit
Copy link
Contributor Author

Update: a second test works now, the progress shows it needs 10+ hours to finish, I'll wait to see if it succeeds tomorrow.

The difference between those two tests is that I rebased #5080 this morning, and the second test is I cherry-picked #5080 - I suppose there are some conflicts between those commits.

Also, I noticed backup spends lots of time on sorting large dirs:

sort.Slice(entries, func(i, j int) bool { return entries[i].Name < entries[j].Name })

Is it necessary?
image

@polyrabbit
Copy link
Contributor Author

It took 7h+ to backup 318039153 files.

@davies
Copy link
Contributor

davies commented Nov 12, 2024

We are working on a faster dump into binary format, will let you know when it's ready

@polyrabbit
Copy link
Contributor Author

Why not consider merging #5080? Does it have any critical drawbacks? I suppose stream scan also benefits other cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants