Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some rename changelogs do not update fileclasses #140

Open
thiell opened this issue Jun 25, 2024 · 0 comments
Open

Some rename changelogs do not update fileclasses #140

thiell opened this issue Jun 25, 2024 · 0 comments

Comments

@thiell
Copy link
Contributor

thiell commented Jun 25, 2024

In some case, with Lustre 2.16, Robinhood 3.1.7 + patches from GerritHub (see our branch here https://github.com/stanford-rc/robinhood/ ). With Lustre changelogs, RENME/RNMTO enabled, a rename does not always update fileclasses. It's quite frequent when using MinIO on top of Lustre as each file uploaded is renamed to its final destination after the upload is complete.

Example, we had a fileclass like this, that we use to exclude files from a policy:

FileClass miniosys {
    definition { tree == "/elm/*/*/*/*/minio/*/*/.minio.sys" }
}

Files are first created within .minio.sys/ so they get the miniosys fileclass at first, but then after a rename, occasionally (but quite often with MinIO), they keep the miniosys fileclass after the rename:

     file,             new,   97.92 MB, minio_p-srcc, elm_p-srcc, mr+p-srcc+minio_n2+miniosys+mr_srcc_minio_n2, /elm/stanford/mr/projects/srcc/minio/n2/disk0/sherlock-groups-weekly/eewhite.tar/e5c64363-61fc-4aa0-9616-b8339a25e30e/part.16
# lfs path2fid /elm/stanford/mr/projects/srcc/minio/n2/disk0/sherlock-groups-weekly/eewhite.tar/e5c64363-61fc-4aa0-9616-b8339a25e30e/part.16
[0x280000c5c:0x1942e:0x0]

Full logs with this FID:

2024/06/20 13:24:04 [1197304/3] ChangeLog | elm-MDT0002: 59965585 01CREAT 1718915044.658417518 0x0 t=[0x280000c5c:0x1942e:0x0] p=[0x280000c5c:0x1942d:0x0] part.16
2024/06/20 13:24:07 [1197304/3] ChangeLog | elm-MDT0002: 59965586 17MTIME 1718915047.297580513 0x6 t=[0x280000c5c:0x1942e:0x0]
2024/06/20 13:24:07 [1197304/3] ChangeLog | elm-MDT0002: 59965587 11CLOSE 1718915047.297599588 0xc2 t=[0x280000c5c:0x1942e:0x0]
2024/06/20 13:24:07 [1197304/3] ChangeLog | elm-MDT0002: 59965588 08RENME 1718915047.343062670 0x0 t=[0:0x0:0x0] p=[0x280000c5c:0x155ad:0x0] part.16 s=[0x280000c5c:0x1942e:0x0] sp=[0x280000c5c:0x1942d:0x0] part.16
2024/06/20 13:24:07 [1197304/3] ChangeLog | Rename: object=[0x280000c5c:0x1942e:0x0], old parent/name=[0x280000c5c:0x1942d:0x0]/part.16, new parent/name=[0x280000c5c:0x155ad:0x0]/part.16
2024/06/20 13:24:14 [1197304/17] EntryProc | [0x280000c5c:0x1942e:0x0]: run_all_cl_cb=none
2024/06/20 13:24:14 [1197304/17] EntryProc | RECORD: CREAT [0x280000c5c:0x1942e:0x0] 0 part.16 => getstripe=1, getattr=1, getpath=1, readlink=0, getstatus()
2024/06/20 13:24:17 [1197304/16] EntryProc | RECORD: RENME [0x280000c5c:0x1942e:0x0] 0 part.16 => getstripe=0, getattr=0, getpath=0, readlink=0, getstatus()
2024/06/20 13:24:17 [1197304/16] EntryProc | Parent dir for entry [0x280000c5c:0x1942e:0x0] is unknown (parent: [0x280000c5c:0x1942d:0x0], child name: 'part.16'): updating entry path info
2024/06/20 13:24:17 [1197304/16] EntryProc | [0x280000c5c:0x1942e:0x0]: run_all_cl_cb=none
2024/06/20 13:24:17 [1197304/16] EntryProc | RECORD: RNMTO [0x280000c5c:0x1942e:0x0] 0 part.16 => getstripe=0, getattr=1, getpath=0, readlink=0, getstatus()

I've been trying to troubleshoot this issue like this without success for now. Simple rename cases just work, but when used at scale with MinIO, it seems to be a race condition happening where the fileclasses are not updated. One thing with MinIO is that the file is created within .minio.sys within a temporary directory, that is also deleted just after the file is renamed to its final destination. Thus, when the rename changelog is processed, its parent dir does not exist anymore. This could be why we're seeing this race with MinIO a lot and perhaps why "Parent dir for entry [0x280000c5c:0x1942e:0x0] is unknown" is shown here, but I am not 100% sure.

I am opening this ticket to keep track of this issue but I will use a workaround for now, by including tree != "/elm/*/*/*/*/minio/*/*/.minio.sys" within the policy condition itself and not use a fileclass for this.

Note that a full scan is a way to fix the fileclasses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant