You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using a global file system as cache, we need to examine the number of metadata files that SCR creates for each dataset. SCR stores a filemap for each process as well as a number of files resulting from the redundancy encoding, even when using SINGLE. In all, SCR writes 9 metadata files per process, per dataset.
ls -ltr /dev/shm/$USER/scr.${SLURM_JOBID}/scr.dataset.12/.scr
-rw------- 1 520 Mar 5 14:35 filemap_0
-rw------- 1 44 Mar 5 14:35 reddescmap.er.er
-rw------- 1 312 Mar 5 14:35 reddescmap.er.shuffile
-rw------- 1 178 Mar 5 14:35 reddescmap.er.0.redset
-rw------- 1 548 Mar 5 14:35 reddescmap.er.0.single.grp_1_of_2.mem_1_of_1.redset
-rw------- 1 44 Mar 5 14:35 reddesc.er.er
-rw------- 1 303 Mar 5 14:35 reddesc.er.shuffile
-rw------- 1 178 Mar 5 14:35 reddesc.er.0.redset
-rw------- 1 548 Mar 5 14:35 reddesc.er.0.single.grp_1_of_2.mem_1_of_1.redset
When cache is node-local storage, these files are distributed among the compute nodes. Each node only holds a small subset of the files, and they are written in parallel. However, these files all pile into a single scr.dataset.<id>/.scr directory when cache is a global file system. The number of files written to this single directory scales as O(9*P) where P is the number of processes.
That feels extreme, especially since the application may write one single shared file in the dataset. For a large-scale run where P=16,000, SCR would produce 144,000 files!
We have a few options:
Modify SCR to keep those metadata files in node-local storage when cache is a global file system.
Modify er/shuffile/redset to avoid creating (so many) redundancy files in SINGLE.
Modify scr/er/shuffile/redset to merge data into fewer physical files, where data from multiple compute nodes are combined.
The text was updated successfully, but these errors were encountered:
adammoody
changed the title
Address O(P) scalability of SCR cache metadata files
shared cache: address O(P) scalability of SCR cache metadata files
Mar 5, 2022
adammoody
changed the title
shared cache: address O(P) scalability of SCR cache metadata files
shared cache: address O(P) scalability of SCR metadata files
Mar 5, 2022
When using a global file system as cache, we need to examine the number of metadata files that SCR creates for each dataset. SCR stores a filemap for each process as well as a number of files resulting from the redundancy encoding, even when using SINGLE. In all, SCR writes 9 metadata files per process, per dataset.
When cache is node-local storage, these files are distributed among the compute nodes. Each node only holds a small subset of the files, and they are written in parallel. However, these files all pile into a single
scr.dataset.<id>/.scr
directory when cache is a global file system. The number of files written to this single directory scales as O(9*P) where P is the number of processes.That feels extreme, especially since the application may write one single shared file in the dataset. For a large-scale run where P=16,000, SCR would produce 144,000 files!
We have a few options:
The text was updated successfully, but these errors were encountered: