-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bloated git repository #2458
Comments
In case it's of interest, here's what I found in terms of repository growth over the last couple of years. These are the data sizes pulled down with 4.6.0 was 13.55 MiB 5.0.0 was 17.79 MiB 5.4.0-alpha01 was 26.98 MiB 5.4.0-alpha16 was 29.35 MiB 5.4.0-alpha22 was 30.33 MiB 5.4.0-alpha24 was 30.50 MiB 5.4.0-alpha25 was 69.55 MiB 5.4.0-alpha26 was 70.47 MiB master is 70.71 MiB |
@billsacks , thanks for this investigation. I will keep an eye on the acme splits to make sure they aren't adding too many commits. Maybe we should go back to squashing to reduce history bloat? |
Yes do squash commits each way. |
I think @billsacks asked for the full history at one point. |
To clarify, I said that squashing didn't seem ideal, but I was okay with it (#2177 (comment) and #2177 (comment)). Based on a few spot-checks from the last two e3sm split PRs, it looks like the only commits being added to history are ones that actually touched cime – which seems like the right behavior. I'd be concerned if somehow all of the e3sm commits were being added to history, and/or if the number of commits coming from the e3sm splits were, say, an order of magnitude larger than the number of commits being added directly to cime. To summarize: I'm fine with the status quo as long as you keep an eye on this, like you suggest @jgfouca . But I'm also fine with having you squash them if you prefer that. |
I'll go ahead and close this because I think we've resolved it enough; feel free to reopen if you want to discuss further. |
Currently MPI task to compute node mapping information is output in two locations, once in CAM, where it is truncated after the first 256 MPI tasks, and once in CLM, where it is truncated after the first 100 MPI tasks, both only for these two components. This is not useful in current production runs. The use of environment variables, such as MPICH_CPUMASK_DISPLAY on Cray systems, generate data that are unnecessarily verbose for our needs. Here a share routine is introduced that writes out one line per compute node. Each line contains the compute node name and the list of MPI tasks assigned to that node for a given communicator. This is then called in the driver and writes out the task-to-node mapping for the entire coupled model. Separate branches will then introduce this into the individual components, replacing the current logic in both CAM and CLM, for example. The share routine also optionally returns the number of compute nodes and the task-to-node mapping, which is needed in the internal CAM load balancing. With the call to the shr_taskmap_write routine in the driver, the mapping data generated by the system when setting the corresponding environment variable is redundant. This is removed for the systems currently setting the variable. Fixes #2457 BFB * origin/worleyph/cime/taskmap: Avoid empty env blocks Remove unnecessary white space in task-to-node map output Modify driver output format Uncomment MV2_CPU_MAPPING definition for Anvil Modify task map output format Unset environment variables to output task-to-node mapping Output MPI task to compute node mapping
This issue of repository bloat came up again on today's CIME call. Current repository size:
@jgfouca raised the idea of, at some point, cutting off history older than some point and force pushing to master. (We could still keep the old history in an archived repo somewhere.) We didn't decide if this is worth doing; we can revisit this later. |
Just a little additional info, as of today, it takes 12s to clone the repo on a local file system with a fast network. |
I noticed that recent clones of cime are much bigger than they used to be. I tracked the problem back to 92ccb83 -
Merge pull request #2312 from ESMCI/jgfouca/branch-for-acme-split-2018-02-22
- which added 5,296 commits to history. #2357 also added a lot of commits, and there was a brief discussion in that PR about that issue. It looks like the two most recent e3sm splits (#2406 and #2433) only added a small number of commits. Maybe the changes with #2367 fixed this?There probably isn't much we can do about this at this point, but I wanted to make sure that the e3sm split process will involve relatively few added commits moving forward so we don't experience runaway repository bloat. @jgfouca ?
The text was updated successfully, but these errors were encountered: