Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document GIT.PROJECT_SUBMODULES_DEPTH #2087

Open
dbeltrankyl opened this issue Jan 30, 2025 · 7 comments
Open

Document GIT.PROJECT_SUBMODULES_DEPTH #2087

dbeltrankyl opened this issue Jan 30, 2025 · 7 comments
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@dbeltrankyl
Copy link
Contributor

Hello @BSC-ES/autosubmit

I was looking for the %GIT.PROJECT_SUBMODULES_DEPTH% in the readthedocs and see that it is undocumented.

On mattermost tiggi:

one general question about the jobs autosubmit sends: some of them have multiple checked out git reposes with code. Do they need to keep the .git/ directory? For aqua .git, it is 2 gigs of uncompressible data for every instance.

cc: @ainagaya , @franra9 , Leo( I don't know the github user)

Basically, that command allows you to pull the most recent commits.

GIT:
 PROJECT_SUBMODULES_DEPTH: 1  # any integer

More info on the parameter is in this issue: ##581

@dbeltrankyl dbeltrankyl added the documentation Improvements or additions to documentation label Jan 30, 2025
@dbeltrankyl dbeltrankyl added this to the 4.1.13 milestone Jan 30, 2025
@kinow
Copy link
Member

kinow commented Jan 30, 2025

+1 Dani! I replied @tiggi on Mattermost but forgot to check our docs.

I think even with a shallow clone his concern is about data that is not diff-able (I guess they might have some test data that's binary?), and creates large git repos.

My copy of AQUA that I use to look at the code occasionally in PyCharm has nearly 1GB (1.8G if I include the venv folder),

(base) bdepaula@bsces107921:~/Development/python/workspace/AQUA$ du -sh .
944M	.

With Nearly 680M in the Git repo,

(base) bdepaula@bsces107921:~/Development/python/workspace/AQUA$ du -h . | sort -h | tail -n 10
14M	./docs/sphinx
14M	./docs/sphinx/source
19M	./AQUA_tests/models
28M	./notebooks
163M	./AQUA_tests/weights
188M	./AQUA_tests
673M	./.git/objects/pack
676M	./.git/objects
678M	./.git
944M	.

If the .git repo is removed, and we consider that most model experiments may have AQUA, I think we are probably talking about a few GB's that we could save in the VM.

I wonder if it'd make sense to have an option that deletes the git repo after cloning the repo?

@dbeltrankyl
Copy link
Contributor Author

I wonder if it'd make sense to have an option that deletes the git repo after cloning the repo?

Then, we would risk having people working locally as they would not be able to commit to the changes.. I think that many years ago this was discussed by @mcastril we can talk in the meeting

@dbeltrankyl
Copy link
Contributor Author

Another option would be to ask to Aqua developers to repack it's git

@ainagaya
Copy link
Contributor

Hi @dbeltrankyl , @kinow !

If the problem is AQUA, in the long term this shouldn't be a problem: in the HPC we are using the container, but we have the local copy for the task that runs in the VM: aqua-push. I expect to have an AQUA container deployed in the VM soon, and indeed, I think that the conversation in Mattermost that you are mentioning is a good moment to bring it up.

On the other side, we could have this key by default in the minimal.yml template that Autosubmit uses. With the Climate DT workflow we would maybe have a small issue, that is that the DVC repository has multiple branches, and the one chosen by the user is checked out in the local setup. But it's something that we could discuss.

Best!

@kinow
Copy link
Member

kinow commented Jan 30, 2025

Thank you @ainagaya ! @dbeltrankyl I think the solution will be the containers, so we can close it or leave it open until that's solved in the VM. WDYT?

@dbeltrankyl
Copy link
Contributor Author

dbeltrankyl commented Jan 30, 2025

Thanks everyone,

@kinow well the directive %git.project_submodules_depth% has to be documented anyway, so we can keep it open until then

@kinow
Copy link
Member

kinow commented Jan 30, 2025

True!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants