Skip to content

Latest commit

 

History

History
321 lines (236 loc) · 20.6 KB

08-VendoringDependenciesAsSubmodules.adoc

File metadata and controls

321 lines (236 loc) · 20.6 KB

Vendoring dependencies as submodules

In this chapter you will learn how to maintain dependencies on other Git-based software projects within your own using Git submodules.

  • When submodules are useful

  • How to add a submodule to a repository

  • How to view the status of the submodules in a repository

  • How to update and initialize all submodules in a repository

  • How to run a command in every submodule in a repository

When are submodules useful?

Almost all software projects will make use of other software projects as libraries or tools. For example, say you’re using Git and writing a desktop application in C, and you want to communicate with a server that provides a JSON API. Rather than writing the JSON handling yourself, you find an open source project on GitHub that provides a library for accessing JSON APIs with C. You want to include this open source library into your project and update it when they’ve released new versions with new bug fixes you need.

There are generally two approaches to handling other software projects (usually known as dependencies) and which versions work with your own software:

  1. Write documentation for what other software projects are required, what versions they should be, and where they should be installed so other developers building the project know how to set it up correctly.

  2. Include the dependencies in the project’s repository so they’re always available to anyone when cloning the repository. This is known as vendoring the dependencies.

There are pros and cons of both approaches. Adopting #1 means that the software source code repository can avoid including other software projects. In the C++ application example, it would mean documenting for other developers where and how they should download the external JSON library’s Git repository from rather than storing anything related to it in the C++ application’s Git repository.

Adopting #2 means that you always have the various dependencies available but can increase the space used by the version control system. In the C++ application example (without submodules), you might copy the source code of the external JSON library into the application’s Git repository. When you wanted to update the library version, you’d copy in the new code and commit it.

As Git stores the complete history of a repository and downloads it all when cloned, too many large dependencies can result in a repository that takes a long time to clone and is unclear about any other repositories whose source code was used to populate some of this repository. This makes updating versions of things like external libraries a painful, manual process. For this reason, submodules were created.

A Git repository can contain submodules. Submodules allow you to reference other Git repositories at specific revisions. This is most commonly used to reference external Git repositories that are dependencies for software projects. In the C++ application example, instead of documenting the location or copying the source code of the external JSON library into the application’s Git repository, you could use submodules to reference the external JSON library’s Git repository.

Git’s submodules store the reference to a specific SHA-1 reference in another remote Git repository and store these in subdirectories of the current repository. All that is actually committed in the current repository are some small pieces of metadata which the git submodule command uses to clone, update, and interact with the various submodules inside a Git repository.

Note
What is git subtree?
You may have heard about git subtree, which is an alternate method of managing Git subprojects inside a Git repository. Instead of just referencing other Git repositories, git subtree will store the contents of the remote Git repository. It’s a contributed command to Git, which means it’s not documented or supported to the same extent as git submodule, so I won’t be covering it in this book. If you want to read more, you can view the git subtree documentation on GitHub: https://github.com/git/git/blob/master/contrib/subtree/git-subtree.txt

Add a git submodule: git submodule add

Let’s start by creating a new repository that can be used as a submodule in our existing GitInPracticeRedux repository.

Create a new repository on GitHub that we can use as a submodule by following these steps:

  1. Create a new repository with git init and pass the path to a new directory; for example, git init /Users/mike/GitInPracticeReduxSubmodule/

  2. Change to the directory containing your new submodule repository; in this example, cd /Users/mike/GitInPracticeReduxSubmodule/.

  3. Create a file named TODO.md with the echo command by running echo "# TODO\n1. Add something useful to this submodule." > TODO.md.

  4. Commit the new TODO.md as the initial commit by running git commit --message "Initial commit of submodule." TODO.md.

  5. Create a new repository on GitHub (or another Git hosting provider).

  6. Add the new remote reference to the GitHub repository by running git remote add origin https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git.

  7. Push the repository to GitHub by running git push --set-upstream origin master.

The output of all these commands should resemble the following:

New submodule repository creation output
# git init /Users/mike/GitInPracticeReduxSubmodule/

Initialized empty Git repository in
/Users/mike/GitInPracticeReduxSubmodule/.git/ (1)

# cd /Users/mike/GitInPracticeReduxSubmodule/

# echo "# TODO\n1. Add something useful to this submodule." > TODO.md

# git commit --message "Initial commit of submodule."

[master (root-commit) e95b4cd] Initial commit of submodule. (2)
 1 file changed, 2 insertions(+)
 create mode 100644 TODO.md

# git remote add origin
  https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git

# git push --set-upstream origin master

Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 272 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git
 * [new branch]      master -> master (3)
Branch master set up to track remote branch master from origin.
  1. New repository

  2. Initial commit

  3. Push repository

From the new submodule repository creation output:

  • "New repository (1)" shows the creation of a new Git repository on disk to be used as a new submodule repository for the GitInPracticeRedux repository. It has been created outside the GitInPracticeRedux directory, so it can be added later as if it were just another GitHub repository.

  • "Initial commit (2)" shows the first commit to the new submodule repository of the TODO.md file.

  • "Push repository (3)" shows the push of the initial commit to the newly created GitHub repository.

The new submodule repository has been created and pushed to GitHub. Note that it’s not yet a submodule of the GitInPracticeRedux repository; this was just to create a new repository that could be added as a submodule repository afterward.

Now that the submodule repository has been created and pushed to GitHub, it can be removed from your local machine with rm -rf GitInPracticeReduxSubmodule/. Don’t worry; remember a complete copy is stored on GitHub (which we will use next).

Now that we’ve created a new submodule repository, let’s add it as a submodule to the existing repository.

Problem

You wish to add a the GitInPracticeReduxSubmodule repository as a submodule of the GitInPracticeRedux repository in the master branch.

Solution

  1. Change to the directory containing your repository; on my machine, cd /Users/mike/GitInPracticeRedux/.

  2. Run git checkout master.

  3. Run git submodule add https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git submodule.

  4. Commit the new submodule changes to the repository by running git commit --message "Add submodule."

The output of all these commands should resemble the following:

Submodule addition output
# git submodule add
  https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git
  submodule

Cloning into 'submodule'... (1)
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 3 (delta 0)
Unpacking objects: 100% (3/3), done.
Checking connectivity... done.

# git commit --message "Add submodule."

[master cc206b5] Add submodule.
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules (2)
 create mode 160000 submodule (3)
  1. Submodule clone

  2. .gitmodules file

  3. Submodule directory

From the submodule addition output:

  • "Submodule clone (1)" shows the clone of the GitInPracticeReduxSubmodule into the directory named submodule in the local repository. After this was done, it also created a .gitmodules file in the root of the repository’s working directory.

  • ".gitmodules file (2)" shows the file that contains the submodule metadata, such as the directory path and the URL.

  • "Submodule directory (3)" shows the new directory named submodule that was created to store the contents of the new submodule repository. Note that you’d normally not call this submodule but we’re just using this name for these examples.

You have successfully added the GitInPracticeReduxSubmodule submodule to the GitInPracticeRedux repository. We will now refer to GitInPracticeRedux as the "superproject" i.e. the Git repository containing the submodule.

Discussion

The new directory named submodule behaves like any other Git repository. If you change into its directory, you can run services like GitX, git log, and even make changes and push them to the GitInPracticeReduxSubmodule repository (provided you have commit access).

Git makes use of the .gitmodules file and special metadata for the directory named submodule to reference the submodule and the current submodule commit. This is used to ensure that anyone else cloning this repository can access the same submodules at the same version after initializing the submodule(s).

Initializing all submodules can be done by running git submodule init, which copies all the submodule names and URLs from the .gitmodules file to the local repository Git configuration file (in .git/config). Note that this was done for you when you ran git add.

Let’s take a closer look at the last commit:

git show submodule output
# git show
commit cc206b5c9b30eef23578e48dadfa3b194a50cfe7
Author: Mike McQuaid <mike@mikemcquaid.com>
Date:   Fri Apr 18 16:16:30 2014 +0100

    Add submodule.

diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..c63f995
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "submodule"] (1)
+       path = submodule (2)
+       url = https://github.com/MikeMcQuaid/GitInPracticeReduxS... (3)
diff --git a/submodule b/submodule
new file mode 160000
index 0000000..e95b4cd
--- /dev/null
+++ b/submodule
@@ -0,0 +1 @@
+Subproject commit e95b4cd02cafa486a7baec19ab26edec28e9eddc (4)
  1. Submodule name

  2. Submodule path

  3. Submodule URL

  4. Submodule commit

From the git show submodule output:

  • "Submodule name (1)" shows the name of the submodule that was created in the repository: submodule. This is used to reference this particular submodule with any additional submodule commands.

  • "Submodule path (2)" shows the directory location where the submodule is cloned into. This is where the submodule files will be accessed.

  • "Submodule URL (3)" shows the remote repository location for the submodule that was added.

  • "Submodule commit (4)" shows the commit SHA-1 for the submodule. Even if there are changes to the submodule, this will always be the commit that is checked out by anyone using this submodule in this repository. This is to ensure that the submodule only uses a known, tested version and that changes to the submodule’s Git repository (which may be something you don’t have any control over) doesn’t change anything in the current repository.

git submodule add can also take some parameters to affect its behavior:

  • The --quiet (or -q) flag can be passed to make git submodule add only print out error messages and no status information.

  • The --force (or -f) flag can be passed to allow adding a submodule path that would otherwise be ignored by .gitignore rules.

  • The --depth is passed to the git clone of the submodule to allow creating a shallow clone with only the requested number of revisions within it. This can be used to shrink the size of the submodule on disk. This flag for git clone was mentioned previously in 02-RemoteGit.adoc and can be useful for reducing the clone time for very large repositories.

Show the status of submodules: git submodule status

Now that we’ve added a submodule to the repository, it can be useful to query what submodules have been added and what their current status is. This can be done with the git submodule status command.

Problem

You wish to show the current states of all submodules of a repository.

Solution

  1. Change to the directory containing your repository; for example, cd /Users/mike/GitInPracticeRedux/.

  2. Run git submodule status. The output should resemble the following:

Submodule status output
# git submodule status

 e95b4cd02cafa486a7baec19ab26edec28e9eddc submodule (heads/master) (1)
  1. Submodule status

From the submodule status output:

  • "Submodule status (1)" shows the SHA-1 of the pinned submodule, the name and the ref that it’s pointing to (the master branch in this case). This matches the SHA-1 you saw earlier in the submodule directory metadata.

Discussion

git submodule status can take a --recursive flag, which will run git submodule status inside each of the submodules directories too. This is useful, as submodules can themselves contain submodules and you may wish to query the status of the submodules within the submodules.

Update and initialize all submodules: git submodule update --init

We have initialized a submodule (copied the submodule names and URLs .gitmodules to .git/config) when we ran git submodule add earlier. But initialization won’t be done automatically for anyone else with a clone of this repository: they must run git submodule init.

Let’s simulate this situation by making a new clone of the GitInPracticeRedux repository.

  1. Change to the parent directory of the directory containing your repository; on my machine, cd /Users/mike/GitInPracticeRedux/...

  2. Run git clone GitInPracticeRedux GitInPracticeReduxClone.

Problem

You wish to initialize all submodules in your repository and populate their working tree according to the submodule commit recorded in the GitInPracticeRedux superproject.

Solution

  1. Change to the directory containing your newly cloned repository; for example, cd /Users/mike/GitInPracticeReduxClone/.

  2. Run git submodule update --init. The output should resemble the following:

Submodule initialize and update output
# git submodule update --init

Submodule 'submodule'
  (https://github.com/MikeMcQuaid/GitInPracticeReduxSubmodule.git)
  registered for path 'submodule' (1)
Cloning into 'submodule'...
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 3 (delta 0)
Unpacking objects: 100% (3/3), done.
Checking connectivity... done. (2)
Submodule path 'submodule': checked out
  'e95b4cd02cafa486a7baec19ab26edec28e9eddc' (3)
  1. Submodule init

  2. Submodule clone

  3. Submodule checkout

From the submodule initialize and update output:

  • "Submodule init (1)" shows the registration of the submodule into the Git repository.

  • "Submodule clone (2)" shows the submodule being cloned into the local Git repository.

  • "Submodule checkout (3)" shows the submodule contents being checked out into the submodule directory for the currently stored revision.

Discussion

git submodule update can take some parameters to customize its behavior:

  • The --recursive flag, which will run git submodule update --init inside each of the submodules directories too. This is useful when there are nested submodules inside submodules.

  • The --force (or -f) flag can be passed to update the submodules to the commit recorded in the superproject by running the equivalent of git checkout --force --to discard any uncommitted changes made to the submodule.

  • The --depth is passed to the git clone of the submodule to allow creating a shallow clone with only the requested number of revisions within it. This can be used to shrink the size of the submodule on disk.

git clone can also take a --recurse-submodules (or --recursive) flag to automatically run git submodule update --init on any submodules within the repository. Typically if you’re cloning a repository you know contains submodules, then you’ll use git clone --recursive-submodules to clone it and all the necessary submodules (and the submodules of the submodules, if they exist).

When you wish to update the submodule to the latest upstream revision to incorporate any changes that were made in the upstream, submodule repository you can use the following git submodule update parameters:

  • The --remote flag will fetch and checkout the latest upstream revision in the local submodule repository. This would then require another commit to update this on the local GitInPracticeRedux repository and a push to update this on the remote GitInPracticeRedux repository. This should only be done after testing that the changes made to the GitInPracticeReduxSubmodule repository remain compatible with the GitInPracticeRedux project.

  • The --no-fetch flag will attempt to update the submodule without running git fetch. This will only update the submodule to a later revision if this has already been fetched. This is useful if you want to fetch the changes to a submodule now and then update and test this update at a later point.

Run a command in every submodule: git submodule foreach

Sometimes you may wish to perform a command or query within every submodule. For example, you may want to iterate through all the submodules in a repository (and their submodules) and run a Git command to ensure they have all checked out the master branch, and have fetched the latest remote repository commits or print status information. Git provides the git submodule foreach command for this case: it takes a command (or commands) as an argument and then iterates through each Git submodule (and their submodules) and runs the same command.

Problem

You wish to output some status information for every submodule in the GitInPracticeRedux repository.

Solution

  1. Change to the directory containing your repository; for example, cd /Users/mike/GitInPracticeRedux/.

  2. Run git submodule foreach 'echo $name: $toplevel/$path [$sha1]'. The output should resemble the following:

submodule loop output
# git submodule foreach 'echo $name: $toplevel:$path [$sha1]'

Entering 'submodule' (1)
submodule: /Users/mike/Documents/GitInPracticeRedux:submodule (2)
  [e95b4cd02cafa486a7baec19ab26edec28e9eddc] (3)
  1. Current submodule

  2. Submodule name, path

  3. Submodule SHA-1

From the submodule loop output:

  • "Current submodule (1)" shows a message showing the name of each submodule that is iterated through.

  • "Submodule name, path (2)" shows the use of the git submodule foreach $name, $toplevel, and $path variables to print out the name of the submodule, the top level repository it belongs to, and the path within that repository..

  • "submodule SHA-1 (3)" shows the use of the git submodule foreach $sha1 variable to print the current SHA-1 of the submodule.

You have successfully iterated through the submodules in the GitInPracticeRedux repository and used all the git submodule foreach variables to print some status information.

Discussion

git submodule foreach can take:

  • The --quiet flag to only print any command output and not print the "Entering 'submodule'" message as it runs on each submodule.

  • The --recursive flag to also iterate through any submodules that exist for any of the submodules.

Summary

In this chapter you hopefully learned:

  • How to use submodules to vendor project dependencies

  • How to use git submodule add to add a submodule and commit its metadata

  • How to use git submodule status to view all submodules and their current revision

  • How to use git submodule update --init to initialize all submodules, fetch any changes, and update them to the latest revision

  • How to use git submodule foreach and its variables to run commands and print metadata for every submodule in a repository