When converting a large git repo into smaller git repos, preserving history can be difficult if files from different subdirectories need to be combined into the same smaller repo. To support this kind of operation, the two utility scripts in this repo can be used together or independently.
Creates a new repo from chunks of an existing repo by selectively moving files from an existing repo to new locations in a new repo, while preserving history. repo_reorg.py uses mapping files generated by hand or by repo_map_gen.py.
repo_reorg.py clones a copy of ORIGIN, then calculates the minimum number of fragments of ORIGIN necessary to create a new repo. It clones a new copy of ORIGIN for each fragment, then filters each with git filter-branch --subdirectory-filter
. Then, repo_reorg.py reorganizes the files in each fragment into their new locations using git mv
and deletes any files which are not in the map. Finally, it merges each fragment into the new repo.
usage: repo_reorg.py [-h] -o ORIGIN [-b BRANCH] [-d DESTINATION] [-n NAME]
[-f FILE]
[P [P ...]]
Create a new git repo, filtering the contents out of an existing one
positional arguments:
P Paths to include in the new repo. Paths must be of the
form: <old repo path>:<new repo path>
optional arguments:
-h, --help show this help message and exit
-o ORIGIN, --origin ORIGIN
The source repository
-b BRANCH, --origin-branch BRANCH
The branch of the origin to clone
-d DESTINATION, --destination DESTINATION
The destination repository (optional)
-n NAME, --name NAME The name of the repo to create
-f FILE, --file FILE Load the paths to merge out of FILE. The format of the
path arguments must be followed, one path per line.
Generating the mapping files used by repo_reorg.py can be tedious. Instead, copy files manually to their new locations and use repo_map_gen.py to generate the mapping file. Once the mapping file is created, run repo_reorg.py to generate a new repo which preserves history for the copied files in their new locations.
There are two exceptional conditions that repo_map_gen.py handles explicitly:
- If a file has no source element in the origin repo, a warning will be printed and no mapping will be generated for that file.
- If a file has more than one possible source element in the origin repo, repo_map_gen.py will try to match the file's path to determine the correct file to use. If that fails, repo_map_gen.py will print a warning that multiple matches were found for the file, and generate a mapping for each possible match, marking each with '#' at the beginning of the line.
usage: repo_map_gen.py [-h] -o ORIGIN -d DESTINATION [-e EXCLUDE_ORIGIN]
[-E EXCLUDE_DESTINATION] [-f OUTPUT_FILE]
Generate a mapping file to translate between two directories
optional arguments:
-h, --help show this help message and exit
-o ORIGIN, --origin ORIGIN
The source directory
-d DESTINATION, --destination DESTINATION
The destination directory
-e EXCLUDE_ORIGIN, --exclude-origin EXCLUDE_ORIGIN
Exclude DIR from origin path search
-E EXCLUDE_DESTINATION, --exclude-destination EXCLUDE_DESTINATION
Exclude DIR from destination path search
-f OUTPUT_FILE, --output-file OUTPUT_FILE
Output File
If there is a repo which has already lost history due to a reorganization, it's possible to get it back:
- Checkout the first commit of files to the new repo
- Use repo_map_gen.py to generate a mapping file
- Use repo_reorg.py to create a new repo with history
- Cherry-pick all commits from the lost-history repo into the recovered-history repo
- Use
git push --force
to override the lost-history repo