The Github Content Sync tool is a command-line script written in Go that allows you to compare the contents of two folders within a GitHub repository.
It helps identify files difference between the two folders.
Basically, if A
and B
are the two folders, the tool will output:
- files present in
A
but not inB
- files present in
B
but not inA
- files present in both
A
andB
but with newer commits inA
Note
You can also do cross-branches comparison by specifing the branches for both directories.
This tool has been specifically developed to assist the Special Interest Groups (SIGs) responsible for glossary management within the CNCF.
The purpose of the tool is to facilitate the comparison of folder contents within a GitHub repository.
This was specifically meant for those repo that contain documentation in various languages (divided into different folders) and you need a fast way to know the deltas:
In this case, usually the reference folder and "source of truth" is the "english" one (for a real world example take a look at this repo, for a test playground we use this one).
Generally, it can be useful in scenarios where you have two folders within a repository and you want to identify the differences between them, such as missing files or files with newer commits.
The script requires the following environment variables to be set:
REPO_URL
: The URL of the GitHub repository to analyze. [MANDATORY]REPO_FOLDER_1
: The name of the reference folder (source of truth, or folderA
). [MANDATORY]REPO_FOLDER_2
: The name of the second folder to compare to the reference folder (folderB
). [MANDATORY]TOKEN
: An access token with appropriate permissions to read and open issues on the target repo. [MANDATORY]FOLDER_1_BRANCH
: The branch for the first folder. If not specified, the default is main [OPTIONAL]FOLDER_2_BRANCH
: The branch for the second folder. If not specified, the default is main [OPTIONAL]OPEN_ISSUE
: If set totrue
, this specify that the script needs to open a "synchronization issue" on the target repo, specifying the folder differences. [OPTIONAL]
The opened issues are structured like this one.MULTIPLE_ISSUES
: IfOPEN_ISSUE
is set totrue
and this var is also set totrue
, the script will create multiple issues, one for every file difference. [OPTIONAL]
Warning
Be careful when setting the MULTIPLE_ISSUES
var to true: if you execute this script against two folders with many files, it will create many issues on your target repo.
The script performs the following steps:
- Checks the presence of the required environment variables and their values.
- Creates a GitHub client using the provided access token.
- Retrieve the content of the two specified folders via the Github client object.
- Compares the contents of the two specified folders within the repository.
- Prints the files that are present in the first folder but not in the second folder.
- Prints the files with newer commits in the first folder compared to the same files in the second folder.
- Prints the files that are present in the second folder but not in the first folder.
- If
OPEN_ISSUE
env var is present and set totrue
, opens a "synchronization issue" on the target repo.
You can run this utility in many ways:
Download the release that you want and run it:
export REPO_URL=https://github.com/R3DRUN3/content-sync-tester
export REPO_FOLDER_1=en
export REPO_FOLDER_2=it
export TOKEN=<your-github-token-here>
./github-content-sync
Output:
__ __ _____ _ __ _ __ ___ __ _ _ __ _____ ___ _ __ _____ ___ _ __ _ __ __
,'_/ / //_ _/ /// / /// / / o.) ,'_/ ,' \ / |/ //_ _/ / _/ / |/ //_ _/ ,' _/ | |/,' / |/ / ,'_/
/ /_n / / / / / ` / / U / / o \ / /_ / o | / || / / / / _/ / || / / / _\ `. | ,' / || / / /_
|__,'/_/ /_/ /_n_/ \_,' /___,' |__/ |_,' /_/|_/ /_/ /___/ /_/|_/ /_/ /___,' /_/ /_/|_/ |__/
[ ALL ENVIRONMENT VARIABLES ARE CONFIGURED ]
[ TARGET REPO URL: https://github.com/R3DRUN3/content-sync-tester ]
[ FILES PRESENT IN en BUT NOT IN it ]
not_present_in_it.md
not_present_in_it_2.md
test.md
[ FILES PRESENT IN BOTH en AND it WITH NEWER COMMITS IN en ]
doc2.md
last.md
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
/__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__//__/
This repo also contain a Dockerfile so you can launch the script as a docker container.
Clone the repo locally and buil the image:
git clone https://github.com/r3drun3/github-content-sync \
&& cd github-content-sync \
&& docker build -t github-content-sync:latest .
Run the docker container (change env vars accordingly):
docker run -it --rm -e REPO_URL=https://github.com/cncf/glossary -e REPO_FOLDER_1=content/en -e REPO_FOLDER_2=content/it -e TOKEN=<your-github-token-here> github-content-sync:latest
Alternatively, this repo already contains an action to publish the script's OCI image to Github Packages.
Pull the version that you want:
docker pull ghcr.io/r3drun3/github-content-sync:1.5.0
Run the docker container (change env vars accordingly):
docker run -it --rm -e REPO_URL=https://github.com/cncf/glossary -e REPO_FOLDER_1=content/en -e REPO_FOLDER_2=content/it -e TOKEN=<your-github-token-here> ghcr.io/r3drun3/github-content-sync:1.5.0
The script in this repo can also executed inside a Github action, for an example take a look at the goaction Github Action associated to this repo.
For development and debug I suggest the use of the VS Code IDE.
In order to debug the script locally, you can create the .vscode/launch.json
file with the following structure:
{
"version": "0.2.0",
"configurations": [
{
"name": "Launch",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/main.go",
"env": {
"REPO_URL": "<your-github-repo-target-url>",
"REPO_FOLDER_1": "<path-of-the-reference-folder-inside-target-repo>",
"REPO_FOLDER_2": "<path-of-the-folder-to-compare-to-the-reference>",
"TOKEN": "<your-github-token-here>",
"OPEN_ISSUE": "false",
"MULTIPLE_ISSUES": "false"
}
}
]
}
- It can be useful to maybe add the possibility of comparing multiple folders at the same time, not just 2.
This script is released under the MIT License.
Feel free to modify and distribute it as per your needs.