-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Diff-tool -> Comparing directory files #8447
Conversation
generation/diff_directory.sh
Outdated
#!/bin/bash | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already mentioned in diff_files.sh usage that diff_directory.sh is just a helper script. You do not need any instructions to run it. It will just be called by diff_files.sh and do its work, you do not need to invoke/run it yourself.
I have included these instructions in the comments for this script now.
generation/diff_files.sh
Outdated
# Run the stage job for google-cloud-java, on a branch which does not have any snapshot versions in it. | ||
# search the stage job logs for the latest-repo-ids and edit them below under `cloudRepoId`, `apiRepoId`, `analyticsRepoId` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe these as steps?
Step 1 . Run ...
Step 2. Check log for ...
Step 3. Update xxx
Step 4. Run diff_files.sh
generation/diff_files.sh
Outdated
# This script calls ./generation/diff_directory.sh for every pair, but you do not need to do anything. You just need to run this script. | ||
# Search total-diff.txt for any artifact, and it will show you a complete scenario, what all files exist etc etc. | ||
|
||
set -x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove '-x'? When someone needs "-x", they can do it with "bash -x ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
Can you address the usage parts? I don't want you to continue enhancing features in one big pull request. |
The previous commit was not "enhancing" features. It is about covering more modules for a more accurate result. About the usage, I have explained it better now. |
generation/diff_files.sh
Outdated
|
||
## HOW TO USE THIS SCRIPT ## | ||
# 1. Run the stage job for google-cloud-java, on a branch which does not have any snapshot versions in it. | ||
# 2. search the stage job logs for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 2. search the stage job logs for | |
# 2. Search the stage job logs for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be consistent about capitalization.
# search the stage job logs for the latest-repo-ids and edit them below under `cloudRepoId`, `apiRepoId`, `analyticsRepoId` | ||
|
||
## HOW TO USE THIS SCRIPT ## | ||
# 1. Run the stage job for google-cloud-java, on a branch which does not have any snapshot versions in it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does mean mean we merge the Release Please pull request? Or manually via Fusion. (Update source code comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge the release-please pull request when it has all the modules in it, right now it does not. Doing it manually via fusion on a branch which does not have any -snapshot version in it is the only way we have right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update source code comment
## This is a helper script invoked by ./generation/diff_files.sh | ||
## All the inputs to this script are provided by diff_files.sh | ||
## You do not need to do anything for this script to run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain the arguments and output, even when you don't think you'll invoke this script separately now.
generation/diff_directory.sh
Outdated
sed -n 's/.*href="\([^"]*\).*/\1/p' mavenFile >mavenContents.txt | ||
sed -n 's/.*href="\([^"]*\).*/\1/p' sonatypeFile >sonatypeContents.txt | ||
|
||
awk "/$4/" sonatypeContents.txt >temp.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create the variables first. Name the arguments. Don't reference them as numbers (e.g., '$4') after that.
wget -O sonatypeFile --recursive -nd --no-parent $2 | ||
|
||
wget -O mavenFile --referer --recursive -nd --no-parent \ | ||
--header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain why User-Agent matters
# search the stage job logs for the latest-repo-ids and edit them below under `cloudRepoId`, `apiRepoId`, `analyticsRepoId` | ||
|
||
## HOW TO USE THIS SCRIPT ## | ||
# 1. Run the stage job for google-cloud-java, on a branch which does not have any snapshot versions in it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update source code comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing a lot of errors like this when running the script:
--2022-09-26 18:09:51-- https://repo1.maven.org/maven2/com/google/cloud/google-cloud-bigqueryreservation-parent/2.4.4/
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 2200 (2.1K) [text/html]
Saving to: ‘mavenFile’
mavenFile 100%[=======================================================================================================================================>] 2.15K --.-KB/s in 0s
2022-09-26 18:09:51 (27.0 MB/s) - ‘mavenFile’ saved [2200/2200]
sed: can't read 1d: No such file or directory
cat: finalSonatype.txt: No such file or directory
diff: finalSonatype.txt: No such file or directory
diff: finalSonatype.txt: No such file or directory
WARNING: combining -O with -r or -p will mean that all downloaded content
will be placed in the single file you specified.
This error is generated when you run it on a branch with SNAPSHOT versions in it. You can verify this in the logs if you see a staging directory URL with a snapshot in it. @meltsufin Which branch are you running this on? It will only work on |
Merge it. |
This PR introduces 2 scripts:
Attaching the output of this script:
diff-files-summary.txt -> This is the output file generated, which summarizes the diff. For each artifact, it tells if the files are same (success) and if not, tells which files differ. (If you look for errorreporting, there is a failure, which is a valid failure)
diff-files-summary.txt
total-diff.txt -> This is the output generated for one complete run. There is a section for each artifact, which describes the files and URL for both the sources (maven and sonatype). This is more like a full-log.
total-diff.txt
Also, just as a sanity check, you can pick up any artifact from total-diff.txt, copy paste both the URL's given into your browser, and check manually if the files are same or not.
Please let me know if this what we need, or do I need to do something else.