Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Diff-tool -> Comparing directory files #8447

Merged
merged 5 commits into from
Sep 27, 2022
Merged

Conversation

ddixit14
Copy link
Contributor

@ddixit14 ddixit14 commented Sep 23, 2022

This PR introduces 2 scripts:

  1. diff_files.sh -> This is the main script for comparing the files on maven central v/s google-sonatype. This calls the other script, diff_directory.sh for each artifact once.
  2. diff_directory.sh -> helper script

Attaching the output of this script:

  1. diff-files-summary.txt -> This is the output file generated, which summarizes the diff. For each artifact, it tells if the files are same (success) and if not, tells which files differ. (If you look for errorreporting, there is a failure, which is a valid failure)
    diff-files-summary.txt

  2. total-diff.txt -> This is the output generated for one complete run. There is a section for each artifact, which describes the files and URL for both the sources (maven and sonatype). This is more like a full-log.
    total-diff.txt

Also, just as a sanity check, you can pick up any artifact from total-diff.txt, copy paste both the URL's given into your browser, and check manually if the files are same or not.

Please let me know if this what we need, or do I need to do something else.

Comment on lines 1 to 2
#!/bin/bash

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already mentioned in diff_files.sh usage that diff_directory.sh is just a helper script. You do not need any instructions to run it. It will just be called by diff_files.sh and do its work, you do not need to invoke/run it yourself.
I have included these instructions in the comments for this script now.

Comment on lines 7 to 8
# Run the stage job for google-cloud-java, on a branch which does not have any snapshot versions in it.
# search the stage job logs for the latest-repo-ids and edit them below under `cloudRepoId`, `apiRepoId`, `analyticsRepoId`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe these as steps?

Step 1 . Run ...
Step 2. Check log for ...
Step 3. Update xxx
Step 4. Run diff_files.sh

# This script calls ./generation/diff_directory.sh for every pair, but you do not need to do anything. You just need to run this script.
# Search total-diff.txt for any artifact, and it will show you a complete scenario, what all files exist etc etc.

set -x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove '-x'? When someone needs "-x", they can do it with "bash -x ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@suztomo
Copy link
Member

suztomo commented Sep 23, 2022

Can you address the usage parts? I don't want you to continue enhancing features in one big pull request.

@ddixit14
Copy link
Contributor Author

ddixit14 commented Sep 23, 2022

Can you address the usage parts? I don't want you to continue enhancing features in one big pull request.

The previous commit was not "enhancing" features. It is about covering more modules for a more accurate result. About the usage, I have explained it better now.


## HOW TO USE THIS SCRIPT ##
# 1. Run the stage job for google-cloud-java, on a branch which does not have any snapshot versions in it.
# 2. search the stage job logs for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 2. search the stage job logs for
# 2. Search the stage job logs for

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be consistent about capitalization.

# search the stage job logs for the latest-repo-ids and edit them below under `cloudRepoId`, `apiRepoId`, `analyticsRepoId`

## HOW TO USE THIS SCRIPT ##
# 1. Run the stage job for google-cloud-java, on a branch which does not have any snapshot versions in it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does mean mean we merge the Release Please pull request? Or manually via Fusion. (Update source code comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge the release-please pull request when it has all the modules in it, right now it does not. Doing it manually via fusion on a branch which does not have any -snapshot version in it is the only way we have right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update source code comment

Comment on lines +2 to +4
## This is a helper script invoked by ./generation/diff_files.sh
## All the inputs to this script are provided by diff_files.sh
## You do not need to do anything for this script to run.
Copy link
Member

@suztomo suztomo Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain the arguments and output, even when you don't think you'll invoke this script separately now.

sed -n 's/.*href="\([^"]*\).*/\1/p' mavenFile >mavenContents.txt
sed -n 's/.*href="\([^"]*\).*/\1/p' sonatypeFile >sonatypeContents.txt

awk "/$4/" sonatypeContents.txt >temp.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create the variables first. Name the arguments. Don't reference them as numbers (e.g., '$4') after that.

wget -O sonatypeFile --recursive -nd --no-parent $2

wget -O mavenFile --referer --recursive -nd --no-parent \
--header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain why User-Agent matters

# search the stage job logs for the latest-repo-ids and edit them below under `cloudRepoId`, `apiRepoId`, `analyticsRepoId`

## HOW TO USE THIS SCRIPT ##
# 1. Run the stage job for google-cloud-java, on a branch which does not have any snapshot versions in it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update source code comment

@ddixit14 ddixit14 requested a review from suztomo September 26, 2022 17:36
Copy link
Member

@meltsufin meltsufin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a lot of errors like this when running the script:

--2022-09-26 18:09:51--  https://repo1.maven.org/maven2/com/google/cloud/google-cloud-bigqueryreservation-parent/2.4.4/
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 2200 (2.1K) [text/html]
Saving to: ‘mavenFile’

mavenFile                                                  100%[=======================================================================================================================================>]   2.15K  --.-KB/s    in 0s      

2022-09-26 18:09:51 (27.0 MB/s) - ‘mavenFile’ saved [2200/2200]

sed: can't read 1d: No such file or directory
cat: finalSonatype.txt: No such file or directory
diff: finalSonatype.txt: No such file or directory
diff: finalSonatype.txt: No such file or directory
WARNING: combining -O with -r or -p will mean that all downloaded content
will be placed in the single file you specified.

@ddixit14
Copy link
Contributor Author

ddixit14 commented Sep 26, 2022

I'm seeing a lot of errors like this when running the script:

--2022-09-26 18:09:51--  https://repo1.maven.org/maven2/com/google/cloud/google-cloud-bigqueryreservation-parent/2.4.4/
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 2200 (2.1K) [text/html]
Saving to: ‘mavenFile’

mavenFile                                                  100%[=======================================================================================================================================>]   2.15K  --.-KB/s    in 0s      

2022-09-26 18:09:51 (27.0 MB/s) - ‘mavenFile’ saved [2200/2200]

sed: can't read 1d: No such file or directory
cat: finalSonatype.txt: No such file or directory
diff: finalSonatype.txt: No such file or directory
diff: finalSonatype.txt: No such file or directory
WARNING: combining -O with -r or -p will mean that all downloaded content
will be placed in the single file you specified.

This error is generated when you run it on a branch with SNAPSHOT versions in it. You can verify this in the logs if you see a staging directory URL with a snapshot in it.

@meltsufin Which branch are you running this on? It will only work on staging-branch branch right now, since that's the only one which doesn't have -SNAPSHOT versions in it. Running this script for a valid result is a bit tricky, you have to take these 2 scripts and run it in staging-branch. But I have opened a new PR with just these 2 scripts in it so that we can merge this in, and run it on release-please PR branch. Tomo wanted these scripts to be pushed to main so we can test it on release please PR. And I cannot merge staging-branch as I have manually synced up the versions there, just for testing purposes (did it when release-please wasn't working)

@suztomo
Copy link
Member

suztomo commented Sep 27, 2022

Merge it.

@suztomo suztomo merged commit b08762b into main Sep 27, 2022
@suztomo suztomo deleted the diffDirectoryNew branch September 27, 2022 15:03
suztomo pushed a commit to suztomo/google-cloud-java that referenced this pull request Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants