Checks the ETag of the files being requested to prevent downloading a… #1383

Tro95 · 2020-02-24T18:01:43Z

Checks the ETag of the files being requested to prevent downloading and replacing files with no updates. #1249

Description of changes: Added caching functionality to the get_url_content() helper function, that stores and compares ETag values in the response headers.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…nd replacing files with no updates. aws-cloudformation#1249

Tro95 · 2020-02-24T18:37:21Z

src/cfnlint/helpers.py

@@ -165,10 +166,52 @@
    'AWS::Redshift::Cluster'
 ]

+CACHING_DIR = os.path.join(os.path.dirname(__file__), 'cache/')


This can easily be changed to a different directory or moved into a config option if desired. I'm also unsure of the consequences of using __file__ rather than cfnlint.__file__ as is used in some places, but there doesn't seem to be a consistent approach being used, and there were circular dependency issues with the latter option.

Wondering if it should be under the data folder? Would we want to commit that file as we do the update-specs so the most recent version is included in the package?

I'm happy for it to live under the data folder; the primary reason for using a separate cache directory was just to signify the temporary, non-essential nature of the files. It definitely could be commited and used like a lock file, and would save an initial fetch of every spec file when it's first run.

Tro95 · 2020-02-24T18:39:07Z

test/unit/module/helpers/test_get_url_content.py

+import random
+import string


These can both be removed if people were fine with hard-coding the ETag value in the tests. I just used them to generate a random string.

Tro95 · 2020-02-24T18:41:35Z

With this pull request I'm seeing a reduction from ~33 seconds to download all the specifications to ~20 seconds when none of them need redownloading. Not exactly the kind of increase I was hoping for, but still an improvement, and I'm sure someone can make it even better.

PatMyron · 2020-02-25T17:14:11Z

With this pull request I'm seeing a reduction from ~33 seconds to download all the specifications to ~20 seconds when none of them need redownloading. Not exactly the kind of increase I was hoping for, but still an improvement, and I'm sure someone can make it even better.

Especially as the number of regions continues to grow, I think updating the regions in parallel will be the main key to keeping cfn-lint --update-specs completion times within a couple seconds

https://github.com/aws-cloudformation/cfn-python-lint/blob/52864add9eac85a42cc6866cdf05c4d3b4dcd149/src/cfnlint/maintenance.py#L50

kddejong · 2020-03-07T14:11:52Z

I think parallelization is a great idea here. Another option would be to combine the region and update specs together. Allowing for a user to filter down to the specs they care about. By default we would have to keep the all which may make this a little trickier to implement.

Tro95 · 2020-03-07T19:40:39Z

I'm happy to give parallelisation a go, is there any particular parellelisation module you'd prefer to use? The constant I/O of the downloads metadata file concerns me, both from a mutex point-of-view, and the inefficiency of reading and writing a whole file for a single line change. What are your thoughts on having an individual metadata file per a canonical url, i.e. <sha256 of url>.meta.json holding data specifically related to that url, rather than a single, centralised file?

…thon-lint into cfn-issue-1249

Tro95 · 2020-03-10T17:31:14Z

I've added parallelisation and changed some basic things like the directory the caching files were being pulled into.

Updating all files speed:

$ time cfn-lint --update-specs

real	0m11.139s
user	0m22.762s
sys	0m2.131s

Subsequent update with no specs to update (as per cache check):

$ time cfn-lint --update-specs

real	0m5.726s
user	0m11.829s
sys	0m1.781s

src/cfnlint/maintenance.py

PatMyron · 2020-04-07T07:02:55Z

Is cfn-lint --update-specs still generating src/cfnlint/data/CloudSpecs/ changes like the ones here?

pip3 install -e . --user
cfn-lint --update-specs
git status

Tro95 · 2020-04-07T08:09:58Z

It should be making changes (assuming there are any to make), so I'm unsure why it's not generating any for you. At the very least it should be creating the src/cfnlint/data/DownloadsMetadata/ directory and files within that.

$ git show
commit 706369e0e0aacbe90c150c1a7b761d880ce601c4 (HEAD -> cfn-issue-1249, dev/cfn-issue-1249)
Merge: ab46e27a be3867c6
Author: Kevin DeJong <kddejong@amazon.com>
Date:   Sat Mar 14 07:25:39 2020 -0500

    Merge branch 'master' into cfn-issue-1249

$ git status
On branch cfn-issue-1249
Your branch is up to date with 'dev/cfn-issue-1249'.

nothing to commit, working tree clean
$ pip3 install -e . --user
$ cfn-lint --update-specs
$ git status
On branch cfn-issue-1249
Your branch is up to date with 'dev/cfn-issue-1249'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   src/cfnlint/data/CloudSpecs/ap-east-1.json
	modified:   src/cfnlint/data/CloudSpecs/ap-northeast-1.json
	modified:   src/cfnlint/data/CloudSpecs/ap-northeast-2.json
	modified:   src/cfnlint/data/CloudSpecs/ap-northeast-3.json
	modified:   src/cfnlint/data/CloudSpecs/ap-south-1.json
	modified:   src/cfnlint/data/CloudSpecs/ap-southeast-1.json
	modified:   src/cfnlint/data/CloudSpecs/ap-southeast-2.json
	modified:   src/cfnlint/data/CloudSpecs/ca-central-1.json
	modified:   src/cfnlint/data/CloudSpecs/cn-north-1.json
	modified:   src/cfnlint/data/CloudSpecs/cn-northwest-1.json
	modified:   src/cfnlint/data/CloudSpecs/eu-central-1.json
	modified:   src/cfnlint/data/CloudSpecs/eu-north-1.json
	modified:   src/cfnlint/data/CloudSpecs/eu-west-1.json
	modified:   src/cfnlint/data/CloudSpecs/eu-west-2.json
	modified:   src/cfnlint/data/CloudSpecs/eu-west-3.json
	modified:   src/cfnlint/data/CloudSpecs/me-south-1.json
	modified:   src/cfnlint/data/CloudSpecs/sa-east-1.json
	modified:   src/cfnlint/data/CloudSpecs/us-east-1.json
	modified:   src/cfnlint/data/CloudSpecs/us-east-2.json
	modified:   src/cfnlint/data/CloudSpecs/us-gov-east-1.json
	modified:   src/cfnlint/data/CloudSpecs/us-gov-west-1.json
	modified:   src/cfnlint/data/CloudSpecs/us-west-1.json
	modified:   src/cfnlint/data/CloudSpecs/us-west-2.json

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	src/cfnlint/data/DownloadsMetadata/

no changes added to commit (use "git add" and/or "git commit -a")

…thon-lint into cfn-issue-1249

PatMyron · 2020-04-08T03:57:45Z

@Tro95 sorry, I forgot I was testing the homebrew version recently, was able to generate changes here after uninstalling that

kddejong · 2020-04-08T15:09:56Z

@PatMyron this looking good to merge?

test/unit/module/helpers/test_get_url_content.py

src/cfnlint/maintenance.py

src/cfnlint/helpers.py

...DownloadsMetadata/1c9ead4af49b3a8f39632f5a30578ead5310da0b5a68ae4cf93b4be6a9a05278.meta.json

src/cfnlint/helpers.py

src/cfnlint/maintenance.py

codecov · 2020-04-10T19:37:41Z

Codecov Report

Merging #1383 into master will increase coverage by 0.01%.
The diff coverage is 91.93%.

@@            Coverage Diff             @@
##           master    #1383      +/-   ##
==========================================
+ Coverage   87.52%   87.54%   +0.01%     
==========================================
  Files         156      156              
  Lines        8731     8782      +51     
  Branches     2095     2101       +6     
==========================================
+ Hits         7642     7688      +46     
- Misses        654      657       +3     
- Partials      435      437       +2

Impacted Files	Coverage Δ
src/cfnlint/helpers.py	`78.60% <87.80%> (+1.76%)`	⬆️
src/cfnlint/maintenance.py	`98.00% <100.00%> (+0.27%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91b4586...5c309d6. Read the comment docs.

…thon-lint into cfn-issue-1249

…sn't identify python2 vs 3 paths

miparnisari

Two minor changes and this looks good to me :)

...DownloadsMetadata/1c9ead4af49b3a8f39632f5a30578ead5310da0b5a68ae4cf93b4be6a9a05278.meta.json

src/cfnlint/helpers.py

src/cfnlint/maintenance.py

PatMyron · 2020-05-18T00:57:45Z

@kddejong @miparnisari if the resource specifications haven't changed, some of the other data sources won't be applied without modifying src/cfnlint/data/DownloadsMetadata:

pip3 install -e .
scripts/update_specs_from_pricing.py # requires Boto3 and Credentials
cfn-lint --update-specs # won't apply changes if metadata matches

Tro95 · 2020-05-18T10:36:06Z

@kddejong @miparnisari if the resource specifications haven't changed, some of the other data sources won't be applied without modifying src/cfnlint/data/DownloadsMetadata:
pip3 install -e .
scripts/update_specs_from_pricing.py # requires Boto3 and Credentials
cfn-lint --update-specs # won't apply changes if metadata matches

@PatMyron Is the problem the fact that --update-specs does both the downloading of updated specs, and the merging of the specs with the patches produced by the scripts? If so, the fix should be relatively straight-forward, by replacing https://github.com/aws-cloudformation/cfn-python-lint/blob/cbf972fcb3d45dd71bdac4575e81433cbd36dcc6/src/cfnlint/maintenance.py#L46-L53 with something like

    # Check to see if we already have the latest version, otherwise download it
    if url_has_newer_version(url):
        spec_content = get_url_content(url, caching=True)
        spec = json.loads(spec_content)
    else:
        with open(filename, 'r') as f:
            spec = json.load(f)

This would allow the downloading of updated spec files to be skipped if there are no updates, whilst patching the specs everytime regardless. Alternatively, should we separate the patching of the specs, and include the patching with the scripts you are running?

PatMyron · 2020-05-18T16:11:27Z

Is the problem the fact that --update-specs does both the downloading of updated specs, and the merging of the specs with the patches produced by the scripts?

Yeah pretty much. --update-specs also merges the specs with manually written patches in addition to patches produced by the scripts

Alternatively, should we separate the patching of the specs, and include the patching with the scripts you are running?

Need some way for manually written patches to be applied in addition to patches produced by the scripts

… hasn't changed #1383 (comment)

Thomas O'Brien added 2 commits February 24, 2020 17:54

Checks the ETag of the files being requested to prevent downloading a…

babfafc

…nd replacing files with no updates. aws-cloudformation#1249

Fixing the pylint errors

537b102

Tro95 commented Feb 24, 2020

View reviewed changes

kddejong requested a review from PatMyron February 28, 2020 01:41

Tro95 added 2 commits March 10, 2020 17:25

Parallelisation and minor refactor

480d7c4

Merge branch 'master' of https://github.com/aws-cloudformation/cfn-py…

c902ea6

…thon-lint into cfn-issue-1249

Tro95 commented Mar 10, 2020

View reviewed changes

src/cfnlint/maintenance.py Show resolved Hide resolved

Tro95 and others added 2 commits March 11, 2020 10:18

Fixing Python2.7 update_resource_specs()

ab46e27

Merge branch 'master' into cfn-issue-1249

706369e

Tro95 added 2 commits April 7, 2020 09:13

Merge branch 'master' of https://github.com/aws-cloudformation/cfn-py…

f7e0e5b

…thon-lint into cfn-issue-1249

Adding the downloads metadata directory and the associated cache files

ff8ed45

miparnisari suggested changes Apr 8, 2020

View reviewed changes

Tro95 added 2 commits April 10, 2020 19:35

Typo

20cc9b6

Improving code quality

d095af4

Tro95 added 5 commits April 10, 2020 20:38

Merge branch 'master' of https://github.com/aws-cloudformation/cfn-py…

650e2a3

…thon-lint into cfn-issue-1249

More tests to improve coverage

06823d1

Adding tests for the download metadata methods

81f9043

Adding tests for update_resource_secs

10ca234

Testing all paths of update_resource_specs explicitly, as CodeCov doe…

29a4717

…sn't identify python2 vs 3 paths

Tro95 requested a review from miparnisari April 11, 2020 11:06

miparnisari suggested changes Apr 12, 2020

View reviewed changes

...DownloadsMetadata/1c9ead4af49b3a8f39632f5a30578ead5310da0b5a68ae4cf93b4be6a9a05278.meta.json Outdated Show resolved Hide resolved

src/cfnlint/helpers.py Outdated Show resolved Hide resolved

src/cfnlint/maintenance.py Show resolved Hide resolved

Tro95 added 2 commits April 12, 2020 10:28

Removing metadata files from codebase

9c5e417

Renaming metadata methods

b4f6424

Tro95 requested a review from miparnisari April 12, 2020 13:22

miparnisari approved these changes Apr 12, 2020

View reviewed changes

miparnisari mentioned this pull request Apr 16, 2020

Speed up "update-specs" #1410

Closed

Tro95 and others added 5 commits April 17, 2020 09:04

Merge branch 'master' into cfn-issue-1249

93a8657

Merge branch 'master' into cfn-issue-1249

ae96ced

Merge branch 'master' into cfn-issue-1249

26bb78a

Merge branch 'master' into cfn-issue-1249

daca1d7

Merge branch 'master' into cfn-issue-1249

5c309d6

kddejong merged commit 3b706b2 into aws-cloudformation:master Apr 24, 2020

Tro95 deleted the cfn-issue-1249 branch May 1, 2020 12:17

PatMyron added a commit that referenced this pull request May 29, 2020

--update-specs even if official CloudFormation Resource Specification…

935ec67

… hasn't changed #1383 (comment)

PatMyron mentioned this pull request May 29, 2020

--update-specs even if official CloudFormation Resource Specification hasn't changed #1554

Closed

PatMyron added a commit that referenced this pull request May 29, 2020

--update-specs even if official CloudFormation Resource Specification…

2deb44a

… hasn't changed #1383 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checks the ETag of the files being requested to prevent downloading a… #1383

Checks the ETag of the files being requested to prevent downloading a… #1383

Tro95 commented Feb 24, 2020 •

edited by PatMyron

Loading

Tro95 Feb 24, 2020

kddejong Mar 7, 2020

Tro95 Mar 7, 2020

Tro95 Feb 24, 2020

Tro95 commented Feb 24, 2020

PatMyron commented Feb 25, 2020

kddejong commented Mar 7, 2020

Tro95 commented Mar 7, 2020 •

edited

Loading

Tro95 commented Mar 10, 2020

PatMyron commented Apr 7, 2020 •

edited

Loading

Tro95 commented Apr 7, 2020 •

edited

Loading

PatMyron commented Apr 8, 2020

kddejong commented Apr 8, 2020

codecov bot commented Apr 10, 2020 •

edited

Loading

miparnisari left a comment

PatMyron commented May 18, 2020 •

edited

Loading

Tro95 commented May 18, 2020

PatMyron commented May 18, 2020

		import random
		import string

Checks the ETag of the files being requested to prevent downloading a… #1383

Checks the ETag of the files being requested to prevent downloading a… #1383

Conversation

Tro95 commented Feb 24, 2020 • edited by PatMyron Loading

Tro95 Feb 24, 2020

Choose a reason for hiding this comment

kddejong Mar 7, 2020

Choose a reason for hiding this comment

Tro95 Mar 7, 2020

Choose a reason for hiding this comment

Tro95 Feb 24, 2020

Choose a reason for hiding this comment

Tro95 commented Feb 24, 2020

PatMyron commented Feb 25, 2020

kddejong commented Mar 7, 2020

Tro95 commented Mar 7, 2020 • edited Loading

Tro95 commented Mar 10, 2020

PatMyron commented Apr 7, 2020 • edited Loading

Tro95 commented Apr 7, 2020 • edited Loading

PatMyron commented Apr 8, 2020

kddejong commented Apr 8, 2020

codecov bot commented Apr 10, 2020 • edited Loading

Codecov Report

miparnisari left a comment

Choose a reason for hiding this comment

PatMyron commented May 18, 2020 • edited Loading

Tro95 commented May 18, 2020

PatMyron commented May 18, 2020

Tro95 commented Feb 24, 2020 •

edited by PatMyron

Loading

Tro95 commented Mar 7, 2020 •

edited

Loading

PatMyron commented Apr 7, 2020 •

edited

Loading

Tro95 commented Apr 7, 2020 •

edited

Loading

codecov bot commented Apr 10, 2020 •

edited

Loading

PatMyron commented May 18, 2020 •

edited

Loading