-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checks the ETag of the files being requested to prevent downloading a… #1383
Conversation
…nd replacing files with no updates. aws-cloudformation#1249
src/cfnlint/helpers.py
Outdated
@@ -165,10 +166,52 @@ | |||
'AWS::Redshift::Cluster' | |||
] | |||
|
|||
CACHING_DIR = os.path.join(os.path.dirname(__file__), 'cache/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can easily be changed to a different directory or moved into a config option if desired. I'm also unsure of the consequences of using __file__
rather than cfnlint.__file__
as is used in some places, but there doesn't seem to be a consistent approach being used, and there were circular dependency issues with the latter option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it should be under the data folder? Would we want to commit that file as we do the update-specs
so the most recent version is included in the package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy for it to live under the data folder; the primary reason for using a separate cache directory was just to signify the temporary, non-essential nature of the files. It definitely could be commited and used like a lock file, and would save an initial fetch of every spec file when it's first run.
import random | ||
import string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can both be removed if people were fine with hard-coding the ETag value in the tests. I just used them to generate a random string.
With this pull request I'm seeing a reduction from ~33 seconds to download all the specifications to ~20 seconds when none of them need redownloading. Not exactly the kind of increase I was hoping for, but still an improvement, and I'm sure someone can make it even better. |
Especially as the number of regions continues to grow, I think updating the regions in parallel will be the main key to keeping |
I think parallelization is a great idea here. Another option would be to combine the region and update specs together. Allowing for a user to filter down to the specs they care about. By default we would have to keep the all which may make this a little trickier to implement. |
I'm happy to give parallelisation a go, is there any particular parellelisation module you'd prefer to use? The constant I/O of the downloads metadata file concerns me, both from a mutex point-of-view, and the inefficiency of reading and writing a whole file for a single line change. What are your thoughts on having an individual metadata file per a canonical url, i.e. |
I've added parallelisation and changed some basic things like the directory the caching files were being pulled into. Updating all files speed:
Subsequent update with no specs to update (as per cache check):
|
Is pip3 install -e . --user
cfn-lint --update-specs
git status |
It should be making changes (assuming there are any to make), so I'm unsure why it's not generating any for you. At the very least it should be creating the
|
@PatMyron this looking good to merge? |
...DownloadsMetadata/1c9ead4af49b3a8f39632f5a30578ead5310da0b5a68ae4cf93b4be6a9a05278.meta.json
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #1383 +/- ##
==========================================
+ Coverage 87.52% 87.54% +0.01%
==========================================
Files 156 156
Lines 8731 8782 +51
Branches 2095 2101 +6
==========================================
+ Hits 7642 7688 +46
- Misses 654 657 +3
- Partials 435 437 +2
Continue to review full report at Codecov.
|
…sn't identify python2 vs 3 paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor changes and this looks good to me :)
...DownloadsMetadata/1c9ead4af49b3a8f39632f5a30578ead5310da0b5a68ae4cf93b4be6a9a05278.meta.json
Outdated
Show resolved
Hide resolved
@kddejong @miparnisari if the resource specifications haven't changed, some of the other data sources won't be applied without modifying pip3 install -e .
scripts/update_specs_from_pricing.py # requires Boto3 and Credentials
cfn-lint --update-specs # won't apply changes if metadata matches |
@PatMyron Is the problem the fact that # Check to see if we already have the latest version, otherwise download it
if url_has_newer_version(url):
spec_content = get_url_content(url, caching=True)
spec = json.loads(spec_content)
else:
with open(filename, 'r') as f:
spec = json.load(f) This would allow the downloading of updated spec files to be skipped if there are no updates, whilst patching the specs everytime regardless. Alternatively, should we separate the patching of the specs, and include the patching with the scripts you are running? |
Yeah pretty much.
Need some way for manually written patches to be applied in addition to patches produced by the scripts |
Checks the ETag of the files being requested to prevent downloading and replacing files with no updates. #1249
Description of changes: Added caching functionality to the
get_url_content()
helper function, that stores and compares ETag values in the response headers.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.