-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tarball #1159
Tarball #1159
Conversation
be82a39
to
ec69c47
Compare
0fc7e24
to
622cece
Compare
This class will serve as a base class for creating tarball environment and module types.
Trying to make sure the global cachedir option gets respected by tarball
It's not just a development dependency; it's an r10k dependency too. Really it always has been because it's a puppetforge dependency.
This permits modules of type tarball to be used. Tarball modules will be extracted with no modification into the modules target directory. Tarballs should not be packed to include a top-level containing folder. modules: lwf-remote_file: type: tarball source: https://forge.puppet.com/v3/files/lwf-remote_file-1.1.3.tar.gz version: bce01b849631926b452900eff5dc385c893c4320c95b290f992e3c529f417f0e
This permits core environment content to be sourced from tarballs. The tarball environment type supports defining modules as well. --- production: type: tarball source: https://example.com/puppet-code-env-1.0.0.tar.gz version: 99a906c99c2f144de43f2ae500509a7474ed11c583fb623efa8e5b377a3157f0
These aren't fully featured tests, but it at least gets the skeleton up and structured.
I'm honestly not sure what this does, but it seems to be the cool new thing; all the other modules have it.
lib/r10k/tarball.rb
Outdated
|
||
# @!attribute [r] sha256digest | ||
# @return [String] The tarball's expected sha256 digest | ||
attr_accessor :sha256digest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just call this sha256
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm leaning towards version
being the term that gets documented.
If we shorten it, I think digest
(or the commonly used but slightly more technically ambiguous checksum
) would be the preferred shortening. That's the part that conveys what this parameter is really about—the output of a hash function which can be used to validate the expected content.
I went with the very specific and technical sha256digest
because I was trying to avoid the user needing to look up which algorithm to use for the checksum
. However, we could probably expose that later by offering a checksum_algorithm
if a use case arose for it.
I've put in a commit to change this to just checksum
for now.
lib/r10k/tarball.rb
Outdated
# tarball content | ||
# | ||
# @return [Boolean] | ||
def insync?(target_dir, purge: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a yardoc for the purge
param? It's not super clear to me what it's being used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the parameter to ignore_untracked_files
and added a yarddoc.
lib/r10k/tarball.rb
Outdated
# tarball content | ||
# | ||
# @return [Boolean] | ||
def insync?(target_dir, purge: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def insync?(target_dir, purge: true) | |
def in_sync?(target_dir, purge: true) |
don't love smushing words together 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I tend to agree, but insync
as a term seems to be the r10k convention. As such, I think it's maybe best to leave it this way in service to the greater goal of consistency.
lib/r10k/tarball.rb
Outdated
# | ||
# @return [Boolean] | ||
def insync?(target_dir, purge: true) | ||
target_dir_entries = Find.find(target_dir).map(&:to_s) - [target_dir] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at https://ruby-doc.org/stdlib-2.6.3/libdoc/find/rdoc/Find.html, I don't think this is giving you a collection of entries. You might just want Dir[File.join(target_dir, '**', '*')]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look into this, and while I'm not 100% sure why it works this way, it seems that Find.find()
returns an iterator that will hit all the files it is supposed to, and if that iterator is passed to #map
, it does what's expected. The end result is an array of strings.
The globbing behavior is actually not complete, as it misses dot-files. Given that Find.find
is working, I think it's best to leave it. I suspect it's more reliable than globbing can be.
lib/r10k/tarball.rb
Outdated
tempfile.close | ||
tempfile.unlink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be in an ensure
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that would be preferred. Pushed a commit to address this by switching to Tempfile::open
with a block.
lib/r10k/tarball.rb
Outdated
unless sha256digest.nil? | ||
File.exist?(cache_path) && sha256digest == file_digest(cache_path) | ||
else | ||
File.exist?(cache_path) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would switch these around and use an if
instead of unless
. Using unless
with else
is hard to read imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you could just do
unless sha256digest.nil? | |
File.exist?(cache_path) && sha256digest == file_digest(cache_path) | |
else | |
File.exist?(cache_path) | |
end | |
return false unless File.exist?(cache_path) | |
return false if sha256digest && sha256digest != file_digest(cache_path) | |
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Adding a commit to simplify.
lib/r10k/tarball.rb
Outdated
# | ||
# @param reader [String] An object that responds to #read | ||
# @return [String] The read data's sha256 hex digest | ||
def reader_digest(reader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to put these digest methods into some utility class that can be shared? They don't necessarily seem specific to this tarball feature.
Also I think they should include sha256
in the names, since digest
is fairly general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names were originally left generic in order to be applicable in the event the digest algorithm ever changes, or is made selectable per-tarball. I've tried making that more evident by using a @checksum_algorithm
instance variable, and references such as the following.
# At this time, the only checksum type supported is sha256. In the future,
# we may decide to support other algorithms if a use case arises. TBD.
@checksum_algorithm = :SHA256
def reader_digest(reader)
digest = Digest(@checksum_algorithm)
lib/r10k/tarball.rb
Outdated
digest.hexdigest | ||
end | ||
|
||
def sanitized_dirname(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this defined elsewhere? We should probably pull it into a place that can be shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an R10K::Util::Cacheable
module to share this method.
lib/r10k/tarball.rb
Outdated
string.gsub(/[^@\w\.-]/, '-') | ||
end | ||
|
||
def http_get(uri, method: Net::HTTP::Get, redirect_limit: 10, &block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want this to be a generic http request method, I would change the method name to reflect that. Otherwise, if you want to keep it specific to GET, I would remove method as a param.
This also seems like something that should be shared, rather than specific to this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed method:
as a user opt, and hard-coded to Net::HTTP::Get
.
Regarding sharing, at the moment I don't know of anything else in R10k that needs to download files this way. I've added a comment to the method which clarifies exactly what it is doing:
# Start a Net::HTTP::Get connection, then yield the Net::HTTPSuccess object
# to the caller's block. Follow redirects if Net::HTTPRedirection responses
# are encountered, and honor settings[:proxy].
I think this could potentially be generalized for sharing, but I'm hesitant to break it off unless/until there's another consumer in r10k that wants to use it, for fear of over engineering. As near as I can tell R10K currently offloads all download-esque activity to other libraries. I'm considering Net::HTTP
to be such a library, though, it's probably possible to use Faraday instead... I think we already have that as a dependency, if it gets us something Net::HTTP
doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing Faraday, it looks like that library can do everything we need it to and more succinctly than Net::HTTP
. Given that it's already an r10k dependency, the right thing to do may be to reduce this method to using Faraday instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a stab at using Faraday, but it looks like:
- We want/need streaming downloads, so I think we need a newer version of Faraday than r10k currently supports, due to the puppet-forge gem dependency constraints.
- Using the newer version of Faraday, we run into a bit of a problem for testing; I can't figure out how to make Faraday's test stubs work with streaming downloads.
If these two problems didn't exist, it seems like we could fairly handily offload this to Faraday. I'm stopping that line of investigation for now though. Seems pointless given item 1.
If it did work, it might look something like this.
# @param uri [URI] The URI to download from
# @param output The file or path to save to
# @return [String] The downloaded file's sha256 hex digest
def download_to_file(uri, output)
digest = DIGEST_CLASS.new
File.open(output, 'wb') do |output_stream|
faraday_connection.get(uri) do |req|
req.options.on_data = Proc.new do |chunk, _|
output_stream.write(chunk)
digest.update(chunk)
end
end
end
digest.hexdigest
end
# Defined separately to enable test stubbing
def faraday_connection
Faraday.new do |faraday|
faraday.response(:follow_redirects)
faraday.adapter(:net_http)
faraday.proxy(settings[:proxy]) unless settings[:proxy].nil?
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, last tweak for now: went ahead and moved all the file download/copy/checksum related methods out of R10K::Tarball
and into a new utility module, R10K::Util::Downloader
.
ugh not sure how my review comments got submitted before I was done.. More to come.. 😅 also @reidmv if/when you make updates, if you could make them in separate commits, that would be helpful for review (and then we can squash where relevant at the end). |
Per PR review suggestion
110ba3f
to
b1e30bc
Compare
Method used by both R10K::Git::Cache and R10K::Tarball
Co-authored-by: Molly Waggett <mwaggett@alumni.reed.edu>
We aren't doing anything besides Get, so just hard-code it for ease of understanding. The comment clarifies what this method does, and how it is used by consuming methods.
R10K::Tarball only supports sha256 at this time, but the door is being left open to potentially support other types of checksums in the future.
Co-authored-by: Molly Waggett <mwaggett@alumni.reed.edu>
Fix proxy and other setting propogation for R10K::Tarball. Use R10K::Util::Cacheable to centrally default a default cachedir setting. Can't set the default centrally yet because that might invalidate existing caches, may need to do it on a major version change.
For consistency, clarity
Taking methods originally defined in R10K::Tarball and placing them into a sharable utility module.
This completes a move of generalizable methods from R10K::Tarball into the sharable utility module R10K::Util::Downloader
Just name cache files by either their checksum (if known), or download URL (if no checksum given). While it's not gonna be super easy to eyeball the cache and know what is or isn't there, it better guarantees that the cache is fully efficient.
Previously, R10K::Module::Tarball#statically_defined_version only accepted the :version key. However, :checksum also constitutes a statically defined version, if it is passed. Thought: #statically_defined_version should be called after #initialize. Otherwise it's repeating work that's handled setting instance variables.
@mwaggett ready for re-review! I solemnly promise to leave this alone and stop fiddling with it in my spare time until then.* 😇 * Unlikely to happen but if the mood strikes me I will not refrain from adding additional tests. |
This commit implements tests for R10K::Util::Downloader#http_get, and fixes bugs found while doing that.
I noticed that after the first review, this PR expanded from being a fairly new-files-only change into something that modifies how some existing structures work. @mwaggett if it's helpful for review and approval, I could probably break the PR apart into smaller requests, and split the structure changing stuff from the new feature stuff. Let me know. |
Closing in favor of #1244 |
This PR implements:
Example usage:
Later enhancements built on top of this might include Artifactory types, or S3 types, with options for real versions, depending on how the remote repository works.
Solves #567