Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tarball #1159

Closed
wants to merge 26 commits into from
Closed

Tarball #1159

wants to merge 26 commits into from

Conversation

reidmv
Copy link
Contributor

@reidmv reidmv commented May 6, 2021

This PR implements:

  • An R10K::Tarball base class
  • An R10K::Module::Tarball class, providing a "tarball" module type
  • An R10K::Environment::Tarball class, providing a "tarball" environment type

Example usage:

---
production:
  type: tarball
  source: https://example.com/puppet-code-env-v1.0.0.tar.gz
  version: 99a906c99c2f144de43f2ae500509a7474ed11c583fb623efa8e5b377a3157f0 # sha256digest
  modules:
    puppetlabs-peadm:
      type: forge
      version: 2.5.0
    reidmv-xampl:
      type: git
      source: https://github.com/reidmv/reidmv-xampl.git
      version: 62d07f2
    reidmv-mymodule:
      type: tarball
      source: https://example.com/reidmv-mymodule-1.0.0.tar.gz
      version: 6128ada158622cd90f8e1360fb7c2c3830a812d1ec26ddf0db7eb16d61b7293f # sha256digest

Later enhancements built on top of this might include Artifactory types, or S3 types, with options for real versions, depending on how the remote repository works.

Solves #567

@reidmv reidmv force-pushed the tarball branch 4 times, most recently from be82a39 to ec69c47 Compare May 7, 2021 02:26
@reidmv reidmv force-pushed the tarball branch 6 times, most recently from 0fc7e24 to 622cece Compare August 27, 2021 20:51
@reidmv reidmv marked this pull request as ready for review August 27, 2021 21:05
@reidmv reidmv requested a review from a team August 27, 2021 21:05
reidmv added 9 commits August 27, 2021 17:33
This class will serve as a base class for creating tarball environment
and module types.
Trying to make sure the global cachedir option gets respected by tarball
It's not just a development dependency; it's an r10k dependency too.
Really it always has been because it's a puppetforge dependency.
This permits modules of type tarball to be used. Tarball modules will
be extracted with no modification into the modules target directory.
Tarballs should not be packed to include a top-level containing folder.

    modules:
      lwf-remote_file:
        type: tarball
        source: https://forge.puppet.com/v3/files/lwf-remote_file-1.1.3.tar.gz
        version: bce01b849631926b452900eff5dc385c893c4320c95b290f992e3c529f417f0e
This permits core environment content to be sourced from tarballs. The
tarball environment type supports defining modules as well.

    ---
    production:
      type: tarball
      source: https://example.com/puppet-code-env-1.0.0.tar.gz
      version: 99a906c99c2f144de43f2ae500509a7474ed11c583fb623efa8e5b377a3157f0
These aren't fully featured tests, but it at least gets the skeleton up
and structured.
I'm honestly not sure what this does, but it seems to be the cool new
thing; all the other modules have it.
lib/r10k/tarball.rb Outdated Show resolved Hide resolved

# @!attribute [r] sha256digest
# @return [String] The tarball's expected sha256 digest
attr_accessor :sha256digest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just call this sha256?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm leaning towards version being the term that gets documented.

If we shorten it, I think digest (or the commonly used but slightly more technically ambiguous checksum) would be the preferred shortening. That's the part that conveys what this parameter is really about—the output of a hash function which can be used to validate the expected content.

I went with the very specific and technical sha256digest because I was trying to avoid the user needing to look up which algorithm to use for the checksum. However, we could probably expose that later by offering a checksum_algorithm if a use case arose for it.

I've put in a commit to change this to just checksum for now.

# tarball content
#
# @return [Boolean]
def insync?(target_dir, purge: true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a yardoc for the purge param? It's not super clear to me what it's being used for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the parameter to ignore_untracked_files and added a yarddoc.

# tarball content
#
# @return [Boolean]
def insync?(target_dir, purge: true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def insync?(target_dir, purge: true)
def in_sync?(target_dir, purge: true)

don't love smushing words together 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I tend to agree, but insync as a term seems to be the r10k convention. As such, I think it's maybe best to leave it this way in service to the greater goal of consistency.

#
# @return [Boolean]
def insync?(target_dir, purge: true)
target_dir_entries = Find.find(target_dir).map(&:to_s) - [target_dir]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at https://ruby-doc.org/stdlib-2.6.3/libdoc/find/rdoc/Find.html, I don't think this is giving you a collection of entries. You might just want Dir[File.join(target_dir, '**', '*')]?

Copy link
Contributor Author

@reidmv reidmv Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look into this, and while I'm not 100% sure why it works this way, it seems that Find.find() returns an iterator that will hit all the files it is supposed to, and if that iterator is passed to #map, it does what's expected. The end result is an array of strings.

The globbing behavior is actually not complete, as it misses dot-files. Given that Find.find is working, I think it's best to leave it. I suspect it's more reliable than globbing can be.

Comment on lines 120 to 121
tempfile.close
tempfile.unlink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be in an ensure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would be preferred. Pushed a commit to address this by switching to Tempfile::open with a block.

Comment on lines 130 to 134
unless sha256digest.nil?
File.exist?(cache_path) && sha256digest == file_digest(cache_path)
else
File.exist?(cache_path)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would switch these around and use an if instead of unless. Using unless with else is hard to read imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you could just do

Suggested change
unless sha256digest.nil?
File.exist?(cache_path) && sha256digest == file_digest(cache_path)
else
File.exist?(cache_path)
end
return false unless File.exist?(cache_path)
return false if sha256digest && sha256digest != file_digest(cache_path)
return true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Adding a commit to simplify.

#
# @param reader [String] An object that responds to #read
# @return [String] The read data's sha256 hex digest
def reader_digest(reader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to put these digest methods into some utility class that can be shared? They don't necessarily seem specific to this tarball feature.

Also I think they should include sha256 in the names, since digest is fairly general.

Copy link
Contributor Author

@reidmv reidmv Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names were originally left generic in order to be applicable in the event the digest algorithm ever changes, or is made selectable per-tarball. I've tried making that more evident by using a @checksum_algorithm instance variable, and references such as the following.

# At this time, the only checksum type supported is sha256. In the future,
# we may decide to support other algorithms if a use case arises. TBD.
@checksum_algorithm = :SHA256
def reader_digest(reader)
  digest = Digest(@checksum_algorithm)

digest.hexdigest
end

def sanitized_dirname(string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this defined elsewhere? We should probably pull it into a place that can be shared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an R10K::Util::Cacheable module to share this method.

lib/r10k/tarball.rb Outdated Show resolved Hide resolved
@mwaggett mwaggett self-assigned this Aug 30, 2021
string.gsub(/[^@\w\.-]/, '-')
end

def http_get(uri, method: Net::HTTP::Get, redirect_limit: 10, &block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want this to be a generic http request method, I would change the method name to reflect that. Otherwise, if you want to keep it specific to GET, I would remove method as a param.

This also seems like something that should be shared, rather than specific to this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed method: as a user opt, and hard-coded to Net::HTTP::Get.

Regarding sharing, at the moment I don't know of anything else in R10k that needs to download files this way. I've added a comment to the method which clarifies exactly what it is doing:

# Start a Net::HTTP::Get connection, then yield the Net::HTTPSuccess object
# to the caller's block. Follow redirects if Net::HTTPRedirection responses
# are encountered, and honor settings[:proxy].

I think this could potentially be generalized for sharing, but I'm hesitant to break it off unless/until there's another consumer in r10k that wants to use it, for fear of over engineering. As near as I can tell R10K currently offloads all download-esque activity to other libraries. I'm considering Net::HTTP to be such a library, though, it's probably possible to use Faraday instead... I think we already have that as a dependency, if it gets us something Net::HTTP doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing Faraday, it looks like that library can do everything we need it to and more succinctly than Net::HTTP. Given that it's already an r10k dependency, the right thing to do may be to reduce this method to using Faraday instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a stab at using Faraday, but it looks like:

  1. We want/need streaming downloads, so I think we need a newer version of Faraday than r10k currently supports, due to the puppet-forge gem dependency constraints.
  2. Using the newer version of Faraday, we run into a bit of a problem for testing; I can't figure out how to make Faraday's test stubs work with streaming downloads.

If these two problems didn't exist, it seems like we could fairly handily offload this to Faraday. I'm stopping that line of investigation for now though. Seems pointless given item 1.

If it did work, it might look something like this.

# @param uri [URI] The URI to download from
# @param output The file or path to save to
# @return [String] The downloaded file's sha256 hex digest
def download_to_file(uri, output)
  digest = DIGEST_CLASS.new

  File.open(output, 'wb') do |output_stream|
    faraday_connection.get(uri) do |req|
      req.options.on_data = Proc.new do |chunk, _|
        output_stream.write(chunk)
        digest.update(chunk)
      end
    end
  end

  digest.hexdigest
end

# Defined separately to enable test stubbing
def faraday_connection
  Faraday.new do |faraday|
    faraday.response(:follow_redirects)
    faraday.adapter(:net_http)
    faraday.proxy(settings[:proxy]) unless settings[:proxy].nil?
  end
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, last tweak for now: went ahead and moved all the file download/copy/checksum related methods out of R10K::Tarball and into a new utility module, R10K::Util::Downloader.

@mwaggett
Copy link
Contributor

ugh not sure how my review comments got submitted before I was done.. More to come.. 😅

also @reidmv if/when you make updates, if you could make them in separate commits, that would be helpful for review (and then we can squash where relevant at the end).

@reidmv reidmv force-pushed the tarball branch 3 times, most recently from 110ba3f to b1e30bc Compare September 13, 2021 03:23
@reidmv reidmv requested a review from mwaggett September 13, 2021 03:28
reidmv and others added 3 commits September 13, 2021 21:31
Method used by both R10K::Git::Cache and R10K::Tarball
Co-authored-by: Molly Waggett <mwaggett@alumni.reed.edu>
We aren't doing anything besides Get, so just hard-code it for ease of
understanding.

The comment clarifies what this method does, and how it is used by
consuming methods.
reidmv and others added 5 commits September 15, 2021 17:59
R10K::Tarball only supports sha256 at this time, but the door is being
left open to potentially support other types of checksums in the future.
Co-authored-by: Molly Waggett <mwaggett@alumni.reed.edu>
Fix proxy and other setting propogation for R10K::Tarball. Use
R10K::Util::Cacheable to centrally default a default cachedir setting.
Can't set the default centrally yet because that might invalidate
existing caches, may need to do it on a major version change.
Taking methods originally defined in R10K::Tarball and placing them into
a sharable utility module.
This completes a move of generalizable methods from R10K::Tarball into
the sharable utility module R10K::Util::Downloader
Just name cache files by either their checksum (if known), or download
URL (if no checksum given). While it's not gonna be super easy to
eyeball the cache and know what is or isn't there, it better guarantees
that the cache is fully efficient.
Previously, R10K::Module::Tarball#statically_defined_version only
accepted the :version key. However, :checksum also constitutes a
statically defined version, if it is passed.

Thought: #statically_defined_version should be called after #initialize.
Otherwise it's repeating work that's handled setting instance variables.
@reidmv
Copy link
Contributor Author

reidmv commented Sep 17, 2021

@mwaggett ready for re-review! I solemnly promise to leave this alone and stop fiddling with it in my spare time until then.* 😇

* Unlikely to happen but if the mood strikes me I will not refrain from adding additional tests.

This commit implements tests for R10K::Util::Downloader#http_get, and
fixes bugs found while doing that.
@reidmv
Copy link
Contributor Author

reidmv commented Oct 4, 2021

I noticed that after the first review, this PR expanded from being a fairly new-files-only change into something that modifies how some existing structures work. @mwaggett if it's helpful for review and approval, I could probably break the PR apart into smaller requests, and split the structure changing stuff from the new feature stuff. Let me know.

@reidmv
Copy link
Contributor Author

reidmv commented Nov 8, 2021

Closing in favor of #1244

@reidmv reidmv closed this Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants