Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(maint) Do not use visitors to synchronize modules #1161

Merged
merged 11 commits into from
May 10, 2021

Conversation

justinstoller
Copy link
Member

@justinstoller justinstoller commented May 7, 2021

In removing the visit_module functionality logic that was duplicated within actions has been consolidated. Notably the info message in each action before it called module.sync was one of:

"Deploying %{origin} content %{path}"
"Deploying module %{mod_path}"
"Updating module %{mod_path}"

And now is:

"Checking module %{path}"

(Open to to a better message, btw).

By moving the control_branch logic from the puppetfile install command into Module::Git users will see the warning about invalid control branches if applicable during regular deploys as well (eg, if they are deploying git modules with the control_branch pattern in an SVN or Yaml backed environment).

Giving the Module::Base have a method like will_sync?seems a bit unnecessary for the single use case of the deploy module action, however I think it will provide a useful start to the kind of behavior that Reid mentions in #1145 (comment).

The first commit is a separate refactor that changes the API of Module.new to work more like the (Source|Environment).from_hash that is very similar to. I'm torn on whether that level of breaking change should go in. Would love to hear comments about how much it might be used in the wild (vs being a private API to R10K). I'm also happy to remove that commit as it isn't directly related to the remaining changes.

Please add all notable changes to the "Unreleased" section of the CHANGELOG in the format:

- Summary of changes. [Issue or PR #](link to issue or PR)

@justinstoller justinstoller requested a review from a team May 7, 2021 17:01

@full_path = File.join(@basedir, @dirname)
@path = Pathname.new(File.join(@basedir, @dirname))

@puppetfile = R10K::Puppetfile.new(@full_path,
{overrides: @options[:overrides],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug that was uncaught earlier, but exposed with the additional testing that comes from attempting to actually use this data at the Module level. It accidentally got rolled into a refactor commit but I'm happy to break it out if that would be better.

@@ -28,7 +28,7 @@ def initialize(opts, argv, settings)
generate_types: @generate_types
},
modules: {
requested_modules: @argv,
requested_modules: @argv.map.to_a,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug in that was exposed with further functional testing. @argv here acts like an array but is actually a CRI::ArgumentList that includes the Enumerable module. This causes problems with logging that calls inspect on the value. This change was accidentally rolled into another refactor, but I can break it out if that's best.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a docs comment to this method indicating the type as it comes in?

@justinstoller justinstoller changed the title (maint) Do not use the visitors to synchronize modules (maint) Do not use visitors to synchronize modules May 7, 2021
def sync(opts={})
raise NotImplementedError
end

def will_sync?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should_sync? Update instance variable to match method name (sans '?')?

@justinstoller
Copy link
Member Author

mmm... I was testing with Ruby 2.7 locally. It looks like I may have used something that is specific to that version (everything lower than that version of Ruby failed in Travis).

@Magisus Magisus self-assigned this May 7, 2021
Prior to this a Puppetfile could declare a module with:
```
mod 'name'
mod 'name', :latest
mod 'name', 'version'
mod 'name', key_one: 'foo', key_two: ...
```

The optional second parameter(s) were then passed down through the
Puppetfile#add_module, Module.implements, and Module.new signatures
as the `args` parameter.

Being able to assume that the `args` parameter is a Hash simplifies the
code in several places - the exception being the Forge Module
implementation (which is the only module where anything but a Hash as the
args was ever valid) and the DSL methods that convert user input into
The Git implementation was the only user of options to sync, and the
value passed is already passed into the Module's initialization.
This consolidates the logging notifying users that a module is being
synchronized. Each caller provided slightly different messages so this
creates a new, generic message that should work for previous uses.
Prior to this when running `puppetfile install` users would receive a
warning whenever processing a git module with a control_branch setting
of :control_repo. This is because the puppetfile subcommands have no
concept of a containing environment that they operate within.

However, this warning should also be applicable when deploying git
modules with non-git backed environments (eg SVN or "bare"
environments). This warning is also only applicable to git backed
modules.

This patch moves the warning from the `puppetfile install` command to
the git module initialization and changes the message to be more general.
Previously, the deploy module action would determine in the visit_module
method if the visited module should be synchronized.

This patch moves that logic into the Module::Base class and makes it
available to subclasses via the `will_sync?` method. It also updates
each module implementation's sync method to only act if `will_sync?`
returns true.
For every module deployed via `deploy module`, if the `generate-types`
flag was given, we would attempt to generate types for the environments
containing that module. This could lead to type generation happening
multiple times for each environment, though in practice users typically
use this action to update a single module at a time.

This moves the logic regarding type generation from the visit_module
method on the deploy modules action to the visit_environment method
where it makes more sense to do environment wide updates.
Previously, the module.sync method would be called within a visit_module
method on each action controlling the sync.

This removes the visit_module methods from actions and has the locations
where `module.accept(visitor)` as called before call `module.sync`
instead.

The visitor interface would wrap every visit_* method in a rescue block
and set the `visit_ok` instance variable to false if there was an issue.
To keep this general behavior for each action we've add this behavior to
the `call` method of the action.
@justinstoller
Copy link
Member Author

Looks like Array#intersection wasn't added until 2.7. I've replaced it with the (IMO less readable) Array#&.

@justinstoller
Copy link
Member Author

I should do a bit more functional testing as well, but I wanted folks to be able to start reviewing asap.

if !args.is_a?(Hash)
args = { version: args }
end

add_module(name, args)
end
end

# @param [String] name
# @param [*Object] args
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update the docs string if we require this to be a hash.

def sync(opts={})
raise NotImplementedError
logger.info _("Checking module %{path}") % {path: path}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I like the word "checking" or not... So this gets logged any time we go to sync a module, right? And that is when we're deploying either a whole env or just this module, and the module may

  1. not exist
  2. exist but need updating
  3. exist and already be at the right version

Is there additional logging elsewhere to indicate which of these states we were in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for those higher level cases and not at info level. However, there are debug messages by lower level objects that can differentiate between those. I was going to use "Synchronizing" since internally it's called "sync" but I something stopped me from doing that - buy I don't remember why anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I kind of agree that synchronizing might not be good, because sometimes we don't do anything (e.g. if the module is already in the right state). In that sense, "checking" is fine. I guess my mind just says "Checking for what?"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also be fine moving back to the term "Deploying" since that was used previously in two the three previous messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For least-change, I'd also suggest going back to "Deploying". Note on the purpose of %{origin}, that helps indicate in the logging where a module was defined, which can be useful when, for example, a module is defined both in the Puppetfile and by the environment, and the environment's version is overriding the definition from the Puppetfile.

@@ -28,7 +28,7 @@ def initialize(opts, argv, settings)
generate_types: @generate_types
},
modules: {
requested_modules: @argv,
requested_modules: @argv.map.to_a,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a docs comment to this method indicating the type as it comes in?

@Magisus
Copy link
Collaborator

Magisus commented May 7, 2021

Personally I'm pro consolidating APIs so they don't take multiple types, particularly a language with heavy type inference like Ruby. The variability adds complexity that usually feels unnecessary. However, I understand it could be a breaking change. Are people using r10k as a library? Does Bolt, maybe?

@justinstoller
Copy link
Member Author

Would also love to get @reidmv 's feedback on this work.

Copy link
Contributor

@reidmv reidmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about this.

I don't like calling sync() on all modules, along with settings, and having them self-decide what to do with that.

Would we be able to simplify this if we instead sort of "demoted" the Puppetfile?

Alternate algorith:

  • [Action]
    Visit deployment
    • [Deployment]
      Choose, then visit, environments
      • [Environment]
        Call environment.puppetfile.load()
        Choose, then concurrently visit, modules
        • [Module]
          Call module.sync()

Would that work? Would demoting the Puppetfile and removing it from the visitor pattern accomplish the reduction in complexity that's needed?

Side note: laying it out like this, it seems like we could also decide to visit chosen environments concurrently, if we wanted to. We'd probably just need to move our queuing/concurrency code into an R10K::Action::Visitor#concurrently_visit method, which accepted an array of other objects instead of just one.

def sync(opts={})
raise NotImplementedError
logger.info _("Checking module %{path}") % {path: path}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For least-change, I'd also suggest going back to "Deploying". Note on the purpose of %{origin}, that helps indicate in the logging where a module was defined, which can be useful when, for example, a module is defined both in the Puppetfile and by the environment, and the environment's version is overriding the definition from the Puppetfile.

end
else
logger.debug1(_("Only updating modules %{modules}, skipping module %{mod_name}") % {modules: requested_mods.inspect, mod_name: mod.name})
mod.sync
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going through this commit-by-commit, so maybe I'll come around to this as I move forward, but... at first blush, I feel like this change (1901284) actually makes the code more confusing, rather than less. Admittedly, I've been working with r10k for a little while, so may be looking a little more favorably on visitor stuff than I would have when I first started trying to figure it out.

In general, I don't like the idea of putting decision-making code into module objects. I am in favor of keeping module objects dumb, being given only simple instructions and doing what they're told. I think we should be able to find a way to simplify the visitor stuff without losing that simplicity for module objects.

@justinstoller
Copy link
Member Author

Would we be able to simplify this if we instead sort of "demoted" the Puppetfile?

I don't quite follow what you mean by "demoting" the Puppetfile. But, if you'd like my idea of where this is going it's this:

Is something like that what you're thinking of when you say "demote" the Puppetfile?

I don't like calling sync() on all modules, along with settings, and having them self-decide what to do with that.

As far as the calling sync and letting them decide...

I don't really care. Having the module container make the call of what module gets sync called on it was my approach in #1145. In that case the container (in that case a Puppetfile) will need to separately track all the modules loaded, the content to be managed (and not purged), and the modules to actually sync. Having the module itself know if it should be updated is also fine by me and I believe closely tracks your suggestion in this comment: #1145 (comment). I don't think there's a huge difference between calling

  mod.sync

when mod#sync is defined by:

def sync
  if should_sync?
    do stuff...
end

vs

mod.sync if mod.should_sync?

and

def sync
  do stuff...
end

Assuming the should_sync? method isn't otherwise used by callers of mod.sync. (fwiw, I believe you used the more Puppet-y insync? in your comment).

I think it's a better separation of concerns for the caller of module.sync to decide but I also think it makes the caller more complicated for logic that is pretty trivially inlined into the module. So 🤷. I have a feeling you were less concerned with which object had the responsibility and more to do with the fact that the particular object being updated was the Puppetfile class?

Regardless of the end state, I do think that having the module determine whether or not it syncs will be advantageous to the further refactors that I mentioned at the top of this comment (breaking up the Puppetfile class).

Copy link
Contributor

@reidmv reidmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with @justinstoller about this, I'm 👍 to move it forward re: design.

There are a lot of good fixes in here. If we can move it forward, as well as get a solution for CODEMGMT-1415 in place, there will be room afterwards to start more gentle refactoring, if we want to and if it's important enough.

Small steps = good. 🙂

@Magisus Magisus merged commit 5574915 into puppetlabs:main May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants