Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow default value lookup in Hiera/Data in Modules #250

Open
SimonHoenscheid opened this issue Aug 9, 2020 · 18 comments · May be fixed by #273
Open

Allow default value lookup in Hiera/Data in Modules #250

SimonHoenscheid opened this issue Aug 9, 2020 · 18 comments · May be fixed by #273

Comments

@SimonHoenscheid
Copy link

Use Case

By now, puppet-strings is able to look up a default value of a variable in puppet code

Describe the Solution You Would Like

extend the used ruby code

Describe Alternatives You've Considered

none

Additional Context

context should be clear

@ekohl
Copy link
Contributor

ekohl commented Aug 10, 2020

How do you imagine this should be represented? I'm not sure how this could be done reliably. You have to determine that there is no possibility of fact interpolation to get a default, or say "may be set via hiera" if there is some hiera entry. Consider Optional[String] $myvar and it is in data/os/RedHat/7.yaml, but nowhere else. What's "the default"?

@binford2k
Copy link
Contributor

This would require using rspec-puppet-facts and iterating over on_supported_os and generating a table of default values. I'm not sure the UX would be any better than just looking at the code.

@ekohl
Copy link
Contributor

ekohl commented Aug 10, 2020

And even that would not be guaranteed correct since there can be custom facts.

@bastelfreak
Copy link
Collaborator

What about displaying all possible values for one puppet parameter / hiera key in the REFERECNE.md?

@ekohl
Copy link
Contributor

ekohl commented Aug 11, 2020

What about displaying all possible values for one puppet parameter / hiera key in the REFERECNE.md?

I've been thinking about that for a while. So one theory I have is that you can convert interpolation into a regex. In https://gist.github.com/ekohl/6b136d1dab4313edd5bd8eab5a7aacc0 I took a stab at this.

The general idea is that hiera.yaml can contain hierarchies and every level can contain one or more paths. In he first section the files are shown and which interpolations we've found. Since these are regexes, I don't know how reliable they are but it looks fairly OK.

Then it builds a list of overrides, showing which variables map to which value in each file. This is the short overview.

The long version combines both into a single overview. I think it does a good job of showing how many things can be involved, but it's based on my somewhat limited Hiera knowledge.

You can probably aggregate on interpolations to build a minimal table, but I'll defer to others what they prefer to see. Do you want to see it parameter centric? You can show default with a table that lists possible overrides and which file has them. Each file can be a link to a Hiera section which lists each file and the interpolations it found. Currently I discard the name of the level in the hierarchy, but that could also help.

@SimonHoenscheid
Copy link
Author

How do you imagine this should be represented? I'm not sure how this could be done reliably. You have to determine that there is no possibility of fact interpolation to get a default, or say "may be set via hiera" if there is some hiera entry. Consider Optional[String] $myvar and it is in data/os/RedHat/7.yaml, but nowhere else. What's "the default"?

I did not think about details, but as I spoke with @bastelfreak the past week, we figured out, if we would migrate a module completely to Data in Modules, it would lack documentation. I would be fine with just analyzing common.yaml or first match.

@SimonHoenscheid
Copy link
Author

What about displaying all possible values for one puppet parameter / hiera key in the REFERECNE.md?

I've been thinking about that for a while. So one theory I have is that you can convert interpolation into a regex. In https://gist.github.com/ekohl/6b136d1dab4313edd5bd8eab5a7aacc0 I took a stab at this.

The general idea is that hiera.yaml can contain hierarchies and every level can contain one or more paths. In he first section the files are shown and which interpolations we've found. Since these are regexes, I don't know how reliable they are but it looks fairly OK.

Then it builds a list of overrides, showing which variables map to which value in each file. This is the short overview.

The long version combines both into a single overview. I think it does a good job of showing how many things can be involved, but it's based on my somewhat limited Hiera knowledge.

You can probably aggregate on interpolations to build a minimal table, but I'll defer to others what they prefer to see. Do you want to see it parameter centric? You can show default with a table that lists possible overrides and which file has them. Each file can be a link to a Hiera section which lists each file and the interpolations it found. Currently I discard the name of the level in the hierarchy, but that could also help.

I like this approach, let me think about this for a while. Ideally this would also reflect merge behavior (if any)

@bastelfreak
Copy link
Collaborator

I like the idea from @ekohl . Maybe someone from Puppet (@scotje ? :)) can look into this.

@SimonHoenscheid
Copy link
Author

I like the idea from @ekohl . Maybe someone from Puppet (@scotje ? :)) can look into this.

I think it's a good way to go

@bastelfreak
Copy link
Collaborator

hello maintainers, could one of you look into this and provide some feedback? The issue is open since a few months now (and probably discussed at various places for two years).

@bastelfreak
Copy link
Collaborator

hey people. Is it possible to get a statement here? @b4ldr made some comments about this in voxpupuli/puppet-unbound#265. @binford2k are you able to take a look here?

@binford2k
Copy link
Contributor

I'm going to close this as won't-do for now. The UX of exploding defaults from in-module hiera to populate defaults is super long lists, confusing UX, and inconsistent presentation. Here's a mockup of the really neat POC data that @ekohl generated as it might be rendered for that class. https://gist.github.com/binford2k/c4e9277f040bde1634086ac43c3bc2a9

First off, the list becomes much longer. It's 71% longer. Not quite as noticeable with 7 parameters, but imagine that on puppetlabs-apache which has 86 parameters.

Second, there are many inconsistencies within the list. For example, on FreeBSD desktop_package_name is set to open-vm-tools-nox11 and on every other platform it's.... nothing? That seems odd, so I have to look at the code and there I see that all other platforms default to open-vm-tools-desktop. But what are those other platforms? As a user, I can extrapolate relatively easily, but for the computer to do it programatically, it would have to iterate all the platforms listed in metadata.json (35 of them)

Note; that could be somewhat alleviated by listing a "default default" and then iterate the auto-discovered default value overrides. That's shown in override.md. But that seems more confusing than just looking at the data directory, since that will be more obvious which yaml file applies to your usage, and then from there you can see specifically which values have overridden default values.

Third, and related to the note ☝️, every parameter listed has a different list of the discovered overrides. That means that as a reader, I already have to mental post-processing of what I'm reading to figure out what it's actually saying. To me, that mental energy could be better spent with a skim through the data directory to see what overrides apply to the platforms you care about.

Finally, this is a pretty constrained example. There are a lot of ways that it could go wrong in a misleading way. For example, if the hierarchy had a typo in the OS Major Version layer, then all of those auto-discovered values would be misleading and wouldn't actually calculate out to the values listed when run on a real system. To be sure, that problem exists currently, but right now we aren't trying to tell the reader that we know the default because we've calculated it. To alleviate that, we'd need to print the original layer that resolved for each override too (as in the long overrides example), and that makes the information presented even worse! An example of that is in typo.md and it's 144 lines long all on it's own!

That said I am not opposed to someone taking this POC and turning it into a full PR. It would need to address the UX concerns here, especially the last one where data source paths that don't resolve properly lead to displaying misleading values in strings output.

@ekohl
Copy link
Contributor

ekohl commented Apr 22, 2021

First off, the list becomes much longer. It's 71% longer. Not quite as noticeable with 7 parameters, but imagine that on puppetlabs-apache which has 86 parameters.

I agree with this but note that you can have expandable blocks in HTML. This means you can build this:

desktop_package_name

Name of the desktop package. Only set this if your platform is not supported or you know what you are doing.

Puppet default: 'open-vm-tools-desktop'

Hiera overrides
Filename Value
data/Debian.Debian.7.yaml true
data/Debian.Debian.8.yaml true
data/Debian.Debian.9.yaml true
data/Debian.Ubuntu.14.yaml true
data/Debian.Ubuntu.16.yaml true
data/Debian.Ubuntu.18.yaml true
data/FreeBSD.FreeBSD.10.yaml true
data/FreeBSD.FreeBSD.11.yaml true
data/FreeBSD.FreeBSD.12.yaml true
data/RedHat.CentOS.6.yaml true
data/RedHat.CentOS.7.yaml true
data/RedHat.CentOS.8.yaml true
data/RedHat.Fedora.19.yaml true
data/RedHat.Fedora.20.yaml true
data/RedHat.Fedora.21.yaml true
data/RedHat.Fedora.22.yaml true
data/RedHat.Fedora.23.yaml true
data/RedHat.Fedora.24.yaml true
data/RedHat.Fedora.25.yaml true
data/RedHat.OracleLinux.6.yaml true
data/RedHat.OracleLinux.7.yaml true
data/RedHat.OracleLinux.8.yaml true
data/RedHat.RedHat.6.yaml true
data/RedHat.RedHat.7.yaml true
data/RedHat.RedHat.8.yaml true
data/Suse.OpenSUSE.11.yaml true
data/Suse.OpenSUSE.12.yaml true
data/Suse.OpenSUSE.13.yaml true
data/Suse.OpenSUSE.15.yaml true
data/Suse.OpenSUSE.42.yaml true
data/Suse.SLES.12.yaml true
data/Suse.SLES.13.yaml true
data/Suse.SLES.14.yaml true
data/Suse.SLES.15.yaml true

(Note that I cheated a bit and got 2 non-matching ones just to show what can be done. Long lists can be hidden this way).

Second, there are many inconsistencies within the list. For example, on FreeBSD desktop_package_name is set to open-vm-tools-nox11 and on every other platform it's.... nothing? That seems odd, so I have to look at the code and there I see that all other platforms default to open-vm-tools-desktop. But what are those other platforms? As a user, I can extrapolate relatively easily, but for the computer to do it programatically, it would have to iterate all the platforms listed in metadata.json (35 of them)

Note that my code did this the other way around. It converts the Hiera hierarchy to regular expressions and then matches that to filenames. It can then translate that back to interpolation. No need for metadata.json. This may break down for very complex hierarchies but really, how complicated is it for most modules?

Third, and related to the note point_up, every parameter listed has a different list of the discovered overrides. That means that as a reader, I already have to mental post-processing of what I'm reading to figure out what it's actually saying. To me, that mental energy could be better spent with a skim through the data directory to see what overrides apply to the platforms you care about.

Just to show, I added an additional override for Debian 10:

openvmtools::desktop_package_name
  data/Debian.Debian.10.yaml => open-vm-tools2
    facts.os.family => Debian
    facts.os.name => Debian
    facts.os.release.major => 10
  data/Debian.yaml => open-vm-toolbox
    facts.os.family => Debian
  data/FreeBSD.yaml => open-vm-tools
    facts.os.family => FreeBSD

This means you can render it like this:

openvmtools::desktop_package_name

Hiera overrides in a detailed table
Filename Interpolations Value
data/Debian.Debian.10.yaml facts.os.family => Debian
facts.os.name => Debian
facts.os.release.major => 10
open-vm-tools2
data/Debian.yaml facts.os.family => Debian open-vm-toolbox
data/FreeBSD.yaml facts.os.family => FreeBSD open-vm-tools

Note that in the order it should respect the Hiera order.

So I certainly understand there are some UX concerns. It all depends on how clean you write your Hierarchy. For example, in openvmtools you could actually write some code like this (using load_module_metadata):

$metadata = load_module_metadata($module_name)
$supported = $metadata['operatingsystem_support'].any |$os| {
  $os['operatingsystem'] == $facts['operatingsystem'] and $facts['operatingsystemmajrelease'] in $os['operatingsystemrelease']
}

In fact, this would make a good function in itself. I've got distracted and submitted voxpupuli/puppet-openvmtools#41. Perhaps the function deserves to be in stdlib, but gives a quicker turnaround time.

So in short: I do believe it is a challenge, but it's not too complicated. The forge probably needs to verify if it can indeed render details in a reference.

@ekohl ekohl linked a pull request Apr 22, 2021 that will close this issue
@ekohl
Copy link
Contributor

ekohl commented Apr 22, 2021

I couldn't help myself and opened #273 which implements this.

@b4ldr
Copy link
Contributor

b4ldr commented Apr 24, 2021

seems to me we have three high level states which puppet-strings could report.

  1. the defaults that are present in the manifest
  2. the value one would get from hiera data with facts = {}
  3. the default values for each supported OS

Currently puppet-strings does 1 the issue with this is it guides people to place defaults in the manifest which IMO (and one that seems to be shared by puppetlabs) is not the best place to put defaults. For me the main reason for this is that it reduces the full power of hiera and prevents one from doing things like:

common.yaml
unbound::service_name: unbound
unbound::restart_cmd: "systemctl restart %{lookup('unbound::service_name')}"
OpenBSD.yaml (i.e. we dont need to redefine service_name)
unbound::restart_cmd: "service restart %{lookup('unbound::service_name')}"

I think options 3 is the ideal state however as has been pointed out it comes with some UX issue (although ekohl work looks promising, assuming forge etc can support this). further i think there may be some edge case which will likely tickle a few bugs

however im not sure option 2 has been given much thought other then early on. This could be as simple as parsing the common.yaml file or some merge of hiera files which are not based on facts (this should be easy to ascertain from the hiera.yaml file). however i also wondered if we could just use the equivalent of loookup('openvmtools::desktop_package_name', {'default_value' => $DEFAULT_FROM_MANIFEST).

Assuming someone knows a simple way to call lookup from ruby* i think this looks like it would be a fairly easy thing to add to the current code and has the additional benefit that it:

  • encourages people to store puppet default in hiera
  • allows people to use the full force of heira
  • Lays the ground work to present per os defaults

The down side, as pointed out is that it doesn't necessarily present the true default for any one os and you still need to check the yaml files. However this is still no worse the the current situation (famous last words). As such i feel that we should not let the enemy be the good of perfect here (of course if ekohl PR gets merged this is all mute)

*ruby hiera lookup

It use to be fairly straight forward to hack around hiera lookups in ruby however im hitting a brick wall trying to do this using the hiera 5 puppet API's. below is what i have so far which dosn't work. may be going in the complete wrong direction. however i guess puppet-rspec hooks into the lookup function so i guess it should be achievable. anyway thought i would leave this here in case it sparks any more thoughts or feedback.

require 'puppet'
Puppet.initialize_settings
environment = Puppet::Node::Environment.create(:puppetstrings, ['modules'])
options = {classes: ['unbound'], facts: {}, environment: environment}
node = Puppet::Node.new('foo.example.org', options)
compiler = Puppet::Parser::Compiler.new(node)
hiera = Puppet::Pops::Lookup::ModuleDataProvider.new('unbound')
scope = Puppet::Parser::Scope.new(compiler)
invocation =  Puppet::Pops::Lookup::ScopeLookupCollectingInvocation.new(scope)
config = Puppet::Pops::Lookup::HieraConfig.create(invocation, Pathname.new('hiera.yaml'), hiera)
hiera.key_lookup_in_default('unbund::package_name', invocation, config)

And of course thanks for all the effort :)

@binford2k
Copy link
Contributor

Yes, the Forge will display <details> tags. And if not, we'll fix it so it will. And all modern browsers will render it too.

The approach that @ekohl takes with #273 basically "reverse engineers" facts by correlating datasource paths and the interpolations from hiera.yaml. I think this approach is workable and we'll likely 🤞 accept the patch, though without reviewing it I can't promise that it won't need any work to get past the gate.

I'll add it to the schedule for our next Community Monday and get some eyeballs on it.

@binford2k binford2k reopened this Apr 27, 2021
@binford2k
Copy link
Contributor

@b4ldr if there's more work to be done, would you be interested in helping?

@b4ldr
Copy link
Contributor

b4ldr commented Apr 28, 2021

@b4ldr if there's more work to be done, would you be interested in helping?

sure i have taken a look at @ekohl branch and sent a commit which make some progress towards also parsing the common.yaml file and overriding the @register[:defaults] ill try to take another look later in the week

lelutin added a commit to lelutin/puppet-fail2ban that referenced this issue Jan 8, 2022
When I asked around about how folks were using hiera for default
parameters, I was pointed to this discussion which is still unresolved:

puppetlabs/puppet-strings#250

I also think that moving all default values to hiera makes it more
complicated to figure out what's happening in most cases since it makes
users of the module always need to open up at least two files to figure
out what parameters and their default values are. It's way easier if the
default values are right there in the code. Only the overrides should be
in hiera.

The downside of this approach is that knowing what gets overridden is
rather hidden. Ppl need to remember to check within data/ if there's any
relevant overrides for the platforms and releases that are relevant to
them.

In our case, the only param that's currently getting overridden is the
jail.conf template path for RedHat. Otherwise all the rest in hiera is
about values for creating default jails.

The other argument that I think makes me think that having values in the
class definition is better is that for defined types we can't use hiera
at all, so all default values must be in the code. So in that sense
pushing all default values for classes only creates an inconsistency
and it makes reading and comprehending the code harder. Things are just
easier if all code follows the same pattern/style.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants