Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #33446: Allow setting the time until a worker is considered lost #227

Closed
wants to merge 1 commit into from

Conversation

ehelms
Copy link
Member

@ehelms ehelms commented Sep 9, 2021

This change implements the WORKER_TTL setting which lets Pulp decide
how long to wait for a worker to respond before considering it gone
and cleaning up it's database and re-assiging work.

@@ -132,6 +132,9 @@
# available, likely results in performance degradation due to I/O blocking and is not recommended in most cases. Modifying this parameter should
# be done incrementally with benchmarking at each step to determine an optimal value for your deployment.
#
# @param worker_ttl
# The number of seconds before a worker should be considered lost.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify here that this applies to the pulpcore workers for the tasking system, not the content or API gunicorn workers? Since that is likely to be confusing to many/most users.

It would be great to offer some guidance as well regarding tuning this parameter and in what circumstances might that need arise (during large content migrations? other workloads as well?). Can this be disabled by setting it to 0? What are the advantages and drawbacks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify here that this applies to the pulpcore workers for the tasking system, not the content or API gunicorn workers? Since that is likely to be confusing to many/most users.

I can add a bit more info. This text was taken directly from Pulp.

It would be great to offer some guidance as well regarding tuning this parameter and in what circumstances might that need arise (during large content migrations? other workloads as well?). Can this be disabled by setting it to 0? What are the advantages and drawbacks?

It would be great to have that, but since we don't know that I would opt not to include anything related to it for now.

Copy link
Collaborator

@wbclark wbclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec/acceptance/basic_spec.rb contains a way to test Django settings directly:

  describe command("PULP_SETTINGS=/etc/pulp/settings.py pulpcore-manager diffsettings") do
    its(:stdout) { is_expected.to match(/^USE_NEW_WORKER_TYPE = True/) }
    its(:exit_status) { is_expected.to eq 0 }
  end

Note that while this is diffsettings, that difference is computed from the defaults for Django, not for Pulpcore specifically. So even Pulpcore defaults may be tested here as long as they are not Django defaults.

With this, even in the case where the value is undefined here you can test the running configuration should match WORKER_TTL = 30. (Although I recommended setting an explicit default, I still think this is neat and worth drawing attention to)

It would be nice, and quite simple, to have the defaults as well as overrides tested in both spec/classes and spec/acceptance

This change implements the WORKER_TTL setting which lets Pulp decide
how long to wait for a worker to respond before considering it gone
and cleaning up it's database and re-assiging work.
Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the logic of only setting values if we explicitly want to override the default. @wbclark?

@@ -79,6 +79,11 @@ class { 'pulpcore':
its(:exit_status) { is_expected.to eq 0 }
end

describe command("PULP_SETTINGS=/etc/pulp/settings.py pulpcore-manager diffsettings") do
its(:stdout) { is_expected.to match(/^WORKER_TTL = 30/) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this in the diffsettings command above? That makes the execution faster since it doesn't need to call diffsettings multiple times.

Copy link
Collaborator

@wbclark wbclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the logic of only setting values if we explicitly want to override the default. @wbclark?

My biggest concern with it is discoverability, so I'd like to see at least documentation like If this is undefined, Pulp will use a default value of 30 seconds

That has the same concern as explicitly setting the same default -- if the default changes on Pulp's side then how would we know?

In fact, since we have the diffsettings test now, that test would inform us if Pulp's default changed, and then we could update both the test and the parameter documentation with the new default from Pulp.

So thinking about it more, I think leaving it optional and undefined by default is ideal, with documentation explaining the behavior.

@ekohl
Copy link
Member

ekohl commented Sep 14, 2021

My biggest concern with it is discoverability, so I'd like to see at least documentation like If this is undefined, Pulp will use a default value of 30 seconds

I think we should not mention 30 seconds. That happens to be Pulp's default today, but what if 3.16 moves it 60? We can update the docs, but if we want to support both 3.15 and 3.16 then it becomes complex. I'd be fine with something like Undefined means Pulp's built in default is used.

That has the same concern as explicitly setting the same default -- if the default changes on Pulp's side then how would we know?

Why would we care? Let's trust Pulp to make reasonable choices.

@wbclark
Copy link
Collaborator

wbclark commented Sep 15, 2021

I think we should not mention 30 seconds. That happens to be Pulp's default today, but what if 3.16 moves it 60? We can update the docs, but if we want to support both 3.15 and 3.16 then it becomes complex. I'd be fine with something like Undefined means Pulp's built in default is used.

My concern comes from past frustrations as an installer user, before I was an installer developer, reading options and having no idea what they do and what is the default behavior. I think you should be able to look at foreman-installer --help and get a reasonable idea for what each option does, what is the default, and what are the likely consequences of using non-default configuration, without referring to Pulp documentation (which many users would not think to do, would not know where to locate it, etc)

If Pulp changes the default to 60 and we support for a time Pulp releases with different defaults, the diffsettings test should catch that once the release with the change becomes available in repositories. Then we still have a lot of options, such as Uses Pulp's default of 30 seconds when undefined --> Uses Pulp's default when undefined: 60 seconds for Pulp v3.16 and later, or 30 seconds for prior releases.

As an example of where it would be useful, let's say I am an installer user and I have seen some worker timeouts. I want to try tuning this setting to see if it alleviates the problem. I find the setting in foreman-installer --help output and see that it is actually undefined, and uses Pulp's default in that case, but there is no information about what Pulp's default is. Already I might be frustrated that point, but I could get over it and try to locate the Pulp documentation and learn how it is laid out and finally locate the relevant bit which contains the default.

What if, as in your example, Pulp changed its default? I now have to check what version of Pulp my system has and make sure that I am looking at the correct version of the Pulp documentation. Of course I /should/ do that, but it would be easy to overlook. I might change this value to 45 seconds, thinking that is an increase over the default 30, when in case it was a decrease over the newer default of 60. Now I might even be more confused and frustrated as the timeouts became much more frequent.

I think the effort on our end would be small, compared to removing a lot of uncertainty for users and reducing the effort required for them to find clear and helpful information.

Why would we care? Let's trust Pulp to make reasonable choices.

I don't think it has anything to do with a lack of trust in Pulp. It is more about making understanding the configuration transparent and accessible for installer users.

@ekohl
Copy link
Member

ekohl commented Sep 17, 2021

To be fair, this module doesn't show up at all since it's an internal module. Based on my experience I think in this case it's good to have a preference for a very minimal config. IMHO this is not a setting that many users will change and really classify as an advanced feature. If you're playing around with that, I think it's reasonable to expect users to know Pulp already.

@ekohl
Copy link
Member

ekohl commented Sep 27, 2021

@jlsherrill do you know how often this would be changed? To me it looks like pulp_installer doesn't expose it (https://docs.pulpproject.org/pulpcore/configuration/settings.html). If a decent number of users would change this, we would be more open that this is an option but if it's just a handful it would be better to keep it slightly hidden. That has some implementation implications.

@evgeni
Copy link
Member

evgeni commented Sep 29, 2021

I know that we had to tune the similar setting for Pulp2 quite often on "less than optimal but still relevant" setups, but I think your question is "hiera vs installer option", and I'd answer Hiera.

@wbclark
Copy link
Collaborator

wbclark commented Sep 29, 2021

I know that we had to tune the similar setting for Pulp2 quite often on "less than optimal but still relevant" setups, but I think your question is "hiera vs installer option", and I'd answer Hiera.

Although Pulp2 and Pulp3 are different technologies and we should not necessarily expect the same amount of tuning to be required for that reason, the diversity of deployments and workloads is essentially the same so I think it would be unsurprising if tuning is frequently required. After all, we expect it to be required in at least some cases, which is the reason for this PR and Redmine #33446 in the first place.

That being said, I have a strong belief that installer option is greatly preferable to hiera, for the reason that an installer option does not require any puppet knowledge (in my downstream experience, a majority of users do not possess this, even many senior level engineers for whom this is their primary technical area that they work with on a daily basis).

Also, the installer option is not difficult to implement or maintain and requires exposing the parameter in one more module, which we already do for similar parameters, and already did for similar parameters for Pulp2.

Finally if the concern is reducing the sprawl of foreman-installer --help output, it can be marked as an advanced parameter where it can still easily be found by users who have no puppet knowledge at all (which I cannot emphasize enough, is a strong majority in my experience) with foreman-installer --full-help | grep ...

To be honest, I am surprised to see so many people voicing a preference for hiera in this case, as I have not seen articulated any particular drawbacks to the installer option while the benefits are very significant from my perspective. (Hence my attempt to speculate on what those might be, in the preceding paragraph... please do point out if there is something else I am missing).

@wbclark
Copy link
Collaborator

wbclark commented Sep 29, 2021

@jlsherrill do you know how often this would be changed? To me it looks like pulp_installer doesn't expose it (https://docs.pulpproject.org/pulpcore/configuration/settings.html). If a decent number of users would change this, we would be more open that this is an option but if it's just a handful it would be better to keep it slightly hidden. That has some implementation implications.

IIRC this option was added to Pulp to begin with, due to Katello needs. So based on that alone, I wouldn't expect adding it to their installer to be a priority, as their installer is not used by Katello. Also IIRC, Pulp has implemented two very different versions of this feature, in order to also deliver a hardcoded version which didn't introduce any new setting at all to an older Pulp3 release that was still used by Katello at the time. So perhaps that development history may have something to do with why their installer does not expose it? I wouldn't assume that is intentional (and I would guess that pulp_installer would accept such a PR if anybody went out of their way to create it) and even if it does turn out to be deliberate I don't see a particular reason that should matter here, given that this module doesn't aim for 1-to-1 correspondence with pulp_installer.

@wbclark
Copy link
Collaborator

wbclark commented Sep 29, 2021

Also, I did promise @jlsherrill that I would pick this one up as @ehelms is currently on leave. I will do that shortly.

That version will keep the default as undef which it is here (not my original vision, but after some discussion I was convinced of this), will include any possible improvements I can find to the parameter documentation (I agree there are challenges based on our own lack of expertise in the area), and will move the diffsettings test into the existing diffsettings test.

@wbclark
Copy link
Collaborator

wbclark commented Sep 30, 2021

Ultimately, this is what I would be very happy with as a starting point: #230

I previously opened #229 as well, but I've marked that as a draft because it also contains quite a different style for the tests while delivering the feature using the current style should be a higher priority, hence the follow up linked above.

@evgeni
Copy link
Member

evgeni commented Oct 2, 2021

I know that we had to tune the similar setting for Pulp2 quite often on "less than optimal but still relevant" setups, but I think your question is "hiera vs installer option", and I'd answer Hiera.

Although Pulp2 and Pulp3 are different technologies and we should not necessarily expect the same amount of tuning to be required for that reason, the diversity of deployments and workloads is essentially the same so I think it would be unsurprising if tuning is frequently required. After all, we expect it to be required in at least some cases, which is the reason for this PR and Redmine #33446 in the first place.

They are different, sure, but the logic "worker sends heartbeats, is considered dead when not in X seconds" is the same, so given similar setups, I wouldn't expect drastically different behavior.

But it's also probably not that important right now

That being said, I have a strong belief that installer option is greatly preferable to hiera, for the reason that an installer option does not require any puppet knowledge (in my downstream experience, a majority of users do not possess this, even many senior level engineers for whom this is their primary technical area that they work with on a daily basis).

Also, the installer option is not difficult to implement or maintain and requires exposing the parameter in one more module, which we already do for similar parameters, and already did for similar parameters for Pulp2.

True, and I kinda see the argument for --full-help in the next paragraph… That all said, this PR won't (and can't) expose it in the installer, as we don't let the pulpcore module be directly managed. So we can safely get this (or rather #230) in and then discuss the exposure in a follow up, while the users will be able to use Hiera after this PR ;)

Finally if the concern is reducing the sprawl of foreman-installer --help output, it can be marked as an advanced parameter where it can still easily be found by users who have no puppet knowledge at all (which I cannot emphasize enough, is a strong majority in my experience) with foreman-installer --full-help | grep ...

To be honest, I am surprised to see so many people voicing a preference for hiera in this case, as I have not seen articulated any particular drawbacks to the installer option while the benefits are very significant from my perspective. (Hence my attempt to speculate on what those might be, in the preceding paragraph... please do point out if there is something else I am missing).

I think, my main pain point with it, is that any installer --(full-)help | grep is also undiscoverable 🤷‍♀️

@ehelms ehelms closed this Oct 11, 2021
@ehelms
Copy link
Member Author

ehelms commented Oct 11, 2021

Merged as of 69bdebc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants