Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration reloading logic and customizability #5966

Open
swiatekm opened this issue Aug 25, 2022 · 19 comments
Open

Configuration reloading logic and customizability #5966

swiatekm opened this issue Aug 25, 2022 · 19 comments

Comments

@swiatekm
Copy link
Contributor

swiatekm commented Aug 25, 2022

Is your feature request related to a problem? Please describe.
Right now, the collector reloads configuration whenever it gets a notification from any of its configuration sources. In practice, this never happens, as watching is not implemented in any of the existing config providers, but it will the moment that is added.

I don't think this is behaviour users would universally want. In many contexts, you only want to reload the configuration on-demand. Kubernetes is one example of this, where mounted ConfigMaps can be remounted if they change, but it's not desirable behaviour for an application to reload the config without the Pod having been recreated. Running as a systemd unit is another example.

I'd like to figure out what is the best way of letting the user control how the collector reloads configuration. I'd like to explore different approaches - for example, many applications do this on receipt of a SIGHUP signal.

Describe the solution you'd like
I don't know what the right set of solutions is, but I think it should allow:

  • the user to decide if they want automatic configuration reload based on provider notification, maybe per provider
  • the user to be able to reload configuration on demand without restarting the collector process

In particular, I'm not sure what the default should be.

Describe alternatives you've considered
Some possible solutions:

  • Let automatic reloading be controlled by command line options
    for example:
    otelcol --config config.yaml --enable-hot-reload
  • Let automatic reloading be controlled by a section in the configuration
    service:
      config:
        hotReload: true
  • unconditionally reload configuration on receipt of a SIGHUP signal

I'm open to other ideas.

Additional context
This originally came up because I wanted to add notifications to the fileprovider in #5945 . @TylerHelmuth suggested it should start off behind a feature flag, and I reconsidered whether it should even ultimately be enabled for all users. I looked at how this is done in some other agents, and this issue is the result.

@bogdandrutu
Copy link
Member

Kubernetes is one example of this, where mounted ConfigMaps can be remounted if they change, but it's not desirable behaviour for an application to reload the config without the Pod having been recreated. Running as a systemd unit is another example.

I am confused about this, I don't think the remounted happens for the existing pods. Imagine an app crash, then restart will pick the new version? Can you investigate a bit more?

(maybe?) for the collector to keep using older working configuration if the new one is invalid
keep the last known working configuration and fall back to it if the new one is invalid

I think it is almost impossible in case of a restart for example, also it becomes impossible to know what version of the config etc. For use cases like this we should create a provider to a proper "configuration" system that deals with all these complications.

unconditionally reload configuration on receipt of a SIGHUP signal

Where is this coming from? SIGHUP seem to mean other things, but I don't have that much experience with this signal.

the user to be able to reload configuration on demand without restarting the collector process

What does it mean? Is this what we have right now?

Can you please re-work/clarify the goals section. Then we can clarify the options, because right now the "possible solutions" section is not clear that I can pick one of that and will solve this problem, it feels more or less a random list of possible things to do and combine that may get us to the "solution".

@swiatekm
Copy link
Contributor Author

I am confused about this, I don't think the remounted happens for the existing pods. Imagine an app crash, then restart will pick the new version? Can you investigate a bit more?

I was also surprised, but it does absolutely happen: https://kubernetes.io/docs/concepts/configuration/configmap/#mounted-configmaps-are-updated-automatically. To make the thing even more confusing, whether this happens depends on the mount type as well. All in all, it's kind of a mess.

Either way, this was just an example of a situation where the user may explicitly not want to reload their config file when it changes.

Where is this coming from? SIGHUP seem to mean other things, but I don't have that much experience with this signal.

It's conventionally used for configuration reloading by daemons, which aren't attached to a terminal and therefore have no use for it otherwise. Some examples from similar projects:

This is actually something I think we could easily and safely add. We already catch signals in the main collector loop and configuration reloading is already implemented.

@swiatekm
Copy link
Contributor Author

What does it mean? Is this what we have right now?

What we have right now is automatic reload based on notifications from the config provider. I want to let users have more control over this. Right now, there's not even a way to opt-out of it.

Can you please re-work/clarify the goals section. Then we can clarify the options, because right now the "possible solutions" section is not clear that I can pick one of that and will solve this problem, it feels more or less a random list of possible things to do and combine that may get us to the "solution".

Sorry about that, this is not a very streamlined issue, and I'm not sure what the right set of solutions is. I want to give users more control over configuration reloading, and I'd like to see what options are on the table for that.

@portertech
Copy link
Contributor

@swiatekm-sumo you thinking something like this? sensu@995cba5

@portertech
Copy link
Contributor

@bogdandrutu
Copy link
Member

@portertech I would prefer to not jump and add more "legacy" things to the collector. We should define the "solution" first then add more functionality.

@bogdandrutu
Copy link
Member

@swiatekm-sumo I thought more also about this, I think we should do the following:

  1. Add a setting to our "service.CollectorSettings" for the Collector instance to watch or not for config provider updates. Maybe disabled by default, and allow a flag to set this, since cannot be part of the config itself.
  2. Not sure about SIGHUP, but seems to be a standard? Can some more folks confirm that? Also if this is the case still believe we want to have this configured in the "service.CollectorSettings".

@codeboten
Copy link
Contributor

1. Add a setting to our "service.CollectorSettings" for the Collector instance to watch or not for config provider updates. Maybe disabled by default, and allow a flag to set this, since cannot be part of the config itself.

2. Not sure about SIGHUP, but seems to be a standard? Can some more folks confirm that? Also if this is the case still believe we want to have this configured in the "service.CollectorSettings".

+1 to @bogdandrutu's suggestions. And from the examples stated above, I do believe using SIGHUP is appropriate here. See the following excerpt from the wikipedia entry https://en.wikipedia.org/wiki/Signal_(IPC)#SIGHUP:

Many daemons (who have no controlling terminal) interpret receipt of this signal as a request to reload their configuration files and flush/reopen their logfiles instead of exiting.[11] nohup is a command to make a command ignore the signal.

@swiatekm
Copy link
Contributor Author

Maybe disabled by default, and allow a flag to set this, since cannot be part of the config itself.

Technically it can, as we read the config normally before starting the main collector loop, unless I'm misunderstanding something. Command-line switch is probably the better choice in general though.

I can try to add this if we have agreement on the mechanism.

Not sure about SIGHUP, but seems to be a standard? Can some more folks confirm that? Also if this is the case still believe we want to have this configured in the "service.CollectorSettings".

Do we need this to be configurable? I can't think of any circumstance where you'd need to disable it. There's no reason to send this signal to the collector other than this functionality.

@bogdandrutu
Copy link
Member

Do we need this to be configurable? I can't think of any circumstance where you'd need to disable it. There's no reason to send this signal to the collector other than this functionality.

I think it is fine, since if it causes troubles we can have a flag/config to disable it.

@swiatekm
Copy link
Contributor Author

Cool, in that case we have an implementation ready in #6000. CC @portertech

@bogdandrutu
Copy link
Member

@swiatekm-sumo I think that is just the initial part. There are more to clarify, e.g. what is the behavior/interaction with watchers?

  1. Do we reload if no watcher triggered before? Assuming all the configs have proper watching implement and we know nothing change do we do load and check with previous config?
  2. Do we reload even if new loaded config == old loaded config?
  3. If watching enable do we ignore SIGHUP?
  4. If watching enabled and reload in progress do we ignore the SIGHUP?
  5. If reload in progress because of SIGHUP do we ignore a new request?

@swiatekm
Copy link
Contributor Author

@swiatekm-sumo I think that is just the initial part. There are more to clarify, e.g. what is the behavior/interaction with watchers?

  1. Do we reload if no watcher triggered before? Assuming all the configs have proper watching implement and we know nothing change do we do load and check with previous config?
  2. Do we reload even if new loaded config == old loaded config?
  3. If watching enable do we ignore SIGHUP?
  4. If watching enabled and reload in progress do we ignore the SIGHUP?
  5. If reload in progress because of SIGHUP do we ignore a new request?

I'd just go with the naive solution for now - we reload on every reload event (be it from a watcher or a signal). If we get an event during a reload, we finish the reload and then reload again. And we don't care if the config actually changed or not.

We could try to be clever about it and implement some kind of throttling, but that seems like premature optimization to me. Ultimately all of these events should be controlled by the user, and it should be on them not to overload the collector with signals or configuration changes. At the very least I'd wait until someone complains before thinking about mitigations for hypothetical problems.

The other benefit is that the naive implementation (as seen in #6000) is both simple and easy to understand.

@swiatekm
Copy link
Contributor Author

swiatekm commented Mar 15, 2023

Summary of recent sig discussion re: config reloading:

  • We don't really have a good idea of how we want this to work
  • Because of the above, it should be disabled by default
  • We should probably let users who want to enable it, do so, but not clear if it should be a single flag like in Add a flag for enabling config watch #6300 or a more fine-grained mechanism
  • It would be good to get a sense for how much demand there is from users for hot reloading
  • We should make sure that the current reloading mechanism actually works correctly, @TylerHelmuth claimed it doesn't reload metrics, for one
  • We need a comprehensive proposal for how this should work before enabling it for all users again
  • There should be some mechanism (possibly separate) for components to do this, mostly for remote management. You can currently do it with SIGHUP, but that's kind of hacky.

@tigrannajaryan I was told you had some thoughts on this as well. @dmitryax you mentioned you wanted to make hot-reloading configurable per config source - how would that work in practice?

@TylerHelmuth
Copy link
Member

I was referring to this issue, but it is fixed now.

@MaciejGrynda
Copy link

Hello,
Last update for this was over a year ago, did something change? SIGHUP is still supported, but it would be great if we had OS independent functionality.
Something like exposing REST API to force reload or definig a file to watch for modification timestamp change

@swiatekm
Copy link
Contributor Author

swiatekm commented Nov 3, 2023

@MaciejGrynda I think the consensus was that automatically reloading whenever config files changed wasn't something we wanted to do, but mechanisms where the user explicitly requests the reload (like SIGHUP) were fine. I suggest you open a new issue if you want one of those. I know some other CNCF projects take the REST API approach, so that would be an argument in favor.

@vikrambe
Copy link
Contributor

Hello, Do we have a plan to support dynamic reload capability soon ? I see a lot of interest and dangling threads on this topic. Here are few question i have

  1. Do we want to make confmap/providers pluggable ?
  2. I do see a watch event in the providers, but none of the provider notify
  3. Also components need to listen to the event and reload configs without collector restarting.

@bogdandrutu @tigrannajaryan Do suggest ?

@agardnerIT
Copy link

agardnerIT commented Nov 11, 2024

Reading through this and other threads and I'm not 100% sure on the status here, but one more option is feature flagging as changing config at runtime is precisely what feature flags are designed to do.

flagd is already part of the OpenTelemetry demo app so is well understood (and as a CNCF project & part of OpenFeature, trusted).

flagd has a Go in-process evaluator which means the collector could simply read a JSON / YAML file to achieve this

If anyone is at Kubecon this week, ping me on the CNCF slack and we can catch up about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants