Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea should take into account changes to the configmap without requiring agent restart #723

Closed
antoninbas opened this issue May 19, 2020 · 10 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
When updating the antrea-agent.conf in the Antrea YAML manifest and re-applying the manifest, all changes are ignored. One possible solution is to update the ConfigMap name (or update build/yamls/base/conf/antrea-agent.conf and run generate-manifest.sh again, which will take care of consistently updating the ConfigMap name); however, this requires an Agent restart.

I believe that we should have the following:

  • for a configuration parameter like enablePrometheusMetrics, we should be able to update the value without having to restart Agents. On a related note, should we have the ability to turn metrics on/off easily on a per-Agent basis (e.g. using antctl)?
  • for configuration parameters that cannot be changed at runtime, we should still detect the change and log an error.

To Reproduce
Apply "vanilla" Antrea manifest, uncomment enablePrometheusMetrics in the configuration and set it to true, then apply the manifest again: metrics will not be enabled.

Versions:
Confirmed with Antrea v0.6.0

@antoninbas antoninbas added bug kind/bug Categorizes issue or PR as related to a bug. labels May 19, 2020
@srikartati
Copy link
Member

@antoninbas Currently Antrea agent loads its config map (antrea-agent.conf) as subpath volume.
From k8s documentation, kubelet will not update subpath volumes.

Note: A container using a ConfigMap as a subPath volume will not receive ConfigMap updates.

Does it make sense to have a separate config map called antrea-agent-updatable.conf and mount it as a volume with no subpath? In Antrea agent, we need to create a poll that keeps on checking this file and various keys for updates and act accordingly.
In this way, we do not have to worry about metrics that cannot be changed as they are in antrea-agent.conf.

In addition, I am trying to understand why antrea-agent.conf is considered as subPath volume. Do you have any insights?

@antoninbas
Copy link
Contributor Author

I think we can stick with a single ConfigMap, but use different volumes, instead of a single volume with subPaths:

...
      volumeMounts:
        - name: antrea-config-agent
          mountPath: /etc/antrea/antrea-agent.conf
...
  volumes:
    - name: antrea-config-agent
      configMap:
        name: antrea-config-agent
        items:
          - key: antrea-agent.conf
            path: antrea-agent.conf
...

Then we can have another volume for the other files (controller, cni).

I haven't tried it, but I feel like it should work. Maybe @tnqn can also provide some insight.

@tnqn
Copy link
Member

tnqn commented May 27, 2020

@antoninbas @srikartati I don't know a real reason why we used subPath volume, guess it was just straightforward to use subPath to mount a single key of a ConfigMap and we didn't consider dynamic config reloading at that time.
If the approach @antoninbas raised can work, I think it would be perfect.

@jayunit100
Copy link
Contributor

jayunit100 commented Jul 30, 2020

Ya that would work i guess - from my perspective the reason i was thinking it would be awesome to have this,

  • Calico allows you to quickly toggle a protocol w/o restarting things, so gives you an instant read out of the new routing tables -- helps users in having the same feature parity
  • Without this change, there is an inconsitency in the state of the antrea configmap and the actual state of the routing tables
  • makes for really cool demos :)

@jayunit100
Copy link
Contributor

You also could just have the agent crash immediately on configmap hash changes probably without having to make any other major changes....

@antoninbas antoninbas added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 2, 2020
@antoninbas antoninbas added this to the Antrea v0.11.0 release milestone Sep 2, 2020
@antoninbas antoninbas self-assigned this Oct 22, 2020
@antoninbas antoninbas added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Oct 22, 2020
@antoninbas antoninbas removed this from the Antrea v0.12.0 release milestone Dec 15, 2020
@mrporcles
Copy link

@antoninbas Any idea what release this is now planned for?

@antoninbas antoninbas removed the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jan 26, 2021
@antoninbas
Copy link
Contributor Author

@mrporcles I am not actively working on this at the moment and it is not scoped for any release.
I did have a PR out to automatically restart the Antrea containers when the ConfigMap changes, but we have found a couple of drawbacks with that approach. At this time, my favorite solution would be treat the ConfigMap as immutable and offer support in the Antrea CLI (antctl) to edit configuration parameters. These edits would cause a new ConfigMap to be generated (with a new map), and trigger a rolling update of the Antrea components. I believe this is a very clean approach but a rolling update can be a bit "expensive" depending on which configuration parameter you want to modify. Specifically, in the context of this issue, it would still require a restart of the Antrea components, but you wouldn't have to do it manually any more.

@antoninbas
Copy link
Contributor Author

The "immutable ConfigMap approach" is described here: #1427 (comment)

@antoninbas antoninbas removed the bug label Apr 19, 2021
@antoninbas antoninbas assigned luolanzone and unassigned antoninbas Sep 9, 2021
@luolanzone
Copy link
Contributor

Hi @antoninbas @jianjuns @tnqn I would like propose a new way to detect configmap change:
add a new deployment with single replica to watch configmap change and update the agent/controller Daemonset/Deployment annotation which is similar like Helm chart: https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments, then we can rely on K8s rolling update strategy to update agent one by one. Could you help to check and give your feedback? thanks.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

6 participants