Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[services] Fix Delay Start of SNMP And Telemetry #5211

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 19 additions & 17 deletions files/build_templates/init_cfg.json.j2
Original file line number Diff line number Diff line change
Expand Up @@ -17,26 +17,28 @@
{% endfor %}
}
},
{%- set features = [("bgp", "enabled", "enabled"),
("database", "enabled", "disabled"),
("dhcp_relay", "enabled", "enabled"),
("lldp", "enabled", "enabled"),
("pmon", "enabled", "enabled"),
("radv", "enabled", "enabled"),
("snmp", "enabled", "enabled"),
("swss", "enabled", "enabled"),
("syncd", "enabled", "enabled"),
("teamd", "enabled", "enabled")] %}
{%- if include_iccpd == "y" %}{% do features.append(("iccpd", "disabled", "enabled")) %}{% endif %}
{%- if include_mgmt_framework == "y" %}{% do features.append(("mgmt-framework", "enabled", "enabled")) %}{% endif %}
{%- if include_nat == "y" %}{% do features.append(("nat", "disabled", "enabled")) %}{% endif %}
{%- if include_restapi == "y" %}{% do features.append(("restapi", "enabled", "enabled")) %}{% endif %}
{%- if include_sflow == "y" %}{% do features.append(("sflow", "disabled", "enabled")) %}{% endif %}
{%- if include_system_telemetry == "y" %}{% do features.append(("telemetry", "enabled", "enabled")) %}{% endif %}
{%- set features = [("bgp", "enabled", false, "enabled"),
("database", "enabled", false, "disabled"),
("dhcp_relay", "enabled", false, "enabled"),
("lldp", "enabled", false, "enabled"),
("pmon", "enabled", false, "enabled"),
("radv", "enabled", false, "enabled"),
("snmp", "enabled", true, "enabled"),
("swss", "enabled", false, "enabled"),
("syncd", "enabled", false, "enabled"),
("teamd", "enabled", false, "enabled")] %}
{%- if include_iccpd == "y" %}{% do features.append(("iccpd", "disabled", false, "enabled")) %}{% endif %}
{%- if include_mgmt_framework == "y" %}{% do features.append(("mgmt-framework", "enabled", false, "enabled")) %}{% endif %}
{%- if include_nat == "y" %}{% do features.append(("nat", "disabled", false, "enabled")) %}{% endif %}
{%- if include_restapi == "y" %}{% do features.append(("restapi", "enabled", false, "enabled")) %}{% endif %}
{%- if include_sflow == "y" %}{% do features.append(("sflow", "disabled", false, "enabled")) %}{% endif %}
{%- if include_system_telemetry == "y" %}{% do features.append(("telemetry", "enabled", true, "enabled")) %}{% endif %}
"FEATURE": {
{%- for feature, state, autorestart in features %}
{# has_timer field if set, will start the feature systemd .timer unit instead of .service unit #}
{%- for feature, state, has_timer, autorestart in features %}
"{{feature}}": {
"state": "{{state}}",
"has_timer" : {{has_timer | lower()}},
"auto_restart": "{{autorestart}}",
"high_mem_alert": "disabled"
}{% if not loop.last %},{% endif -%}
Expand Down
24 changes: 13 additions & 11 deletions files/image_config/hostcfgd/hostcfgd
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,13 @@ def obfuscate(data):
return data


def update_feature_state(feature_name, state):
def update_feature_state(feature_name, state, has_timer=False):
feature_suffix = "timer" if has_timer else "service"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tahmed-dev wondering is it it not better to check for timer service exits for any service and start it if present? That way it can be dynamic and we don't to pre-define in init_cfg.json.j2 as this is always can break if new service is added but init_cfg file is not updated accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdosi that would work as well. The argument is applicable to features as well. This would also break a feature as it will not be started. I think it is simpler for hostcfgd to not assume any knowledge about systemd internals or where .service/.timer files are on disk had in chance systemd relocated those files. After all this is one time configuration and it should be well defined during development.

After all, if you feel strongly about it, please go ahead and put out a PR to that effect.

Copy link
Contributor

@jleveque jleveque Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdosi: I made the same suggestion above. I think the current solution (relying on init_cfg.json) is better than explicitly specifying the names of the services which have a .timer file. I'm still open to checking for the presence of a .timer file. The more foolproof and maintenance-free we can make the codebase, the better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tahmed-dev and @jleveque.
I was thinking just check return value of below command and based on that use either .service or .timer
"sudo "sudo systemctl list-unit-files | grep {}.timer".format(feature_name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdosi Thanks! I did not know about this command.

The only thing that would hold me off is that this comes with a cost during boot time as such check for every service will consume precious CPU cycles in this path (boot time).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tahmed-dev: I was also concerned about that downside to checking for the unit files. I guess we could check the runtime of the sudo systemctl list-unit-files | grep ... command to understand how intensive it is. But as above, I'm OK with expecting this information to be added to init_cfg.json -- now, all new services should be added there. It's one location and it's a data file. What I really wanted to avoid (and this implementation does that) is the need to add new service names into various code files if they are exceptions to the norm (e.g., they have a .timer file).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jleveque As I was discussing with Tamer one concern I had:

  1. In future we add timer to any existing service then it is not intuitive to go and add the change into init_cfg.json accordingly

Also regarding boot-time performance we can run this command only one and not for all services and save the state/O-P

Copy link
Contributor

@abdosi abdosi Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had offline chat with @tahmed-dev and since using this approach can have boot-time impact so we can park this discussion for now .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Definitely something to reconsider in the future.

if state == "enabled":
start_cmds = []
start_cmds.append("sudo systemctl unmask {}.service".format(feature_name))
start_cmds.append("sudo systemctl enable {}.service".format(feature_name))
start_cmds.append("sudo systemctl start {}.service".format(feature_name))
start_cmds.append("sudo systemctl unmask {}.{}".format(feature_name, feature_suffix))
start_cmds.append("sudo systemctl enable {}.{}".format(feature_name, feature_suffix))
start_cmds.append("sudo systemctl start {}.{}".format(feature_name, feature_suffix))
for cmd in start_cmds:
syslog.syslog(syslog.LOG_INFO, "Running cmd: '{}'".format(cmd))
try:
Expand All @@ -55,12 +56,12 @@ def update_feature_state(feature_name, state):
syslog.syslog(syslog.LOG_ERR, "'{}' failed. RC: {}, output: {}"
.format(err.cmd, err.returncode, err.output))
continue
syslog.syslog(syslog.LOG_INFO, "Feature '{}' is enabled and started".format(feature_name))
syslog.syslog(syslog.LOG_INFO, "Feature '{}.{}' is enabled and started".format(feature_name, feature_suffix))
elif state == "disabled":
stop_cmds = []
stop_cmds.append("sudo systemctl stop {}.service".format(feature_name))
stop_cmds.append("sudo systemctl disable {}.service".format(feature_name))
stop_cmds.append("sudo systemctl mask {}.service".format(feature_name))
stop_cmds.append("sudo systemctl stop {}.{}".format(feature_name, feature_suffix))
Copy link
Contributor

@jleveque jleveque Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When stopping the service, if the service has a .timer file, I believe we need to stop both the timer AND the service. If the timer has already started the service, we need to stop the service. If the timer is currently running and hasn't started the service, we need to stop the timer. Thus, we should always stop both to be safe.

stop_cmds.append("sudo systemctl disable {}.{}".format(feature_name, feature_suffix))
stop_cmds.append("sudo systemctl mask {}.{}".format(feature_name, feature_suffix))
for cmd in stop_cmds:
syslog.syslog(syslog.LOG_INFO, "Running cmd: '{}'".format(cmd))
try:
Expand All @@ -71,7 +72,8 @@ def update_feature_state(feature_name, state):
continue
syslog.syslog(syslog.LOG_INFO, "Feature '{}' is stopped and disabled".format(feature_name))
else:
syslog.syslog(syslog.LOG_ERR, "Unexpected state value '{}' for feature '{}'".format(state, feature_name))
syslog.syslog(syslog.LOG_ERR, "Unexpected state value '{}' for feature '{}.{}'"
.format(state, feature_name, feature_suffix))


class Iptables(object):
Expand Down Expand Up @@ -284,7 +286,7 @@ class HostConfigDaemon:
syslog.syslog(syslog.LOG_WARNING, "Eanble state of feature '{}' is None".format(feature_name))
continue

update_feature_state(feature_name, state)
update_feature_state(feature_name, state, feature_table[feature_name]['has_timer'])

def aaa_handler(self, key, data):
self.aaacfg.aaa_update(key, data)
Expand Down Expand Up @@ -326,7 +328,7 @@ class HostConfigDaemon:
syslog.syslog(syslog.LOG_WARNING, "Enable state of feature '{}' is None".format(feature_name))
return

update_feature_state(feature_name, state)
update_feature_state(feature_name, state, feature_table[feature_name]['has_timer'])

def start(self):
# Update all feature states once upon starting
Expand Down