Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of mux feature is un-rendered in config_db.json #8484

Closed
wangxin opened this issue Aug 16, 2021 · 11 comments
Closed

State of mux feature is un-rendered in config_db.json #8484

wangxin opened this issue Aug 16, 2021 · 11 comments
Assignees
Labels
Dual ToR Platform ♊ Issues found on dual ToR platforms Issue for 202012

Comments

@wangxin
Copy link
Contributor

wangxin commented Aug 16, 2021

Description

State of mux feature is un-rendered in config_db.json

Steps to reproduce the issue:

  1. Check content of /etc/sonic/config_db.json

Describe the results you received:

State of mux feature is un-rendered in config_db.json:

    " FEATURE": {
...
        "mux": {
            "auto_restart": "enabled",
            "has_global_scope": "True",
            "has_per_asic_scope": "False",
            "has_timer": "False",
            "high_mem_alert": "disabled",
            "set_owner": "local",
            "state": "{% if 'subtype' in DEVICE_METADATA['localhost'] and DEVICE_METADATA['localhost']['subtype'] == 'DualToR' %}enabled{% else %}always_disabled{% endif %}"
        },
...

Describe the results you expected:

State should be rendered, not raw Jinja template.

Output of show version:

admin@str2-7260cx3-acs-10:~$ show ver

SONiC Software Version: SONiC.20201231.16
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: decfa37df2
Build date: Wed Aug 11 18:12:46 UTC 2021
Built by: AzDevOps@sonic-int-build-workers-0002L6

Platform: x86_64-arista_7260cx3_64
HwSKU: Arista-7260CX3-D108C8
ASIC: broadcom
ASIC Count: 1
Serial Number: JPE20255203
Uptime: 14:14:29 up  3:06,  2 users,  load average: 3.03, 2.74, 2.59

Docker images:
REPOSITORY                 TAG                 IMAGE ID            SIZE
docker-mux                 20201231.16         b1d56bbf4f0c        455MB
docker-mux                 latest              b1d56bbf4f0c        455MB
docker-acms                20201231.16         6469aad23ae0        197MB
docker-acms                latest              6469aad23ae0        197MB
docker-syncd-brcm          20201231.16         9461c120d381        695MB
docker-syncd-brcm          latest              9461c120d381        695MB
docker-dhcp-relay          20201231.16         2b2ba6bcfa51        410MB
docker-dhcp-relay          latest              2b2ba6bcfa51        410MB
docker-teamd               20201231.16         e16467565ee2        413MB
docker-teamd               latest              e16467565ee2        413MB
docker-router-advertiser   20201231.16         63895992bd93        403MB
docker-router-advertiser   latest              63895992bd93        403MB
docker-platform-monitor    20201231.16         9ae8f204f810        612MB
docker-platform-monitor    latest              9ae8f204f810        612MB
docker-lldp                20201231.16         9c793ee0d7f3        443MB
docker-lldp                latest              9c793ee0d7f3        443MB
docker-database            20201231.16         225d58f28324        403MB
docker-database            latest              225d58f28324        403MB
docker-orchagent           20201231.16         e1893ce9d37a        432MB
docker-orchagent           latest              e1893ce9d37a        432MB
docker-snmp                20201231.16         e940783d8e01        444MB
docker-snmp                latest              e940783d8e01        444MB
docker-sonic-telemetry     20201231.16         abce2b1e60be        492MB
docker-sonic-telemetry     latest              abce2b1e60be        492MB
docker-fpm-frr             20201231.16         6fcacdc3cca5        432MB
docker-fpm-frr             latest              6fcacdc3cca5        432MB
k8s.gcr.io/pause           3.4.1               0f8457a4c2ec        683kB

Output of show techsupport:

Additional information you deem important (e.g. issue happens only occasionally):

@tahmed-dev
Copy link
Contributor

is it rendered when using sonic-cfggen -d --print-data? Maybe the configuration was not saved...

@wangxin
Copy link
Contributor Author

wangxin commented Aug 17, 2021

I regenerated minigraph using "testbed-cli.sh gen-mg". This tool will generate minigraph to the DUT, load from minigraph and save config to /etc/sonic/config_db.json with command sonic-cfggen -d --print-data. It was still not rendered.

@wangxin
Copy link
Contributor Author

wangxin commented Aug 17, 2021

When the mux state is incorrect, same mux on both ToRs could be both active or standby.
After changed mux state in both ToRs to "enabled" and reloaded config, then the mux state became normal. For a same mux, if it is active on upper ToR, then it is standby on lower Tor, and vice versa.

@lguohan
Copy link
Collaborator

lguohan commented Aug 20, 2021

@tahmed-dev , can you follow up this one. it is causing all failures on 7260 dual tor testbed.

@wangxin
Copy link
Contributor Author

wangxin commented Aug 20, 2021

Observed similar issue for feature dhcp_relay on master image.

Version:

admin@str2-7050cx3-acs-10:~$ show ver

SONiC Software Version: SONiC.master.30026-2348794ef
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: 2348794ef
Build date: Thu Aug 19 13:05:16 UTC 2021
Built by: AzDevOps@sonic-build-workers-000M28

Platform: x86_64-arista_7050cx3_32s
HwSKU: Arista-7050CX3-32S-D48C8
ASIC: broadcom
ASIC Count: 1
Serial Number: JPE20437840
Model Number: DCS-7050CX3-32S-SSD
Hardware Revision: N/A
Uptime: 08:44:13 up 17 min,  1 user,  load average: 2.58, 3.22, 2.24

Issue in /etc/sonic/config_db.json:

    "FEATURE": {
        ...
        "dhcp_relay": {
            "auto_restart": "enabled",
            "has_global_scope": "True",
            "has_per_asic_scope": "False",
            "has_timer": "False",
            "high_mem_alert": "disabled",
            "set_owner": "local",
            "state": "{% if not (DEVICE_METADATA is defined and DEVICE_METADATA['localhost'] is defined and DEVICE_METADATA['localhost']['type'] is defined and DEVICE_METADATA['localhost']['type'] != 'ToRRouter') %}enabled{% else %}disabled{% endif %}"
        },

@wangxin
Copy link
Contributor Author

wangxin commented Aug 20, 2021

Same issue on 7050 running 202012.18:

@tahmed-dev
Copy link
Contributor

@wangxin, rendering takes place after hostcfgd is restarted. I can see that config_db after first boot has the template, however when hostcfgd is started, mux service will start and config db in memory will be correct and also show feature status. It might take while to for mux service to start. Can you please provide sonic-cfggen -d --print-data after 2min of load minigraph. Also, provide show feature status

@tahmed-dev
Copy link
Contributor

PR:8117 delay start hostcfgd by 90sec. @wangxin it is not clear to me why the un-rendered value is the cause of the underlying failures.

Please elaborate on the expectations why template rendering is the issue rather than delay starting mux service.

@tahmed-dev
Copy link
Contributor

@qiluo-msft can you please revert PR:8117?

@tahmed-dev
Copy link
Contributor

@tahmed-dev , can you follow up this one. it is causing all failures on 7260 dual tor testbed.

@lguohan I do not think this issue is critical. If the enabled field is rendered or not, it is consumed by hostcfgd which renders the field anyway, obtains the target state, and enables systemd config accordingly. Subsequent load_minigrpah will not have effect on the timing as systemd is already configured, assuming switch type does not change.

The first boot is when those containers are going to experience delay because they will not be systemd enabled until hostcfg processes Feature table.

@wangxin why is this observation considered expectation and/or issue? Can you please provide details?

@wangxin
Copy link
Contributor Author

wangxin commented Dec 1, 2021

Issue already fixed. Closing this issue.

@wangxin wangxin closed this as completed Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dual ToR Platform ♊ Issues found on dual ToR platforms Issue for 202012
Projects
None yet
Development

No branches or pull requests

4 participants