Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[services] Fix Delay Start of SNMP And Telemetry #5211

Conversation

tahmed-dev
Copy link
Contributor

@tahmed-dev tahmed-dev commented Aug 18, 2020

SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

Fixes #5172
closes #5172

signed-off-by: Tamer Ahmed tamer.ahmed@microsoft.com

- Why I did it
Mellanox reports fast-reboot is failing due to anmp/telemetry services being started early

- How I did it
Enabled systemd timer unit instead of systemd service unit

- How to verify it
fast-reboot :

admin@str-s6000-acs-14:~$ docker ps -a
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS               NAMES
73a0ac3d8809        docker-snmp:latest                   "/usr/bin/supervisord"   11 seconds ago      Up 8 seconds                            snmp
e00c3b98073d        docker-sonic-telemetry:latest        "/usr/bin/supervisord"   11 seconds ago      Up 10 seconds                           telemetry
e9adb6db457d        docker-router-advertiser:latest      "/usr/bin/docker-ini…"   2 minutes ago       Up 2 minutes                            radv
307918645b4f        docker-sonic-mgmt-framework:latest   "/usr/bin/supervisord"   2 minutes ago       Up 2 minutes                            mgmt-framework
fedd5b3265a5        docker-lldp:latest                   "/usr/bin/docker-lld…"   2 minutes ago       Up 2 minutes                            lldp
585327cfb624        docker-dhcp-relay:latest             "/usr/bin/docker_ini…"   2 minutes ago       Up 2 minutes                            dhcp_relay
7af84c5ed228        docker-syncd-brcm:latest             "/usr/bin/supervisord"   2 minutes ago       Up 2 minutes                            syncd
3d20336c910c        docker-teamd:latest                  "/usr/bin/supervisord"   2 minutes ago       Up 2 minutes                            teamd
0e43c8188cd6        docker-orchagent:latest              "/usr/bin/docker-ini…"   2 minutes ago       Up 2 minutes                            swss
e4df6993ac97        docker-fpm-frr:latest                "/usr/bin/docker_ini…"   2 minutes ago       Up 2 minutes                            bgp
d16b86ef127d        docker-platform-monitor:latest       "/usr/bin/docker_ini…"   2 minutes ago       Up 2 minutes                            pmon
aefd87ac2fef        docker-database:latest               "/usr/local/bin/dock…"   3 minutes ago       Up 3 minutes                            database
admin@str-s6000-acs-14:~

syslog file shows delayed start succeeds:

Aug 18 18:40:43.594545 str-s6000-acs-14 INFO systemd[1]: Started Delays snmp container until SONiC has started.
Aug 18 18:41:46.093939 str-s6000-acs-14 INFO hostcfgd: Running cmd: 'sudo systemctl unmask snmp.timer'
Aug 18 18:41:47.866326 str-s6000-acs-14 INFO hostcfgd: Running cmd: 'sudo systemctl enable snmp.timer'
Aug 18 18:41:49.610696 str-s6000-acs-14 INFO hostcfgd: Running cmd: 'sudo systemctl start snmp.timer'
Aug 18 18:41:49.777290 str-s6000-acs-14 INFO hostcfgd: Feature 'snmp.timer' is enabled and started
Aug 18 18:44:04.788079 str-s6000-acs-14 INFO systemd[1]: snmp.timer: Succeeded.
Aug 18 18:44:04.788643 str-s6000-acs-14 INFO systemd[1]: Stopped Delays snmp container until SONiC has started.
Aug 18 18:44:05.128882 str-s6000-acs-14 INFO snmp.sh[4422]: Starting existing snmp container with HWSKU Force10-S6000
Aug 18 18:44:06.372863 str-s6000-acs-14 INFO snmp.sh[4422]: 1
Aug 18 18:44:07.068655 str-s6000-acs-14 INFO snmp.sh[4422]: snmp
Aug 18 18:44:10.422535 str-s6000-acs-14 INFO snmp#rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="19" x-info="https://www.rsyslog.com"] start

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006

- A picture of a cute animal (not mandatory but encouraged)

SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
lguohan
lguohan previously approved these changes Aug 19, 2020
@tahmed-dev tahmed-dev marked this pull request as ready for review August 19, 2020 19:10
@tahmed-dev tahmed-dev force-pushed the taahme/fix-delay-start-of-snmp-telemetry branch from a9b6881 to 26e5564 Compare August 19, 2020 19:32
@tahmed-dev tahmed-dev requested review from jleveque and lguohan August 19, 2020 19:33
@jleveque
Copy link
Contributor

Just wanted to add a note that we should probably now also check the has_timer field when restarting services in the config command: https://github.com/Azure/sonic-utilities/blob/master/config/main.py#L238

@tahmed-dev
Copy link
Contributor Author

Just wanted to add a note that we should probably now also check the has_timer field when restarting services in the config command: https://github.com/Azure/sonic-utilities/blob/master/config/main.py#L238

Good point! I was thinking about it last night, however the rationale is to delay non critical service during boot time in order to meet fast/warm boot time requirements. During config load/reload, we don't have such urgency. What do you think?

@jleveque
Copy link
Contributor

Good point! I was thinking about it last night, however the rationale is to delay non critical service during boot time in order to meet fast/warm boot time requirements. During config load/reload, we don't have such urgency. What do you think?

If that is the only rationale, and there is no other reason for the delay (race conditions, etc.), then I agree we don't need to be concerned about config load/reload.

@tahmed-dev tahmed-dev merged commit e484ae9 into sonic-net:master Aug 20, 2020
@@ -41,12 +41,13 @@ def obfuscate(data):
return data


def update_feature_state(feature_name, state):
def update_feature_state(feature_name, state, has_timer=False):
feature_suffix = "timer" if has_timer else "service"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tahmed-dev wondering is it it not better to check for timer service exits for any service and start it if present? That way it can be dynamic and we don't to pre-define in init_cfg.json.j2 as this is always can break if new service is added but init_cfg file is not updated accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdosi that would work as well. The argument is applicable to features as well. This would also break a feature as it will not be started. I think it is simpler for hostcfgd to not assume any knowledge about systemd internals or where .service/.timer files are on disk had in chance systemd relocated those files. After all this is one time configuration and it should be well defined during development.

After all, if you feel strongly about it, please go ahead and put out a PR to that effect.

Copy link
Contributor

@jleveque jleveque Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdosi: I made the same suggestion above. I think the current solution (relying on init_cfg.json) is better than explicitly specifying the names of the services which have a .timer file. I'm still open to checking for the presence of a .timer file. The more foolproof and maintenance-free we can make the codebase, the better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tahmed-dev and @jleveque.
I was thinking just check return value of below command and based on that use either .service or .timer
"sudo "sudo systemctl list-unit-files | grep {}.timer".format(feature_name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdosi Thanks! I did not know about this command.

The only thing that would hold me off is that this comes with a cost during boot time as such check for every service will consume precious CPU cycles in this path (boot time).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tahmed-dev: I was also concerned about that downside to checking for the unit files. I guess we could check the runtime of the sudo systemctl list-unit-files | grep ... command to understand how intensive it is. But as above, I'm OK with expecting this information to be added to init_cfg.json -- now, all new services should be added there. It's one location and it's a data file. What I really wanted to avoid (and this implementation does that) is the need to add new service names into various code files if they are exceptions to the norm (e.g., they have a .timer file).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jleveque As I was discussing with Tamer one concern I had:

  1. In future we add timer to any existing service then it is not intuitive to go and add the change into init_cfg.json accordingly

Also regarding boot-time performance we can run this command only one and not for all services and save the state/O-P

Copy link
Contributor

@abdosi abdosi Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had offline chat with @tahmed-dev and since using this approach can have boot-time impact so we can park this discussion for now .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Definitely something to reconsider in the future.

@abdosi
Copy link
Contributor

abdosi commented Aug 20, 2020

Good point! I was thinking about it last night, however the rationale is to delay non critical service during boot time in order to meet fast/warm boot time requirements. During config load/reload, we don't have such urgency. What do you think?

If that is the only rationale, and there is no other reason for the delay (race conditions, etc.), then I agree we don't need to be concerned about config load/reload.

@tahmed-dev Adding to @jleveque point currently on doing config reload seeing this. May be we need to check if we want to add .timer in config commands also

sudo config reload -y
Executing stop of service telemetry...
Warning: Stopping telemetry.service, but it can still be activated by:
telemetry.timer

stop_cmds.append("sudo systemctl stop {}.service".format(feature_name))
stop_cmds.append("sudo systemctl disable {}.service".format(feature_name))
stop_cmds.append("sudo systemctl mask {}.service".format(feature_name))
stop_cmds.append("sudo systemctl stop {}.{}".format(feature_name, feature_suffix))
Copy link
Contributor

@jleveque jleveque Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When stopping the service, if the service has a .timer file, I believe we need to stop both the timer AND the service. If the timer has already started the service, we need to stop the service. If the timer is currently running and hasn't started the service, we need to stop the timer. Thus, we should always stop both to be safe.

abdosi pushed a commit that referenced this pull request Aug 20, 2020
SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
noaOrMlnx added a commit to noaOrMlnx/sonic-buildimage that referenced this pull request Aug 26, 2020
* [BFN] Add support pcied daemon for Montara and Newport (sonic-net#5199)

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>

* [cfggen] Allow Write To Redis DB With Template/Batch Mode (sonic-net#5203)

Argument to write to config-db is not allowed when using template.
This PR allows cfggen to write to redis db when using template
mode.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [submodule]: Advance sonic-snmpagent. (sonic-net#5213)

Update sonic-snmpagent submodule to include below commits:
1a2b62a [Namespace]: Fix SAI_ID key used in cpfcIfTable and csqIfQosGroupStatsTable implementation (sonic-net#138)
d06f00c [pytest/coverage]: add coverage support (sonic-net#156)
90e9f2e [Namespace]: Simplify sync_d functions to use higher order (sonic-net#154)
b5815d9 [LLDP]: Modify OID index of LLDPRemTableUpdater MIB (sonic-net#155)
d5f2b92 [Multiasic]: Provide namespace support for ipNetToMediaPhysAddress (sonic-net#129)
166c221 [Namespace]: Fix interface counters in RFC 1213 (sonic-net#145)

Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>

* [cfggen] Conform With Python 3 Syntax (sonic-net#5154)

Preparing sonic-cfggen for migration to Python 3.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [redis-dump-load] Update submodule (sonic-net#5215)

* src/redis-dump-load 832a645...7585497 (2):
  > Merge pull request sonic-net#63 from jleveque/update_gitignore
  > Merge pull request sonic-net#59 from breser/redis-load-empty

* [services] Fix Delay Start of SNMP And Telemetry (sonic-net#5211)

SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [sonic-py-common][multi ASIC] API to get a list of frontend ports (sonic-net#5221)

* [sonic-py-common][multi ASIC] utility to get a list of frontend ports from a given list of ports

* [sonic-config-engine] Update .gitignore (sonic-net#5223)

- Ignore directories generated by building Python wheel package
- Move all sonic-config-engine ignores from the root .gitignore to src/sonic-config-engine/.gitignore

* Advance swss-common submodule. (sonic-net#5222)

9a7c9d Dbconnector namespace support (sonic-net#376)
c32f0b5 add state db entry for fgnhg route entry (sonic-net#374)

* [caclmgrd] Add support for multi-ASIC platforms (sonic-net#5022)

* Support for Control Plane ACL's for Multi-asic Platforms.
Following changes were done:
 1) Moved from using blocking listen() on Config DB to the select() model
 via python-swsscommon since we have to wait on event from multiple
 config db's
 2) Since  python-swsscommon is not available on host added libswsscommon and python-swsscommon
    and dependent packages in the base image (host enviroment)
 3) Made iptables programmed in all namespace using ip netns exec

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Fix Comments

* Added Change for Multi-asic to have iptables
rules to accept internal docker tcp/udp traffic
needed for syslog and redis-tcp connection.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Added more comments on logic.

* Fixed all warning/errors reported by http://pep8online.com/
other than line > 80 characters.

* Fix Comment
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Verified with swsscommon package. Fix issue for single asic platforms.

* Moved to new python package

* Address Review Comments.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments.

* Add support to VS platform for platform.json and DPB CLI Tests (sonic-net#5192)

- Reverts commit 457674c
- Creates "platform.json" for vs docker
- Adds test case for port breakout CLI
- Explicitly sets admin status of all the VS interfaces to down to be compatible with SWSS test cases, specifically vnet tests and sflow tests

Signed-off-by: Sangita Maity <sangitamaity0211@gmail.com>

* [iccpd] Fix uninitialized variable. (sonic-net#5112)

To declare *tb[] but do not initialize it, it might be very risky. We get iccpd exception during processing arp/nd event. Initialize it to {0};

* Fix unwanted python exception in syslog during database container (sonic-net#5227)

startup when doing redis PING since database_config.json getting
generated from jinja2 template is still not ready.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* [hostcfgd] Handle Both Service And Timer Units (sonic-net#5228)

Commit e484ae9 introduced systemd .timer unit to hostcfgd.
However, when stopping service that has timer, there is possibility that
timer is not running and the service would not be stopped. This PR
address this situation by handling both .timer and .service units.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [arista] Update driver submodules (sonic-net#5147)

- fix watchdog timeout units
- fix import path for thermal_manager
- remove arista bind mounts for docker-snmp
- improve arista bind mounts for pmon

* [docker-radv] Fix startup issues (sonic-net#5230)

**- Why I did it**

PR sonic-net#4599 introduced two bugs in the startup of the router advertiser container:

1. References to the `wait_for_intf.sh` script were changed to `wait_for_link.sh`, but the actual script was not renamed
2. The `ipv6_found` Jinja2 variable added to the supervisor config file goes out of scope before it is read.

**- How I did it**
1. Rename the `wait_for_intf.sh` script to `wait_for_link.sh`
2. Use the Jinja2 "namespace" construct to fix the scope issue

**- How to verify it**

Ensure all processes in the radv container start properly under the correct conditions (i.e., whether or not there is at least one VLAN with an IPv6 address assigned).

* [sonic-utilities] Update submodule (sonic-net#5233)

* src/sonic-utilities d5fdd74...17fb378 (7):
  > [sonic-installer] Import re module (sonic-net#1061)
  > [fast-reboot]: Fix fail to execute fast-reboot problem (sonic-net#1047)
  > [config] Reduce Calls to SONiC Cfggen (sonic-net#1052)
  > [filter-fdb] Call Filter FDB Main From Within Test Code (sonic-net#1051)
  > [sflow_test.py]: Fix show sflow display. (sonic-net#1054)
  > Change fast-reboot script to use swss and radv service script (sonic-net#1036)
  > Common functions for show CLI support on multi ASIC (sonic-net#999)

* [sonic-host-service]: Add SONiC Host Services infrastructure (sonic-net#4840)

- Why I did it

When SONiC is configured with the management framework and/or telemetry services, the applications running inside those containers need to access some functionality on the host system. The following is a non-exhaustive list of such functionality:

Image management
Configuration save and load
ZTP enable/disable and status
Show tech support
- How I did it

The host service is a Python process that listens for requests via D-Bus. It will then service those requests and send a response back to the requestor.

This PR only introduces the host service infrastructure. Applications that need access to the host services must add applets that will register on D-Bus endpoints to service the appropriate functionality.

- How to verify it

- Description for the changelog

Add SONiC Host Service for container to execute select commands in host

Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com>

* Add common functions applicable to single/multi asic platforms (sonic-net#5224)

* Add common functions applicable to single/multi asic platforms
* Raise exception if invalid namespace is given as input.

* [sonic-swss] Update submodule (sonic-net#5231)

* src/sonic-swss d2bab10...c4949a2 (34):
  > [dvs] Add new common issues and TOC to DVS README (sonic-net#1405)
  > Avoid adding loopback interface (ip link add) when setting nat zone on loopback interface (sonic-net#1411)
  > [portsorch] add buffer drop FC group (sonic-net#1368)
  > [dvs/chassis] Bring up SONiC interfaces in virtual chassis (sonic-net#1410)
  > [chassis/dvs] Add support for virtual chassis to DVS testbed (sonic-net#1345)
  > [sonic-swsss] Fix the issue of field "next_hop_ip" not getting updated in state DB in ERSPAN Mirror (sonic-net#1375)
  > [intfmgr] Fix OA crash issue due to link local configurations (sonic-net#1195)
  > Fix the issue when persistent DVS is used to run pytest which has number of front-panel ports < 32 (sonic-net#1373)
  > [dvs] Refactor AsicDbValidator (sonic-net#1402)
  > [fec] Get FEC mode when port is already admin down (sonic-net#1403)
  > [fec] added logic that put port down before applying fec onfiguration (sonic-net#1399)
  > [dvs] Add performance test for adding and deleting routes (sonic-net#1392)
  > Ignore IPv6 link-local and multicast entries as Vnet routes (sonic-net#1401)
  > [vlanmgr] Support Jumbo Frame By Default (sonic-net#1393)
  > Fix log/syslog not being correct when last test fails for given module (sonic-net#1395)
  > Get initial speed from ASIC DB  (sonic-net#1390)
  > [dvs] Add options to limit CPU usage (sonic-net#1394)
  > [intfsorch] Retrieve Port object before setting NAT zone on router interfaces. (sonic-net#1372)
  > [.gitignore] Ignore gearsyncd binary (sonic-net#1381)
  > Added Max Nexthopgroup/ECMP Count supported by device into State DB. (sonic-net#1383)
  > [dvs] Upload logs even if failure occurs during startup (sonic-net#1389)
  > [rates] fix issue with rates init (sonic-net#1387)
  > [dvs] Validate that SWSS is ready to receive input before starting tests (sonic-net#1385)
  > [dvs] Convert sflow and speed tests to use dvslib (sonic-net#1382)
  > [dvs_acl] Refactor and document dvs_acl library (sonic-net#1378)
  > [dvs] Fix install instructions in README (sonic-net#1379)
  > [dvs] Update README with new flags, options, and known issues (sonic-net#1380)
  > swss: gearsyncd should return 0 on exit (sonic-net#1376)
  > Remove 00-copp.config.json from swss debian package. (sonic-net#1366)
  > fix undefined var in rates lua scripts (sonic-net#1365)
  > [fdborch] Fixed Orchagent crash in FDB flush on port disable. (sonic-net#1369)
  > [tlm_teamd]: Try to add LAG again, when teamd is not ready first time (sonic-net#1347)
  > [vs] Incorporate python3 best practices into DVSLib (sonic-net#1357)
  > [dvs] Mark unstable tests as xfail (sonic-net#1356)

* [arista/aboot]: Zero out 1st MB before repartitioning (sonic-net#5220)

The first partition starting point was changed to be 1M as part of this
commit: 6ba2f97. On systems that are misaligned before conversion
(partition start is the first sector), the relica partition that is
left in the first MB can cause problems in Aboot and result in corruption
of the filesystem on the new aligned partition.

Zeroing this old relica makes sure that there is nothing left of the old
partition lying around. There won't be any risk of having Aboot corrupt
the new filesystem because of the old relica.

Signed-off-by: Baptiste Covolato <baptiste@arista.com>

* [sonic-py-common] Add unit test framework (sonic-net#5238)

**- Why I did it**

To install the framework for adding unit tests to the sonic-py-common package and report coverage.

** How I did it **

- Incorporate pytest and pytest-cov into sonic-py-common package build
- Updgrade version of 'mock' installed to version 3.0.5, the last version which supports Python 2. This fixes a bug where the file object returned from `mock_open()` was not iterable (see https://bugs.python.org/issue32933)
- Add support for Python 3 setuptools and pytest in sonic-slave-buster environment
- Add tests for `device_info.get_machine_info()` and `device_info.get_platform()` functions
- Also add a .gitignore in the root of the sonic-py-common directory, move all related ignores from main .gitignore file, and add ignores for files and dirs generated by pytest-cov

* Add switch for synchronous mode (sonic-net#5237)

Add a master switch so that the sync/async mode can be configured.
Example usage of the switch:
1.  Configure mode while building an image
    `make ENABLE_SYNCHRONOUS_MODE=y <target>`
2. Configure when the device is running 
    Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db`
    Restart swss with `systemctl restart swss`

* [enable counters] Enable port buffer drops by default and update MLNX SAI submodule (sonic-net#5059)

* Enable port buffer drops by default
* [Mellanox] Update SAI_Implementation

Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>

* Platform monitor changes in daemon_base for multi_asic (sonic-net#4932)

Adding namespace support for db connect API.

Co-authored-by: Petro Bratash <68950226+bratashX@users.noreply.github.com>
Co-authored-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Co-authored-by: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com>
Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
Co-authored-by: Mahesh Maddikayala <10645050+smaheshm@users.noreply.github.com>
Co-authored-by: judyjoseph <53951155+judyjoseph@users.noreply.github.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
Co-authored-by: Sangita Maity <sangitamaity0211@gmail.com>
Co-authored-by: Kelly Chen <kelly_chen@edge-core.com>
Co-authored-by: Samuel Angebault <staphylo@arista.com>
Co-authored-by: nirenjan <nirenjan@users.noreply.github.com>
Co-authored-by: Baptiste Covolato <b.covolato@gmail.com>
Co-authored-by: shi-su <67605788+shi-su@users.noreply.github.com>
Co-authored-by: Mykola F <37578614+mykolaf@users.noreply.github.com>
noaOrMlnx added a commit to noaOrMlnx/sonic-buildimage that referenced this pull request Aug 27, 2020
* [BFN] Add support pcied daemon for Montara and Newport (sonic-net#5199)

Signed-off-by: Petro Bratash <petrox.bratash@intel.com>

* [cfggen] Allow Write To Redis DB With Template/Batch Mode (sonic-net#5203)

Argument to write to config-db is not allowed when using template.
This PR allows cfggen to write to redis db when using template
mode.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [submodule]: Advance sonic-snmpagent. (sonic-net#5213)

Update sonic-snmpagent submodule to include below commits:
1a2b62a [Namespace]: Fix SAI_ID key used in cpfcIfTable and csqIfQosGroupStatsTable implementation (sonic-net#138)
d06f00c [pytest/coverage]: add coverage support (sonic-net#156)
90e9f2e [Namespace]: Simplify sync_d functions to use higher order (sonic-net#154)
b5815d9 [LLDP]: Modify OID index of LLDPRemTableUpdater MIB (sonic-net#155)
d5f2b92 [Multiasic]: Provide namespace support for ipNetToMediaPhysAddress (sonic-net#129)
166c221 [Namespace]: Fix interface counters in RFC 1213 (sonic-net#145)

Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>

* [cfggen] Conform With Python 3 Syntax (sonic-net#5154)

Preparing sonic-cfggen for migration to Python 3.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [redis-dump-load] Update submodule (sonic-net#5215)

* src/redis-dump-load 832a645...7585497 (2):
  > Merge pull request sonic-net#63 from jleveque/update_gitignore
  > Merge pull request sonic-net#59 from breser/redis-load-empty

* [services] Fix Delay Start of SNMP And Telemetry (sonic-net#5211)

SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [sonic-py-common][multi ASIC] API to get a list of frontend ports (sonic-net#5221)

* [sonic-py-common][multi ASIC] utility to get a list of frontend ports from a given list of ports

* [sonic-config-engine] Update .gitignore (sonic-net#5223)

- Ignore directories generated by building Python wheel package
- Move all sonic-config-engine ignores from the root .gitignore to src/sonic-config-engine/.gitignore

* Advance swss-common submodule. (sonic-net#5222)

9a7c9d Dbconnector namespace support (sonic-net#376)
c32f0b5 add state db entry for fgnhg route entry (sonic-net#374)

* [caclmgrd] Add support for multi-ASIC platforms (sonic-net#5022)

* Support for Control Plane ACL's for Multi-asic Platforms.
Following changes were done:
 1) Moved from using blocking listen() on Config DB to the select() model
 via python-swsscommon since we have to wait on event from multiple
 config db's
 2) Since  python-swsscommon is not available on host added libswsscommon and python-swsscommon
    and dependent packages in the base image (host enviroment)
 3) Made iptables programmed in all namespace using ip netns exec

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Fix Comments

* Added Change for Multi-asic to have iptables
rules to accept internal docker tcp/udp traffic
needed for syslog and redis-tcp connection.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fix Review Comments

* Added more comments on logic.

* Fixed all warning/errors reported by http://pep8online.com/
other than line > 80 characters.

* Fix Comment
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Verified with swsscommon package. Fix issue for single asic platforms.

* Moved to new python package

* Address Review Comments.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments.

* Add support to VS platform for platform.json and DPB CLI Tests (sonic-net#5192)

- Reverts commit 457674c
- Creates "platform.json" for vs docker
- Adds test case for port breakout CLI
- Explicitly sets admin status of all the VS interfaces to down to be compatible with SWSS test cases, specifically vnet tests and sflow tests

Signed-off-by: Sangita Maity <sangitamaity0211@gmail.com>

* [iccpd] Fix uninitialized variable. (sonic-net#5112)

To declare *tb[] but do not initialize it, it might be very risky. We get iccpd exception during processing arp/nd event. Initialize it to {0};

* Fix unwanted python exception in syslog during database container (sonic-net#5227)

startup when doing redis PING since database_config.json getting
generated from jinja2 template is still not ready.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* [hostcfgd] Handle Both Service And Timer Units (sonic-net#5228)

Commit e484ae9 introduced systemd .timer unit to hostcfgd.
However, when stopping service that has timer, there is possibility that
timer is not running and the service would not be stopped. This PR
address this situation by handling both .timer and .service units.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [arista] Update driver submodules (sonic-net#5147)

- fix watchdog timeout units
- fix import path for thermal_manager
- remove arista bind mounts for docker-snmp
- improve arista bind mounts for pmon

* [docker-radv] Fix startup issues (sonic-net#5230)

**- Why I did it**

PR sonic-net#4599 introduced two bugs in the startup of the router advertiser container:

1. References to the `wait_for_intf.sh` script were changed to `wait_for_link.sh`, but the actual script was not renamed
2. The `ipv6_found` Jinja2 variable added to the supervisor config file goes out of scope before it is read.

**- How I did it**
1. Rename the `wait_for_intf.sh` script to `wait_for_link.sh`
2. Use the Jinja2 "namespace" construct to fix the scope issue

**- How to verify it**

Ensure all processes in the radv container start properly under the correct conditions (i.e., whether or not there is at least one VLAN with an IPv6 address assigned).

* [sonic-utilities] Update submodule (sonic-net#5233)

* src/sonic-utilities d5fdd74...17fb378 (7):
  > [sonic-installer] Import re module (sonic-net#1061)
  > [fast-reboot]: Fix fail to execute fast-reboot problem (sonic-net#1047)
  > [config] Reduce Calls to SONiC Cfggen (sonic-net#1052)
  > [filter-fdb] Call Filter FDB Main From Within Test Code (sonic-net#1051)
  > [sflow_test.py]: Fix show sflow display. (sonic-net#1054)
  > Change fast-reboot script to use swss and radv service script (sonic-net#1036)
  > Common functions for show CLI support on multi ASIC (sonic-net#999)

* [sonic-host-service]: Add SONiC Host Services infrastructure (sonic-net#4840)

- Why I did it

When SONiC is configured with the management framework and/or telemetry services, the applications running inside those containers need to access some functionality on the host system. The following is a non-exhaustive list of such functionality:

Image management
Configuration save and load
ZTP enable/disable and status
Show tech support
- How I did it

The host service is a Python process that listens for requests via D-Bus. It will then service those requests and send a response back to the requestor.

This PR only introduces the host service infrastructure. Applications that need access to the host services must add applets that will register on D-Bus endpoints to service the appropriate functionality.

- How to verify it

- Description for the changelog

Add SONiC Host Service for container to execute select commands in host

Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com>

* Add common functions applicable to single/multi asic platforms (sonic-net#5224)

* Add common functions applicable to single/multi asic platforms
* Raise exception if invalid namespace is given as input.

* [sonic-swss] Update submodule (sonic-net#5231)

* src/sonic-swss d2bab10...c4949a2 (34):
  > [dvs] Add new common issues and TOC to DVS README (sonic-net#1405)
  > Avoid adding loopback interface (ip link add) when setting nat zone on loopback interface (sonic-net#1411)
  > [portsorch] add buffer drop FC group (sonic-net#1368)
  > [dvs/chassis] Bring up SONiC interfaces in virtual chassis (sonic-net#1410)
  > [chassis/dvs] Add support for virtual chassis to DVS testbed (sonic-net#1345)
  > [sonic-swsss] Fix the issue of field "next_hop_ip" not getting updated in state DB in ERSPAN Mirror (sonic-net#1375)
  > [intfmgr] Fix OA crash issue due to link local configurations (sonic-net#1195)
  > Fix the issue when persistent DVS is used to run pytest which has number of front-panel ports < 32 (sonic-net#1373)
  > [dvs] Refactor AsicDbValidator (sonic-net#1402)
  > [fec] Get FEC mode when port is already admin down (sonic-net#1403)
  > [fec] added logic that put port down before applying fec onfiguration (sonic-net#1399)
  > [dvs] Add performance test for adding and deleting routes (sonic-net#1392)
  > Ignore IPv6 link-local and multicast entries as Vnet routes (sonic-net#1401)
  > [vlanmgr] Support Jumbo Frame By Default (sonic-net#1393)
  > Fix log/syslog not being correct when last test fails for given module (sonic-net#1395)
  > Get initial speed from ASIC DB  (sonic-net#1390)
  > [dvs] Add options to limit CPU usage (sonic-net#1394)
  > [intfsorch] Retrieve Port object before setting NAT zone on router interfaces. (sonic-net#1372)
  > [.gitignore] Ignore gearsyncd binary (sonic-net#1381)
  > Added Max Nexthopgroup/ECMP Count supported by device into State DB. (sonic-net#1383)
  > [dvs] Upload logs even if failure occurs during startup (sonic-net#1389)
  > [rates] fix issue with rates init (sonic-net#1387)
  > [dvs] Validate that SWSS is ready to receive input before starting tests (sonic-net#1385)
  > [dvs] Convert sflow and speed tests to use dvslib (sonic-net#1382)
  > [dvs_acl] Refactor and document dvs_acl library (sonic-net#1378)
  > [dvs] Fix install instructions in README (sonic-net#1379)
  > [dvs] Update README with new flags, options, and known issues (sonic-net#1380)
  > swss: gearsyncd should return 0 on exit (sonic-net#1376)
  > Remove 00-copp.config.json from swss debian package. (sonic-net#1366)
  > fix undefined var in rates lua scripts (sonic-net#1365)
  > [fdborch] Fixed Orchagent crash in FDB flush on port disable. (sonic-net#1369)
  > [tlm_teamd]: Try to add LAG again, when teamd is not ready first time (sonic-net#1347)
  > [vs] Incorporate python3 best practices into DVSLib (sonic-net#1357)
  > [dvs] Mark unstable tests as xfail (sonic-net#1356)

* [arista/aboot]: Zero out 1st MB before repartitioning (sonic-net#5220)

The first partition starting point was changed to be 1M as part of this
commit: 6ba2f97. On systems that are misaligned before conversion
(partition start is the first sector), the relica partition that is
left in the first MB can cause problems in Aboot and result in corruption
of the filesystem on the new aligned partition.

Zeroing this old relica makes sure that there is nothing left of the old
partition lying around. There won't be any risk of having Aboot corrupt
the new filesystem because of the old relica.

Signed-off-by: Baptiste Covolato <baptiste@arista.com>

* [sonic-py-common] Add unit test framework (sonic-net#5238)

**- Why I did it**

To install the framework for adding unit tests to the sonic-py-common package and report coverage.

** How I did it **

- Incorporate pytest and pytest-cov into sonic-py-common package build
- Updgrade version of 'mock' installed to version 3.0.5, the last version which supports Python 2. This fixes a bug where the file object returned from `mock_open()` was not iterable (see https://bugs.python.org/issue32933)
- Add support for Python 3 setuptools and pytest in sonic-slave-buster environment
- Add tests for `device_info.get_machine_info()` and `device_info.get_platform()` functions
- Also add a .gitignore in the root of the sonic-py-common directory, move all related ignores from main .gitignore file, and add ignores for files and dirs generated by pytest-cov

* Add switch for synchronous mode (sonic-net#5237)

Add a master switch so that the sync/async mode can be configured.
Example usage of the switch:
1.  Configure mode while building an image
    `make ENABLE_SYNCHRONOUS_MODE=y <target>`
2. Configure when the device is running 
    Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db`
    Restart swss with `systemctl restart swss`

* [enable counters] Enable port buffer drops by default and update MLNX SAI submodule (sonic-net#5059)

* Enable port buffer drops by default
* [Mellanox] Update SAI_Implementation

Signed-off-by: Mykola Faryma <mykolaf@mellanox.com>

* Platform monitor changes in daemon_base for multi_asic (sonic-net#4932)

Adding namespace support for db connect API.

* [py-swsssdk] Submodule Update (sonic-net#5249)

Change:
  c25d492 Merge pull request sonic-net#83 from tahmed-dev/taahme/add-redis-pipeline-operation
  198d143 review comments - part of [configdb] Add Ability to Query/Update Redis Using Pipelines
  994851c review comments - part of [configdb] Add Ability to Query/Update Redis Using Pipelines
  2d2b7e1 making lgtm happy - part of [configdb] Add Ability to Query/Update Redis Using Pipelines
  fa9093c [configdb] Add Ability to Query/Update Redis Using Pipelines

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [cfggen] Use Redis Pipeline (sonic-net#5250)

This PR enables cfggen to readr/write from Redis DB using pipelines.
Pipelines enables batch read/write from/to Redis DB.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

Co-authored-by: Petro Bratash <68950226+bratashX@users.noreply.github.com>
Co-authored-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Co-authored-by: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com>
Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
Co-authored-by: Mahesh Maddikayala <10645050+smaheshm@users.noreply.github.com>
Co-authored-by: judyjoseph <53951155+judyjoseph@users.noreply.github.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
Co-authored-by: Sangita Maity <sangitamaity0211@gmail.com>
Co-authored-by: Kelly Chen <kelly_chen@edge-core.com>
Co-authored-by: Samuel Angebault <staphylo@arista.com>
Co-authored-by: nirenjan <nirenjan@users.noreply.github.com>
Co-authored-by: Baptiste Covolato <b.covolato@gmail.com>
Co-authored-by: shi-su <67605788+shi-su@users.noreply.github.com>
Co-authored-by: Mykola F <37578614+mykolaf@users.noreply.github.com>
santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this pull request Feb 25, 2021
SNMP and Telemetry services are not critical to switch startup.
They also cause fast-reboot not to meet timing requirements.
In order to delay start those service are associated with systemd
timer units, however when hostcfgd initiate service start, it start
the service and not the timer. This PR fixes this issue by
starting the timer associated with systemd unit.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

snmp and telemetry services are not delayed on fast-boot/warm-boot
4 participants