Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xcvrd daemon is crashing in sonic mainline #4886

Closed
BaluAlluru opened this issue Jul 2, 2020 · 2 comments · Fixed by sonic-net/sonic-platform-daemons#63 or #4887
Closed

xcvrd daemon is crashing in sonic mainline #4886

BaluAlluru opened this issue Jul 2, 2020 · 2 comments · Fixed by sonic-net/sonic-platform-daemons#63 or #4887
Assignees
Labels

Comments

@BaluAlluru
Copy link

BaluAlluru commented Jul 2, 2020

Description
xcvrd daemon part of pmon container is crashing.
Tested on SONIC Jenkins 327 image.

Steps to reproduce the issue:

  1. Load the latest SONIC image from Jenkins.
  2. Get into pmon docker container bash by executing "docker exec -it pmon bash"
  3. execute "ps ax" inside the pmon container
  4. Observe that xcvrd is not running.

Describe the results you received:
root@sonic:/home/admin# docker exec -it pmon bash

root@sonic:/# ps ax
PID TTY STAT TIME COMMAND
1 pts/0 Ss+ 0:00 /usr/bin/python /usr/bin/supervisord
13 pts/0 S 0:00 python /usr/bin/supervisor-proc-exit-listener --conta
18 pts/0 Sl 0:00 /usr/sbin/rsyslogd -n -iNONE
25 pts/0 S 0:00 /usr/bin/python /usr/bin/ledd
27 pts/0 S 0:00 /usr/bin/python /usr/bin/psud
28 pts/0 S 0:00 /usr/bin/python /usr/bin/syseepromd
93 pts/1 Ss 0:00 bash
98 pts/1 R+ 0:00 ps ax

Describe the results you expected:
root@sonic:/home/admin# docker exec -it pmon bash

root@sonic:/usr/bin# ps ax
PID TTY STAT TIME COMMAND
1 pts/0 Ss+ 0:00 /usr/bin/python /usr/bin/supervisord
13 pts/0 S 0:00 python /usr/bin/supervisor-proc-exit-listener --conta
17 pts/0 Sl 0:00 /usr/sbin/rsyslogd -n -iNONE
24 pts/0 S 0:00 /usr/bin/python /usr/bin/ledd
26 pts/0 S 0:00 /usr/bin/python /usr/bin/psud
27 pts/0 S 0:00 /usr/bin/python /usr/bin/syseepromd
96 pts/1 Ss 0:00 bash
109 pts/0 Sl 0:00 /usr/bin/python /usr/bin/xcvrd
145 pts/0 S 0:00 /usr/bin/python /usr/bin/xcvrd

146 pts/1 R+ 0:00 ps ax

Additional information you deem important (e.g. issue happens only occasionally):
we have narrowed the issue to particular commit.
$ git show ced0f7b
commit ced0f7b
Author: Joe LeVeque jleveque@users.noreply.github.com
Date: Sat Jun 27 22:57:26 2020 -0700

In this commit, they have tweaked some transceiver info key names in xcvrd script.
In some places in xcvrd script,still the old names are used, hence xcvrd is crashing.
We tried to manually run xcvrd. Below is the result

root@sonic:/# xcvrd
Traceback (most recent call last):
File "/usr/bin/xcvrd", line 1171, in
main()
File "/usr/bin/xcvrd", line 1168, in main
xcvrd.run()
File "/usr/bin/xcvrd", line 1132, in run
self.init()
File "/usr/bin/xcvrd", line 1111, in init
post_port_sfp_dom_info_to_db(is_warm_start, self.stop_event)
File "/usr/bin/xcvrd", line 404, in post_port_sfp_dom_info_to_db
notify_media_setting(logical_port_name, transceiver_dict, app_port_tbl)
File "/usr/bin/xcvrd", line 630, in notify_media_setting
key = get_media_settings_key(physical_port, transceiver_dict)
File "/usr/bin/xcvrd", line 526, in get_media_settings_key
vendor_name_str = transceiver_dict[physical_port]['manufacturename']
KeyError: 'manufacturename'

**Output of `show version`:**

root@sonic:/home/admin# show version

SONiC Software Version: SONiC.master.327-dd4cf912
Distribution: Debian 10.4
Kernel: 4.19.0-6-2-amd64
Build commit: dd4cf91
Build date: Mon Jun 29 16:35:50 UTC 2020
Built by: johnar@jenkins-worker-4

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@BaluAlluru
Copy link
Author

@jleveque
Copy link
Contributor

jleveque commented Jul 2, 2020

Reopening. Will be closed once submodule update merges.

@jleveque jleveque reopened this Jul 2, 2020
abdosi pushed a commit to sonic-net/sonic-platform-daemons that referenced this issue Jul 8, 2020
stephenxs added a commit to stephenxs/sonic-buildimage that referenced this issue Aug 15, 2020
…lation

This is to backport the sonic-net#4886 to 201911

Calculate pool size in t1 as 24 * downlink port + 8 * uplink port

- Take both port and peer MTU into account when calculating headroom
- Worst case factor is decreased to 50%
- Mellanox-SN2700-C28D8 t0, assume 48 * 50G/5m + 8 * 100G/40m ports
- Mellanox-SN2700 (C32)
  - t0: 16 * 100G/5m + 16 * 100G/40m
  - t1: 16 * 100G/40m + 16 * 100G/300m

Signed-off-by: Stephen Sun <stephens@nvidia.com>
abdosi pushed a commit that referenced this issue Aug 15, 2020
…lation (#5194)

This is to backport the #4886 to 201911

Calculate pool size in t1 as 24 * downlink port + 8 * uplink port

- Take both port and peer MTU into account when calculating headroom
- Worst case factor is decreased to 50%
- Mellanox-SN2700-C28D8 t0, assume 48 * 50G/5m + 8 * 100G/40m ports
- Mellanox-SN2700 (C32)
  - t0: 16 * 100G/5m + 16 * 100G/40m
  - t1: 16 * 100G/40m + 16 * 100G/300m

Signed-off-by: Stephen Sun <stephens@nvidia.com>
StormLiangMS pushed a commit to StormLiangMS/sonic-buildimage that referenced this issue Jul 18, 2022
…lation (sonic-net#5194)

This is to backport the sonic-net#4886 to 201911

Calculate pool size in t1 as 24 * downlink port + 8 * uplink port

- Take both port and peer MTU into account when calculating headroom
- Worst case factor is decreased to 50%
- Mellanox-SN2700-C28D8 t0, assume 48 * 50G/5m + 8 * 100G/40m ports
- Mellanox-SN2700 (C32)
  - t0: 16 * 100G/5m + 16 * 100G/40m
  - t1: 16 * 100G/40m + 16 * 100G/300m

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants