Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iccpd service is not running in the image built from latest SONIC mainline code. #5310

Open
BaluAlluru opened this issue Sep 3, 2020 · 27 comments

Comments

@BaluAlluru
Copy link

BaluAlluru commented Sep 3, 2020

iccpd service is not running in the image built from latest SONIC mainline code.

Cloned the latest SONIC mainline code, built the image by making INCLUDE_ICCPD = y in rules/config file.

Loaded this image on the box and observed that docker-iccpd is part of the image.
This is confirmed from "show version" command output.

Tried to manually start iccpd using systemctl, but observed error
root@sonic:/home/admin# systemctl start iccpd
Failed to start iccpd.service: Unit iccpd.service is masked.

Also observed iccpd.service is a soft link to /dev/null in /etc/systemd/system directory.

Tried to start the iccpd docker using docker command "docker start docker-iccpd"

root@sonic:/home/admin# docker start docker-iccpd
Error response from daemon: No such container: docker-iccpd
Error: failed to start containers: docker-iccpd

Is there any specific reason to point iccpd.service to /dev/null.
How can we start iccpd.service in the sonic mainline.

Steps to reproduce the issue:
1.load the latest SONIC image
2. observe that iccpd.service is not running.

Describe the results you received:
iccpd service was not running.

Describe the results you expected:
iccpd service should run.

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**

root@QFX5200-SONiC-SW1:/home/admin# show version

SONiC Software Version: SONiC.master.0-dirty-20200901.063245
Distribution: Debian 10.5
Kernel: 4.19.0-9-2-amd64
Build commit: ca3e71d
Build date: Tue Sep 1 07:01:28 UTC 2020
Built by: regress@ubuntu

Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-sflow latest 40e12d69a078 390MB
docker-sflow master.0-dirty-20200901.063245 40e12d69a078 390MB
docker-teamd latest ad4f0110c71d 386MB
docker-teamd master.0-dirty-20200901.063245 ad4f0110c71d 386MB
docker-nat latest bad3b9ef725a 389MB
docker-nat master.0-dirty-20200901.063245 bad3b9ef725a 389MB
docker-router-advertiser latest 59c7df5b0f13 355MB
docker-router-advertiser master.0-dirty-20200901.063245 59c7df5b0f13 355MB
docker-platform-monitor latest 19b5010a5de3 429MB
docker-platform-monitor master.0-dirty-20200901.063245 19b5010a5de3 429MB
docker-lldp latest 4aba01fa0c39 383MB
docker-lldp master.0-dirty-20200901.063245 4aba01fa0c39 383MB
docker-orchagent latest 66c709ebd229 400MB
docker-orchagent master.0-dirty-20200901.063245 66c709ebd229 400MB
docker-dhcp-relay latest 4155721f59b2 362MB
docker-dhcp-relay master.0-dirty-20200901.063245 4155721f59b2 362MB
docker-sonic-telemetry latest 99a568a213f4 425MB
docker-sonic-telemetry master.0-dirty-20200901.063245 99a568a213f4 425MB
docker-sonic-mgmt-framework latest 4a33b0714186 481MB
docker-sonic-mgmt-framework master.0-dirty-20200901.063245 4a33b0714186 481MB
docker-fpm-frr latest dd3a5fbcf940 402MB
docker-fpm-frr master.0-dirty-20200901.063245 dd3a5fbcf940 402MB
docker-iccpd latest 6b127b402227 386MB
docker-iccpd master.0-dirty-20200901.063245 6b127b402227 386MB
docker-database latest a52fff7d4939 355MB
docker-database master.0-dirty-20200901.063245 a52fff7d4939 355MB
docker-snmp latest 9e2fed4b1929 395MB
docker-snmp master.0-dirty-20200901.063245 9e2fed4b1929 395MB
docker-syncd-brcm latest fef2251e4340 447MB
docker-syncd-brcm master.0-dirty-20200901.063245 fef2251e4340 447MB

sonic_dump_QFX5200-SONiC-SW1_20200903_181322.tar.gz
Attach debug file sudo generate_dump:

```
(paste your output here)
```
@BaluAlluru BaluAlluru changed the title Docker container iccpd is not running in the image built from latest SONIC mainline code. iccpd service is not running in the image built from latest SONIC mainline code. Sep 4, 2020
@BaluAlluru
Copy link
Author

@shine4chen @sdddean @tylerlinp , Can you please let us know what to do with this issue.
iccpd is not running in sonic-mainline.

@jianjundong
Copy link
Contributor

@BaluAlluru
Before 'systemctl start iccpd', you may run 'systemctl unmask iccpd' first. And 'docker start docker-iccpd' is not correctly, you may run 'docker start iccpd'.

@BaluAlluru
Copy link
Author

@jianjundong ,
Thanks for clarifying on this. Executed systemctl unmask iccpd command followed by
systemctl start iccpd. Observed iccpd docker container running.

@shine4chen @sdddean @tylerlinp, @jianjundong
Attached file TopologyDiagram_MCLAGconfig-SW1-SW2 has topology diagram along with the MC-LAG configs on Switches SW1 and SW2.

Orchagent is crashing on our setup when MCLAG negotiation happens.

Also attached are generate dump tar files for SW1 and SW2 switches.

TopologyDiagram_MCLAGconfig-SW1-SW2.docx
sonic_dump_QFX5200-SONiC-SW1_20200911_185608.tar.gz
sonic_dump_QFX5200-SONiC-SW2_20200911_185612.tar.gz

@BaluAlluru
Copy link
Author

few more observations:

  1. Initially all the docker containers are running after iccpd is started.

image

  1. After few seconds, orchagent is crashing and many of the docker containers are not running.
    image

@jianjundong
Copy link
Contributor

@BaluAlluru
Sep 11 18:51:54.120675 QFX5200-SONiC-SW1 ERR syncd#syncd: [none] _brcm_sai_create_acl_table:5334 OUT PORTS not supported on this platform.
Sep 11 18:51:54.120723 QFX5200-SONiC-SW1 ERR syncd#syncd: [none] brcm_sai_create_acl_table:109 create table entry failed with error -327680.
Sep 11 18:51:54.121148 QFX5200-SONiC-SW1 ERR syncd#syncd: :- run: Runtime error: :- processQuadEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_ACL_TABLE:oid:0x70000000007f3, status: SAI_STATUS_ATTR_NOT_SUPPORTED_0

Jianjun: If the MCLAG peer connection is establised, one ACL will be installed.

  • In ASIC, ACL rule is used to isolate peer link from MC-LAG port. The rule is when the traffic is received from peer link and the output port is MC-LAG member port, the traffic must be dropped. For the chips whose ACL rule can't support out-port, there is a workaround in SAI layer by combination of ingress acl and egress acl. An alternative approach is to use isolation group. But The approach of isolation group still has some weakness, Firstly isolation group can't support tunnel-port and orchagent has not isolation group logic currently. Secondly isolation group may not be supported by all ASIC vendors. Using ACL is a more generic way to support the isolation function. We will refine this function to use isolation group later if it’s required.

@ciju-juniper
Copy link
Contributor

@jianjundong Could you provide an example configuration that will work on TH1 platforms?

@docker2017713
Copy link

@jianjundong I found the following configuration example, I don't know if it is correct.
"ACL_TABLE": {
"mclag": {
"policy_desc" : "Mclag egress port isolate acl",
"type" : "MCLAG",
"ports" : [
"PortChannel0001"
]
}
},
"ACL_RULE": {
"mclag|mclag": {
"OUT_PORTS" : "Ethernet0",
"IP_TYPE" : "ANY",
"PACKET_ACTION" : "DROP"
}
},
Can you tell me what the "ports" and "OUT_PORTS" fields in this example represent?many thanks!
ps:
I tried to fill in the port of peerlink or port-channel of mc-lag in the "ports" field, and fill in the member port of mc-lag in "OUT_PORTS", the container still crashed

@jianjundong
Copy link
Contributor

@jianjundong Could you provide an example configuration that will work on TH1 platforms?

@ciju-juniper
Please refer to https://github.com/Azure/SONiC/blob/master/doc/mclag/Sonic-mclag-hld.md.
If you have a problem, would you mind pasting up your configuration and topology map? Let's see what the problem is.

@jianjundong
Copy link
Contributor

@docker2017713
In this example, "Ports" is the peerlink, "OUT_PORTS" are the member ports of mclag enabled portchannel.
This ACL table is created automatically and does not need to be manually modified.
There are usually two reasons for container crash: 1. The attribute "OUT_PORTS" is not supported. SAI needs to be modified to support "OUT_PORTS" attribute. 2. When deleting an FDB entry, if it is found that the FDB entry does not exist, SAI cannot return failure.

@BaluAlluru
Copy link
Author

@jianjundong

I did attach Topology Diagram, config's and generate dump few days ago in this issue page. Please scroll up the conversations.

you should see below attachments
TopologyDiagram_MCLAGconfig-SW1-SW2.docx
sonic_dump_QFX5200-SONiC-SW1_20200911_185608.tar.gz
sonic_dump_QFX5200-SONiC-SW2_20200911_185612.tar.gz

@jianjundong
Copy link
Contributor

@jianjundong

I did attach Topology Diagram, config's and generate dump few days ago in this issue page. Please scroll up the conversations.

you should see below attachments
TopologyDiagram_MCLAGconfig-SW1-SW2.docx sonic_dump_QFX5200-SONiC-SW1_20200911_185608.tar.gz sonic_dump_QFX5200-SONiC-SW2_20200911_185612.tar.gz

@BaluAlluru
In your attached syslog document, there are the following log information:
Sep 11 18:56:32.110825 QFX5200-SONiC-SW1 NOTICE iccpd#iccpd: [iccp_csm_transit.NOTICE] csm 1 change state from CONNECTING to OPERATIONAL.
//jianjun: ICCP connection is established, your configuration is OK.

Sep 11 18:56:34.281952 QFX5200-SONiC-SW1 NOTICE iccpd#iccpd: [update_peerlink_isolate_from_all_csm_lif.NOTICE] Send port isolate msg to mclagsyncd, src port Ethernet68, dst port Ethernet0
Sep 11 18:56:34.283292 QFX5200-SONiC-SW1 NOTICE swss#orchagent: :- bindAclTable: Bind table mclag to ports
Sep 11 18:56:34.283664 QFX5200-SONiC-SW1 NOTICE swss#orchagent: :- createBindAclTableGroup: Create ingress ACL table group and bind port Ethernet68 to it
Sep 11 18:56:34.284019 QFX5200-SONiC-SW1 NOTICE swss#orchagent: :- bind: Successfully bound port oid: 100000000001d, group member oid:c0000000007f5
Sep 11 18:56:34.284019 QFX5200-SONiC-SW1 NOTICE swss#orchagent: :- addAclTable: Created ACL table mclag oid:70000000007f3
Sep 11 18:56:34.284383 QFX5200-SONiC-SW1 NOTICE swss#orchagent: :- set: setting attribute 0x10000004 status: SAI_STATUS_SUCCESS
Sep 11 18:56:34.285803 QFX5200-SONiC-SW1 ERR syncd#syncd: [none] _brcm_sai_create_acl_table:5334 OUT PORTS not supported on this platform.
Sep 11 18:56:34.285881 QFX5200-SONiC-SW1 NOTICE swss#orchagent: :- add: Successfully created ACL rule mclag in table mclag
Sep 11 18:56:34.285968 QFX5200-SONiC-SW1 ERR syncd#syncd: [none] brcm_sai_create_acl_table:109 create table entry failed with error -327680.
Sep 11 18:56:34.286248 QFX5200-SONiC-SW1 ERR syncd#syncd: :- run: Runtime error: :- processQuadEvent: failed to execute api: create, key: SAI_OBJECT_TYPE_ACL_TABLE:oid:0x70000000007f3, status: SAI_STATUS_ATTR_NOT_SUPPORTED_0
Sep 11 18:56:34.286248 QFX5200-SONiC-SW1 NOTICE syncd#syncd: :- sendShutdownRequest: sending switch_shutdown_request notification to OA for switch: oid:0x21000000000000
Sep 11 18:56:34.286295 QFX5200-SONiC-SW1 NOTICE syncd#syncd: :- sendShutdownRequestAfterException: notification send successfull
//jianjun: The above information indicates that your device is using Broadcom chip, but the chip does not support installing ACL with attribute OUT_PORTS, and SAI return failure, causing syncd to hang, and then notifying orchagent to suspend. In order to solve this problem, we need Broadcom SAI support with OUT_PORTS attribute of ACL. If the OUT_PORTS attribute cannot be supported currently, the temporary solution is not to install the ACL. You can remove following codes in function update_peerlink_isolate_from_all_csm_lif().
/if (sys->sync_fd)
write(sys->sync_fd, msg_buf, msg_hdr->len);
/

return;
This modification may result in CE receiving duplicated traffic of BUM.

@ciju-juniper
Copy link
Contributor

@jianjundong Which are the ASIC platforms currently supported with mclag?

In this platform, we have a Broadcom TH1 (Tomahawk-1) asic. Who can we ask about the SAI support for OUT_PORTS attribute of ACL? We would like to know if this can be fixed in the SAI layer or hardware limitation.

@jianjundong
Copy link
Contributor

@jianjundong Which are the ASIC platforms currently supported with mclag?

In this platform, we have a Broadcom TH1 (Tomahawk-1) asic. Who can we ask about the SAI support for OUT_PORTS attribute of ACL? We would like to know if this can be fixed in the SAI layer or hardware limitation.

@ciju-juniper
The ASIC platforms of nephos supported with mclag.

If the OUT_PORTS attribute cannot be supported currently, the temporary solution is not to install the ACL. You can remove following codes in function update_peerlink_isolate_from_all_csm_lif().
//if (sys->sync_fd)
//write(sys->sync_fd, msg_buf, msg_hdr->len);

return;
This modification may result in CE receiving duplicated traffic of BUM. It will not have a bad impact on the test results.

@ciju-juniper
Copy link
Contributor

@smaheshm Would you know if Broadcom SAI supports 'OUT_PORTS' attribute? Who would be the point of contact for this query? Could you add him to this thread?

@davidfordouce
Copy link

I've hit similar problems, and concluded that OUT_PORTS is not supported or supportable by Broadcom, which is why there is an enhanced version under development. sonic-net/sonic-swss#810 details the initial discussion when this was committed.

I suspect therefore that getting the MC-Lag enhancements merged is the solution to this issue.

@ciju-juniper
Copy link
Contributor

@davidfordouce Thanks for the pointer. Would you know if Broadcom chips can workaround this as mentioned in sonic-net/sonic-swss#810?

@adyeung @prsunny @shine4chen @lguohan Would you know the configuration to make mclag work on Broadcom chips?

@davidfordouce
Copy link

My understanding (which is based on sonic-net/SONiC#550 and https://github.com/Azure/SONiC/wiki/Release-Progress-Tracking-202012 is that not only will Broadcom chips support this, but code is awaiting review to do so, written by Broadcom

I did attempt to build a private image using that code, however sadly I lack sufficient knowledge of the SONIC codebase to rebase they various PRs successfully.

@Praveen-Brcm
Copy link
Contributor

@davidfordouce Thanks for the pointer. Would you know if Broadcom chips can workaround this as mentioned in Azure/sonic-swss#810?

@adyeung @prsunny @shine4chen @lguohan Would you know the configuration to make mclag work on Broadcom chips?
@ciju-juniper @BaluAlluru @jianjundong and All,
What version of the SAI is being used .?
The ACL attribute "OUT_PORTS" is not supported by Broadcom, Broadcom uses isolation_groups to achieve blocking the traffic received on peer_link to MCLAG PO.
Code to stupport the isolation groups is submitted for review, below are the related PR's.
HLD: @ sonic-net/SONiC#596
Code PR's:
#4819
sonic-net/sonic-swss#1331
sonic-net/sonic-swss#1349
Thanks,

@ciju-juniper
Copy link
Contributor

@Praveen-Brcm Thank you for the details.

As I look at the PRs, it's not in a state to be committed. Do you have any time frame in mind for making it available in the master branch?

SAI version is the one which is available in the master branch as of now: BRCM_SAI = libsaibcm_3.7.5.1-3_amd64.deb

@carrierone
Copy link

It's been quite a few months since this was opened and iccpd is still masked in the repository by default, @davidfordouce @ciju-juniper @jianjundong are there any recent improvements to mclag/iccpd and is broadcom support working?

@ciju-juniper
Copy link
Contributor

@carrierone AFAIK iccpd support is not yet enabled for Broadcom platforms.
@ben-gale Please let us know the latest status.

@adyeung
Copy link
Collaborator

adyeung commented May 17, 2021 via email

@haslersn
Copy link

Is this still the case with the latest release (I think 202106)?

@BaluAlluru @adyeung @Praveen-Brcm

@Praveen-Brcm
Copy link
Contributor

Is this still the case with the latest release (I think 202106)?

@BaluAlluru @adyeung @Praveen-Brcm

@haslersn : ICCPd is not included by default to be build. The file rules/config needs to be modified to include.
Set INCLUDE_ICCPD = n to INCLUDE_ICCPD = y.
or please consider using the systemctl command to start ICCPD as initially mentioned by jianjundong at start of the conversation thread.
Thanks.

@haslersn
Copy link

haslersn commented Nov 16, 2021

@Praveen-Brcm Thank you. Should the systemctl command be available in the default image (sonic-aboot-broadcom.swi)? I get the following output:

$ sudo systemctl unmask iccpd
Unit iccpd.service does not exist, proceeding anyway.

@jianjundong
Copy link
Contributor

@haslersn You should build the image by making INCLUDE_ICCPD = y in rules/config file.

@ccy
Copy link

ccy commented Sep 16, 2023

The proper way to activate iccpd.service are:

  • Build sonic image with INCLUDE_ICCPD = y
  • Activate iccpd in SONIC runtime with config feature state iccpd enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants