Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loopback Interface changes for multi ASIC devices #4825

Merged
merged 4 commits into from
Jul 2, 2020

Conversation

arlakshm
Copy link
Contributor

@arlakshm arlakshm commented Jun 22, 2020

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com

- Why I did it
multi ASIC will have 2 Loopback Interfaces

  • Loopback0 has globally unique IP address, which is advertised by the multi ASIC device to its peers.
    This way all the external devices will see this device as a single device.

  • Loopback4096 is assigned an IP address which has a scope is within the device. Each ASIC has a different ip address for Loopback4096. This ip address will be used as Router-Id by the bgp instance on multi ASIC devices.

This PR implements this change for multi ASIC devices

- How I did it
The following changes are done

  • Update the minigraph parser to update the ASIC configuration with the host/device Loopback address
  • Update the bgp template to use Loopback1 as router-id and advertise the Loopback4096 address only to internal neighbors if its multi_npu device.
  • Add the route-map HIDE_INTERNAL when advertising the Loopback4096 ip address
  • Update the sonic-cfggen to pass the is_multi_npu function as enviornment global while rendering this templates.
  • Update the sonic_device_util.py to get_num_npus both on the host and inside the containers.
  • Added Unit tests for generating Loopback interface configurations for multi ASIC devices

- How to verify it

  • Verify if the Loopback interface and bgp configuration are generated properly.
  • Verify the internal bgp and external bgp sessions are established.

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@@ -8,13 +8,21 @@
! TSA configuration
!
ip prefix-list PL_LoopbackV4 permit {{ get_ipv4_loopback_address(LOOPBACK_INTERFACE, "Loopback0") | ip }}/32
{% if multi_npu() == true %}
ip prefix-list PL_LoopbackV4 permit {{ get_ipv4_loopback_address(LOOPBACK_INTERFACE, "Loopback1") | ip }}/32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are currently discussions on adding multiple loopback interfaces to Sonic and if there is a regular "Loopback1", say got added from previous config_db etc, it could result in some conflicts. My suggestion is, can we have it renamed to something specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny , changed from Loopback1 to Loopback4096 for ASIC's loopback interface

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@@ -859,6 +870,7 @@ def parse_xml(filename, platform=None, port_config_file=None, asic_name=None):
else:
if child.tag == str(QName(ns, "DpgDec")):
(intfs, lo_intfs, mvrf, mgmt_intf, vlans, vlan_members, pcs, pc_members, acls, vni) = parse_dpg(child, asic_name)
host_lo_intfs = parse_host_loopback(child, hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get this change. As far as minigraph is concerned, it is yet another loopback interface, correct? Why do we need to have a special parsing for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get this change. As far as minigraph is concerned, it is yet another loopback interface, correct? Why do we need to have a special parsing for this?

In the multi ASIC minigraph, each ASIC is modeled as different device. The Loopback0 interface is in the DeviceDataPlaneInfo of the device or host, while the Loopback4096 is in the DeviceDataPlaneInfo of the ASIC. Here we are trying to get Loopback0 interface when generating the configuration of the ASIC.

{% if multi_npu() == true %}
{% if get_ipv6_loopback_address(LOOPBACK_INTERFACE, "Loopback4096") != 'None' %}
address-family ipv6
network {{ get_ipv6_loopback_address(LOOPBACK_INTERFACE, "Loopback4096") | ip }}/64 route-map HIDE_INTERNAL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavel-shirshov , do we need to add this in the bgpcfgd unit test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan yes. when it is ready.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we advertising this loopback 4096 to the outside world?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan, the route-map HIDE_INTERNAL will prevent advertising Loopback4096 to the external devices.

prsunny
prsunny previously approved these changes Jun 30, 2020
Copy link
Contributor

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, please check other comments.

Copy link
Contributor

@pavel-shirshov pavel-shirshov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please answer my questions?

@@ -8,13 +8,21 @@
! TSA configuration
!
ip prefix-list PL_LoopbackV4 permit {{ get_ipv4_loopback_address(LOOPBACK_INTERFACE, "Loopback0") | ip }}/32
{% if multi_npu() == true %}
ip prefix-list PL_LoopbackV4 permit {{ get_ipv4_loopback_address(LOOPBACK_INTERFACE, "Loopback4096") | ip }}/32
{% endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavel-shirshov, I think this might be needed to support TSA for the multi-asic platforms.
Let me know if my understanding is wrong. I will remove this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arlakshm I think you're not going to announce your internal loopbacks at all, right? This prefix-list is used to announce loopback only from the device in TSA mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in the latest commit

{% if get_ipv6_loopback_address(LOOPBACK_INTERFACE, "Loopback4096") != 'None' %}
ipv6 prefix-list PL_LoopbackV6 permit {{ get_ipv6_loopback_address(LOOPBACK_INTERFACE, "Loopback4096") | replace('/128', '/64') | ip_network }}/64
{% endif %}
{% endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavel-shirshov, I think this might be needed to support TSA for the multi-asic platforms.
Let me know if my understanding is wrong. I will remove this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in the latest commit

!
!
{% if DEVICE_METADATA['localhost']['sub_role'] == 'FrontEnd' %}
{% if multi_npu() == true %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to write {% if multi_npu() %}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in the latest commit

{% if multi_npu() == true %}
{% if get_ipv6_loopback_address(LOOPBACK_INTERFACE, "Loopback4096") != 'None' %}
address-family ipv6
network {{ get_ipv6_loopback_address(LOOPBACK_INTERFACE, "Loopback4096") | ip }}/64 route-map HIDE_INTERNAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lguohan yes. when it is ready.

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
pavel-shirshov
pavel-shirshov previously approved these changes Jul 1, 2020
Copy link
Contributor

@pavel-shirshov pavel-shirshov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arlakshm
Copy link
Contributor Author

arlakshm commented Jul 1, 2020

retest this please

@lguohan
Copy link
Collaborator

lguohan commented Jul 2, 2020

retest vsimage please

@arlakshm arlakshm merged commit cae65a4 into sonic-net:master Jul 2, 2020
@arlakshm arlakshm deleted the mnpu_loopback_address_chgs branch July 2, 2020 18:37
lguohan added a commit that referenced this pull request Jul 3, 2020
arlakshm added a commit to arlakshm/sonic-buildimage that referenced this pull request Jul 4, 2020
arlakshm added a commit that referenced this pull request Jul 6, 2020
Resubmitting the changes for (#4825) with fixes for sonic-bgpcdgd test failures
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
abdosi pushed a commit that referenced this pull request Jul 8, 2020
Resubmitting the changes for (#4825) with fixes for sonic-bgpcdgd test failures
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
bbinxie added a commit to SONIC-DEV/sonic-buildimage that referenced this pull request Jul 12, 2020
* [brcmsai]: Updated BRCM SAI Debina package to 3.7.5.1-2 (sonic-net#4916)

Fix for Copp Rules not having Policer Rate-Limit applied.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* [nephos]: upgrade Nephos SAI version to c749df (sonic-net#4814)

Verified with Nephos nps8365 based platform Accton AS7116-54x.

* "[config]: Multi ASIC loopback changes (sonic-net#4895)

Resubmitting the changes for (sonic-net#4825) with fixes for sonic-bgpcdgd test failures
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>

* [mellanox]: Support warm reboot on MSN4700 (sonic-net#4910)

* [Submodule Update] Sonic-platform-common
[sfp_base] Update return value documentation of channel-specific methods
(#98)
[SfpBase] Fix key name typo in docstring (#99)
[sfp] Tweak key names of some transceiver info fields (#97)
[sfputil] Make SfpUtilHelper.get_physical_to_logical noexcept as
in SfpUtilBase (#96)

* [Submodule update] sonic-platform-daemons
[xcvrd] Update key names in 'get_media_settings_value()' (#63)
[xcvrd] Tweak some transceiver info key names (#62)

* [Submodule update] sonic-utilities
[sfpshow][mock_state_db] Tweak key names of some transceiver info fields
(sonic-net#958)
[config] Fix syntax error (sonic-net#966)
[config] Fix indentation level in _get_disabled_services_list() (sonic-net#965)

* [Submodule Update] sonic-swss
[aclorch] Use IPv6 Next Header internally for protocol number on MLNX
platform (sonic-net#1343)

* [Submodule Update]
 Add support for attribute capability query in lua script (sonic-net#362)

* Cherry-pick was not clean. Fixing it.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* [telemetry] Call sonic-cfggen Once (sonic-net#4901)

sonic-cfggen call is slow and this is taking place in the SONiC
boot up process. The change uses templates to assemble all required
vars into single template file. With this change, telemetry now calls
once into sonic-cfggen.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [mgmt docker] move pycryptodome installation to the end of the docker building (sonic-net#4917)

* [mgmt docker] move pycryptodome installation to the end of the docker building

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* pin down the version to current: 3.9.8

* comment

* Add support for bcmsh and bcmcmd utlitites in multi ASIC devices (sonic-net#4926)

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
This PR has changes to support accessing the bcmsh and bcmcmd utilities on multi ASIC devices
Changes done
- move the link of /var/run/sswsyncd from docker-syncd-brcm.mk to docker_image_ctl.j2
- update the bcmsh and bcmcmd scripts to take -n [ASIC_ID] as an argument on multi ASIC platforms

* [caclmgrd] Improve code reuse (sonic-net#4931)

Improve code reuse in `generate_block_ip2me_traffic_iptables_commands()` function.

* [Submodule Update] sonic-utilities
Intf table migration for APP_DB entries during warmboot (sonic-net#980)
[Multi NPU] Time Improvements to the config reload/load_minigraph
commands  (sonic-net#917)

* [Submodule Update] sonic-py-swssdk
 [MultiDB]: use python class composition to avoid confusion in base
class (#74)

* [Submodule update] sonic-snmpagent. Movent to 201911 Branch with with
following PR's :
Implement cbgpPeer2State in CiscoBgp4MIB (#119)
Fix index nodes in LLDP tables whose access right is not-accessible.
(#112)
 Fix quagga/FRR parser on IPv6 BGP sessions (#122)
 [lint] Fix some syntax errors or warnings (#127)
  Update README.md: Add lgtm badges (#128)
  [Multi-asic]: Support multi-asic platform (#126)
  Simplify test code (#132)
  [Multi-asic]: Namespace support for LLDP and Sensor tables (#131)
  Fix undefined variable and warning message (#134)
  Fix SNMP AgentX socket connection timeout when using
  Namespace.get_all() (#140)
  [Namespace] Fix interfaces counters in InterfacesMIB RFC 2863 (#141)
   Fix LGTM reported alert of PR#141 (#142)

* [bgpcfgd] - Fix a key error during delete (sonic-net#4946)

Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
Co-authored-by: gracelicd <39251567+gracelicd@users.noreply.github.com>
Co-authored-by: arlakshm <55814491+arlakshm@users.noreply.github.com>
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Co-authored-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>
Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
Co-authored-by: Prince Sunny <prince.sunny@microsoft.com>
qiluo-msft pushed a commit to qiluo-msft/sonic-buildimage that referenced this pull request Jul 12, 2020
…t#4825)

* Loopback IP changes for multi ASIC devices
multi ASIC will have 2 Loopback Interfaces

- Loopback0 has globally unique IP address, which is advertised by the multi ASIC device to its peers.
This way all the external devices will see this device as a single device.
- Loopback4096 is assigned an IP address which has a scope is within the device. Each ASIC has a different ip address for Loopback4096. This ip address will be used as Router-Id by the bgp instance on multi ASIC devices.

This PR implements this change for multi ASIC devices

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
qiluo-msft pushed a commit to qiluo-msft/sonic-buildimage that referenced this pull request Jul 12, 2020
qiluo-msft pushed a commit to qiluo-msft/sonic-buildimage that referenced this pull request Jul 12, 2020
Resubmitting the changes for (sonic-net#4825) with fixes for sonic-bgpcdgd test failures
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
bbinxie added a commit to SONIC-DEV/sonic-buildimage that referenced this pull request Jul 22, 2020
* [brcmsai]: Updated BRCM SAI Debina package to 3.7.5.1-2 (sonic-net#4916)

Fix for Copp Rules not having Policer Rate-Limit applied.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* [nephos]: upgrade Nephos SAI version to c749df (sonic-net#4814)

Verified with Nephos nps8365 based platform Accton AS7116-54x.

* "[config]: Multi ASIC loopback changes (sonic-net#4895)

Resubmitting the changes for (sonic-net#4825) with fixes for sonic-bgpcdgd test failures
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>

* [mellanox]: Support warm reboot on MSN4700 (sonic-net#4910)

* [Submodule Update] Sonic-platform-common
[sfp_base] Update return value documentation of channel-specific methods
(#98)
[SfpBase] Fix key name typo in docstring (#99)
[sfp] Tweak key names of some transceiver info fields (#97)
[sfputil] Make SfpUtilHelper.get_physical_to_logical noexcept as
in SfpUtilBase (#96)

* [Submodule update] sonic-platform-daemons
[xcvrd] Update key names in 'get_media_settings_value()' (#63)
[xcvrd] Tweak some transceiver info key names (#62)

* [Submodule update] sonic-utilities
[sfpshow][mock_state_db] Tweak key names of some transceiver info fields
(sonic-net#958)
[config] Fix syntax error (sonic-net#966)
[config] Fix indentation level in _get_disabled_services_list() (sonic-net#965)

* [Submodule Update] sonic-swss
[aclorch] Use IPv6 Next Header internally for protocol number on MLNX
platform (sonic-net#1343)

* [Submodule Update]
 Add support for attribute capability query in lua script (sonic-net#362)

* Cherry-pick was not clean. Fixing it.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* [telemetry] Call sonic-cfggen Once (sonic-net#4901)

sonic-cfggen call is slow and this is taking place in the SONiC
boot up process. The change uses templates to assemble all required
vars into single template file. With this change, telemetry now calls
once into sonic-cfggen.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

* [mgmt docker] move pycryptodome installation to the end of the docker building (sonic-net#4917)

* [mgmt docker] move pycryptodome installation to the end of the docker building

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* pin down the version to current: 3.9.8

* comment

* Add support for bcmsh and bcmcmd utlitites in multi ASIC devices (sonic-net#4926)

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
This PR has changes to support accessing the bcmsh and bcmcmd utilities on multi ASIC devices
Changes done
- move the link of /var/run/sswsyncd from docker-syncd-brcm.mk to docker_image_ctl.j2
- update the bcmsh and bcmcmd scripts to take -n [ASIC_ID] as an argument on multi ASIC platforms

* [caclmgrd] Improve code reuse (sonic-net#4931)

Improve code reuse in `generate_block_ip2me_traffic_iptables_commands()` function.

* [Submodule Update] sonic-utilities
Intf table migration for APP_DB entries during warmboot (sonic-net#980)
[Multi NPU] Time Improvements to the config reload/load_minigraph
commands  (sonic-net#917)

* [Submodule Update] sonic-py-swssdk
 [MultiDB]: use python class composition to avoid confusion in base
class (#74)

* [Submodule update] sonic-snmpagent. Movent to 201911 Branch with with
following PR's :
Implement cbgpPeer2State in CiscoBgp4MIB (#119)
Fix index nodes in LLDP tables whose access right is not-accessible.
(#112)
 Fix quagga/FRR parser on IPv6 BGP sessions (#122)
 [lint] Fix some syntax errors or warnings (#127)
  Update README.md: Add lgtm badges (#128)
  [Multi-asic]: Support multi-asic platform (#126)
  Simplify test code (#132)
  [Multi-asic]: Namespace support for LLDP and Sensor tables (#131)
  Fix undefined variable and warning message (#134)
  Fix SNMP AgentX socket connection timeout when using
  Namespace.get_all() (#140)
  [Namespace] Fix interfaces counters in InterfacesMIB RFC 2863 (#141)
   Fix LGTM reported alert of PR#141 (#142)

* [bgpcfgd] - Fix a key error during delete (sonic-net#4946)

* [Submodule Update] sonic-utilities
 Fix the None Type Exception when Interface Table does not exist (cold
boot) as part of db migration (sonic-net#986)

* Fix the below frr start.sh jija2 exception in 201911 image syslog: (sonic-net#4958)

File "/usr/local/bin/sonic-cfggen", line 380, in <module>
     main()
   File "/usr/local/bin/sonic-cfggen", line 354, in main
     print(template.render(data))
   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 1090, in render
     self.environment.handle_exception()
   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 832, in handle_exception
     reraise(*rewrite_traceback_stack(source=source))
   File "<template>", line 1, in top-level template code
   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 471, in getattr
     return getattr(obj, attribute)
 jinja2.exceptions.UndefinedError: 'WARM_RESTART' is undefined

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* [Submodule update] sonic-snmpagent
 [201911] Fix interface counters in RFC1213 (#144)

* [docker-ptf] Add support for spytest to ptf container (sonic-net#4410)

- Install apt and pip dependencies
- Define traffic generator service

Signed-off-by: Danny Allen <daall@microsoft.com>

* [arista] update Arista drivers submodules (sonic-net#4967)

Merge most of the changes that recently made it to master.
This will be the last such merge operation and future commits will only cherry-pick fixes and targeted features.

Major fixes and features,
- reboot cause enhancement with more hardware reboot cause reporting
- fix reboot cause parsing issue with 201811 release
- fix get_change_event logic
- fix error message on missing sysfs entry by our plugins
- final piece of the platform refactors for fan and sensor reporting through the platform API

* [201911][devices] Update SFP keys to align with new standard (sonic-net#4976)

Align SFP key names with new standard defined in sonic-net/sonic-platform-common#97

- hardwarerev -> hardware_rev
- serialnum -> serial
- manufacturename -> manufacturer
- modelname -> model
- Connector -> connector

* [201911][sonic-telemetry] Update submodule (sonic-net#4987)

Point submodule to new 201911 branch of sonic-telemetry and update pointer to the current HEAD of the 201911 branch

* src/sonic-telemetry aaa9188...01b5365 (1):
  > [testdata] Update SFP keys to align with new standard (#39)

* [201911][sudoers] Add `sonic_installer list` to read-only commands (sonic-net#4997)

`sonic_installer list` is a read-only command. Specify it as such in the sudoers file.

This will also ensure the new `show boot` command, which calls `sudo sonic_installer list` under the hood doesn't fail due to permissions.

* [caclmgrd] Filter DHCP packets based on dest port only (sonic-net#4995)

Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
Co-authored-by: gracelicd <39251567+gracelicd@users.noreply.github.com>
Co-authored-by: arlakshm <55814491+arlakshm@users.noreply.github.com>
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Co-authored-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>
Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>
Co-authored-by: Prince Sunny <prince.sunny@microsoft.com>
Co-authored-by: Danny Allen <daall@microsoft.com>
Co-authored-by: Samuel Angebault <staphylo@arista.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants