Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FRR]: Update frr to latest 7.2.1-s3 #4294

Merged
merged 4 commits into from
Apr 1, 2020

Conversation

pavel-shirshov
Copy link
Contributor

I suggest to do extra tests to avoid situation we had with:
#4145 and #4170 The tests were ok, but then the #4145 was reversed.

- What I did

  • Updated to latest frr 7.2.1 from the master.
  • Updated patches accordingly

- How I did it
I fetched changes from the master repo of frr.

- How to verify it
Build an image and run tests on the image.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@daall
Copy link
Contributor

daall commented Mar 20, 2020

Still seeing Zebra crash on the virtual switch with this change:

Mar 20 18:15:16.482422 951bd2e723da INFO #supervisord: zebra core_handler: showing active allocations in memory group libfrr
Mar 20 18:15:16.482593 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Buffer                        :      1 *         24
Mar 20 18:15:16.482605 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Host config                   :      2 * (variably sized)
Mar 20 18:15:16.482614 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Command Tokens                :   3861 *         72
Mar 20 18:15:16.482624 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Command Token Text            :   2825 * (variably sized)
Mar 20 18:15:16.482633 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Command Token Help            :   2825 * (variably sized)
Mar 20 18:15:16.482641 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Command Argument Name         :    888 * (variably sized)
Mar 20 18:15:16.482651 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  RCU thread                    :      2 *        128
Mar 20 18:15:16.482675 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  FRR POSIX Thread              :      4 * (variably sized)
Mar 20 18:15:16.482688 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  POSIX sync primitives         :      4 * (variably sized)
Mar 20 18:15:16.482699 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Graph                         :     26 *          8
Mar 20 18:15:16.482709 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Graph Node                    :   4519 *         32
Mar 20 18:15:16.482720 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Hash                          :    282 * (variably sized)
Mar 20 18:15:16.482730 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Hash Bucket                   :    826 *         32
Mar 20 18:15:16.482740 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Hash Index                    :    141 * (variably sized)
Mar 20 18:15:16.482751 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Hook entry                    :     21 *         48
Mar 20 18:15:16.482771 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Interface                     :     68 *        264
Mar 20 18:15:16.482789 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Connected                     :      2 *         48
Mar 20 18:15:16.482800 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Link List                     :    356 *         40
Mar 20 18:15:16.482811 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Link Node                     :    166 *         24
Mar 20 18:15:16.482821 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Logging                       :      1 *         72
Mar 20 18:15:16.482831 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Module loading name           :      1 *          4
Mar 20 18:15:16.482842 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Nexthop                       :      5 *        112
Mar 20 18:15:16.482852 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  NetNS Context                 :      2 * (variably sized)
Mar 20 18:15:16.482861 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  NetNS Name                    :      1 *         18
Mar 20 18:15:16.482881 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Northbound Node               :      5 *        392
Mar 20 18:15:16.482954 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Northbound Configuration      :      2 *         72
Mar 20 18:15:16.482965 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Prefix                        :      2 *         48
Mar 20 18:15:16.482974 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Privilege information         :      3 * (variably sized)
Mar 20 18:15:16.482984 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Stream                        :      4 * (variably sized)
Mar 20 18:15:16.482994 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Stream FIFO                   :      2 *         64
Mar 20 18:15:16.483005 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Route table                   :     79 *         56
Mar 20 18:15:16.483014 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Route node                    :    149 * (variably sized)
Mar 20 18:15:16.483030 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Thread                        :     20 *        168
Mar 20 18:15:16.483051 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Thread master                 :     11 * (variably sized)
Mar 20 18:15:16.483062 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Thread Poll Info              :      6 *    8388608
Mar 20 18:15:16.483073 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Thread stats                  :     20 *         72
Mar 20 18:15:16.483083 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Typed-hash bucket             :      7 * (variably sized)
Mar 20 18:15:16.483092 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Typed-heap array              :      1 * (variably sized)
Mar 20 18:15:16.483102 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Vector                        :   9095 *         16
Mar 20 18:15:16.483112 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Vector index                  :   9095 * (variably sized)
Mar 20 18:15:16.483121 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  VRF                           :      1 *        200
Mar 20 18:15:16.483137 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  VRF bit-map                   :      1 *          8
Mar 20 18:15:16.483157 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Work queue item               :      1 *         24
Mar 20 18:15:16.483226 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Work queue name string        :      1 *         22
Mar 20 18:15:16.483237 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  YANG module                   :      1 *         48
Mar 20 18:15:16.483246 951bd2e723da INFO #supervisord: zebra core_handler: showing active allocations in memory group Label Manager
Mar 20 18:15:16.483256 951bd2e723da INFO #supervisord: zebra core_handler: showing active allocations in memory group zebra
Mar 20 18:15:16.483267 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Zebra Interface Information   :     68 *        344
Mar 20 18:15:16.483277 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Route Entry                   :      5 *         88
Mar 20 18:15:16.483288 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  RIB destination               :      8 *         88
Mar 20 18:15:16.483339 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Zebra DPlane Provider         :      1 *        232
Mar 20 18:15:16.483351 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  Zebra Name Space              :      5 * (variably sized)
Mar 20 18:15:16.483360 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  RIB table info                :      4 *         16
Mar 20 18:15:16.483441 951bd2e723da INFO #supervisord: zebra core_handler: memstats:  ZEBRA VRF                     :      1 *       4616
Mar 20 18:15:16.483456 951bd2e723da INFO #supervisord: zebra core_handler: showing active allocations in memory group Table Manager
Mar 20 18:15:24.046562 951bd2e723da INFO #supervisord 2020-03-20 18:15:16,623 INFO exited: zebra (terminated by SIGABRT (core dumped); not expected)

This happens right after we try to delete a VRF in the test: https://github.com/Azure/sonic-swss/blob/29dc62c0840913992541bde83de1bed70b24a63e/tests/test_interface.py#L461

@prsunny there's an email thread about a similar issue to this, right?

@prsunny
Copy link
Contributor

prsunny commented Mar 20, 2020

yes, unfortunately, i see this similar crash on 201911 image also.

@pavel-shirshov
Copy link
Contributor Author

Hi Danny and Prince,

Thank you for your review. Can you please share where I can find such crashes? How can I reproduce?

Thanks

@pavel-shirshov
Copy link
Contributor Author

Closing for now until I found a solution

@daall
Copy link
Contributor

daall commented Mar 20, 2020

Sure Pavel! Can you try following these directions here to run the vs tests locally: https://github.com/Azure/sonic-swss/tree/master/tests

I'm not sure which 201911 image Prince was using, but I downloaded the docker-sonic-vs.gz from the PR build here: https://sonic-jenkins.westus2.cloudapp.azure.com/job/vs/job/buildimage-vs-all-pr/2726/

And then ran pytest -s -vv --keeptb test_interface.py. The --keeptb flag should keep a copy of the docker that was used to run the tests so that you can take a look at all the logs and cores on the device. Let me know if you need any help getting set up!

@pavel-shirshov
Copy link
Contributor Author

Thank you Danny for you help. I'm checking that. Why we wouldn't run this test during our PR CI?

@daall
Copy link
Contributor

daall commented Mar 23, 2020

Why we wouldn't run this test during our PR CI?

I brought this up with @lguohan last week and I think we're trying to implement a few more measures to stabilize these tests before adding them to buildimage. They're running in swss and utilities at the moment and we still occasionally hit issues that block PRs in those repos for extended periods of time. We're trying to avoid that for buildimage since the volume of PRs is quite a bit higher.

(Admittedly, buildimage PRs are one of those issues that we're hitting occasionally, so we're trying to get these tests incorporated into this repo sooner rather than later.)

@@ -1,6 +1,4 @@
0001-Add-support-of-bgp-tcp-DSCP-value.patch
0002-Reduce-severity-of-Vty-connected-from-message.patch
0003-Use-vrf_id-for-vrf-not-tabled_id.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch is required. Otherwise fpmsyncd will break at https://github.com/Azure/sonic-swss/blob/master/fpmsyncd/routesync.cpp#L46

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny Yes. I understand. I need to remove it, to build an image and run the tests.

Copy link
Contributor

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comment

@pavel-shirshov
Copy link
Contributor Author

retest default please

@pavel-shirshov
Copy link
Contributor Author

retest mellanox please

@pavel-shirshov
Copy link
Contributor Author

retest default please

1 similar comment
@pavel-shirshov
Copy link
Contributor Author

retest default please

@pavel-shirshov
Copy link
Contributor Author

retest broadcom please

@lguohan
Copy link
Collaborator

lguohan commented Mar 29, 2020

is this pr safe to merge now?

abdosi pushed a commit that referenced this pull request May 20, 2020
- Updated to latest frr 7.2.1 from the master.
- Updated patches accordingly
bbinxie added a commit to SONIC-DEV/sonic-buildimage that referenced this pull request May 22, 2020
* [201911][devices] skip_fancontrol for wedge 100 barefoot platforms (sonic-net#4528)

* [device] DellEMC s5232f  50G hwsku support (sonic-net#4525)

* [device] DellEmc S5232 support for new hwsku C8D48
8 100G ports and 48 50G ports

* 10G ports update for S5232 hwsku-C8D48

Signed-off-by: Srideep Devireddy <srideep_devireddy@dell.com>

* DellEMC S6000 updated sensors.conf (sonic-net#4568)

Change PSU MAX temperature to 80 degree
Change tmp75 sensors default temperature value from 25/50 to 70/80 degree.

* [sonic-slave-stretch]: install same version for docker-ce and docker-ce-cli

difference versions can cause compatibility issue between the server and client

Signed-off-by: Guohan Lu <lguohan@gmail.com>

* [baseimage]: install same version for docker-ce and docker-ce-cli

Signed-off-by: Guohan Lu <lguohan@gmail.com>

* [FRR]: Update frr to latest 7.2.1-s3 (sonic-net#4294)

- Updated to latest frr 7.2.1 from the master.
- Updated patches accordingly

* [sonic-buildimage] updated minigraph for ACL Table data and ACL Interface Binding for Multi-NPU platforms (sonic-net#4491)

* [sonic-buildimage] updated minigraph for ACL Table data and ACL Interface
binding update for multu-npu platform based on subrole as "Frontend" or
"Backend". For backend npu no ACL table is associated. For frontend npu
only front-panel interface are associated.

Updated with test case and fix typo in sample-mingraph for npu

Address Review comments

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fixed the logic as per preview comment. Interface Filter logic
only applies to Everflow/Mirror tables.

* Address Review Comments.

* Changes for LLDP docker to support multi-npu platforms (sonic-net#4530)

* Changes for LLDP for Multi NPU Platoforms:-
a) Enable LLDP for Host namespace for Management Port
b) Make sure Management IP is avaliable in per asic namespace
   needed for LLDP Chassis configuration
c) Make sure chassis mac-address is correct in per asic namespace
d) Do not run lldp on eth0 of per asic namespace and avoid chassis
   configuration for same
e) Use Linux hostname instead from Device Metadata for lldp chassis
   configuration since in multi-npu platforms device metadata hostname
   will be differnt

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comment with following changes:
a) Use Device Metadata hostname even in per namespace conatiner.
   updated minigraph parsing for same to have hostname as system
   hostname and add new key for asic name

b) Minigraph changes to have MGMT_INTERFACE Key in per asic/namespace
   config also as needed for LLDP for setting chassis management IP.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments

* Moved utility functions for multi-npu platforms from sonic-utilities to sonic_device_util.py (sonic-net#4559)

* Moved utility functions for multi-npu platforms from
sonic-utilities config/main.py to here so that they can be used
any module

* Fix the issue with test run during compilation with acl-uploader
PR#908 of sonic-utilities.

* Fix get_num_npu as it was retuning string and not int

* Address Review Comments

* Address Review Comments

* Fix for issue where image is compile with flag ENABLE_DHCP_GRAPH_SERVICE (sonic-net#4573)

and then we load image and reboot even if there was existing
config_db.json we will look for DHCP Service. we should disbale
update_graph in such cases. This behaviour is silimar to what we have in
201811 image.

* Change to enable redistribute connected on Frontend asics instead of backend asics (sonic-net#4588)

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>

* [DellEMC] S6000 Disable Low power mode by default (sonic-net#4592)

* [BFN] Updated Barefoot SDK to 2020-05-07 (sonic-net#4566)

Signed-off-by: Andriy Kokhan <akokhan@barefootnetworks.com>

* [minigraph] Add tags for egress mirror tables (sonic-net#4526)

Signed-off-by: Danny Allen <daall@microsoft.com>

* [Submodule update] sonic-utlities with PR's
[201911][show] Fix abbreviations for 'show ip bgp ...' commands (sonic-net#909)
Changes to support acl-loader and mirror-session config commands for
multi-npu platforms. (sonic-net#908)
Changes to commands  config reload/load-minigraph (sonic-net#919)
Stop/Start restapi server upon config reload (sonic-net#911)
[config] Add 'interface transceiver' subgroup with 'lpmode' and
'reset' subcommands (sonic-net#904)

* [minigraph] Support FECDisabled in minigraph parser (sonic-net#4556) (sonic-net#4624)

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>

* [ntp] enable/disable NTP long jump according to reboot type (sonic-net#4577)

* [ntp] enable/disable NTP long jump according to reboot type

- Enable NTP long jump after cold reboot.
- Disable NTP long jump after warrm/fast reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* fix typo

* further refactoring

* use sonic-db-cli instead

* [arista]: remove the soc property disabling sram scan (sonic-net#4623)

* Changes to support config-setup service for multi-npu (sonic-net#4609)

* Changes to support config-setup service for multi-npu
platforms. For Multi-npu we are not supporting as of
now config initializtion and ZTP. It will support creating
config db from minigraph or using  config db from previous
file system

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments.

* Address Review comments

* Address Review Comments of using pyhton based config load_minigraph/
config save/config reload from shell scripts so that we don't duplicate
code. Also while running from shell we will skip stop/start services
done by those commands.

* Updated to use python command so no code duplication.

* [config]: Fix the device  type and internal bgp session status for multi NPU platforms (sonic-net#4600)

* The following changes for multi-npu platforms are done
- Set the type in device_metadata for asic configuration to be same as host
- Set the admin-status of internal bgp sessions as up
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>

* Adding new BGP peer groups PEER_V4_INT and PEER_V6_INT.  (sonic-net#4620)

* Adding new BGP peer groups PEER_V4_INT and PEER_V6_INT. The internal BGP sessions
will be added to this peer group while the external BGP sessions will be added
to the exising PEER_V4 and PEER_V6 peer group.

* Check for "ASIC" keyword in the hostname to identify the internal neighbors.

* [submodule update] sonic-swss with PR
 [vnet] Fix IP2ME route creation logic for BITMAP VNET interface (sonic-net#1284)

* [submodule update] sonic-util
 Revert "[config] Add 'interface transceiver' subgroup with 'lpmode' and
 'reset' subcommands (sonic-net#904)"
  Multi-asic changes for config bgp commands and utilities. (sonic-net#910)

* [submodule update] sonic-rest API's
PR#39  Setup module versioning
Add support for get all Vlans (#37)

* Update golang version for 1.11.5 to 1.14.2 (sonic-net#4520)

Co-authored-by: Myron Sosyak <49795530+msosyak@users.noreply.github.com>
Co-authored-by: Srideep <srideep_devireddy@dell.com>
Co-authored-by: paavaanan <paavaanan_t_n@dell.com>
Co-authored-by: Guohan Lu <lguohan@gmail.com>
Co-authored-by: pavel-shirshov <pavelsh@microsoft.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
Co-authored-by: arlakshm <55814491+arlakshm@users.noreply.github.com>
Co-authored-by: Santhosh Kumar T <53558409+santhosh-kt@users.noreply.github.com>
Co-authored-by: Andriy Kokhan <43479230+akokhan@users.noreply.github.com>
Co-authored-by: Danny Allen <daall@microsoft.com>
Co-authored-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>
Co-authored-by: Samuel Angebault <staphylo@arista.com>
Co-authored-by: judyjoseph <53951155+judyjoseph@users.noreply.github.com>
@tylerlinp
Copy link
Contributor

@pavel-shirshov @lguohan @prsunny In branch 201911, #4294 is added on #4145, so patch 0005 is missing, and even in frr-7.2.1-s3, the vrf deleting problem is not fixed thoroughly. While in master, #4294 is based on #4170 (revert 4145), that is ok. I think frr-7.2.1-s1/frr-7.2.1-s3 are both fine with patch 0005, but frr-7.2.1-s2 introduced another bug.

@lguohan
Copy link
Collaborator

lguohan commented May 23, 2020

@tylerlinp , what is your suggestion? so we need patch 0005 in the 201911 release? should I cherry-pick this commit into 201911 release? 7c65f8c

@pavel-shirshov , what is your opinion?

@tylerlinp
Copy link
Contributor

@lguohan Yes, we need patch 0005, maybe it is the best way to cherry-pick 7c65f8c into 201911 release.

@pavel-shirshov
Copy link
Contributor Author

@lguohan
As we discussed before it's better to have the same FRR in 201911 as we have in the master now.

@lguohan
Copy link
Collaborator

lguohan commented May 27, 2020

ok. I mark 7c65f8c to be included in 201911 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants