Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WarmRestart class changes #600

Merged
merged 6 commits into from
Sep 25, 2018

Conversation

rodnymolina
Copy link
Contributor

The current WarmStart class assumes that restarting applications will invoke checkWarmStart() method during initialization. Currently, this method serves two main purposes: 1) a static-constructor and 2) checks if WarmRestart feature has been enabled for a given application.

In this PR i'm proposing to split these different logics in two separated methods. The main goal here is to allow applications such as fpmSyncd to call checkWarmStart whenever it please (i.e. fpmSyncd restart event or zebra<->fpmSyncd tcp re-establishment). There could be other applications in the future interested in this differentiated approach.

Other than that i'm making small adjustments in the code to enhance readibility.

Signed-off-by: Rodny Molina rmolina@linkedin.com

Copy link
Contributor

@jipanyang jipanyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current bool WarmStart::checkWarmStart()
"Check warm start flag at the very begining of application, do it once for each process."
One of the key functions is to record the warm restart count, that is why it should be called once only for each application process.

Once the process is up, "WarmStart::isWarmStart()" could be used to check whether warm start has been enabled.

are you trying to address the scenario that fpmsyncd is cold started, then enable warm start to deal with state restore for zebra<->fpmSyncd tcp re-establishment? Probably that specific case should be handled separately.

warmrestart/warm_restart.cpp Show resolved Hide resolved
@rodnymolina
Copy link
Contributor Author

Jipan, by having checkWarmStart() being called only once, we are limiting the scenarios in which Warm-Restart logic can be utilized. In scenarios such as zebra<->fpmSyncd tcp re-establishment, we cannot rely on isWarmStart() method.

Btw, fpmSyncd may not be the only app interested in this behavior, as every other north-app has an fpmSyncd equivalent.

We can dicuss this offline.

Copy link
Contributor

@jipanyang jipanyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the original definition of restart_count needs to be changed to reflect the extra scope.

    restart_count   = 1*10DIGIT                               ; a number between 0 and 2147483647,
                                                              ; count of warm start times.

Other than fpmsyncd, I don't see other applications have the need to resync the state while being up.
Would it be simpler to just restart fpmsyncd in case of tcp connection lost? So we could focus on one scenario in stead of multiple different scenarios with the same goal of state restore/reconcile.

rodnymolina pushed a commit to rodnymolina/sonic-swss that referenced this pull request Aug 29, 2018
[ dependencies with PR/600: sonic-net#600 ]

Please refer to the corresponding design document for more details: sonic-net/SONiC#239

The following manual UT plan has been successfully executed. UT automation pending.

Physical topology formed by various BGP peers connecting to the DUT. Both non-ecmp and ecmp prefixes are learned/tested.

Testcase Suite 1: Making use of “pkill -9 bgpd && pkill -9 zebra” as trigger.
=============

IPv4 prefixes
==========

    * Restart zebra/bgpd:
            - Verify that all prefixes are properly stale-marked and that no
    change is pushed to AppDB during reconciliation.
            - Result: PASSED

    * Restart zebra/bgpd and add one new non-ecmp IPv4 prefix.
            - To produce a route-state-inconsistency, add prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that new-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and withdraw one non-ecmp IPv4 prefix.
            - To produce a route-state-inconsistency, remove prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that deleted-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and add one new path to an IPv4 ecmp-prefix.
            - To produce a route-state-inconsistency, add prefix-path in
    adjacent node before bgp sessions are re-established.
            - Verify that new prefix-path msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and withdraw one ecmp-path from an existing ecmp IPv4
      prefix.
              - To produce a route-state-inconsistency, remove prefix-path in
      adjacent node before bgp sessions are re-established.
              - Verify that deleted-prefix-path msg is NOT pushed down to AppDB
      till reconciliation takes place.
              - Eventually, during reconciliation, this path will be eliminated
      as it’s not being refreshed by remote-peer.
              - Result: PASSED

IPv6 prefixes
==========

    * Restart zebra/bgpd:
            - Verify that all prefixes are properly stale-marked and that no
    change is pushed to AppDB during reconciliation.
            - Result: PASSED

    * Restart zebra/bgpd and add one new non-ecmp IPv6 prefix.
            - To produce a route-state-inconsistency, add prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that new-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and withdraw one non-ecmp IPv6 prefix.
            - To produce a route-state-inconsistency, remove prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that deleted-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and add one new path to an IPv6 ecmp-prefix.
            - To produce a route-state-inconsistency, add prefix-path in
    adjacent node before bgp sessions are re-established.
            - Verify that new prefix-path msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and withdraw one ecmp-path from an existing ecmp IPv6
      prefix.
              - To produce a route-state-inconsistency, remove prefix-path in
      adjacent node before bgp sessions are re-established.
              - Verify that deleted-prefix-path msg is NOT pushed down to AppDB
      till reconciliation takes place.
              - Eventually, during reconciliation, this path will be eliminated
      as it’s not being refreshed by remote-peer.
              - Result: PASSED

Testcase Suite 2: Making use of sudo service bgp restart” as trigger.
=============

IPv4 prefixes
==========

    * Restart bgp service/docker:
            - Verify that all prefixes are properly stale-marked and that no
    change is pushed to AppDB during reconciliation.
            - Result: PASSED

    * Restart bgp service/docker and add one new non-ecmp IPv4 prefix.
            - To produce a route-state-inconsistency, add prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that new-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and withdraw one non-ecmp IPv4 prefix.
            - To produce a route-state-inconsistency, remove prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that deleted-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and add one new path to an IPv4 ecmp-prefix.
            - To produce a route-state-inconsistency, add prefix-path in
    adjacent node before bgp sessions are re-established.
            - Verify that new prefix-path msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and withdraw one ecmp-path from an existing
      ecmp IPv4 prefix.
              - To produce a route-state-inconsistency, remove prefix-path in
      adjacent node before bgp sessions are re-established.
              - Verify that deleted-prefix-path msg is NOT pushed down to AppDB
      till reconciliation takes place.
              - Eventually, during reconciliation, this path will be eliminated
      as it’s not being refreshed by remote-peer.
              - Result: PASSED

IPv6 prefixes
==========

    * Restart bgp service/docker:
            - Verify that all prefixes are properly stale-marked and that no
    change is pushed to AppDB during reconciliation.
            - Result: PASSED

    * Restart bgp service/docker and add one new non-ecmp IPv6 prefix.
            - To produce a route-state-inconsistency, add prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that new-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and withdraw one non-ecmp IPv6 prefix.
            - To produce a route-state-inconsistency, remove prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that deleted-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and add one new path to an IPv6 ecmp-prefix.
            - To produce a route-state-inconsistency, add prefix-path in
    adjacent node before bgp sessions are re-established.
            - Verify that new prefix-path msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and withdraw one ecmp-path from an existing
      ecmp IPv6 prefix.
              - To produce a route-state-inconsistency, remove prefix-path in
      adjacent node before bgp sessions are re-established.
              - Verify that deleted-prefix-path msg is NOT pushed down to AppDB
      till reconciliation takes place.
              - Eventually, during reconciliation, this path will be eliminated
      as it’s not being refreshed by remote-peer.
              - Result: PASSED

Signed-off-by: Rodny Molina <rmolina@linkedin.com>
@lguohan
Copy link
Contributor

lguohan commented Aug 30, 2018

test failure

@lguohan
Copy link
Contributor

lguohan commented Aug 30, 2018

what is the scenario for zebra <-> fpmsyncd re-establish? they are in the same docker, should be restarted as the same time.

@rodnymolina
Copy link
Contributor Author

rodnymolina commented Aug 31, 2018

@lguohan i'm adding certain level of flexibility in my routing warm-reboot code. This logic is dealing with all these scenarios without needing to add any extra complexity: 1) routing-stack crash, or 2) fpmsyncd crash, or 3) docker restart. If we have a process-manager within the docker container (supervisord or watchfrr) that detects the crash and re-instantiate the process, then there's no need to restart the entire bgp-service from the host side.

@rodnymolina rodnymolina force-pushed the warmstart_class_changes branch 3 times, most recently from c30f2f1 to f9be17a Compare September 6, 2018 01:41
Copy link
Contributor

@jipanyang jipanyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@rodnymolina rodnymolina force-pushed the warmstart_class_changes branch from 686484b to dc7b2a8 Compare September 11, 2018 21:15
@rodnymolina
Copy link
Contributor Author

The failing test-case (VXLAN) is not related to any of my changes.

@lguohan
Copy link
Contributor

lguohan commented Sep 16, 2018

retest this please

Rodny Molina added 5 commits September 24, 2018 11:51
The current WarmStart class assumes that restarting applications will invoke checkWarmStart() method during initialization. Currently, this method serves two main purposes: 1) a static-constructor and 2) checks if WarmRestart feature has been enabled for a given application.

In this PR i'm proposing to split these different logics in two separated methods. The main goal here is to allow applications such as fpmSyncd to call checkWarmStart whenever it please (i.e. fpmSyncd restart or zebra<->fpmSyncd tcp re-establishment). There could be other applications in the future interested in this differentiated approch.

Other than that i'm making small adjustments in the code to enhance readibility.

Signed-off-by: Rodny Molina <rmolina@linkedin.com>
Signed-off-by: Rodny Molina <rmolina@linkedin.com>
Signed-off-by: Rodny Molina <rmolina@linkedin.com>
Signed-off-by: Rodny Molina <rmolina@linkedin.com>
…tate.

Signed-off-by: Rodny Molina <rmolina@linkedin.com>
@rodnymolina rodnymolina force-pushed the warmstart_class_changes branch from dc7b2a8 to 088ecc8 Compare September 24, 2018 19:18
Signed-off-by: Rodny Molina <rmolina@linkedin.com>
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
…uration ports" (sonic-net#600)

Signed-off-by: Vasant Patil <vapatil@linkedin.com>
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants