Skip to content

Commit

Permalink
Routing-stack warm-reboot feature. (sonic-net#602)
Browse files Browse the repository at this point in the history
* Routing-stack warm-reboot feature.

Please refer to the corresponding design document for more details: sonic-net/SONiC#239

The following manual UT plan has been successfully executed. UT automation pending.

Physical topology formed by various BGP peers connecting to the DUT. Both non-ecmp and ecmp prefixes are learned/tested.

Testcase Suite 1: Making use of “pkill -9 bgpd && pkill -9 zebra” as trigger.
=============

IPv4 prefixes
==========

    * Restart zebra/bgpd:
            - Verify that all prefixes are properly stale-marked and that no
    change is pushed to AppDB during reconciliation.
            - Result: PASSED

    * Restart zebra/bgpd and add one new non-ecmp IPv4 prefix.
            - To produce a route-state-inconsistency, add prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that new-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and withdraw one non-ecmp IPv4 prefix.
            - To produce a route-state-inconsistency, remove prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that deleted-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and add one new path to an IPv4 ecmp-prefix.
            - To produce a route-state-inconsistency, add prefix-path in
    adjacent node before bgp sessions are re-established.
            - Verify that new prefix-path msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and withdraw one ecmp-path from an existing ecmp IPv4
      prefix.
              - To produce a route-state-inconsistency, remove prefix-path in
      adjacent node before bgp sessions are re-established.
              - Verify that deleted-prefix-path msg is NOT pushed down to AppDB
      till reconciliation takes place.
              - Eventually, during reconciliation, this path will be eliminated
      as it’s not being refreshed by remote-peer.
              - Result: PASSED

IPv6 prefixes
==========

    * Restart zebra/bgpd:
            - Verify that all prefixes are properly stale-marked and that no
    change is pushed to AppDB during reconciliation.
            - Result: PASSED

    * Restart zebra/bgpd and add one new non-ecmp IPv6 prefix.
            - To produce a route-state-inconsistency, add prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that new-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and withdraw one non-ecmp IPv6 prefix.
            - To produce a route-state-inconsistency, remove prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that deleted-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and add one new path to an IPv6 ecmp-prefix.
            - To produce a route-state-inconsistency, add prefix-path in
    adjacent node before bgp sessions are re-established.
            - Verify that new prefix-path msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart zebra/bgpd and withdraw one ecmp-path from an existing ecmp IPv6
      prefix.
              - To produce a route-state-inconsistency, remove prefix-path in
      adjacent node before bgp sessions are re-established.
              - Verify that deleted-prefix-path msg is NOT pushed down to AppDB
      till reconciliation takes place.
              - Eventually, during reconciliation, this path will be eliminated
      as it’s not being refreshed by remote-peer.
              - Result: PASSED

Testcase Suite 2: Making use of sudo service bgp restart” as trigger.
=============

IPv4 prefixes
==========

    * Restart bgp service/docker:
            - Verify that all prefixes are properly stale-marked and that no
    change is pushed to AppDB during reconciliation.
            - Result: PASSED

    * Restart bgp service/docker and add one new non-ecmp IPv4 prefix.
            - To produce a route-state-inconsistency, add prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that new-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and withdraw one non-ecmp IPv4 prefix.
            - To produce a route-state-inconsistency, remove prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that deleted-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and add one new path to an IPv4 ecmp-prefix.
            - To produce a route-state-inconsistency, add prefix-path in
    adjacent node before bgp sessions are re-established.
            - Verify that new prefix-path msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and withdraw one ecmp-path from an existing
      ecmp IPv4 prefix.
              - To produce a route-state-inconsistency, remove prefix-path in
      adjacent node before bgp sessions are re-established.
              - Verify that deleted-prefix-path msg is NOT pushed down to AppDB
      till reconciliation takes place.
              - Eventually, during reconciliation, this path will be eliminated
      as it’s not being refreshed by remote-peer.
              - Result: PASSED

IPv6 prefixes
==========

    * Restart bgp service/docker:
            - Verify that all prefixes are properly stale-marked and that no
    change is pushed to AppDB during reconciliation.
            - Result: PASSED

    * Restart bgp service/docker and add one new non-ecmp IPv6 prefix.
            - To produce a route-state-inconsistency, add prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that new-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and withdraw one non-ecmp IPv6 prefix.
            - To produce a route-state-inconsistency, remove prefix in adjacent
    node before bgp sessions are re-established.
            - Verify that deleted-prefix msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and add one new path to an IPv6 ecmp-prefix.
            - To produce a route-state-inconsistency, add prefix-path in
    adjacent node before bgp sessions are re-established.
            - Verify that new prefix-path msg is NOT pushed down to AppDB till
    reconciliation takes place.
            - Result: PASSED

    * Restart bgp service/docker and withdraw one ecmp-path from an existing
      ecmp IPv6 prefix.
              - To produce a route-state-inconsistency, remove prefix-path in
      adjacent node before bgp sessions are re-established.
              - Verify that deleted-prefix-path msg is NOT pushed down to AppDB
      till reconciliation takes place.
              - Eventually, during reconciliation, this path will be eliminated
      as it’s not being refreshed by remote-peer.
              - Result: PASSED

RB=
G=lnos-reviewers
R=pchaudhary,pmao,rmolina,samaity,sfardeen,zxu
A=

Signed-off-by: Rodny Molina <rmolina@linkedin.com>

* Renaming 'restoration' code and making minor adjustments to fpmsyncd

Signed-off-by: Rodny Molina <rmolina@linkedin.com>

* Eliminating 'state' associated to field-value-tuples

Code has been refactored to reduce the complexity associated to carrying 'state' for every received FV-tuple. Obviously, there's no free-lunch here, this is only possible at the expense of more memory utilization: we are now using two different buffers to store 'old' and 'new' state. Yet, this is a relatively-small price to pay for a much simpler implementation.

Signed-off-by: Rodny Molina <rmolina@linkedin.com>

* Adding UTs to cover routing-warm-reboot logic.

Signed-off-by: Rodny Molina <rmolina@linkedin.com>

* Fixing an issue with warm-reboot UTs that prevented an existing test-case from passing

As part of these changes i'm also modifying the ip-addresses of my UT setup to
avoid clashes with existing/previous testcases.

Signed-off-by: Rodny Molina <rmolina@linkedin.com>

* Making some small adjustments

* Addressing review comments for my UT code

* Addressing more review comments.

* Refactoring UTs to rely on pubsub notifications instead of logs
  • Loading branch information
rodnymolina authored and lguohan committed Nov 10, 2018
1 parent 9fbcb60 commit f380685
Show file tree
Hide file tree
Showing 8 changed files with 1,290 additions and 24 deletions.
5 changes: 2 additions & 3 deletions fpmsyncd/Makefile.am
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
INCLUDES = -I $(top_srcdir) -I $(FPM_PATH)
INCLUDES = -I $(top_srcdir) -I $(top_srcdir)/warmrestart -I $(FPM_PATH)

bin_PROGRAMS = fpmsyncd

Expand All @@ -8,9 +8,8 @@ else
DBGFLAGS = -g
endif

fpmsyncd_SOURCES = fpmsyncd.cpp fpmlink.cpp routesync.cpp
fpmsyncd_SOURCES = fpmsyncd.cpp fpmlink.cpp routesync.cpp $(top_srcdir)/warmrestart/warmRestartHelper.cpp $(top_srcdir)/warmrestart/warmRestartHelper.h

fpmsyncd_CFLAGS = $(DBGFLAGS) $(AM_CFLAGS) $(CFLAGS_COMMON)
fpmsyncd_CPPFLAGS = $(DBGFLAGS) $(AM_CFLAGS) $(CFLAGS_COMMON)
fpmsyncd_LDADD = -lnl-3 -lnl-route-3 -lswsscommon

71 changes: 66 additions & 5 deletions fpmsyncd/fpmsyncd.cpp
Original file line number Diff line number Diff line change
@@ -1,13 +1,24 @@
#include <iostream>
#include "logger.h"
#include "select.h"
#include "selectabletimer.h"
#include "netdispatcher.h"
#include "warmRestartHelper.h"
#include "fpmsyncd/fpmlink.h"
#include "fpmsyncd/routesync.h"


using namespace std;
using namespace swss;


/*
* Default warm-restart timer interval for routing-stack app. To be used only if
* no explicit value has been defined in configuration.
*/
const uint32_t DEFAULT_ROUTING_RESTART_INTERVAL = 120;


int main(int argc, char **argv)
{
swss::Logger::linkToDbNative("fpmsyncd");
Expand All @@ -18,25 +29,75 @@ int main(int argc, char **argv)
NetDispatcher::getInstance().registerMessageHandler(RTM_NEWROUTE, &sync);
NetDispatcher::getInstance().registerMessageHandler(RTM_DELROUTE, &sync);

while (1)
while (true)
{
try
{
FpmLink fpm;
Select s;
SelectableTimer warmStartTimer(timespec{0, 0});

cout << "Waiting for connection..." << endl;
/*
* Pipeline should be flushed right away to deal with state pending
* from previous try/catch iterations.
*/
pipeline.flush();

cout << "Waiting for fpm-client connection..." << endl;
fpm.accept();
cout << "Connected!" << endl;

s.addSelectable(&fpm);

/* If warm-restart feature is enabled, execute 'restoration' logic */
bool warmStartEnabled = sync.m_warmStartHelper.checkAndStart();
if (warmStartEnabled)
{
/* Obtain warm-restart timer defined for routing application */
uint32_t warmRestartIval = sync.m_warmStartHelper.getRestartTimer();
if (!warmRestartIval)
{
warmStartTimer.setInterval(timespec{DEFAULT_ROUTING_RESTART_INTERVAL, 0});
}
else
{
warmStartTimer.setInterval(timespec{warmRestartIval, 0});
}

/* Execute restoration instruction and kick off warm-restart timer */
if (sync.m_warmStartHelper.runRestoration())
{
warmStartTimer.start();
s.addSelectable(&warmStartTimer);
}
}

while (true)
{
Selectable *temps;
/* Reading FPM messages forever (and calling "readData" to read them) */

/* Reading FPM messages forever (and calling "readMe" to read them) */
s.select(&temps);
pipeline.flush();
SWSS_LOG_DEBUG("Pipeline flushed");

/*
* Upon expiration of the warm-restart timer, proceed to run the
* reconciliation process and remove warm-restart timer from
* select() loop.
*/
if (warmStartEnabled && temps == &warmStartTimer)
{
SWSS_LOG_NOTICE("Warm-Restart timer expired.");
sync.m_warmStartHelper.reconcile();
s.removeSelectable(&warmStartTimer);

pipeline.flush();
SWSS_LOG_DEBUG("Pipeline flushed");
}
else if (!warmStartEnabled || sync.m_warmStartHelper.isReconciled())
{
pipeline.flush();
SWSS_LOG_DEBUG("Pipeline flushed");
}
}
}
catch (FpmLink::FpmConnectionClosedException &e)
Expand Down
53 changes: 48 additions & 5 deletions fpmsyncd/routesync.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ using namespace std;
using namespace swss;

RouteSync::RouteSync(RedisPipeline *pipeline) :
m_routeTable(pipeline, APP_ROUTE_TABLE_NAME, true)
m_routeTable(pipeline, APP_ROUTE_TABLE_NAME, true),
m_warmStartHelper(pipeline, &m_routeTable, APP_ROUTE_TABLE_NAME, "bgp", "bgp")
{
m_nl_sock = nl_socket_alloc();
nl_connect(m_nl_sock, NETLINK_ROUTE);
Expand All @@ -38,10 +39,31 @@ void RouteSync::onMsg(int nlmsg_type, struct nl_object *obj)
return;
}

/*
* Upon arrival of a delete msg we could either push the change right away,
* or we could opt to defer it if we are going through a warm-reboot cycle.
*/
bool warmRestartInProgress = m_warmStartHelper.inProgress();

if (nlmsg_type == RTM_DELROUTE)
{
m_routeTable.del(destipprefix);
return;
if (!warmRestartInProgress)
{
m_routeTable.del(destipprefix);
return;
}
else
{
SWSS_LOG_INFO("Warm-Restart mode: Receiving delete msg: %s\n",
destipprefix);

vector<FieldValueTuple> fvVector;
const KeyOpFieldsValuesTuple kfv = std::make_tuple(destipprefix,
DEL_COMMAND,
fvVector);
m_warmStartHelper.insertRefreshMap(kfv);
return;
}
}
else if (nlmsg_type != RTM_NEWROUTE)
{
Expand Down Expand Up @@ -118,8 +140,29 @@ void RouteSync::onMsg(int nlmsg_type, struct nl_object *obj)
vector<FieldValueTuple> fvVector;
FieldValueTuple nh("nexthop", nexthops);
FieldValueTuple idx("ifname", ifnames);

fvVector.push_back(nh);
fvVector.push_back(idx);
m_routeTable.set(destipprefix, fvVector);
SWSS_LOG_DEBUG("RoutTable set: %s %s %s\n", destipprefix, nexthops.c_str(), ifnames.c_str());

if (!warmRestartInProgress)
{
m_routeTable.set(destipprefix, fvVector);
SWSS_LOG_DEBUG("RouteTable set msg: %s %s %s\n",
destipprefix, nexthops.c_str(), ifnames.c_str());
}

/*
* During routing-stack restarting scenarios route-updates will be temporarily
* put on hold by warm-reboot logic.
*/
else
{
SWSS_LOG_INFO("Warm-Restart mode: RouteTable set msg: %s %s %s\n",
destipprefix, nexthops.c_str(), ifnames.c_str());

const KeyOpFieldsValuesTuple kfv = std::make_tuple(destipprefix,
SET_COMMAND,
fvVector);
m_warmStartHelper.insertRefreshMap(kfv);
}
}
10 changes: 7 additions & 3 deletions fpmsyncd/routesync.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
#include "dbconnector.h"
#include "producerstatetable.h"
#include "netmsg.h"
#include "warmRestartHelper.h"


namespace swss {

Expand All @@ -16,10 +18,12 @@ class RouteSync : public NetMsg

virtual void onMsg(int nlmsg_type, struct nl_object *obj);

WarmStartHelper m_warmStartHelper;

private:
ProducerStateTable m_routeTable;
struct nl_cache *m_link_cache;
struct nl_sock *m_nl_sock;
ProducerStateTable m_routeTable;
struct nl_cache *m_link_cache;
struct nl_sock *m_nl_sock;
};

}
Expand Down
54 changes: 47 additions & 7 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,26 @@ def stop_swss(self):
cmd += "supervisorctl stop {}; ".format(pname)
self.runcmd(['sh', '-c', cmd])

def start_zebra(dvs):
dvs.runcmd(['sh', '-c', 'supervisorctl start zebra'])

# Let's give zebra a chance to connect to FPM.
time.sleep(5)

def stop_zebra(dvs):
dvs.runcmd(['sh', '-c', 'pkill -x zebra'])
time.sleep(1)

def start_fpmsyncd(dvs):
dvs.runcmd(['sh', '-c', 'supervisorctl start fpmsyncd'])

# Let's give fpmsyncd a chance to connect to Zebra.
time.sleep(5)

def stop_fpmsyncd(dvs):
dvs.runcmd(['sh', '-c', 'pkill -x fpmsyncd'])
time.sleep(1)

def init_asicdb_validator(self):
self.asicdb = AsicDbValidator(self)

Expand Down Expand Up @@ -334,13 +354,18 @@ def get_logs(self, modname=None):
raise RuntimeError("Failed to unpack the archive.")
os.system("chmod a+r -R log")

def add_log_marker(self):
def add_log_marker(self, file=None):
marker = "=== start marker {} ===".format(datetime.now().isoformat())
self.ctn.exec_run("logger {}".format(marker))

if file:
self.runcmd(['sh', '-c', "echo \"{}\" >> {}".format(marker, file)])
else:
self.ctn.exec_run("logger {}".format(marker))

return marker

def SubscribeAppDbObject(self, objpfx):
r = redis.Redis(unix_socket_path=self.redis_sock, db=swsscommon.APP_DB)
r = redis.Redis(unix_socket_path=self.redis_sock, db=swsscommon.APPL_DB)
pubsub = r.pubsub()
pubsub.psubscribe("__keyspace@0__:%s*" % objpfx)
return pubsub
Expand Down Expand Up @@ -375,26 +400,40 @@ def CountSubscribedObjects(self, pubsub, ignore=None, timeout=10):
return (nadd, ndel)

def GetSubscribedAppDbObjects(self, pubsub, ignore=None, timeout=10):
r = redis.Redis(unix_socket_path=self.redis_sock, db=swsscommon.APP_DB)
r = redis.Redis(unix_socket_path=self.redis_sock, db=swsscommon.APPL_DB)

addobjs = []
delobjs = []
idle = 0
prev_key = None

while True and idle < timeout:
message = pubsub.get_message()
if message:
print message
key = message['channel'].split(':', 1)[1]
# In producer/consumer_state_table scenarios, every entry will
# show up twice for every push/pop operation, so skip the second
# one to avoid double counting.
if key != None and key == prev_key:
continue
# Skip instructions with meaningless keys. To be extended in the
# future to other undesired keys.
if key == "ROUTE_TABLE_KEY_SET" or key == "ROUTE_TABLE_DEL_SET":
continue
if ignore:
fds = message['channel'].split(':')
if fds[2] in ignore:
continue

if message['data'] == 'hset':
(_, k) = key.split(':', 1)
value=r.hgetall(key)
addobjs.append({'key':k, 'vals':value})
addobjs.append({'key':json.dumps(k), 'vals':json.dumps(value)})
prev_key = key
elif message['data'] == 'del':
delobjs.append(key)
(_, k) = key.split(':', 1)
delobjs.append({'key':json.dumps(k)})
idle = 0
else:
time.sleep(1)
Expand Down Expand Up @@ -424,7 +463,8 @@ def GetSubscribedAsicDbObjects(self, pubsub, ignore=None, timeout=10):
(_, t, k) = key.split(':', 2)
addobjs.append({'type':t, 'key':k, 'vals':value})
elif message['data'] == 'del':
delobjs.append(key)
(_, t, k) = key.split(':', 2)
delobjs.append({'key':k})
idle = 0
else:
time.sleep(1)
Expand Down
Loading

0 comments on commit f380685

Please sign in to comment.