Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TSA] Reliable TSA: Addressing pizza box issues #19217

Merged
merged 5 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions dockers/docker-fpm-frr/base_image_files/TS
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,31 @@
[ -f /etc/sonic/sonic-environment ] && . /etc/sonic/sonic-environment

PLATFORM=${PLATFORM:-`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`}
switch_type=`sonic-db-cli CONFIG_DB hget 'DEVICE_METADATA|localhost' 'switch_type'`
TSA_CHASSIS_STATE=false

if [[ $switch_type == 'voq' ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please change this to include chassis-packet or change the check to look for type == "SpineRouter"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fountzou : Please address this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for comment @tjchadaga @abdosi . Fixed -- kindly see #19527

TSA_CHASSIS_STATE="$(sonic-db-cli CHASSIS_APP_DB HGET "BGP_DEVICE_GLOBAL|STATE" tsa_enabled)"
fi

if [[ $1 == "TSA" ]]; then
TSA_STATE_UPDATE='{"BGP_DEVICE_GLOBAL":{"STATE":{"tsa_enabled": "true"}}}'
log_msg='System Mode: Normal -> Maintenance'
if [[ $TSA_CHASSIS_STATE == true ]]; then
log_msg='System Mode: Maintenance -> Maintenance'
else
log_msg='System Mode: Normal -> Maintenance'
fi
err_msg='System is already in Maintenance'
desired_tsa_state=true
desired_tsa_state=true
elif [[ $1 == "TSB" ]]; then
TSA_STATE_UPDATE='{"BGP_DEVICE_GLOBAL":{"STATE":{"tsa_enabled": "false"}}}'
log_msg='System Mode: Maintenance -> Normal'
if [[ $TSA_CHASSIS_STATE == true ]]; then
log_msg='System Mode: Maintenance -> Maintenance'
else
log_msg='System Mode: Maintenance -> Normal'
fi
err_msg='System is already in Normal mode'
desired_tsa_state=false
desired_tsa_state=false
fi

# Parse the device specific asic conf file, if it exists
Expand Down
13 changes: 12 additions & 1 deletion dockers/docker-fpm-frr/base_image_files/TSA
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,18 @@ if [ "$EUID" -ne 0 ] ; then
fi

if [ -f /etc/sonic/chassisdb.conf ]; then
rexec all -c "sudo TSA chassis"
CHASSIS_TSA_STATE_UPDATE="CHASSIS_APP_DB HMSET "BGP_DEVICE_GLOBAL\|STATE" tsa_enabled "true""
CONFIG_DB_TSA_STATE_UPDATE='{"BGP_DEVICE_GLOBAL":{"STATE":{"tsa_enabled": "true"}}}'
current_tsa_state="$(sonic-cfggen -d -v BGP_DEVICE_GLOBAL.STATE.tsa_enabled)"
if [[ $current_tsa_state == true ]]; then
echo "Chassis is already in Maintenance"
logger -t TSA -p user.info "Chassis is already in Maintenance"
else
sonic-db-cli $CHASSIS_TSA_STATE_UPDATE
sonic-cfggen -a "$CONFIG_DB_TSA_STATE_UPDATE" -w
echo "Chassis Mode: Normal -> Maintenance"
logger -t TSA -p user.info "Chassis Mode: Normal -> Maintenance"
fi
echo "Please execute \"rexec all -c 'sudo config save -y'\" to preserve System mode in Maintenance after reboot\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be changed to just 'sudo config save' instead of rexec all now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fountzou : Please address this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for comment @tjchadaga @abdosi . Fixed -- kindly see #19527

or config reload on all linecards"
exit 0
Expand Down
16 changes: 13 additions & 3 deletions dockers/docker-fpm-frr/base_image_files/TSB
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,21 @@ if [ "$EUID" -ne 0 ] ; then
exit 1
fi

# If run on supervisor of chassis, trigger remote execution of TSB on all linecards
if [ -f /etc/sonic/chassisdb.conf ]; then
rexec all -c "sudo TSB chassis"
CHASSIS_TSA_STATE_UPDATE="CHASSIS_APP_DB HMSET "BGP_DEVICE_GLOBAL\|STATE" tsa_enabled "false""
CONFIG_DB_TSA_STATE_UPDATE='{"BGP_DEVICE_GLOBAL":{"STATE":{"tsa_enabled": "false"}}}'
current_tsa_state="$(sonic-cfggen -d -v BGP_DEVICE_GLOBAL.STATE.tsa_enabled)"
if [[ $current_tsa_state == false ]]; then
echo "Chassis is already in Normal mode"
logger -t TSB -p user.info "Chassis is already in Normal mode"
else
sonic-db-cli $CHASSIS_TSA_STATE_UPDATE
sonic-cfggen -a "$CONFIG_DB_TSA_STATE_UPDATE" -w
echo "Chassis Mode: Maintenance -> Normal"
logger -t TSB -p user.info "Chassis Mode: Maintenance -> Normal"
fi
echo "Please execute \"rexec all -c 'sudo config save -y'\" to preserve System mode in Normal state after reboot\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fountzou : Please address this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for comment @tjchadaga @abdosi . Fixed -- kindly see #19527

or config reload on all linecards"
or config reload on all linecards"
exit 0
fi

Expand Down
8 changes: 8 additions & 0 deletions files/build_templates/docker_image_ctl.j2
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,14 @@ function postStartAction()
$SONIC_DB_CLI CONFIG_DB SET "CONFIG_DB_INITIALIZED" "1"
fi

# In SUP, enforce CHASSIS_APP_DB.tsa_enabled to be in sync with BGP_DEVICE_GLOBAL.STATE.tsa_enabled
if [[ -z "$DEV" ]] && [[ -f /etc/sonic/chassisdb.conf ]]; then
tsa_cfg="$($SONIC_DB_CLI CONFIG_DB HGET "BGP_DEVICE_GLOBAL|STATE" "tsa_enabled")"
if [[ -n "$tsa_cfg" ]]; then
docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB HMSET "BGP_DEVICE_GLOBAL|STATE" tsa_enabled ${tsa_cfg}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capture return code of operation and log message in case of failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed your suggestion. Thanks!

fi
fi

# Add redis UDS to the redis group and give read/write access to the group
REDIS_SOCK="/var/run/redis${DEV}/redis.sock"
else
Expand Down
6 changes: 6 additions & 0 deletions files/image_config/config-setup/config-setup
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,12 @@ do_db_migration()
/usr/local/bin/db_migrator.py -o migrate
fi
sonic-db-cli CONFIG_DB SET "CONFIG_DB_INITIALIZED" "1"

#Enforce CHASSIS_APP_DB.tsa_enabled to be in sync with BGP_DEVICE_GLOBAL.STATE.tsa_enabled
if [[ -f /etc/sonic/chassisdb.conf ]]; then
tsa_cfg="$(sonic-db-cli CONFIG_DB HGET "BGP_DEVICE_GLOBAL|STATE" "tsa_enabled")"
sonic-db-cli CHASSIS_APP_DB HMSET "BGP_DEVICE_GLOBAL|STATE" tsa_enabled ${tsa_cfg}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please capture return code of operation and log message in case of failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks for your comment

fi
}

# Perform configuration migration from backup copy.
Expand Down
6 changes: 6 additions & 0 deletions src/sonic-bgpcfgd/bgpcfgd/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import traceback

from swsscommon import swsscommon
from sonic_py_common import device_info

from .config import ConfigMgr
from .directory import Directory
Expand All @@ -20,6 +21,7 @@
from .managers_static_rt import StaticRouteMgr
from .managers_rm import RouteMapMgr
from .managers_device_global import DeviceGlobalCfgMgr
from .managers_chassis_app_db import ChassisAppDbMgr
from .static_rt_timer import StaticRouteTimer
from .runner import Runner, signal_handler
from .template import TemplateFabric
Expand Down Expand Up @@ -74,6 +76,10 @@ def do_work():
# Device Global Manager
DeviceGlobalCfgMgr(common_objs, "CONFIG_DB", swsscommon.CFG_BGP_DEVICE_GLOBAL_TABLE_NAME),
]

if device_info.get_platform_info().get('switch_type') == "voq":
Copy link
Contributor

@abdosi abdosi Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need not limit to voq chassis but to any chassis.

Please replace the API to this one: https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-py-common/sonic_py_common/device_info.py#L575

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment and for pointing out the correct API. Done

managers.append(ChassisAppDbMgr(common_objs, "CHASSIS_APP_DB", "BGP_DEVICE_GLOBAL"))

runner = Runner(common_objs['cfg_mgr'])
for mgr in managers:
runner.add_manager(mgr)
Expand Down
50 changes: 50 additions & 0 deletions src/sonic-bgpcfgd/bgpcfgd/managers_chassis_app_db.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from .manager import Manager
from .managers_device_global import DeviceGlobalCfgMgr
from .log import log_err, log_debug, log_notice
import re
from swsscommon import swsscommon

class ChassisAppDbMgr(Manager):
"""This class responds to change in tsa_enabled state of the supervisor"""

def __init__(self, common_objs, db, table):
"""
Initialize the object
:param common_objs: common object dictionary
:param db: name of the db
:param table: name of the table in the db
"""
self.lc_tsa = ""
self.directory = common_objs['directory']
self.dev_cfg_mgr = DeviceGlobalCfgMgr(common_objs, "CONFIG_DB", swsscommon.CFG_BGP_DEVICE_GLOBAL_TABLE_NAME)
self.directory.subscribe([("CONFIG_DB", swsscommon.CFG_BGP_DEVICE_GLOBAL_TABLE_NAME, "tsa_enabled"),], self.on_lc_tsa_status_change)
super(ChassisAppDbMgr, self).__init__(
common_objs,
[],
db,
table,
)

def on_lc_tsa_status_change(self):
if self.directory.path_exist("CONFIG_DB", swsscommon.CFG_BGP_DEVICE_GLOBAL_TABLE_NAME, "tsa_enabled"):
self.lc_tsa = self.directory.get_slot("CONFIG_DB", swsscommon.CFG_BGP_DEVICE_GLOBAL_TABLE_NAME)["tsa_enabled"]
log_debug("ChassisAppDbMgr:: LC TSA update handler status %s" % self.lc_tsa)

def set_handler(self, key, data):
log_debug("ChassisAppDbMgr:: set handler")

if not data:
log_err("ChassisAppDbMgr:: data is None")
return False

if "tsa_enabled" in data:
if self.lc_tsa == "false":
self.dev_cfg_mgr.cfg_mgr.commit()
self.dev_cfg_mgr.cfg_mgr.update()
self.dev_cfg_mgr.isolate_unisolate_device(data["tsa_enabled"])
return True
return False

def del_handler(self, key):
log_debug("ChassisAppDbMgr:: del handler")
return True
27 changes: 23 additions & 4 deletions src/sonic-bgpcfgd/bgpcfgd/managers_device_global.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ def __init__(self, common_objs, db, table):
:param table: name of the table in the db
"""
self.switch_type = ""
self.chassis_tsa = ""
self.directory = common_objs['directory']
self.cfg_mgr = common_objs['cfg_mgr']
self.constants = common_objs['constants']
Expand Down Expand Up @@ -97,11 +98,16 @@ def configure_tsa(self, data=None):
if "tsa_enabled" in data:
state = data["tsa_enabled"]

if self.is_update_required("tsa_enabled", state):
self.chassis_tsa = self.get_chassis_tsa_status()
requires_update = self.is_update_required("tsa_enabled", state)

if state in ["true", "false"] and self.directory.path_exist(self.db_name, self.table_name, "tsa_enabled"):
self.directory.put(self.db_name, self.table_name, "tsa_enabled", state)

if requires_update and self.chassis_tsa == "false":
self.cfg_mgr.commit()
self.cfg_mgr.update()
if self.isolate_unisolate_device(state):
self.directory.put(self.db_name, self.table_name, "tsa_enabled", state)
self.isolate_unisolate_device(state)
else:
log_notice("DeviceGlobalCfgMgr:: TSA configuration is up-to-date")

Expand Down Expand Up @@ -167,7 +173,9 @@ def check_state_and_get_tsa_routemaps(self, cfg):
cmd = ""
if self.directory.path_exist("CONFIG_DB", swsscommon.CFG_BGP_DEVICE_GLOBAL_TABLE_NAME, "tsa_enabled"):
tsa_status = self.directory.get_slot("CONFIG_DB", swsscommon.CFG_BGP_DEVICE_GLOBAL_TABLE_NAME)["tsa_enabled"]
if tsa_status == "true":
chassis_tsa = self.get_chassis_tsa_status()

if tsa_status == "true" or chassis_tsa == "true":
cmds = cfg.replace("#012", "\n").split("\n")
log_notice("DeviceGlobalCfgMgr:: Device is isolated. Applying TSA route-maps")
cmd = self.get_ts_routemaps(cmds, self.tsa_template)
Expand Down Expand Up @@ -228,6 +236,17 @@ def __extract_out_route_map_names(self, cmds):
route_map_names.add(result.group(1))
return route_map_names

def get_chassis_tsa_status(self):
chassis_tsa_status = "false"
try:
ch = swsscommon.SonicV2Connector(use_unix_socket_path=False)
ch.connect(ch.CHASSIS_APP_DB, False)
chassis_tsa_status = ch.get(ch.CHASSIS_APP_DB, "BGP_DEVICE_GLOBAL|STATE", 'tsa_enabled')
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please log the message on exception. Helpful for debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!


return chassis_tsa_status

def downstream_isolate_unisolate(self, idf_isolation_state):
""" API to apply IDF configuration """

Expand Down
5 changes: 4 additions & 1 deletion src/sonic-bgpcfgd/bgpcfgd/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,10 @@ def add_manager(self, manager):
table_name = manager.get_table_name()
db = swsscommon.SonicDBConfig.getDbId(db_name)
if db not in self.db_connectors:
self.db_connectors[db] = swsscommon.DBConnector(db_name, 0)
if db_name == "CHASSIS_APP_DB":
self.db_connectors[db] = swsscommon.DBConnector(db_name, 0, True, '')
else:
self.db_connectors[db] = swsscommon.DBConnector(db_name, 0)

if table_name not in self.callbacks[db]:
conn = self.db_connectors[db]
Expand Down
142 changes: 142 additions & 0 deletions src/sonic-bgpcfgd/tests/test_chassis_app_db.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
from unittest.mock import MagicMock, patch

import os
from bgpcfgd.directory import Directory
from bgpcfgd.template import TemplateFabric
from . import swsscommon_test
from .util import load_constants
import bgpcfgd.managers_chassis_app_db
import bgpcfgd.managers_device_global
from swsscommon import swsscommon
from copy import deepcopy

TEMPLATE_PATH = os.path.abspath('../../dockers/docker-fpm-frr/frr')
BASE_PATH = os.path.abspath('../sonic-bgpcfgd/tests/data/general/peer-group.conf/')
INTERNAL_BASE_PATH = os.path.abspath('../sonic-bgpcfgd/tests/data/internal/peer-group.conf/')
global_constants = {
"bgp": {
"traffic_shift_community" :"12345:12345",
"internal_community_match_tag" : "1001"
}
}

def constructor(check_internal=False):
cfg_mgr = MagicMock()
def get_text():
text = []
for line in cfg_mgr.changes.split('\n'):
if line.lstrip().startswith('!'):
continue
text.append(line)
text += [" "]
return text
def update():
if check_internal:
cfg_mgr.changes = get_string_from_file("/result_chasiss_packet.conf", INTERNAL_BASE_PATH)
else:
cfg_mgr.changes = get_string_from_file("/result_all.conf")
def push(cfg):
cfg_mgr.changes += cfg + "\n"
def get_config():
return cfg_mgr.changes
cfg_mgr.get_text = get_text
cfg_mgr.update = update
cfg_mgr.push = push
cfg_mgr.get_config = get_config

constants = deepcopy(global_constants)
common_objs = {
'directory': Directory(),
'cfg_mgr': cfg_mgr,
'tf': TemplateFabric(TEMPLATE_PATH),
'constants': constants
}
mgr = bgpcfgd.managers_chassis_app_db.ChassisAppDbMgr(common_objs, "CHASSIS_APP_DB", "BGP_DEVICE_GLOBAL")
cfg_mgr.update()
return mgr


@patch('bgpcfgd.managers_device_global.log_debug')
def test_isolate_device(mocked_log_info):
m = constructor()

m.lc_tsa = "false"
res = m.set_handler("STATE", {"tsa_enabled": "true"})
assert res, "Expect True return value for set_handler"
mocked_log_info.assert_called_with("DeviceGlobalCfgMgr::Done")
assert m.cfg_mgr.get_config() == get_string_from_file("/result_all_isolate.conf")

curr_cfg = m.cfg_mgr.get_config()
m.lc_tsa = "true"
res = m.set_handler("STATE", {"tsa_enabled": "true"})
assert res, "Expect True return value for set_handler"
assert m.cfg_mgr.get_config() == curr_cfg

@patch('bgpcfgd.managers_device_global.log_debug')
def test_isolate_device_internal_session(mocked_log_info):
m = constructor(check_internal=True)

m.lc_tsa = "false"
res = m.set_handler("STATE", {"tsa_enabled": "true"})
assert res, "Expect True return value for set_handler"
mocked_log_info.assert_called_with("DeviceGlobalCfgMgr::Done")
assert m.cfg_mgr.get_config() == get_string_from_file("/result_chassis_packet_isolate.conf", INTERNAL_BASE_PATH)

curr_cfg = m.cfg_mgr.get_config()
m.lc_tsa = "true"
res = m.set_handler("STATE", {"tsa_enabled": "true"})
assert res, "Expect True return value for set_handler"
assert m.cfg_mgr.get_config() == curr_cfg


@patch('bgpcfgd.managers_device_global.log_debug')
def test_unisolate_device(mocked_log_info):
m = constructor()

m.lc_tsa = "false"
res = m.set_handler("STATE", {"tsa_enabled": "false"})
assert res, "Expect True return value for set_handler"
mocked_log_info.assert_called_with("DeviceGlobalCfgMgr::Done")
assert m.cfg_mgr.get_config() == get_string_from_file("/result_all_unisolate.conf")

curr_cfg = m.cfg_mgr.get_config()
m.lc_tsa = "true"
res = m.set_handler("STATE", {"tsa_enabled": "false"})
assert res, "Expect True return value for set_handler"
assert m.cfg_mgr.get_config() == curr_cfg

@patch('bgpcfgd.managers_device_global.log_debug')
def test_unisolate_device_internal_session(mocked_log_info):
m = constructor(check_internal=True)

m.lc_tsa = "false"
res = m.set_handler("STATE", {"tsa_enabled": "false"})
assert res, "Expect True return value for set_handler"
mocked_log_info.assert_called_with("DeviceGlobalCfgMgr::Done")
assert m.cfg_mgr.get_config() == get_string_from_file("/result_chassis_packet_unisolate.conf", INTERNAL_BASE_PATH)

curr_cfg = m.cfg_mgr.get_config()
m.lc_tsa = "true"
res = m.set_handler("STATE", {"tsa_enabled": "false"})
assert res, "Expect True return value for set_handler"
assert m.cfg_mgr.get_config() == curr_cfg


def get_string_from_file(filename, base_path=BASE_PATH):
fp = open(base_path + filename, "r")
cfg = fp.read()
fp.close()

return cfg

@patch('bgpcfgd.managers_chassis_app_db.log_err')
def test_set_handler_failure_case(mocked_log_info):
m = constructor()
res = m.set_handler("STATE", {})
assert res == False, "Expect False return value for invalid data passed to set_handler"
mocked_log_info.assert_called_with("ChassisAppDbMgr:: data is None")

def test_del_handler():
m = constructor()
res = m.del_handler("STATE")
assert res, "Expect True return value for del_handler"
Loading
Loading