-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot #15685
Conversation
This PR needs to be merged after #15684 |
/Azp run |
No commit pushedDate could be found for PR 15685 in repo sonic-net/sonic-buildimage |
@@ -248,28 +249,18 @@ function postStartAction() | |||
$SONIC_CFGGEN -j /etc/sonic/config_db$DEV.json --write-to-db | |||
fi | |||
fi | |||
|
|||
if [[ "$BOOT_TYPE" == "fast" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewer:
This code section is being removed as it will never get called - the check for fast
reboot will enter the prev parent if block and not this.
Moreoever, w/ new changes in fast-boot, we have a dedicated flag for fastboot that gets cleared as part of finalizer. The timed expiry mechanism is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw this, this is actually crucial for upgrading from older versions prior fast-reboot enhancements.
For example, upgrading from 202012 images, in case it exceeded 180 seconds no indication of fast-reboot is found in STATE-DB and db-migrator will fail to set fast-reboot new notation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Thanks for reporting this. The PR was merged when you added comment, and I did not check it.
Fixed this here: #16733
fi | ||
|
||
if [ -e /tmp/pending_config_migration ]; then | ||
if [ -e /tmp/pending_config_migration ] || [ -e /tmp/pending_config_initialization ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewer:
pending_config_migration
is used for sonic to sonic upgrades.
pending_config_initialization
is used for sonic conversions / first boot cases.
Both of these flags are set by rc.local script.
The flags are unset (files removed) in config-setup.service.
else | ||
# this is not a first time boot to a new image. Datbase container starts w/ old pre-existing config | ||
if [[ -x /usr/local/bin/db_migrator.py ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviwer:
I don't think migration is needed for same-image reboot.
For case when device reboots to a different image, but this is not first-time boot to this image:
Cold boot: Migration is not needed for cold reboot, as config_db file in target image already has correct version.
Warm/fast boot: Migration is needed, and old config dump will have old version. But, do we want to support this path, as ideally we expect that the upgrades are always first-time-boot. If this expectation is not true then I can enable the call here for warmboot case.
/easycla |
…g for warmboot (#16225) Cherry pick of #15685 MSFT ADO: 24274591 #### Why I did it Two changes: ### 1 Fix a day1 issue, where check to wait until `CONFIG_DB_INITIALIZED` is incorrect. There are multiple places where same incorrect logic is used. Current logic (`until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];`) will always result in pass, irrespective of the result of GET operation. ``` root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 1 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 0 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# ``` Fix this logic by checking for value of flag to be "1". ``` root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done entered here entered here entered here ``` This gap in logic was highlighted when another fix was merged: #14933 The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized. ### 2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case Currently, during warm shutdown `CONFIG_DB_INITIALIZED`'s value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery. So the value of `CONFIG_DB_INITIALIZED` does not depend on config db's state, however it remain what it was before reboot. Fix this by setting `CONFIG_DB_INITIALIZED` to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.
- Why I did it The recent change sonic-net#15685 (comment) removed the db migration for non first reboots. This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json Port to older branches after sonic-net#15685 is ported back - How I did it Re-introduce the logic to run the db_migrator on non-first boots - How to verify it Verified reboot and warm-reboot cases Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it The recent change sonic-net#15685 (comment) removed the db migration for non first reboots. This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json Port to older branches after sonic-net#15685 is ported back - How I did it Re-introduce the logic to run the db_migrator on non-first boots - How to verify it Verified reboot and warm-reboot cases Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it The recent change #15685 (comment) removed the db migration for non first reboots. This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json Port to older branches after #15685 is ported back - How I did it Re-introduce the logic to run the db_migrator on non-first boots - How to verify it Verified reboot and warm-reboot cases Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it The recent change sonic-net#15685 (comment) removed the db migration for non first reboots. This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json Port to older branches after sonic-net#15685 is ported back - How I did it Re-introduce the logic to run the db_migrator on non-first boots - How to verify it Verified reboot and warm-reboot cases Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it The recent change #15685 (comment) removed the db migration for non first reboots. This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json Port to older branches after #15685 is ported back - How I did it Re-introduce the logic to run the db_migrator on non-first boots - How to verify it Verified reboot and warm-reboot cases Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- Why I did it The recent change sonic-net#15685 (comment) removed the db migration for non first reboots. This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json Port to older branches after sonic-net#15685 is ported back - How I did it Re-introduce the logic to run the db_migrator on non-first boots - How to verify it Verified reboot and warm-reboot cases Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
…mboot (sonic-net#15685) * Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot * Fix db-cli usage * Handle same image warm-reboot and generalize handling of INIT flag * Cover boot from ONIE case: set config init flag when minigraph, config_db are missing * Handle case: first boot of SONiC * Check for config init flag * Simplify logic, and do not call db_migrator for same image reboot
- Why I did it The recent change sonic-net#15685 (comment) removed the db migration for non first reboots. This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json Port to older branches after sonic-net#15685 is ported back - How I did it Re-introduce the logic to run the db_migrator on non-first boots - How to verify it Verified reboot and warm-reboot cases Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
… missing FAST_REBOOT system flag (#16669) ### Why I did it Fast reboot is failing on 202012 after PR #15685 was cherrypicked to 202012 as part of #16225 The master branch change is good, but the cherry pick to 202012 is bad. Change was needed on master as the code added here was not effective (as it was unreachable) and not required (as fast-reboot on master uses warm-reboot infra of db dump and reconc). However, this code was still being used in 202012, and should not have been removed. The DB flag needs to be set to allow services do fast recovery. In the latest 202012 images, fast reboot fails as syncd does cold restart: Good case on 202012 (before PR 16225) ``` Sep 14 13:25:55.435266 str3-s6100-acs-6 NOTICE syncd#syncd: :- Syncd: command line: EnableDiagShell=YES EnableTempView=YES DisableExitSleep=NO EnableUnittests=NO EnableConsistencyCheck=NO EnableSyncMode=YES RedisCommunicationMode=redis_async EnableSaiBulkSuport=NO StartType=fast <---------------------- ProfileMapFile=/etc/sai.d/sai.profile GlobalContext=0 ContextConfig= BreakConfig=/tmp/break_before_make_objects ``` Bad case on 202012 (after PR 16225) ``` Sep 22 22:00:19.619381 str-s6100-acs-2 NOTICE syncd#syncd: :- Syncd: command line: EnableDiagShell=YES EnableTempView=YES DisableExitSleep=NO EnableUnittests=NO EnableConsistencyCheck=NO EnableSyncMode=YES RedisCommunicationMode=redis_async EnableSaiBulkSuport=NO StartType=cold <---------------------- ProfileMapFile=/etc/sai.d/sai.profile GlobalContext=0 ContextConfig= BreakConfig=/tmp/break_before_make_objects ``` ##### Work item tracking - Microsoft ADO **(number only)**: 25227065 #### How I did it Set system flag for fast reboot during boot up path #### How to verify it Change restores the state as it was before PR 16225, and fast-reboot worked before 16225 Tested locally w/ the change by replacing database.sh on the device.
…t fast-reboot from older images (#16733) Why I did it Fix: #16699 Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR #15685 The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
…t fast-reboot from older images (sonic-net#16733) Why I did it Fix: sonic-net#16699 Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR sonic-net#15685 The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
…t fast-reboot from older images (sonic-net#16733) Why I did it Fix: sonic-net#16699 Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR sonic-net#15685 The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
…t fast-reboot from older images (sonic-net#16733) Why I did it Fix: sonic-net#16699 Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR sonic-net#15685 The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
…t fast-reboot from older images (#16733) Why I did it Fix: #16699 Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR #15685 The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
MSFT ADO: 24274591
Why I did it
Two changes:
1 Fix a day1 issue, where check to wait until
CONFIG_DB_INITIALIZED
is incorrect.There are multiple places where same incorrect logic is used.
Current logic (
until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];
) will always result in pass, irrespective of the result of GET operation.Fix this logic by checking for value of flag to be "1".
This gap in logic was highlighted when another fix was merged: #14933
The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized.
2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case
Currently, during warm shutdown
CONFIG_DB_INITIALIZED
's value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery.So the value of
CONFIG_DB_INITIALIZED
does not depend on config db's state, however it remain what it was before reboot.Fix this by setting
CONFIG_DB_INITIALIZED
to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.Work item tracking
How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)