Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot #15685

Merged
merged 9 commits into from
Aug 4, 2023

Conversation

vaibhavhd
Copy link
Contributor

@vaibhavhd vaibhavhd commented Jun 30, 2023

MSFT ADO: 24274591

Why I did it

Two changes:

1 Fix a day1 issue, where check to wait until CONFIG_DB_INITIALIZED is incorrect.

There are multiple places where same incorrect logic is used.

Current logic (until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];) will always result in pass, irrespective of the result of GET operation.

root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
1
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 

root@str2-7060cx-32s-29:~# 
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"                                             
0
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 

Fix this logic by checking for value of flag to be "1".

root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done
entered here
entered here
entered here

This gap in logic was highlighted when another fix was merged: #14933
The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized.

2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case

Currently, during warm shutdown CONFIG_DB_INITIALIZED's value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery.
So the value of CONFIG_DB_INITIALIZED does not depend on config db's state, however it remain what it was before reboot.

Fix this by setting CONFIG_DB_INITIALIZED to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.

Work item tracking
  • Microsoft ADO (number only):

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@vaibhavhd
Copy link
Contributor Author

This PR needs to be merged after #15684

yxieca
yxieca previously approved these changes Jul 1, 2023
@vaibhavhd
Copy link
Contributor Author

/Azp run

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 15685 in repo sonic-net/sonic-buildimage

@@ -248,28 +249,18 @@ function postStartAction()
$SONIC_CFGGEN -j /etc/sonic/config_db$DEV.json --write-to-db
fi
fi

if [[ "$BOOT_TYPE" == "fast" ]]; then
Copy link
Contributor Author

@vaibhavhd vaibhavhd Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer:
This code section is being removed as it will never get called - the check for fast reboot will enter the prev parent if block and not this.

Moreoever, w/ new changes in fast-boot, we have a dedicated flag for fastboot that gets cleared as part of finalizer. The timed expiry mechanism is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this, this is actually crucial for upgrading from older versions prior fast-reboot enhancements.
For example, upgrading from 202012 images, in case it exceeded 180 seconds no indication of fast-reboot is found in STATE-DB and db-migrator will fail to set fast-reboot new notation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Thanks for reporting this. The PR was merged when you added comment, and I did not check it.
Fixed this here: #16733

fi

if [ -e /tmp/pending_config_migration ]; then
if [ -e /tmp/pending_config_migration ] || [ -e /tmp/pending_config_initialization ]; then
Copy link
Contributor Author

@vaibhavhd vaibhavhd Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer:

pending_config_migration is used for sonic to sonic upgrades.
pending_config_initialization is used for sonic conversions / first boot cases.

Both of these flags are set by rc.local script.
The flags are unset (files removed) in config-setup.service.

else
# this is not a first time boot to a new image. Datbase container starts w/ old pre-existing config
if [[ -x /usr/local/bin/db_migrator.py ]]; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviwer:

I don't think migration is needed for same-image reboot.

For case when device reboots to a different image, but this is not first-time boot to this image:
Cold boot: Migration is not needed for cold reboot, as config_db file in target image already has correct version.

Warm/fast boot: Migration is needed, and old config dump will have old version. But, do we want to support this path, as ideally we expect that the upgrades are always first-time-boot. If this expectation is not true then I can enable the call here for warmboot case.

@yxieca
Copy link
Contributor

yxieca commented Jul 19, 2023

/easycla

@yxieca yxieca merged commit e127701 into sonic-net:master Aug 4, 2023
qiluo-msft pushed a commit that referenced this pull request Aug 24, 2023
…g for warmboot (#16225)

Cherry pick of #15685

MSFT ADO: 24274591

#### Why I did it

Two changes:
### 1  Fix a day1 issue, where check to wait until `CONFIG_DB_INITIALIZED` is incorrect.
There are multiple places where same incorrect logic is used.

Current logic (`until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];`) will always result in pass, irrespective of the result of GET operation.
```
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
1
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 

root@str2-7060cx-32s-29:~# 
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"                                             
0
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 
```

Fix this logic by checking for value of flag to be "1".
```
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done
entered here
entered here
entered here
```

This gap in logic was highlighted when another fix was merged: #14933
The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized.

### 2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case

Currently, during warm shutdown `CONFIG_DB_INITIALIZED`'s value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery.
So the value of `CONFIG_DB_INITIALIZED` does not depend on config db's state, however it remain what it was before reboot.

Fix this by setting `CONFIG_DB_INITIALIZED` to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Aug 28, 2023
- Why I did it
The recent change sonic-net#15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after sonic-net#15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Aug 28, 2023
- Why I did it
The recent change sonic-net#15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after sonic-net#15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
mssonicbld pushed a commit that referenced this pull request Aug 31, 2023
- Why I did it
The recent change #15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after #15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Sep 6, 2023
- Why I did it
The recent change sonic-net#15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after sonic-net#15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
mssonicbld pushed a commit that referenced this pull request Sep 7, 2023
- Why I did it
The recent change #15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after #15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Sep 12, 2023
- Why I did it
The recent change sonic-net#15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after sonic-net#15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
…mboot (sonic-net#15685)

* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
- Why I did it
The recent change sonic-net#15685 (comment) removed the db migration for non first reboots.
This is problematic for many deployments which doesn't rely on ZTP and push a custom config_db.json
Port to older branches after sonic-net#15685 is ported back

- How I did it
Re-introduce the logic to run the db_migrator on non-first boots

- How to verify it
Verified reboot and warm-reboot cases

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
qiluo-msft pushed a commit that referenced this pull request Sep 25, 2023
… missing FAST_REBOOT system flag (#16669)

### Why I did it

Fast reboot is failing on 202012 after PR #15685 was cherrypicked to 202012 as part of #16225

The master branch change is good, but the cherry pick to 202012 is bad.
Change was needed on master as the code added here was not effective (as it was unreachable) and not required (as fast-reboot on master uses warm-reboot infra of db dump and reconc).

However, this code was still being used in 202012, and should not have been removed. The DB flag needs to be set to allow services do fast recovery. In the latest 202012 images, fast reboot fails as syncd does cold restart:

Good case on 202012 (before PR 16225)
```
Sep 14 13:25:55.435266 str3-s6100-acs-6 NOTICE syncd#syncd: :- Syncd: command line:  
	EnableDiagShell=YES EnableTempView=YES DisableExitSleep=NO EnableUnittests=NO EnableConsistencyCheck=NO 
	EnableSyncMode=YES RedisCommunicationMode=redis_async EnableSaiBulkSuport=NO 
	StartType=fast               <----------------------
	ProfileMapFile=/etc/sai.d/sai.profile GlobalContext=0 ContextConfig= BreakConfig=/tmp/break_before_make_objects
```

Bad case on 202012 (after PR 16225)
```
Sep 22 22:00:19.619381 str-s6100-acs-2 NOTICE syncd#syncd: :- Syncd: command line:  
	EnableDiagShell=YES EnableTempView=YES DisableExitSleep=NO EnableUnittests=NO EnableConsistencyCheck=NO 
	EnableSyncMode=YES RedisCommunicationMode=redis_async EnableSaiBulkSuport=NO 
	StartType=cold               <----------------------
	ProfileMapFile=/etc/sai.d/sai.profile GlobalContext=0 ContextConfig= BreakConfig=/tmp/break_before_make_objects
```
##### Work item tracking
- Microsoft ADO **(number only)**: 25227065

#### How I did it

Set system flag for fast reboot during boot up path

#### How to verify it

Change restores the state as it was before PR 16225, and fast-reboot worked before 16225

Tested locally w/ the change by replacing database.sh on the device.
yxieca pushed a commit that referenced this pull request Sep 28, 2023
…t fast-reboot from older images (#16733)

Why I did it
Fix: #16699

Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR #15685

The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Sep 28, 2023
…t fast-reboot from older images (sonic-net#16733)

Why I did it
Fix: sonic-net#16699

Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR sonic-net#15685

The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Sep 28, 2023
…t fast-reboot from older images (sonic-net#16733)

Why I did it
Fix: sonic-net#16699

Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR sonic-net#15685

The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Sep 28, 2023
…t fast-reboot from older images (sonic-net#16733)

Why I did it
Fix: sonic-net#16699

Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR sonic-net#15685

The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
mssonicbld pushed a commit that referenced this pull request Oct 20, 2023
…t fast-reboot from older images (#16733)

Why I did it
Fix: #16699

Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR #15685

The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants