Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for fast/cold-boot: call db_migrator only after old config is loaded #14933

Merged
merged 1 commit into from
May 30, 2023

Conversation

vaibhavhd
Copy link
Contributor

@vaibhavhd vaibhavhd commented May 3, 2023

Why I did it

Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

  1. Not finding anything, and resumes to incorrectly migrate every missing config
    This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
  2. Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.

The reason for incorrect call is that:

  1. database service starts db_migrator as part of startup sequence.
  2. config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
  3. Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.

Fixed by:

  1. Check if this is first time boot by checking pending_config_migration flag.
  2. If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
  3. Let database service start which triggers config-setup service to start.
  4. Now call db_migrator after when config-setup service loads old-config/minigraph

Error that's being fixed:

May  2 20:35:04 sonic database.sh[649]: Creating new database container
May  2 20:35:04 sonic database.sh[663]: 99e8edba01ed0c7581f0d61dd2fa78374fa4f23e636a957004dd03a6f68eea86
May  2 20:35:04 sonic root: Starting database service...
May  2 20:35:06 sonic database.sh[690]: database
May  2 20:35:10 sonic database.sh[926]: True

May  2 20:35:10 sonic database.sh[928]:   File "/usr/local/bin/db_migrator.py", line 714, in common_migration_ops
May  2 20:35:10 sonic database.sh[928]:   File "/usr/local/bin/db_migrator.py", line 741, in migrate
May  2 20:35:10 sonic database.sh[928]:   File "/usr/local/bin/db_migrator.py", line 782, in main
May  2 20:35:10 sonic database.sh[928]: Traceback (most recent call last):
May  2 20:35:10 sonic database.sh[928]: TypeError: argument of type 'NoneType' is not iterable
May  2 20:35:10 sonic database.sh[928]: argument of type 'NoneType' is not iterable
May  2 20:35:10 sonic database.sh[928]: optional arguments:
May  2 20:35:10 sonic database.sh[928]: usage: db_migrator.py [-h] [-o operation migrate, set_version, get_version]
May  2 20:35:10 sonic db_migrator: :- operator(): DB '{APPL_DB}' is empty with pattern 'COPP_TABLE:*'!
May  2 20:35:10 sonic db_migrator: :- operator(): DB '{APPL_DB}' is empty with pattern 'INTF_TABLE:*'!
May  2 20:35:10 sonic db_migrator: :- operator(): Key 'BUFFER_MAX_PARAM_TABLE|global' field 'mmu_size' unavailable in database 'STATE_DB'
May  2 20:35:10 sonic db_migrator: :- operator(): Key 'WARM_RESTART_ENABLE_TABLE|system' field 'enable' unavailable in database 'STATE_DB'
May  2 20:35:10 sonic db_migrator: Caught exception: argument of type 'NoneType' is not iterable

May  2 20:35:11 sonic config-setup[935]: Copying SONiC configuration minigraph.xml ...
May  2 20:35:11 sonic config-setup[935]: Reloading minigraph...
May  2 20:35:11 sonic config-setup[935]: Use minigraph.xml from old system...

May  2 20:35:11 sonic root: Started database service...
Work item tracking
  • Microsoft ADO (number only): 15829809

How I did it

How to verify it

ested on physical platform. Rebooted from 202012 to master image (new install):

db_migrator now gets delayed during database service bring up: Delaying db_migrator until config migration is over
db_migrator now gets called when config-setup service loads DB. The issue of migrating on empty DB is fixed and is evident as the previous error of hwsku being None is not seen anymore:

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@vaibhavhd vaibhavhd requested a review from lguohan as a code owner May 3, 2023 18:59
@vaibhavhd
Copy link
Contributor Author

Tests for this PR are in progress. I am waiting for the PR tests to generate a build that I can use to test.

@vaibhavhd
Copy link
Contributor Author

vaibhavhd commented May 4, 2023

Tested on physical platform. Rebooted from 202012 to master image (new install):

db_migrator now gets delayed during database service bring up: Delaying db_migrator until config migration is over
db_migrator now gets called when config-setup service loads DB. The issue of migrating on empty DB is fixed and is evident as the previous error of hwsku being None is not seen anymore:

2023-05-04T19:40:26.631064+00:00 sonic rc.local[551]: + touch /tmp/pending_config_migration
2023-05-04T19:40:26.706940+00:00 sonic rc.local[551]: + touch /tmp/notify_firstboot_to_platform
2023-05-04T19:40:26.798950+00:00 sonic rc.local[551]: + [ ! -d /host/reboot-cause/platform ]
2023-05-04T19:40:26.874898+00:00 sonic rc.local[551]: + [ -d /host/image-master-14933.265920-9259dbc31/platform/x86_64-arista_7260cx3_64 ]
2023-05-04T19:40:27.002953+00:00 sonic rc.local[551]: + sync
2023-05-04T19:40:27.050958+00:00 sonic rc.local[551]: + [ -n  ]
2023-05-04T19:40:27.094925+00:00 sonic rc.local[551]: + mkdir -p /var/platform
2023-05-04T19:40:27.158963+00:00 sonic rc.local[551]: + [ -f /etc/default/kdump-tools ]
2023-05-04T19:40:27.331047+00:00 sonic rc.local[551]: + sed -i -e s/__PLATFORM__/x86_64-arista_7260cx3_64/g /etc/default/kdump-tools
2023-05-04T19:40:27.450950+00:00 sonic rc.local[551]: + firsttime_exit
2023-05-04T19:40:27.507073+00:00 sonic rc.local[551]: + rm -rf /host/image-master-14933.265920-9259dbc31/platform/firsttime
2023-05-04T19:40:27.615085+00:00 sonic rc.local[551]: + exit 0

2023-05-04T19:40:27.731227+00:00 sonic root: Starting database service...

2023-05-04T19:40:34.065364+00:00 sonic database.sh[789]: Delaying db_migrator until config migration is over   <-----------------------------

2023-05-04T19:40:34.073062+00:00 sonic root: Started database service...

2023-05-04T19:40:34.413134+00:00 sonic config-setup[1343]: Copying SONiC configuration minigraph.xml ...
2023-05-04T19:40:34.428186+00:00 sonic config-setup[1343]: Use minigraph.xml from old system...
2023-05-04T19:40:34.428247+00:00 sonic config-setup[1343]: Reloading minigraph...
2023-05-04T19:40:37.698652+00:00 sonic config-setup[1350]: Please note setting loaded from minigraph will be lost after system reboot. To preserve setting, run `config save`.
2023-05-04T19:40:38.108348+00:00 sonic config-setup[1371]: Running command: /usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json

2023-05-04T19:40:38.681059+00:00 sonic db_migrator: Asic Type: broadcom, Hwsku: Arista-7260CX3-D108C8  <--------------------------------

# Perform DB schema migration after loading backup config from previous image
do_db_migration()
{
if [[ -x /usr/local/bin/db_migrator.py ]]; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judyjoseph can you please review this part of change from multi DB namespace point of view.

With this change db_migrator will get called always. Earlier you ignored migration as part of #4477

Why would db_migrator be skipped for multi db?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaibhavhd the db_migrator should run fine for multi-asic config _db's too. I see the VERSIONS table with DATABASE version in multi-asic config _db's, hence it should work as it works for single asic config_db.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Judy! @yxieca we should be good to go ahead with this change. Can you please re-review.

@yxieca yxieca merged commit 02b1783 into sonic-net:master May 30, 2023
qiluo-msft pushed a commit that referenced this pull request Jun 2, 2023
…ded (#14933)

Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:

database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:

Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Jun 2, 2023
…ded (sonic-net#14933)

Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:

database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:

Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Jun 2, 2023
…ded (sonic-net#14933)

Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:

database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:

Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202205: #15316

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202211: #15317

vaibhavhd added a commit to vaibhavhd/sonic-buildimage that referenced this pull request Jun 30, 2023
yxieca pushed a commit that referenced this pull request Jul 7, 2023
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Jul 7, 2023
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Jul 7, 2023
mssonicbld added a commit that referenced this pull request Jul 7, 2023
mssonicbld added a commit that referenced this pull request Jul 7, 2023
vaibhavhd added a commit to sonic-net/sonic-utilities that referenced this pull request Jul 24, 2023
…sku is None (#2896)

Cherry pick of #2821

MSFT ADO: 17972494

Fix errors in db migration when hwsku is not detected.
This PR is adds a better-error-handling fix for the issue that is fixed by: sonic-net/sonic-buildimage#14933

May  2 20:35:04 sonic database.sh[649]: Creating new database container
May  2 20:35:04 sonic database.sh[663]: 99e8edba01ed0c7581f0d61dd2fa78374fa4f23e636a957004dd03a6f68eea86
May  2 20:35:04 sonic root: Starting database service...
May  2 20:35:06 sonic database.sh[690]: database
May  2 20:35:10 sonic database.sh[926]: True

May  2 20:35:10 sonic database.sh[928]:   File "/usr/local/bin/db_migrator.py", line 714, in common_migration_ops
May  2 20:35:10 sonic database.sh[928]:   File "/usr/local/bin/db_migrator.py", line 741, in migrate
May  2 20:35:10 sonic database.sh[928]:   File "/usr/local/bin/db_migrator.py", line 782, in main
May  2 20:35:10 sonic database.sh[928]: Traceback (most recent call last):
May  2 20:35:10 sonic database.sh[928]: TypeError: argument of type 'NoneType' is not iterable
May  2 20:35:10 sonic database.sh[928]: argument of type 'NoneType' is not iterable
May  2 20:35:10 sonic database.sh[928]: optional arguments:
May  2 20:35:10 sonic database.sh[928]: usage: db_migrator.py [-h] [-o operation migrate, set_version, get_version]
May  2 20:35:10 sonic db_migrator: :- operator(): DB '{APPL_DB}' is empty with pattern 'COPP_TABLE:*'!
May  2 20:35:10 sonic db_migrator: :- operator(): DB '{APPL_DB}' is empty with pattern 'INTF_TABLE:*'!
May  2 20:35:10 sonic db_migrator: :- operator(): Key 'BUFFER_MAX_PARAM_TABLE|global' field 'mmu_size' unavailable in database 'STATE_DB'
May  2 20:35:10 sonic db_migrator: :- operator(): Key 'WARM_RESTART_ENABLE_TABLE|system' field 'enable' unavailable in database 'STATE_DB'
May  2 20:35:10 sonic db_migrator: Caught exception: argument of type 'NoneType' is not iterable

May  2 20:35:11 sonic config-setup[935]: Copying SONiC configuration minigraph.xml ...
May  2 20:35:11 sonic config-setup[935]: Reloading minigraph...
May  2 20:35:11 sonic config-setup[935]: Use minigraph.xml from old system...

May  2 20:35:11 sonic root: Started database service...
How I did it
Convert hwsku's type to str before checking substring.
Add error logs when hwsku and asic type information is not obtained.

How to verify it
Tested on a physical device
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Aug 14, 2023
mssonicbld pushed a commit that referenced this pull request Aug 14, 2023
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Aug 21, 2023
mssonicbld added a commit that referenced this pull request Aug 21, 2023
StormLiangMS pushed a commit that referenced this pull request Aug 24, 2023
…mboot (#15685) (#16217)

Cherypick of #15685

MSFT ADO: 24274591

Why I did it
Two changes:

1 Fix a day1 issue, where check to wait until CONFIG_DB_INITIALIZED is incorrect.
There are multiple places where same incorrect logic is used.

Current logic (until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];) will always result in pass, irrespective of the result of GET operation.

root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
1
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 

root@str2-7060cx-32s-29:~# 
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"                                             
0
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 
Fix this logic by checking for value of flag to be "1".

root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done
entered here
entered here
entered here
This gap in logic was highlighted when another fix was merged: #14933
The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized.

2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case
Currently, during warm shutdown CONFIG_DB_INITIALIZED's value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery.
So the value of CONFIG_DB_INITIALIZED does not depend on config db's state, however it remain what it was before reboot.

Fix this by setting CONFIG_DB_INITIALIZED to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.

Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
qiluo-msft pushed a commit that referenced this pull request Aug 24, 2023
…g for warmboot (#16225)

Cherry pick of #15685

MSFT ADO: 24274591

#### Why I did it

Two changes:
### 1  Fix a day1 issue, where check to wait until `CONFIG_DB_INITIALIZED` is incorrect.
There are multiple places where same incorrect logic is used.

Current logic (`until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];`) will always result in pass, irrespective of the result of GET operation.
```
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"
1
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 

root@str2-7060cx-32s-29:~# 
root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED"                                             
0
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done
root@str2-7060cx-32s-29:~# 
```

Fix this logic by checking for value of flag to be "1".
```
root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done
entered here
entered here
entered here
```

This gap in logic was highlighted when another fix was merged: #14933
The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized.

### 2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case

Currently, during warm shutdown `CONFIG_DB_INITIALIZED`'s value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery.
So the value of `CONFIG_DB_INITIALIZED` does not depend on config db's state, however it remain what it was before reboot.

Fix this by setting `CONFIG_DB_INITIALIZED` to 0 as when the DB is loaded, and set it to 1 after db_migrator is done.
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
…ded (sonic-net#14933)

Why I did it
Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator:

Not finding anything, and resumes to incorrectly migrate every missing config
This is not expected. migration should happen after the old config is loaded and only new schema changes need migration.
Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None.
The reason for incorrect call is that:

database service starts db_migrator as part of startup sequence.
config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service.
Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed.
Fixed by:

Check if this is first time boot by checking pending_config_migration flag.
If pending_config_migration is enabled, then do not call db_migrator as part of database service startup.
Let database service start which triggers config-setup service to start.
Now call db_migrator after when config-setup service loads old-config/minigraph
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
…g is loaded (sonic-net#14933)" (sonic-net#15464)

This reverts commit 02b1783.

Reverts sonic-net#14933

The earlier commit caused a race condition that particularly broke cross branch warm upgrade.

Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile.

If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart.

ADO: 24274591
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants