-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KVM][warm reboot] syncd crash when getting virtual router ID #6509
Comments
need redis monitor logs from this crash to confirm that db was cleared or not during that time |
try using "redis-cli monitor" |
It seems that |
if you would run this with system start in background, we could at least confirm what commands is clearing db |
do you happen to have full syslog form that event ? lines before beginning of that what you pasted? there could be some hint information from syncd during startup in syslog |
|
ok, i found something like this in syslog (take a look at timestamps):
root: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases this will be the issue, ASIC db is cleared, but now to figure out why this happening :/ so actually this operation here is COLD BOOT not warm as you suggested, since both swss.sh and syncd are started in cold boot
If ASIC DB was empty at cold boot start, and there was no switch present, then syncd would create new switch in TEMP_VIEW depends what operation we want in COLD boot, probably DB should be cleared before start syncd |
It seems that either
|
I think I understood the cause of this issue. There was another issue #6383 that warm reboot finalizer waits for not enabled component for 5 minutes. This causes warm reboot finalizer to clear warmboot flag in state_db after 5 minutes after the switch boots up. And the second warm reboot request happens right before clearing the flag in state_db, the flag might change when things go down, and some services might get a wrong shutdown type. To address this, I think we need to check if warm reboot finalizer is still in progress when requesting a warm reboot. |
we can sync up with Ying on that too |
Description
Steps to reproduce the issue:
Describe the results you received:
Syncd fails when getting the default virtual router ID in warm boot.
Recording
Syslog
Describe the results you expected:
Crash free warm reboot.
Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: