Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Fix] docker_image_ctl.j2: remove the logic that recreates database containers on VS platform #21089

Merged
merged 3 commits into from
Dec 23, 2024

Conversation

BYGX-wcr
Copy link
Contributor

@BYGX-wcr BYGX-wcr commented Dec 7, 2024

Why I did it

To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR #18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.

Work item tracking
  • Microsoft ADO (number only): 30454307

How I did it

Remove the code that removes existing database containers in /usr/bin/database.sh

How to verify it

  • Build VS image and install it on a KVM sonic-vs
  • Reboot the sonic-vs multiple times and check if all the containers are started successfully every time.

Which release branch to backport (provide reason below if selected)

The original change was introduced into 202205 branch so we need to backport this fix.

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • 202405

Tested branch (Please provide the tested image version)

  • 20220532.72

Description for the changelog

docker_image_ctl.j2: change to not remove existing database containers on VS platform

@BYGX-wcr BYGX-wcr requested a review from lguohan as a code owner December 7, 2024 19:00
@BYGX-wcr BYGX-wcr requested review from deepak-singhal0408 and abdosi and removed request for lguohan December 7, 2024 19:00
@liushilongbuaa
Copy link
Contributor

/azpw ms_conflict

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BYGX-wcr BYGX-wcr changed the title [Bug Fix] docker_image_ctl.j2: change to recreate every docker container when the asic type is vs [Bug Fix] docker_image_ctl.j2: remove the logic that recreates database containers on VS platform Dec 20, 2024
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BYGX-wcr BYGX-wcr requested review from arlakshm and rlhui December 20, 2024 00:08
Copy link
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@deepak-singhal0408
Copy link
Contributor

@BYGX-wcr Have you tried single asic to multi-asic transition?

@BYGX-wcr
Copy link
Contributor Author

@BYGX-wcr Have you tried single asic to multi-asic transition?

Yes

@deepak-singhal0408
Copy link
Contributor

@BYGX-wcr , as discussed, please update the description, on whats needed for multi-asic transition (remove database container before reboot).? Also make sure the change works on our internal emulation. thanks

@BYGX-wcr
Copy link
Contributor Author

@deepak-singhal0408 , I have updated the description of the PR.

@rlhui rlhui merged commit 6058cfc into sonic-net:master Dec 23, 2024
22 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Dec 24, 2024
…se containers on VS platform (sonic-net#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR sonic-net#18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #21276

mssonicbld pushed a commit that referenced this pull request Dec 25, 2024
…se containers on VS platform (#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR #18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
github-actions bot pushed a commit to bradh352/sonic-buildimage that referenced this pull request Jan 2, 2025
…se containers on VS platform (sonic-net#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR sonic-net#18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
github-actions bot pushed a commit to bradh352/sonic-buildimage that referenced this pull request Jan 2, 2025
…se containers on VS platform (sonic-net#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR sonic-net#18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
github-actions bot pushed a commit to bradh352/sonic-buildimage that referenced this pull request Jan 2, 2025
…se containers on VS platform (sonic-net#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR sonic-net#18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
@BYGX-wcr BYGX-wcr deleted the fix_docker_image_ctl branch January 17, 2025 20:05
VladimirKuk pushed a commit to Marvell-switching/sonic-buildimage that referenced this pull request Jan 21, 2025
…se containers on VS platform (sonic-net#21089)

Why I did it
To support the emulation of VS chassis, we need to remove the existing critical service containers before transforming the HwSKU of the VS device. A previous PR sonic-net#18512 introduced a change to the docker_image_ctl.j2 that forces VS images to recreate database containers every time the OS is cold started while the behaviors of other containers(swss/bgp/teamd/syncd) remained unchanged. As a consequence, when the VS device is rebooted without proper human intervention, the database containers will be recreated while the other services will reuse existing containers. That can cause the swss/bgp/syncd containers to become invalid if the database containers get recreated with a different container ID, because swss/bgp/syncd containers are configured to use the database containers as the underlying networking stack.

By further investigation, we have found that it is not necessary to recreate the database containers in /usr/bin/database.sh to perform HwSKU transformation. So, we should remove this logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants