Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warm reboot: restore the database docker with content saved #2216

Merged
merged 8 commits into from
Nov 2, 2018

Conversation

qiluo-msft
Copy link
Collaborator

@qiluo-msft qiluo-msft commented Oct 31, 2018

Restore the database docker with content saved during the command 'warm-reboot'. If anything failed, the database service failed immediately.

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
Co-Authored-By: qiluo-msft <qiluo-msft@users.noreply.github.com>
if [[ "$REBOOT_TYPE" == "warm" && -d /host/warmboot ]]; then
WARM_DIR=/host/warmboot
function redisLoadAndDelete()
{
Copy link
Contributor

@yxieca yxieca Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs to also take database ID as a parameter #Resolved

function redisLoadAndDelete()
{
FILENAME="$1"
test -e $FILENAME && redis-load -s /var/run/redis/redis.sock -e EMPTY $FILENAME && rm $FILENAME
Copy link
Contributor

@yxieca yxieca Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few issues from test:

  • rm always fail in this function. you need to issue "sudo rm" to get it to work.
  • "-s /var/run/redis/redis.sock" cause import to fail always. Removing this option works better.
  • import fails randomly. I am stilling looking for a way to make it working reliably. This service is crucial that it has to be reliable.
  • I think you shouldn't use '&&' notation. We want to remove these files regardless import succeeded or not. right? I don't think we should retry warm-boot if any failure was encountered. #Resolved

Copy link
Contributor

@yxieca yxieca Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought:

Maybe we should catch these db restore failures and in case of failure, clear the database and continue with a regular boot up? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • rm fixed
  • redis-load fixed. if any more failure case, let me know
  • I cannot agree to make it retry blindly. I make it exit immediately and we should fix if there is error in normal case.

In reply to: 229778573 [](ancestors = 229778573)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I make it exit immediately and we should fix if there is error in normal case.


In reply to: 229780996 [](ancestors = 229780996)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that if we fail database service in product, the device will be in failed state but ASIC is still forwarding. I am not sure if this is better than coming up with cold start and suffer a short IO disruption?

yxieca and others added 2 commits October 31, 2018 16:57
Co-Authored-By: qiluo-msft <qiluo-msft@users.noreply.github.com>
Co-Authored-By: qiluo-msft <qiluo-msft@users.noreply.github.com>
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
@lguohan
Copy link
Collaborator

lguohan commented Nov 1, 2018

@qiluo-msft , can you provide description for you commit? #Resolved

echo $1 | python -c "import sys, json, os; mnts = [x for x in json.load(sys.stdin)[0]['Mounts'] if x['Destination'] == '/usr/share/sonic/hwsku']; print '' if len(mnts) == 0 else os.path.basename(mnts[0]['Source'])" 2>/dev/null
}

function getRebootType()
Copy link
Collaborator

@lguohan lguohan Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBootType #Resolved

}

function postStartAction()
{
REBOOT_TYPE=`getRebootType`
Copy link
Collaborator

@lguohan lguohan Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOOT_TYPE #Resolved

$SUDO rm $FILENAME || exit 12
}
# Load applDB from /host/warm-reboot/appl_db.json
redisLoadAndDelete $WARM_DIR/appl_db.json
Copy link
Collaborator

@lguohan lguohan Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the DB argument? #Resolved

Copy link
Collaborator

@lguohan lguohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as comments.

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
@qiluo-msft qiluo-msft changed the title Warm reboot: database docker Warm reboot: restore the database docker with content saved Nov 1, 2018
Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
@lguohan lguohan merged commit 8b67424 into sonic-net:master Nov 2, 2018
@stcheng stcheng deleted the qiluo/warmdb branch November 2, 2018 16:25
# Load stateDB from /host/warm-reboot/state_db.json
redisLoadAndDelete 6 $WARM_DIR/state_db.json
# Load asicDB from /host/warm-reboot/asic_db.json
redisLoadAndDelete 1 $WARM_DIR/asic_db.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing came to my mind: I think we should test all file existence before proceeding with restoration. If any file is missing, there is something wrong. We should restore all or nothing. Do you agree?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation treat this case as a service start failure. Later we can refine the case with robust recovery.

saiarcot895 added a commit to saiarcot895/sonic-buildimage that referenced this pull request Apr 5, 2022
This submodule update brings in the following changes:

```
50d5be2 Make changes to support compiling on Bullseye with GCC 10 (sonic-net#2216)
0870cf5 [mirrororch]: Implement HW resources availability validation for SPAN/ERSPAN (sonic-net#2187)
f4ec565 [vlanmgrd] fix use-after-free memory issue (sonic-net#2211)
c2de7fc [QosOrch] The notifications cannot be drained in QosOrch in case the first one needs to retry (sonic-net#2206)
5575935 [neighsyncd] increase neighsyncd timeout (sonic-net#2209)
0f06910 [PBH] Implement Edit Flows (sonic-net#2169)
6241bbf Remove redundant and problematic code to skip "pool" field in buffer profile handling (sonic-net#2197)
a55343c [azp]: Set diff coverage threshhold to 80% (sonic-net#2188)
390cae1 [portsorch]: Prevent LAG member configuration when port has active ACL binding (sonic-net#2165)
c1d47e6 [VNET]Fixing nexthop group delete during route change (sonic-net#2198)
8941cc0 [BFD]Registering BFD state change callback during session creation (sonic-net#2202)
680c539 [vxlan] Remove tunnel map objects on VNET tunnel removal (sonic-net#2150)
20dde0c Fix for handling broadcom DNX ASIC to have ipv4 and ipv6 ACL rules in separate tables. (sonic-net#2178)
5b7c949 [FdbOrch] SAI_FDB_EVENT_MOVE generates update with empty update.entry.port_name (sonic-net#2200)
7350d49 [Vxlanmgr] vnet netdev cleanup during config reload fix (sonic-net#2191)
2bef62b Validate LAG has members before mirror session create (sonic-net#2130)
1e4d4ce [VS test] Increase VS test time, skip dpb flaky test (sonic-net#2195)
6eda965 [vstest]Migrating vs tests from using click commands to direct DB access (sonic-net#2179)
```

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
theasianpianist added a commit to theasianpianist/sonic-buildimage that referenced this pull request Apr 6, 2022
50d5be2 (HEAD, origin/master, origin/HEAD) Make changes to support compiling on Bullseye with GCC 10 (sonic-net#2216)
0870cf5 [mirrororch]: Implement HW resources availability validation for SPAN/ERSPAN (sonic-net#2187)
f4ec565 [vlanmgrd] fix use-after-free memory issue (sonic-net#2211)
c2de7fc [QosOrch] The notifications cannot be drained in QosOrch in case the first one needs to retry (sonic-net#2206)
5575935 [neighsyncd] increase neighsyncd timeout (sonic-net#2209)
0f06910 (master) [PBH] Implement Edit Flows (sonic-net#2169)
6241bbf Remove redundant and problematic code to skip "pool" field in buffer profile handling (sonic-net#2197)
a55343c [azp]: Set diff coverage threshhold to 80% (sonic-net#2188)
390cae1 [portsorch]: Prevent LAG member configuration when port has active ACL binding (sonic-net#2165)
c1d47e6 [VNET]Fixing nexthop group delete during route change (sonic-net#2198)
8941cc0 [BFD]Registering BFD state change callback during session creation (sonic-net#2202)
680c539 [vxlan] Remove tunnel map objects on VNET tunnel removal (sonic-net#2150)
20dde0c Fix for handling broadcom DNX ASIC to have ipv4 and ipv6 ACL rules in separate tables. (sonic-net#2178)
5b7c949 [FdbOrch] SAI_FDB_EVENT_MOVE generates update with empty update.entry.port_name (sonic-net#2200)
7350d49 [Vxlanmgr] vnet netdev cleanup during config reload fix (sonic-net#2191)
2bef62b Validate LAG has members before mirror session create (sonic-net#2130)
1e4d4ce [VS test] Increase VS test time, skip dpb flaky test (sonic-net#2195)
6eda965 [vstest]Migrating vs tests from using click commands to direct DB access (sonic-net#2179)

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
liat-grozovik pushed a commit that referenced this pull request Apr 7, 2022
In order to include the following commit:
0f06910 [PBH] Implement Edit Flows (sonic-net/sonic-swss#2169)

sonic-swss

50d5be2 Make changes to support compiling on Bullseye with GCC 10 (#2216)
0870cf5 [mirrororch]: Implement HW resources availability validation for SPAN/ERSPAN (#2187)
f4ec565 [vlanmgrd] fix use-after-free memory issue (#2211)
c2de7fc [QosOrch] The notifications cannot be drained in QosOrch in case the first one needs to retry (#2206)
5575935 [neighsyncd] increase neighsyncd timeout (#2209)
0f06910 [PBH] Implement Edit Flows (#2169)
6241bbf Remove redundant and problematic code to skip "pool" field in buffer profile handling (#2197)
a55343c [azp]: Set diff coverage threshhold to 80% (#2188)
390cae1 [portsorch]: Prevent LAG member configuration when port has active ACL binding (#2165)
c1d47e6 [VNET]Fixing nexthop group delete during route change (#2198)
8941cc0 [BFD]Registering BFD state change callback during session creation (#2202)
680c539 [vxlan] Remove tunnel map objects on VNET tunnel removal (#2150)
20dde0c Fix for handling broadcom DNX ASIC to have ipv4 and ipv6 ACL rules in separate tables. (#2178)
5b7c949 [FdbOrch] SAI_FDB_EVENT_MOVE generates update with empty update.entry.port_name (#2200)
7350d49 [Vxlanmgr] vnet netdev cleanup during config reload fix (#2191)
2bef62b Validate LAG has members before mirror session create (#2130)
1e4d4ce [VS test] Increase VS test time, skip dpb flaky test (#2195)
6eda965 [vstest]Migrating vs tests from using click commands to direct DB access (#2179)

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Ndancejic pushed a commit to Ndancejic/sonic-buildimage that referenced this pull request May 3, 2022
…2216)

Types of changes done:
* Add missing includes in header files and .cpp files
* Don't use parentheses when doing list initialization in constructors
* Make sure variables are initialized before first use

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this pull request Jun 20, 2022
Related work items: #49, #58, #107, sonic-net#247, sonic-net#249, sonic-net#277, sonic-net#593, sonic-net#597, sonic-net#1035, sonic-net#2130, sonic-net#2150, sonic-net#2165, sonic-net#2169, sonic-net#2178, sonic-net#2179, sonic-net#2187, sonic-net#2188, sonic-net#2191, sonic-net#2195, sonic-net#2197, sonic-net#2198, sonic-net#2200, sonic-net#2202, sonic-net#2206, sonic-net#2209, sonic-net#2211, sonic-net#2216, sonic-net#7909, sonic-net#8927, sonic-net#9681, sonic-net#9733, sonic-net#9746, sonic-net#9850, sonic-net#9967, sonic-net#10104, sonic-net#10152, sonic-net#10168, sonic-net#10228, sonic-net#10266, sonic-net#10288, sonic-net#10294, sonic-net#10313, sonic-net#10394, sonic-net#10403, sonic-net#10404, sonic-net#10421, sonic-net#10431, sonic-net#10437, sonic-net#10445, sonic-net#10457, sonic-net#10458, sonic-net#10465, sonic-net#10467, sonic-net#10469, sonic-net#10470, sonic-net#10474, sonic-net#10477, sonic-net#10478, sonic-net#10482, sonic-net#10485, sonic-net#10488, sonic-net#10489, sonic-net#10492, sonic-net#10494, sonic-net#10498, sonic-net#10501, sonic-net#10509, sonic-net#10512, sonic-net#10514, sonic-net#10516, sonic-net#10517, sonic-net#10523, sonic-net#10525, sonic-net#10531, sonic-net#10532, sonic-net#10538, sonic-net#10555, sonic-net#10557, sonic-net#10559, sonic-net#10561, sonic-net#10565, sonic-net#10572, sonic-net#10574, sonic-net#10576, sonic-net#10578, sonic-net#10581, sonic-net#10585, sonic-net#10587, sonic-net#10599, sonic-net#10607, sonic-net#10611, sonic-net#10616, sonic-net#10618, sonic-net#10619, sonic-net#10623, sonic-net#10624, sonic-net#10633, sonic-net#10646, sonic-net#10655, sonic-net#10660, sonic-net#10664, sonic-net#10680, sonic-net#10683
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants