-
Notifications
You must be signed in to change notification settings - Fork 95
Make configuration DB location survive ESXi reboot #1401
Conversation
esx_service/cli/local_sh.py
Outdated
# This is what we use to identify the our content for DB links.. | ||
CONFIG_DB_TAG = "# -- vSphere Docker Volume Service configuration --" | ||
|
||
# This is the content tempate for db links. '{}' will be replaced by datastore name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have an e2e test to verify this scenario. Can we file a issue to track the task of adding an e2e test?
esx_service/cli/local_sh.py
Outdated
""" | ||
Support for adding/removing information about config db link from /etc/rc.local.d/local.sh | ||
Any config stuff we need and configure in /etc/... will be removed on ESX reboot. | ||
Anything we need to persist between the reboots, needs to be confugured here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: confugured => configured
esx_service/cli/local_sh.py
Outdated
# We need to insert the content before it. | ||
END_OF_SCRIPT = "exit 0" | ||
|
||
# This is what we use to identify the our content for DB links.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "the our" => "the"
esx_service/cli/local_sh.py
Outdated
datastore={} | ||
|
||
slink=/etc/vmware/vmdkops/auth-db | ||
shared_dbn=/vmfs/volumes/$datastore/dockvols/vmdkops_config.db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shared_dbn => shared_db?
esx_service/cli/local_sh.py
Outdated
|
||
# open file and scan it. | ||
# | ||
# if we reached "exit 0" add the text section, add the rest of the file and be done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
- Let's keep the tense consistent: reached => reach (since we are using "find" below)
- What does "add the rest of the file and be done" mean?
# requested content just in case. | ||
if add: | ||
sys.stdout.write(content) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove one empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want 2 newline between functions
esx_service/cli/local_sh_test.py
Outdated
import shutil | ||
|
||
class TestLocalShInfo(unittest.TestCase): | ||
""" Basic test for saving cofig DB link uising local.sh. The test checks saving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: cofig => config; uising => using; fie => file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Darn, my spellchecker was off. Thanks for the catch. Fixed
esx_service/cli/local_sh_test.py
Outdated
|
||
# Basic test: add new content. Replace it. Remove it. | ||
# Compare with original content - should be the same. | ||
# Also, on neach step check some pattern in the current file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: neach => each?
else: | ||
print("local.sh update/remove test - All good") | ||
os.remove(name) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove one empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Sam, will do (will fix all nits and file an issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
couple of nits and a question about inplace update of the file.
esx_service/cli/vmdkops_admin.py
Outdated
@@ -1343,16 +1349,19 @@ def config_rm(args): | |||
print("Removed link {}".format(link_path)) | |||
except Exception as ex: | |||
print(" Failed to remove {}: {}".format(link_path, ex)) | |||
print("Updating /etc/rc.local.d/local.sh") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"/etc/rc.local.d/local.sh" should this be a const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good catch. will fix
# First tag - add the content (if needed) and skip till the next tag | ||
if add: | ||
sys.stdout.write(content) | ||
skip_to_tag = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming the code flow about how is this piece working when there is already one section with vdvs tag (datastore 1) in the file and then this function is called to add another section with vdvs tag (datastore 2). The new section would be written and old section would be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's exactly the behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can the ESX service instead use a symbolic link in /usr/lib/vmware/vmdkops instead of a link in /etc/vmware? Given the comments in local.sh regarding its usage.
- If needing to still use a link from /etc/vmware can the VIB install a script under /usr/lib/vmware/vmdkops/bin which handles restoring the linkage to the DB and insert an invocation of the script in local.sh - only one line addition to local.sh
- Having a script to handle the linkage is easier, any changes to handling links to the DB can be done to the script vs. in local_sh.py.
it can but the result will be the same. VIB is loaded in ramdisk and changes to ramdisk are discarded on reboot
I did not catch it - can you elaborate ? Keep in mind that all locations we install via VIB are just a tar unpacked into ramdisk, and ramdisk changes are discarded on reboot. There is no system persistency without state.tgz backup (there are plenty of related KBs) |
c43cd9a
to
b30ff67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
please do not merge yet, I need to investigate a potential issue... |
@msterin curious as to what is the potential issue? |
never mind, it was VSAN-related issue - delay in starting vsantraced on rebooot. The vDVS config worked fine. |
@@ -1323,7 +1331,17 @@ def config_rm(args): | |||
info = auth.get_info() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the current change, but the info variable seems not being used.
esx_service/cli/vmdkops_admin.py
Outdated
os.remove(auth_data.AUTH_DB_PATH) | ||
except: | ||
pass | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this return statement be inside the above "except auth_data.DbAccessError as ex:" block? The current indention seems incorrect as it will return in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there is an indent error and it should be returning only for "except auth_data.DbAccessError as ex:"
. Thanks for the catch
@msterin Please triage CI failure. This has been open for a while. Let's try to get this in asap. |
7fca0f3
to
6732c52
Compare
@pdhamdhere - looks test-related. The branch was a bit behind - just rebased and resubmitted. |
test fails in docker volume create, for some reason driver name does not seem to have been passed as it says the failure is unrelated to the fix but concerning. @shuklanirdesh82 - any comment ? Also @shuklanirdesh82 - there is still garbage in the logs (after
|
@msterin @shuklanirdesh82 |
We have seen such failure locally once, I asked @pshahzeb to raise an issue to triage further (#1401 (comment)). The failure is unrelated to your PR.
Yeah it was fixed. I will check once again what is causing this again. Thanks @msterin for pointing out. |
After triaging further came to know that it is a regression introduced by this PR. E2E test failure is tricky and not helping to find out root cause easily. The failure observed due to this PR is raising a false impression about I had to checkout the branch locally to generate the VIB and following is my analysis. conclusion: steps to reproduce:
|
thanks for the analysis. |
The step is there but not performing any validation. (https://github.com/vmware/docker-volume-vsphere/blob/master/tests/e2e/basic_test.go#L197)
I have just created a new issue. (#1450) |
Thanks for analysis @shuklanirdesh82 .
Since the cleanup didn't complete, VM2 is still a part of different vmgroup (not default)
This is what I think is happening. |
On reboot all changes in /etc are lost, unless they are backed up. ESXi has a standard backup for /etc/rc.local.d/local.sh - this file is backed up and run on boot ,so to persist symlink to shared DB we need to insert the link crestion to local.sh. And when we drop the link, we need to clean up the local.sh This is exactly what this change is doing
a46ba38
to
bc15906
Compare
bc15906
to
db1db20
Compare
db1db20
to
a244904
Compare
Fixes #1347
On reboot all changes in /etc are lost, unless they are backed up.
ESXi has a standard backup for /etc/rc.local.d/local.sh - this file is
backed up and run on boot ,so to persist symlink to shared DB we need to
insert the link creation to local.sh. And when we drop the link, we need
to clean up the local.sh
This is exactly what this change is doing.
Tested - unit test attached. Admin_cli tested manually:
Add:
remove :