-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[techsupport] Handle minor fixes of TS Lock and update auto-TS #2114
Conversation
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@qiluo-msft Can you please help to review? |
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
scripts/coredump_gen_handler.py
Outdated
return "" | ||
elif rc == EXT_RETRY: | ||
if num_retry <= MAX_RETRY_LIMIT: | ||
print(num_retry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a gap of few seconds before next retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required, EXT_RETRY happening is less likely and response should be quick in order to grab the lock. so i don't think a gap is required.
I'll remove the print statement though
EXT_LOCKFAIL = 2 | ||
EXT_RETRY = 4 | ||
EXT_SUCCESS = 0 | ||
MAX_RETRY_LIMIT = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is MAX_RETRY_LIMIT configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXT_RETRY happening more than one time for a single process is even more unlikely and thus a MAX_RETRY_LIMIT need not be configurable.
@qiluo-msft Can you please help to review this PR? |
return self.invoke_ts_cmd(since_cfg, num_retry+1) | ||
else: | ||
syslog.syslog(syslog.LOG_ERR, "MAX_RETRY_LIMIT for show techsupport invocation exceeded, stderr: {}".format(stderr)) | ||
elif rc != EXT_SUCCESS: | ||
syslog.syslog(syslog.LOG_ERR, "show techsupport failed with exit code {}, stderr: {}".format(rc, stderr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i'll add it
scripts/generate_dump
Outdated
@@ -63,6 +63,10 @@ handle_exit() | |||
ECODE=$? | |||
echo "Removing lock. Exit: $ECODE" >&2 | |||
$RM $V -rf ${LOCKDIR} | |||
# Echo the filename as the last statement if the generation suceeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts/generate_dump
Outdated
@@ -63,6 +63,10 @@ handle_exit() | |||
ECODE=$? | |||
echo "Removing lock. Exit: $ECODE" >&2 | |||
$RM $V -rf ${LOCKDIR} | |||
# Echo the filename as the last statement if the generation suceeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it already supported? I am seeing
mkdir: created directory '/var/dump/sonic_dump_vlab-01_20220331_204737/log'
sonic_dump_vlab-01_20220331_204737/log/techsupport_time_info.Gg18UE4AVH
removed '/var/dump/sonic_dump_vlab-01_20220331_204737/log/techsupport_time_info.Gg18UE4AVH'
removed directory '/var/dump/sonic_dump_vlab-01_20220331_204737/core'
removed directory '/var/dump/sonic_dump_vlab-01_20220331_204737/log'
removed directory '/var/dump/sonic_dump_vlab-01_20220331_204737'
/var/dump/sonic_dump_vlab-01_20220331_204737.tar: 5.5% -- replaced with /var/dump/sonic_dump_vlab-01_20220331_204737.tar.gz
/var/dump/sonic_dump_vlab-01_20220331_204737.tar.gz
``` #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the addition of lock code, no. handle_exit is the last trap that runs before exiting, and it does print a few statements and thus the issue
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ganglyu could yuo please review recent changes following feedback provided? |
1. Print the last statement as the techsupport dump name, as some automation processes might depend of parsing the last line to infer the dump path. Previously: handle_exit Removing lock. Exit: 0 removed '/tmp/techsupport-lock/PID' removed directory '/tmp/techsupport-lock' Updated: handle_exit Removing lock. Exit: 0 removed '/tmp/techsupport-lock/PID' removed directory '/tmp/techsupport-lock' /var/dump/sonic_dump_r-bulldog-03_20220324_195553.tar.gz 2. Don't acquire the lock when running in NOOP mode 3. Set the set -v option just before running main so that it won't print the generate_dump code to stdout 4. Update the auto-techsupport script to handle EXT_RETRY and EXT_LOCKFAIL exit codes returned by show techsupport command. 5. Update the minor error in since argument for auto-techsupport Signed-off-by: Vivek Keddy Karri <vkarri@nvidia.com>
includes: 320591a [DualToR] Handle race condition between tunnel_decap and mux orchestrator (sonic-net#2114) 5027a8f Handling Invalid CRM configuration gracefully (sonic-net#2109) 0b120fa [ci]: use native arm64 and armhf pool (sonic-net#2013) 394e88a Don't handle buffer pool watermark during warm reboot reconciling (sonic-net#1987) 9008a01 patch for issue sonic-net#1971 - enable Rx Drop handling for cisco-8000 (sonic-net#2041) 2723ee3 create debug_shell_enable config to enable debug shell (sonic-net#2060) d7be0b9 [request parser] Add unit tests for request parser for multiple values (sonic-net#1766)
Signed-off-by: Vivek Reddy Karri vkarri@nvidia.com
What I did
Don't acquire the lock when running in NOOP mode
Set the set -v option just before running main so that it won't print the generate_dump code to stdout
Update the auto-techsupport script to handle EXT_RETRY and EXT_LOCKFAIL exit codes returned by show techsupport command.
Update the minor error in since argument for auto-techsupport
How I did it
How to verify it
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)