Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[techsupport] Handle minor fixes of TS Lock and update auto-TS #2114

Merged
merged 7 commits into from
Apr 3, 2022

Conversation

vivekrnv
Copy link
Contributor

@vivekrnv vivekrnv commented Mar 25, 2022

Signed-off-by: Vivek Reddy Karri vkarri@nvidia.com

What I did

  1. Print the last statement as the techsupport dump name, as some automation processes might depend of parsing the last line to infer the dump path.
Previously:
handle_exit
Removing lock. Exit: 0
removed '/tmp/techsupport-lock/PID'
removed directory '/tmp/techsupport-lock'

Updated:
handle_exit
Removing lock. Exit: 0
removed '/tmp/techsupport-lock/PID'
removed directory '/tmp/techsupport-lock'
/var/dump/sonic_dump_r-bulldog-03_20220324_195553.tar.gz
  1. Don't acquire the lock when running in NOOP mode

  2. Set the set -v option just before running main so that it won't print the generate_dump code to stdout

  3. Update the auto-techsupport script to handle EXT_RETRY and EXT_LOCKFAIL exit codes returned by show techsupport command.

  4. Update the minor error in since argument for auto-techsupport

How I did it

How to verify it

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

vivekrnv and others added 2 commits March 24, 2022 19:51
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
dgsudharsan
dgsudharsan previously approved these changes Mar 25, 2022
@dgsudharsan
Copy link
Collaborator

@qiluo-msft Can you please help to review?

@vivekrnv vivekrnv changed the title [techsupport] Handle Minor Issues because of TS Locking [techsupport] Handle minor fixes of TS Lock and update auto-TS Mar 25, 2022
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
return ""
elif rc == EXT_RETRY:
if num_retry <= MAX_RETRY_LIMIT:
print(num_retry)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a gap of few seconds before next retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required, EXT_RETRY happening is less likely and response should be quick in order to grab the lock. so i don't think a gap is required.
I'll remove the print statement though

EXT_LOCKFAIL = 2
EXT_RETRY = 4
EXT_SUCCESS = 0
MAX_RETRY_LIMIT = 2
Copy link
Collaborator

@dgsudharsan dgsudharsan Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MAX_RETRY_LIMIT configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXT_RETRY happening more than one time for a single process is even more unlikely and thus a MAX_RETRY_LIMIT need not be configurable.

dgsudharsan
dgsudharsan previously approved these changes Mar 31, 2022
@dgsudharsan
Copy link
Collaborator

@qiluo-msft Can you please help to review this PR?

@qiluo-msft qiluo-msft requested a review from ganglyu March 31, 2022 20:45
return self.invoke_ts_cmd(since_cfg, num_retry+1)
else:
syslog.syslog(syslog.LOG_ERR, "MAX_RETRY_LIMIT for show techsupport invocation exceeded, stderr: {}".format(stderr))
elif rc != EXT_SUCCESS:
syslog.syslog(syslog.LOG_ERR, "show techsupport failed with exit code {}, stderr: {}".format(rc, stderr))
Copy link
Contributor

@qiluo-msft qiluo-msft Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syslog

Do you want to exit immediately after the error logging? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i'll add it

@@ -63,6 +63,10 @@ handle_exit()
ECODE=$?
echo "Removing lock. Exit: $ECODE" >&2
$RM $V -rf ${LOCKDIR}
# Echo the filename as the last statement if the generation suceeds
Copy link
Contributor

@qiluo-msft qiluo-msft Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suceeds

typo #Closed

@@ -63,6 +63,10 @@ handle_exit()
ECODE=$?
echo "Removing lock. Exit: $ECODE" >&2
$RM $V -rf ${LOCKDIR}
# Echo the filename as the last statement if the generation suceeds
Copy link
Contributor

@qiluo-msft qiluo-msft Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Echo

Isn't it already supported? I am seeing

mkdir: created directory '/var/dump/sonic_dump_vlab-01_20220331_204737/log'
sonic_dump_vlab-01_20220331_204737/log/techsupport_time_info.Gg18UE4AVH
removed '/var/dump/sonic_dump_vlab-01_20220331_204737/log/techsupport_time_info.Gg18UE4AVH'
removed directory '/var/dump/sonic_dump_vlab-01_20220331_204737/core'
removed directory '/var/dump/sonic_dump_vlab-01_20220331_204737/log'
removed directory '/var/dump/sonic_dump_vlab-01_20220331_204737'
/var/dump/sonic_dump_vlab-01_20220331_204737.tar:         5.5% -- replaced with /var/dump/sonic_dump_vlab-01_20220331_204737.tar.gz
/var/dump/sonic_dump_vlab-01_20220331_204737.tar.gz
``` #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the addition of lock code, no. handle_exit is the last trap that runs before exiting, and it does print a few statements and thus the issue

ganglyu
ganglyu previously approved these changes Apr 1, 2022
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv vivekrnv dismissed stale reviews from ganglyu and dgsudharsan via 230c8df April 1, 2022 02:18
Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liat-grozovik
Copy link
Collaborator

@ganglyu could yuo please review recent changes following feedback provided?

@liat-grozovik liat-grozovik merged commit f70dc27 into sonic-net:master Apr 3, 2022
judyjoseph pushed a commit that referenced this pull request Apr 4, 2022
1. Print the last statement as the techsupport dump name, as some automation processes might depend of parsing the last line to infer the dump path.

Previously:
  handle_exit
  Removing lock. Exit: 0
  removed '/tmp/techsupport-lock/PID'
  removed directory '/tmp/techsupport-lock'

Updated:
  handle_exit
  Removing lock. Exit: 0
  removed '/tmp/techsupport-lock/PID'
  removed directory '/tmp/techsupport-lock'
  /var/dump/sonic_dump_r-bulldog-03_20220324_195553.tar.gz

2. Don't acquire the lock when running in NOOP mode
3. Set the set -v option just before running main so that it won't print the generate_dump code to stdout
4. Update the auto-techsupport script to handle EXT_RETRY and EXT_LOCKFAIL exit codes returned by show techsupport command.
5. Update the minor error in since argument for auto-techsupport

Signed-off-by: Vivek Keddy Karri <vkarri@nvidia.com>
stepanblyschak pushed a commit to stepanblyschak/sonic-utilities that referenced this pull request Apr 18, 2022
includes:

320591a [DualToR] Handle race condition between tunnel_decap and mux orchestrator (sonic-net#2114)
5027a8f Handling Invalid CRM configuration gracefully (sonic-net#2109)
0b120fa [ci]: use native arm64 and armhf pool (sonic-net#2013)
394e88a Don't handle buffer pool watermark during warm reboot reconciling (sonic-net#1987)
9008a01 patch for issue sonic-net#1971 - enable Rx Drop handling for cisco-8000 (sonic-net#2041)
2723ee3 create debug_shell_enable config to enable debug shell (sonic-net#2060)
d7be0b9 [request parser] Add unit tests for request parser for multiple values (sonic-net#1766)
@vivekrnv vivekrnv deleted the ts_error branch June 10, 2022 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants