Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rsyslog] Disable rsyslog rate limit in pre_test and do a recover in post_test #2378

Merged

Conversation

bingwang-ms
Copy link
Collaborator

@bingwang-ms bingwang-ms commented Oct 21, 2020

Description of PR

Summary:
This is a workaround for sonic-net/sonic-buildimage#5667
Syslog messages will sometimes be rate-limited for no apparent reason. The rate-limiting threshhold is set at 20k messages sent in 5 minutes, however syslog will sometimes start rate-limiting messages from orchagent when fewer than 20k messages have been logged in the past 5 minutes.
Some ERROR or WARNING log are supressed if rsyslog begin limiting rate, and errors might not be detected. This PR add a workaround for this issue.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

This PR is to disable the rate limit of rsyslog to ensure that all logs are recorded in syslog.

How did you do it?

  1. Add a new case test_disable_rsyslog_rate_limit in test_pretest.py to update the configuration of rsyslog to disable rate limit, and then reload the rsyslogd service in each container;
  2. Add a new case test_recover_rsyslog_rate_limit in test_posttest.py to do a recover after all tests finish.

How did you verify/test it?

Verified on dx010 with a sample script to generate logs running in one of containers:

import syslog
index = 1
while index < 100000:
    msg="Test message at INFO priority index = {}".format(index)
    index += 1
    syslog.syslog(syslog.LOG_INFO, msg)

Before disable rate limit, we can see a rate-limiting in syslog

Oct 21 05:23:40.128727 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19995
Oct 21 05:23:40.128766 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19996
Oct 21 05:23:40.128766 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19997
Oct 21 05:23:40.128791 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19998
Oct 21 05:23:40.128791 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 19999
Oct 21 05:23:40.128829 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 20000
Oct 21 05:23:40.128829 str-dx010-acs-4 INFO swss#rsyslogd: imuxsock[pid: 2210, name: python] from <str-dx010-acs-4:run.py>: begin to drop messages due to rate-limiting

After disable rate limit, all logs are recorded in syslog

Oct 21 05:20:08.373985 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99994
Oct 21 05:20:08.374006 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99995
Oct 21 05:20:08.374023 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99996
Oct 21 05:20:08.374023 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99997
Oct 21 05:20:08.374065 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99998
Oct 21 05:20:08.374065 str-dx010-acs-4 INFO swss#run.py: Test message at INFO priority index = 99999

And after recover, the rate-limiting begins to work again.

Any platform specific information?

No.

Supported testbed topology if it's a new test case?

No.

Documentation

No.

Signed-off-by: bingwang <bingwang@microsoft.com>
tests/test_posttest.py Show resolved Hide resolved
tests/test_pretest.py Show resolved Hide resolved
tests/test_posttest.py Outdated Show resolved Hide resolved
tests/test_posttest.py Outdated Show resolved Hide resolved
tests/test_pretest.py Outdated Show resolved Hide resolved
@theasianpianist
Copy link
Contributor

LGTM

Should we revert the fix in sonic-buildimage after this gets merged?

Signed-off-by: bingwang <bingwang@microsoft.com>
@bingwang-ms
Copy link
Collaborator Author

Updated. Thanks @daall

@bingwang-ms bingwang-ms merged commit 1b5539d into sonic-net:master Oct 22, 2020
@bingwang-ms
Copy link
Collaborator Author

LGTM

Should we revert the fix in sonic-buildimage after this gets merged?

I don't think so. It's only a workaround for test. The issue needed to be addressed for production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants