Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support log file size and number limits #879

Merged
merged 1 commit into from
Jul 2, 2020

Conversation

jianjuns
Copy link
Contributor

@jianjuns jianjuns commented Jun 27, 2020

klog does not enforce max log file size limit when the log file is not
specifiec by the --log_file argument. This commit sets klog.MaxSize to
the --log_file_max_size argument value when logging to file is enabled
and --log_file is not provided.
The commit also adds a new argument --log_file_max_num to define the
max number of log files to be kept. The file number limit is enforced by
Antrea code that periodically checks the INFO log files and deletes the
oldest files to keep at most the specified max number of log files.

Fixes: #788

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

These commands can only be run by members of the vmware-tanzu organization.

@jianjuns
Copy link
Contributor Author

@antoninbas @tnqn : let me know if you are fine with the approach or not.
I still plan to add some tests.

@jianjuns jianjuns force-pushed the log-limit branch 5 times, most recently from 3a1f12c to a155dcd Compare June 29, 2020 20:56
antoninbas
antoninbas previously approved these changes Jun 29, 2020
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the approach looks fine to me. I feel like we should be able to write a unit test for this?


// InitLogFileLimits initializes log file maximum size and maximum number limits based on the
// command line flags.
func InitLogFileLimits(fs *pflag.FlagSet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should let this function return an error, I don't see any reason not to, and it is itself calling functions that may return errors (even if unlikely).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could return errors. Just feel no big harm to let Agent continue to run even at such errors. The errors could happen only when klog implementation changes.

pkg/log/log_file.go Outdated Show resolved Hide resolved
pkg/log/log_file.go Outdated Show resolved Hide resolved
pkg/log/log_file.go Outdated Show resolved Hide resolved
}

// StartLogFileNumberMonitor starts monitoring the log files to make sure the
// number of INFO log files does not exceed the maximum limit, when the log file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include all verbosity levels, not just INFO. You never know, the WARNING / ERROR files may grow large as well, and I don't see a downside to monitoring their size as well. I don't think we need to be concerned about missing important errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean for example <10 INFO files, <10 ERROR files, <10WARNING files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to limit all severity levels.

pkg/log/log_file.go Outdated Show resolved Hide resolved
@jianjuns
Copy link
Contributor Author

the approach looks fine to me. I feel like we should be able to write a unit test for this?

Great to know. I was thinking to add integration tests. Could consider unit tests too.

@antoninbas
Copy link
Contributor

You may be able to use this for unit tests (in-memory filesystem): https://github.com/spf13/afero. It's already an Antrea dependency.

@jianjuns
Copy link
Contributor Author

You may be able to use this for unit tests (in-memory filesystem): https://github.com/spf13/afero. It's already an Antrea dependency.

How can I let klog log to in-memory FS?

@antoninbas
Copy link
Contributor

You may be able to use this for unit tests (in-memory filesystem): https://github.com/spf13/afero. It's already an Antrea dependency.

How can I let klog log to in-memory FS?

You're right, it wouldn't be possible if we want to use klog to generate the test logs, which is probably the right thing to do.
Still, do you think this warrants an integration test? I imagine you will create a temporary directory with ioutil.TempDir and use that as the logs directory, which IMO is ok to do in unit tests. What do you think?

@jianjuns jianjuns changed the title WIP: Support log file size and number limits Support log file size and number limits Jul 2, 2020
@jianjuns
Copy link
Contributor Author

jianjuns commented Jul 2, 2020

@antoninbas @tnqn : I tested the PR and added unit tests. Could I request a review from you? Hope to get it in 0.8.0 for the DECC request.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, two minor comments on cleanup style.

if err != nil {
klog.Errorf("Failed to open log directory %s: %v", logDir, err)
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I hope to close the file earlier. Changed the code to cover Readdir failure.

assert.Equal(t, 2, infoLogFileNum, "info log file number after checking")
assert.Equal(t, 2, warningLogFileNum, "warning log file number after checking")

restoreFlagDefaultValues()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether there are other exits in the function, but is defer restoreFlagDefaultValues() more safe and go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use defer.

tnqn
tnqn previously approved these changes Jul 2, 2020
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

antoninbas
antoninbas previously approved these changes Jul 2, 2020
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think the commit message / PR description still mention INFO files only

} else if strings.Contains(file.Name(), ".log.WARNING.") {
warningLogFiles = append(warningLogFiles, file)
} else if strings.Contains(file.Name(), ".log.ERROR.") {
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have a comment here stating that the test is purposely not generating any error logs, or remove the empty if statement altogether

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

}
defer os.RemoveAll(testLogDir)

args := []string{"--logtostderr=false", "--log_dir=" + testLogDir, "--log_file_max_size=1", "--log_file_max_num=2"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think you can use a local const for the max num here and use that in your asserts below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

logs.InitLogs()
defer restoreFlagDefaultValues()

// Should generate 3 log files (100K * 30 / 1M).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use consts for 100k, 1M and number of iterations

then we can add an assert like this to document our intent:

require.Greater(logSize * numIters / logFileMaxSize, logFileMaxNum, "test should generate enough logs to exceed --log_file_max_num")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is hard to accurately control the log message size/number, due to the log file/line header and klog flush time (I am not very sure about this but I did see log interval impact the number of log files created). So, I feel such assert does not help much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. You probably want to include that information in the comment though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Added some comments to explain this.

@jianjuns jianjuns dismissed stale reviews from antoninbas and tnqn via 6511bf6 July 2, 2020 18:24
@jianjuns jianjuns force-pushed the log-limit branch 3 times, most recently from 03b3538 to e63b9b4 Compare July 2, 2020 18:45
antoninbas
antoninbas previously approved these changes Jul 2, 2020
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jianjuns jianjuns force-pushed the log-limit branch 4 times, most recently from 6755a23 to c205232 Compare July 2, 2020 21:46
@jianjuns
Copy link
Contributor Author

jianjuns commented Jul 2, 2020

/test-all

// See the License for the specific language governing permissions and
// limitations under the License.

// Package main under directory cmd parses and validates user input,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect pkg description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

if err != nil {
return
}
if maxSize > 1024*1024 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined a constant.

return
}
if maxSize > 1024*1024 {
klog.Errorf("The specified log file max size %d is too big, ignored", maxSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe quantify "too big" ? with a number that we allow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

pkg/log/log_file_test.go Outdated Show resolved Hide resolved
klog does not enforce max log file size limit when the log file is not
specifiec by the --log_file argument. This commit sets klog.MaxSize to
the --log_file_max_size argument value when logging to file is enabled
and --log_file is not provided.
The commit also adds a new argument --log_file_max_num to define the
max number (per severity level) of log files to be kept. The file number
limit is enforced by Antrea code that periodically checks the log files
and deletes the oldest files to keep at most the specified max number of
log files.

Fixes: antrea-io#788
@jianjuns
Copy link
Contributor Author

jianjuns commented Jul 2, 2020

/test-all

// Check log file number every 10 mins.
logFileCheckInterval = time.Minute * 10
// Allowed maximum value for the maximum file size limit.
maxMaxSizeMB = 1024 * 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was 1024*1024 previously.. decided to reduce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Do not really what is the appropriate number. klog default value is 18G.

@jianjuns jianjuns merged commit cfd6440 into antrea-io:master Jul 2, 2020
@jianjuns jianjuns deleted the log-limit branch August 25, 2020 21:54
GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 22, 2020
klog does not enforce max log file size limit when the log file is not
specifiec by the --log_file argument. This commit sets klog.MaxSize to
the --log_file_max_size argument value when logging to file is enabled
and --log_file is not provided.
The commit also adds a new argument --log_file_max_num to define the
max number (per severity level) of log files to be kept. The file number
limit is enforced by Antrea code that periodically checks the log files
and deletes the oldest files to keep at most the specified max number of
log files.

Fixes: antrea-io#788
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit the size of Antrea log files
6 participants