Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate RLIMIT_NOFILE against limits.http_max_conns_per_client #7434

Merged

Conversation

pierresouchay
Copy link
Contributor

I spent some time today on my local Mac to figure out why Consul 1.6.3+ was not accepting limits.http_max_conns_per_client.

This adds an explicit check on the number of file descriptors to be sure it might work (this is no guarantee as if many clients are reaching the agent, it might consume even more file descriptors)

Anyway, many users are fighting with RLIMIT_NOFILE, having a clear message would allow them to figure out what to fix.

Example of message (reload or start):

2020-03-11T16:38:37.062+0100 [ERROR] agent: Error starting agent: error="system allows a max of 512 file descriptors, but limits.http_max_conns_per_client: 8192 needs at least 8212"

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

Would it be possible to do this validation earlier in the flow? It seems like this check is validating a configuration option. Could it be validated as part of the config validation?

The lint timeout in CI should be fixed by #7496. It seems that the default timeout is 1 minute and sometimes it takes a bit longer.

@pierresouchay pierresouchay force-pushed the check_for_max_file_descriptors branch 2 times, most recently from a4e3f6e to 27b4f78 Compare March 25, 2020 17:02
@pierresouchay
Copy link
Contributor Author

@dnephin DONE, now, it fails with consul validate

@pierresouchay
Copy link
Contributor Author

@dnephin DONE, with fix in unit test (since validation now fails), try to fix another unstable unit test in last commit

Message:

consul validate test.json
Config validation failed: system allows a max of 256 file descriptors, but limits.http_max_conns_per_client: 65535 needs at least 65555

Copy link
Member

@hanshasselberg hanshasselberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except this one little comment about a change that doesn't belong in here.

api/agent_test.go Outdated Show resolved Hide resolved
@hanshasselberg hanshasselberg self-assigned this Apr 1, 2020
@hanshasselberg hanshasselberg added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Apr 1, 2020
@pierresouchay pierresouchay force-pushed the check_for_max_file_descriptors branch 2 times, most recently from db25c61 to e66bfbf Compare April 1, 2020 10:50
@pierresouchay
Copy link
Contributor Author

Unstable test for go-tests:
TestForwardSignals/signal-terminated: util_test.go:285: expected to read line "signal: terminated" but got "signal: urgent I/O condition"

Probably due to a VM issue, restarting test

@ghost ghost removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Apr 1, 2020
@pierresouchay pierresouchay force-pushed the check_for_max_file_descriptors branch from e66bfbf to 9da0843 Compare April 1, 2020 11:03
Copy link
Member

@hanshasselberg hanshasselberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making those changes! I think I did not explain myself well in the previous comment. I was hoping we could do this validation in a single place as part of config validation.

I've left some comments below that I think will allow us to isolate this validation to a single package. Please let me know if I've missed something, or if that won't work for some reason.

agent/agent.go Outdated Show resolved Hide resolved
agent/agent.go Outdated Show resolved Hide resolved
lib/agent_limits.go Outdated Show resolved Hide resolved
@pierresouchay
Copy link
Contributor Author

@dnephin Fixed your concerns in the last commit

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for making those changes! This change is looking good!

I think my last comment about unexporting the helpers may have been missed. Added more explanation inline, and one more suggestion about the test.

agent/agent_test.go Outdated Show resolved Hide resolved
# This value is more than max on Windows as well
http_max_conns_per_client = 16777217
}`
_, _, validationError := TestConfigWithErr(testutil.Logger(t), config.Source{Name: t.Name(), Format: "hcl", Data: hcl})
Copy link
Contributor

@dnephin dnephin Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be difficult to understand what is being tested when test helpers hide the function-under-test. TestConfig as a helper for testing other code makes a lot of sense, but in this case since we are testing validate I think we want to be calling Builder.Validate or Builder.BuildAndValidate directly, not through a helper.


// CheckLimitsFromMaxConnsPerClient check that value provided might be OK
// return an error if values are not compatible
func CheckLimitsFromMaxConnsPerClient(maxConnsPerClient int) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be unexported (checkLimitsFromMaxConnsPerClient) now that it is called from inside the same package.


// TestConfig returns a unique default configuration for testing an
// agent.
func TestConfig(logger hclog.Logger, sources ...config.Source) *config.RuntimeConfig {
Copy link
Contributor

@dnephin dnephin Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the change suggested above (to call Builder.BuildAndValidate directly from the test), this new function won't be necessary .

lib/limits.go Outdated Show resolved Hide resolved
I spent some time today on my local Mac to figure out why Consul 1.6.3+
was not accepting limits.http_max_conns_per_client.

This adds an explicit check on number of file descriptors to be sure
it might work (this is no guarantee as if many clients are reaching
the agent, it might consume even more file descriptors)

Anyway, many users are fighting with RLIMIT_NOFILE, having a clear
message would allow them to figure out what to fix.

Example of message (reload or start):

```
2020-03-11T16:38:37.062+0100 [ERROR] agent: Error starting agent: error="system allows a max of 512 file descriptors, but limits.http_max_conns_per_client: 8192 needs at least 8212"
```
@pierresouchay pierresouchay force-pushed the check_for_max_file_descriptors branch from b196a28 to 5869ee6 Compare April 1, 2020 17:31
@pierresouchay pierresouchay force-pushed the check_for_max_file_descriptors branch from 5869ee6 to f6b83a8 Compare April 1, 2020 17:32
@pierresouchay
Copy link
Contributor Author

@dnephin DONE

@hanshasselberg
Copy link
Member

Hi Pierre, after a little internal discussion, what do you think about putting all the code into one place without exporting any of it? It could go all in config, without touching lib at all.

@pierresouchay
Copy link
Contributor Author

@i0rek @dnephin DONE, moved limits*.go to config package + un-exported getrlimit()

Copy link
Member

@hanshasselberg hanshasselberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found another tiny thing...

agent/config/agent_limits_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you!

Just one question I wasn't sure about

@@ -0,0 +1,45 @@
// +build !consulent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this by mistake? Why do we want to exclude this test on consulent ?

@pierresouchay
Copy link
Contributor Author

@i0rek @dnephin Oops, yes, I took it from another test, wrong copy/paste. Fixed in last commit

@pierresouchay pierresouchay force-pushed the check_for_max_file_descriptors branch from 4bf6d49 to 1bd0553 Compare April 1, 2020 21:53
Copy link
Member

@hanshasselberg hanshasselberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@hanshasselberg hanshasselberg merged commit be1c5c4 into hashicorp:master Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants