-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor abort on OOM changes #1506
Conversation
include/seastar/core/memory.hh
Outdated
* occurs if abort is globablly enabled on this shard and there | ||
* are no disable_abort_... objets currently alive on this shard. | ||
*/ | ||
void enable_abort_on_allocation_failure(bool enabled = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a boolean parameter with default so the default behavior is unchanged.
seastar_logger.warn("Seastar compiled with default allocator, will not abort on bad_alloc"); | ||
void enable_abort_on_allocation_failure(bool enabled) { | ||
if (enabled) { | ||
seastar_logger.warn("Seastar compiled with default allocator, will not abort on bad_alloc"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only warn if we are trying to turn enable abort on OOM since the other way is "OK", I suppose.
} | ||
|
||
bool is_abort_on_allocation_failure() { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inside ifdef DEFAULT_ALLOCATOR block.
include/seastar/core/memory.hh
Outdated
* occurs if abort is globablly enabled on this shard and there | ||
* are no disable_abort_... objets currently alive on this shard. | ||
*/ | ||
void enable_abort_on_allocation_failure(bool enabled = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the call allows setting the parameter both ways, it's better to name it set_abort_on_allocation_failure()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xemul - I agree and considered this, but this would be a breaking change to the API right?
I also considered just adding another method disable_abort_on_allocation_failure
, though I find this API a bit worse, it doesn't have the naming problem.
I'll go whatever way you think is best: maybe adding the new set_...
method and leaving this one as is (just forwarding to the set_
one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xemul - I agree and considered this, but this would be a breaking change to the API right?
Ah, I wasn't clear enough, sorry :) Let me try to clarify my point.
This comment and next one should have been together -- the old API call should not get removed, but it should be deprecated. Then we wait for ~1 year, then remove it completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification @xemul, it makes sense. I'll wait for the resolution of the discussion on whether we need this at all with @raphaelsc before proceeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xemul - I've updated this PR to add a new set_
method as you suggest and deprecate the existing one-way API.
why not piggyback on existing class disable_abort_on_alloc_failure_temporarily? seastar allows user to enable and disable abort_on_alloc_failure at runtime. I don't see an actual need to add a third choice. you can do the following at the application (not even compile tested):
you can even wrap it in a seastar service, do whatever you want. interfaces should only be added / changed when really needed (meaning seastar application cannot achieve a target without it), otherwise we end up bloating the seastar interface. |
Global variable storing the temporary disabling RAII guard is awesome :D |
Because this is a different behavior: it controls the shard-local temporary disablement behavior. I want to control the global enablement, which even if I were willing to try to emulate this by changing the thread-local value on every shard still isn't the same (consider e.g., alien threads).
I really need this, and prefer that this change (or an equivalent one) is upstream as well. Avoiding adding another method is why I went with the boolean argument and default value (though I agree with Pavel that it makes the name "wrong"). |
That is indeed not the best example to come up with :-) |
include/seastar/core/memory.hh
Outdated
* default (system) allocator is being used. | ||
*/ | ||
bool is_abort_on_allocation_failure(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add a few words on the commit message about the motivation for this one?
I am trying to understand why an application needs it. If it needs to enable, it will enable regardless of the previous value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @raphaelsc. I've updated the commit message to the following:
Allow application to query the abort on OOM state
Add an is_abort_on_allocation_failure method which returns
the global state of this reactor option.
This is useful for applications that which to query this state in order
to expose it for debugging or administration purposes. Applications
could try to track this themselves, but this is unreliable since the
reactor itself can change this value based on the command line flags.
Let me know if you think it is sufficient.
Thanks for the explanation. I was just making sure that we're not doing changes needlessly. Please address Pavel's suggestions. |
cf064fd
to
0289f47
Compare
include/seastar/core/memory.hh
Outdated
* is enabled, or false otherwise. Always returns false if the | ||
* default (system) allocator is being used. | ||
*/ | ||
bool is_abort_on_allocation_failure(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this fwd declaration belong to next patch instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, moved in next push.
include/seastar/core/memory.hh
Outdated
@@ -139,8 +139,35 @@ static constexpr size_t huge_page_size = | |||
void configure(std::vector<resource::memory> m, bool mbind, | |||
std::optional<std::string> hugetlbfs_path = {}); | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't help asking for making this awesome comment into a doxygen-generated documentation (and the one in front of is_abort_on_allocation_failure
below too).
To check it you can use ninja -C build/dev docs
and then check the build/dev/doc/html/namespaceseastar_1_1memory.html
file. Likely it won't render at the first try, because this line sits inside \cond internal
block, so its worth moving it outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely it won't render at the first try, because this line sits inside \cond internal block
Right, doing this.
Question: is whether an API internal or not defined by the this \cond internal
annotation or whether it appears in an *::internal
namespace? I would assume the latter, though if the former I guess this would change this API from internal to not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The *::internal
namespace doesn't affect documentation generation AFAIK. Also, the *_abort_on_allocation_failure()
is already out of any C++'s internal
namespace, it only comes under the doxygen's radar, so my request is just to pull it out of there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xemul - done and checked with ninja docs
.
My question was just whether having a method inside the \cond internal
block had any bearing on whether it was considered an "internal" API for the purposes of stability guarantees, etc. I guess the answer is "no, only the C++ namespace matters".
0289f47
to
d17780b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
There is a global "abort on OOM" setting in the seastar, and we allow enabling it via a public method in seastar::memory. There was no corresponding method to disable it, but such a method is useful in order to give applications full control over this setting. Introduce a new method which takes a boolean parameter in order to allow both enabling and disabling the abort behavior. The old call is deprecated and simply falls the new method with enabled=true.
Add an is_abort_on_allocation_failure method which returns the global state of this reactor option. This is useful for applications that which to query this state in order to expose it for debugging or administration purposes. Applications could try to track this themselves, but this is unreliable since the reactor itself can change this value based on the command line flags.
A basic test which flips the abort on OOM switch up and down and which checks that the is_enabled function returns the correct value.
Replace enable_abort_on_allocation_failure with set_abort_on_allocation_failure(true) as the former is deprecated.
To include details of the set_abort_on_allocation_failure API.
d17780b
to
3077ab3
Compare
Last pushes enable doxygen generation for the new methods and also switch from |
@raphaelsc - thanks for the review, much appreciated! @xemul - thanks for the review so far: let me know if you were looking for anything else here. |
At Redpanda we are moving to enable abort-on-OOM by default, and as part of this it is useful to be able to disable as well as enable this setting (current behavior is that it starts out disabled and the reactor does a one-time enable based on the command line flag
--abort-on-seastar-bad-alloc
.This series implements the ability to disable it and also to get its status.
A small test is included.