-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return 400 error for GetUserPrivileges call with API keys #89333
Return 400 error for GetUserPrivileges call with API keys #89333
Conversation
The GetUserPrivileges API returns a 500 error when it is called with an API key that has assigned role descriptors. This is because the underlying LimitedRole class that represents the API key's effective privileges does not support building a simple view of the privileges. This PR changes the code to return 400 error instead of 500 along with a better error message that suggests the GetApiKey API as an alternative. Relates: elastic#89058
Pinging @elastic/es-security (Team:Security) |
Hi @ywangd, I've created a changelog YAML for you. |
"Cannot retrieve privileges for API keys with assigned role descriptors. " | ||
+ "Please use the Get API key information API https://ela.st/es-api-get-api-key", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakelandis I am not entirely clear about the current recommendation of adding HTTP links into error messages (or more general HTTP responses). My previous understanding is that we don't want to do it because the links may break. But I have noticed that links are being used today, e.g. in the health API. Could you please adivse whether it is OK to use such a short url here? If not, what would be our choices? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this but I'm generally in favor of links in the error/response messages. It's helpful and potentially helps us cut down on lengthy failure messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both deprecation's and new health API use shortened URL links embedded in the code. The primary reason is that they will be viewed inside Kibana. I am a fan of using URL links in error messages (especially shortened links) but I don't think we have widely adopted them quite yet. While the shortening service itself is solid, some of the tooling and management of the shortened URLs are lacking. For example: #78273 and creating/updating these is a hodgepodge of custom scripts and manual efforts. We have also had some conversations around version specific links, but settled on just pointing to current version and we can figure that part out later.
I don't think we have an official stance, but IMO if it is only additive information to the error message I don't see any harm. If the docs move, they generally leave behind a re-direct or page to manually redirect and having the link is better than not. I imagine existing usages and increased usage will force a better eco-system around managing these. cc @masseyke (since he worked with these recently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context, Jake. The link you shared is super helpful! I like the proposal it puts forward. It's good to know that we are thinking ways to make things more robust.
For this particular usage here, my take is that it should be fine. Once the broader framework is ready, we just need to remember migrating it to the new system. The https://ela.st
part is pretty easy to search upon. So I am not worried at all.
try { | ||
authorizationService.retrieveUserPrivileges(subject, securityContext.getAuthorizationInfoFromContext(), listener); | ||
} catch (UnsupportedOperationException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to use try-catch
instead of pro-actively checking the Role class because:
- Whether LimitedRole can throw exception on building privileges response is an implementation detail. The LimitedRole class is in a separate module (
core
) and in theory does not even need to be exposed to the security module. In fact, security module currently has no direct reference to it. - Checking the Role class does not really work with Custom Authorization engine by checking because it is theoretically possible for a custom engine to implement the
retrieveUserPrivileges
call in a different way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only reservation here is that we are leaking this implementation detail all the way up to the transport layer in an unstructured way: we are assuming that any UnsupportedOperationException
thrown by retrieveUserPrivileges
for an API key is because our implementation of retrieveUserPrivileges
doesn't support this. UnsupportedOperationException
is generic though so this could lead to unexpected behavior/incorrect error messages for custom implementations (or if we throw UnsupportedOperationException
somewhere else throughout retrieveUserPrivileges
).
I'm wondering if we should specialize the exception thrown in LimitedRole
to a sublass of UnsupportedOperationException
. This way we still leak the implementation but in a more robust, structured way.
Alternatively, we could throw the 400 API-key specific exception inside RBACEngine
which would keep it closer to the root cause and not force our interpretation on exceptions thrown by custom implementations.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's awkward. The problem is that not every API key should fail. Only those with assigned role descriptors will fail. Underlying those API keys have their roles represented as LimitedRole
. But I don't want to expose LimiteRole
to the security module because it should just rely on the Role
interface. I pondered on this and didn't find a clear better solution:
- Throwing specific subclass of
UnsupportedOperationException
(or just a specific exception) is slightly better. But it still leaks implementation. I also didn't want to add another class just for this error message. My thought was that the exception + checking authentication being an API key (authentication.isApiKey()
inside the catch block) is sufficient to rule out accidental unexpected behaviour in future. - By "throwing 400 API key specific exception inside RBACEngine", I assume you mean actively checking whether the
Role
is aLimitedRole
and throw error accordingly? This would requireRBACEngine
be aware ofLimitedRole
which I rather dislike as stated above. - If you mean use try-catch for
UnsupportedOperationException
in RBACEngine, this means putting the error message inside RBACEngine. I initially dislike this because I felt this error message, especially with the HTTP url should be a handle level thing. But I now noticed that similar messages are prepared at other service classes (mostly implementations of the newHealthIndicatorService
). So maybe this is not too bad.
So overall, I think option 3 is my current preference. More concretely, catching UnsupportedOperationException
in RBACEngine#getUserPrivileges
and retrow it as a IllegalArgumentException
with proper error message. Using IllegalArgumentException
avoids having to set an explicit 400 status code which is something I also didn't like at service layer. Please let me know if this works for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, I meant option 3 👍
I think that building a message that feels "handler-layer" inside RBACEngine#getUserPrivileges
is a worthwhile trade-off here: we get to handle the issue close to the root cause and don't artificially impose limitations/interpretation on other implementations of AuthorizationEngine
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question around where and how we throw, before I review the rest of the PR.
Question inline, in response to your comment here.
try { | ||
authorizationService.retrieveUserPrivileges(subject, securityContext.getAuthorizationInfoFromContext(), listener); | ||
} catch (UnsupportedOperationException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only reservation here is that we are leaking this implementation detail all the way up to the transport layer in an unstructured way: we are assuming that any UnsupportedOperationException
thrown by retrieveUserPrivileges
for an API key is because our implementation of retrieveUserPrivileges
doesn't support this. UnsupportedOperationException
is generic though so this could lead to unexpected behavior/incorrect error messages for custom implementations (or if we throw UnsupportedOperationException
somewhere else throughout retrieveUserPrivileges
).
I'm wondering if we should specialize the exception thrown in LimitedRole
to a sublass of UnsupportedOperationException
. This way we still leak the implementation but in a more robust, structured way.
Alternatively, we could throw the 400 API-key specific exception inside RBACEngine
which would keep it closer to the root cause and not force our interpretation on exceptions thrown by custom implementations.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
when(role.application()).thenReturn(ApplicationPermission.NONE); | ||
when(role.runAs()).thenReturn(RunAsPermission.NONE); | ||
|
||
final UnsupportedOperationException unsupportedOperationException = mock(UnsupportedOperationException.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: would just instantiate the exception here instead of mocking since we're not relying on any mocked behavior.
try { | ||
getUserPrivilegesResponse = buildUserPrivilegesResponseObject(role); | ||
} catch (UnsupportedOperationException e) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: to stay consistent with the other exception above, should we pass this to onFailure
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to be consistent with the existing behaviour, i.e. buildUserPrivilegesResponseObject
just throws exceptions. But consistent with the caller method is also good. I'll make the change
Test failure is being fixed by https://github.com/elastic/elasticsearch/pull/89360/files#r946240748 |
@elasticmachine update branch |
@elasticmachine update branch |
* upstream/main: (265 commits) Disable openid connect tests due to missing fixture (elastic#89478) Add periodic job for single processor node testing Updates to changelog processing after docs redesign (elastic#89463) Better support for multi cluster for run task (elastic#89442) Mute failing tests (elastic#89465) [ML] Performance improvements related to ECS Grok pattern usage (elastic#89424) Add source fallback support for date and date_nanos mapped types (elastic#89440) Reuse Info in lifecycle step (elastic#89419) feature: support metrics for multi value fields (elastic#88818) Upgrade OpenTelemetry API and remove workaround (elastic#89438) Remove LegacyClusterTaskResultActionListener (elastic#89459) Add YAML spec docs about matching errors (elastic#89370) Remove redundant cluster upgrade tests for auth tokens (elastic#89417) Return 400 error for GetUserPrivileges call with API keys (elastic#89333) User Profile - Detailed errors in hasPrivileges response (elastic#89224) Rollover min_* conditions docs and highlight (elastic#89434) REST tests for percentiles_bucket agg (elastic#88029) REST tests for cumulative pipeline aggs (elastic#88966) Clean-up file watcher keys. (elastic#89429) fix a typo in Security.java (elastic#89248) ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java
The GetUserPrivileges API returns a 500 error when it is called with an
API key that has assigned role descriptors. This is because the
underlying LimitedRole class that represents the API key's effective
privileges does not support building a simple view of the privileges.
This PR changes the code to return 400 error instead of 500 along with a
better error message that suggests the GetApiKey API as an alternative.
Relates: #89058