Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return 400 error for GetUserPrivileges call with API keys #89333

Merged

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Aug 15, 2022

The GetUserPrivileges API returns a 500 error when it is called with an
API key that has assigned role descriptors. This is because the
underlying LimitedRole class that represents the API key's effective
privileges does not support building a simple view of the privileges.

This PR changes the code to return 400 error instead of 500 along with a
better error message that suggests the GetApiKey API as an alternative.

Relates: #89058

The GetUserPrivileges API returns a 500 error when it is called with an
API key that has assigned role descriptors. This is because the
underlying LimitedRole class that represents the API key's effective
privileges does not support building a simple view of the privileges.

This PR changes the code to return 400 error instead of 500 along with a
better error message that suggests the GetApiKey API as an alternative.

Relates: elastic#89058
@ywangd ywangd added >enhancement :Security/Security Security issues without another label v8.5.0 labels Aug 15, 2022
@ywangd ywangd requested a review from n1v0lg August 15, 2022 05:19
@elasticsearchmachine elasticsearchmachine added the Team:Security Meta label for security team label Aug 15, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticsearchmachine
Copy link
Collaborator

Hi @ywangd, I've created a changelog YAML for you.

Comment on lines 60 to 61
"Cannot retrieve privileges for API keys with assigned role descriptors. "
+ "Please use the Get API key information API https://ela.st/es-api-get-api-key",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakelandis I am not entirely clear about the current recommendation of adding HTTP links into error messages (or more general HTTP responses). My previous understanding is that we don't want to do it because the links may break. But I have noticed that links are being used today, e.g. in the health API. Could you please adivse whether it is OK to use such a short url here? If not, what would be our choices? Thanks!

Copy link
Contributor

@n1v0lg n1v0lg Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this but I'm generally in favor of links in the error/response messages. It's helpful and potentially helps us cut down on lengthy failure messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both deprecation's and new health API use shortened URL links embedded in the code. The primary reason is that they will be viewed inside Kibana. I am a fan of using URL links in error messages (especially shortened links) but I don't think we have widely adopted them quite yet. While the shortening service itself is solid, some of the tooling and management of the shortened URLs are lacking. For example: #78273 and creating/updating these is a hodgepodge of custom scripts and manual efforts. We have also had some conversations around version specific links, but settled on just pointing to current version and we can figure that part out later.

I don't think we have an official stance, but IMO if it is only additive information to the error message I don't see any harm. If the docs move, they generally leave behind a re-direct or page to manually redirect and having the link is better than not. I imagine existing usages and increased usage will force a better eco-system around managing these. cc @masseyke (since he worked with these recently)

Copy link
Member Author

@ywangd ywangd Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context, Jake. The link you shared is super helpful! I like the proposal it puts forward. It's good to know that we are thinking ways to make things more robust.

For this particular usage here, my take is that it should be fine. Once the broader framework is ready, we just need to remember migrating it to the new system. The https://ela.st part is pretty easy to search upon. So I am not worried at all.

Comment on lines 55 to 57
try {
authorizationService.retrieveUserPrivileges(subject, securityContext.getAuthorizationInfoFromContext(), listener);
} catch (UnsupportedOperationException e) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to use try-catch instead of pro-actively checking the Role class because:

  1. Whether LimitedRole can throw exception on building privileges response is an implementation detail. The LimitedRole class is in a separate module (core) and in theory does not even need to be exposed to the security module. In fact, security module currently has no direct reference to it.
  2. Checking the Role class does not really work with Custom Authorization engine by checking because it is theoretically possible for a custom engine to implement the retrieveUserPrivileges call in a different way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only reservation here is that we are leaking this implementation detail all the way up to the transport layer in an unstructured way: we are assuming that any UnsupportedOperationException thrown by retrieveUserPrivileges for an API key is because our implementation of retrieveUserPrivileges doesn't support this. UnsupportedOperationException is generic though so this could lead to unexpected behavior/incorrect error messages for custom implementations (or if we throw UnsupportedOperationException somewhere else throughout retrieveUserPrivileges).

I'm wondering if we should specialize the exception thrown in LimitedRole to a sublass of UnsupportedOperationException. This way we still leak the implementation but in a more robust, structured way.

Alternatively, we could throw the 400 API-key specific exception inside RBACEngine which would keep it closer to the root cause and not force our interpretation on exceptions thrown by custom implementations.

WDYT?

Copy link
Member Author

@ywangd ywangd Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's awkward. The problem is that not every API key should fail. Only those with assigned role descriptors will fail. Underlying those API keys have their roles represented as LimitedRole. But I don't want to expose LimiteRole to the security module because it should just rely on the Role interface. I pondered on this and didn't find a clear better solution:

  1. Throwing specific subclass of UnsupportedOperationException (or just a specific exception) is slightly better. But it still leaks implementation. I also didn't want to add another class just for this error message. My thought was that the exception + checking authentication being an API key (authentication.isApiKey() inside the catch block) is sufficient to rule out accidental unexpected behaviour in future.
  2. By "throwing 400 API key specific exception inside RBACEngine", I assume you mean actively checking whether the Role is a LimitedRole and throw error accordingly? This would require RBACEngine be aware of LimitedRole which I rather dislike as stated above.
  3. If you mean use try-catch for UnsupportedOperationException in RBACEngine, this means putting the error message inside RBACEngine. I initially dislike this because I felt this error message, especially with the HTTP url should be a handle level thing. But I now noticed that similar messages are prepared at other service classes (mostly implementations of the new HealthIndicatorService). So maybe this is not too bad.

So overall, I think option 3 is my current preference. More concretely, catching UnsupportedOperationException in RBACEngine#getUserPrivileges and retrow it as a IllegalArgumentException with proper error message. Using IllegalArgumentException avoids having to set an explicit 400 status code which is something I also didn't like at service layer. Please let me know if this works for you.

Copy link
Contributor

@n1v0lg n1v0lg Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I meant option 3 👍

I think that building a message that feels "handler-layer" inside RBACEngine#getUserPrivileges is a worthwhile trade-off here: we get to handle the issue close to the root cause and don't artificially impose limitations/interpretation on other implementations of AuthorizationEngine.

Copy link
Contributor

@n1v0lg n1v0lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question around where and how we throw, before I review the rest of the PR.

Question inline, in response to your comment here.

Comment on lines 55 to 57
try {
authorizationService.retrieveUserPrivileges(subject, securityContext.getAuthorizationInfoFromContext(), listener);
} catch (UnsupportedOperationException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only reservation here is that we are leaking this implementation detail all the way up to the transport layer in an unstructured way: we are assuming that any UnsupportedOperationException thrown by retrieveUserPrivileges for an API key is because our implementation of retrieveUserPrivileges doesn't support this. UnsupportedOperationException is generic though so this could lead to unexpected behavior/incorrect error messages for custom implementations (or if we throw UnsupportedOperationException somewhere else throughout retrieveUserPrivileges).

I'm wondering if we should specialize the exception thrown in LimitedRole to a sublass of UnsupportedOperationException. This way we still leak the implementation but in a more robust, structured way.

Alternatively, we could throw the 400 API-key specific exception inside RBACEngine which would keep it closer to the root cause and not force our interpretation on exceptions thrown by custom implementations.

WDYT?

@ywangd ywangd requested a review from n1v0lg August 15, 2022 09:37
Copy link
Contributor

@n1v0lg n1v0lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

when(role.application()).thenReturn(ApplicationPermission.NONE);
when(role.runAs()).thenReturn(RunAsPermission.NONE);

final UnsupportedOperationException unsupportedOperationException = mock(UnsupportedOperationException.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would just instantiate the exception here instead of mocking since we're not relying on any mocked behavior.

try {
getUserPrivilegesResponse = buildUserPrivilegesResponseObject(role);
} catch (UnsupportedOperationException e) {
throw new IllegalArgumentException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: to stay consistent with the other exception above, should we pass this to onFailure instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to be consistent with the existing behaviour, i.e. buildUserPrivilegesResponseObject just throws exceptions. But consistent with the caller method is also good. I'll make the change

@ywangd
Copy link
Member Author

ywangd commented Aug 16, 2022

@ywangd
Copy link
Member Author

ywangd commented Aug 16, 2022

@elasticmachine update branch

@ywangd
Copy link
Member Author

ywangd commented Aug 18, 2022

@elasticmachine update branch

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Aug 18, 2022
@elasticsearchmachine elasticsearchmachine merged commit 3bb13e2 into elastic:main Aug 18, 2022
@ywangd ywangd deleted the error-400-for-api-get-privileges branch August 18, 2022 03:05
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Aug 19, 2022
* upstream/main: (265 commits)
  Disable openid connect tests due to missing fixture (elastic#89478)
  Add periodic job for single processor node testing
  Updates to changelog processing after docs redesign (elastic#89463)
  Better support for multi cluster for run task (elastic#89442)
  Mute failing tests (elastic#89465)
  [ML] Performance improvements related to ECS Grok pattern usage (elastic#89424)
  Add source fallback support for date and date_nanos mapped types (elastic#89440)
  Reuse Info in lifecycle step (elastic#89419)
  feature: support metrics for multi value fields (elastic#88818)
  Upgrade OpenTelemetry API and remove workaround (elastic#89438)
  Remove LegacyClusterTaskResultActionListener (elastic#89459)
  Add YAML spec docs about matching errors (elastic#89370)
  Remove redundant cluster upgrade tests for auth tokens (elastic#89417)
  Return 400 error for GetUserPrivileges call with API keys (elastic#89333)
  User Profile - Detailed errors in hasPrivileges response (elastic#89224)
  Rollover min_* conditions docs and highlight (elastic#89434)
  REST tests for percentiles_bucket agg (elastic#88029)
  REST tests for cumulative pipeline aggs (elastic#88966)
  Clean-up file watcher keys. (elastic#89429)
  fix a typo in Security.java (elastic#89248)
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement :Security/Security Security issues without another label Team:Security Meta label for security team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants