Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulkGet saved objects across spaces #109967

Merged
merged 12 commits into from
Aug 26, 2021

Conversation

jportner
Copy link
Contributor

@jportner jportner commented Aug 24, 2021

Summary

Resolves #109197.

These unit tests were not using mocks correctly, so they passed when
they should not have:
 * '#find returns empty result if user is unauthorized in any space'
 * '#openPointInTimeForType throws error if if user is unauthorized in any space'

The previous refactor fixed the mocks, which caused these two tests to
start failing. In this commit, I have fixed the code so the tests pass
as written.
@jportner jportner added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes v7.16.0 labels Aug 24, 2021
@jportner jportner changed the title bulkGet across spaces bulkGet saved objects across spaces Aug 24, 2021
@pgayvallet
Copy link
Contributor

Did not look at implementation in depth for now, but API looks good

1. Removed previous changes to pass unit tests, changed the tests
   instead. The tests were meant to be asserting for the "403 Forbidden"
   error.
2. Updated bulkGet to replace `'*'` with an empty namespaces array when
   a user is not authorized to access any spaces. This is primarily so
   that the API is consistent (the Security plugin will still check
   that they are authorized to access the type in the active space and
   return a more specific 403 error if not).
At the repository level, if a user tries to bulkGet an isolated object
in multiple spaces (explicitly defined, or '*') they will get a 400 Bad
Request error. However, in the Spaces SOC wrapper we are deconstructing
the '*' identifier and replacing it with all available spaces. This can
cause a problem when the user has access to only one space, the API
response would be inconsistent (the repository would treat this as a
perfectly valid request and attempt to fetch the object.)
So, now in the Spaces SOC wrapper we modify the results based on what
the request, leaving any 403 errors from the Security SOC wrapper
intact.
Copy link
Contributor Author

@jportner jportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Author's notes for reviewers.

Note: I looked at the EncryptedSavedObjects SOC wrapper and it doesn't need any changes, since it just takes the returned objects and decrypts the attributes accordingly.

const allTypes = normalTypes.concat(hiddenType);
const allTypes = [...normalTypes, ...crossNamespace, ...hiddenType];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is unrelated to the rest of this PR, but as I was looking at these test cases I noticed that the crossNamespace test cases were not included in the allTypes superset. That meant they were getting skipped for the superuser.

return availableSpacesPromise;
};

const expectedResults = await Promise.all(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ugly but necessary to keep the API responses consistent for various user- and space-combinations. See the individual commit messages (5e5e00c, f224966) and integration tests for more info.

Comment on lines -386 to +463
const namespaces = await this.getSearchableSpaces(options.namespaces);
let namespaces: string[];
try {
namespaces = await this.getSearchableSpaces(options.namespaces);
} catch (err) {
if (Boom.isBoom(err) && err.output.payload.statusCode === 403) {
// throw bad request since the user is unauthorized in any space
throw SavedObjectsErrorHelpers.createBadRequestError();
}
throw err;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test for this (and for find) were not working correctly (see e6588b7). I fixed the tests and added this bit to match the behavior to the unit tests.

@jportner jportner marked this pull request as ready for review August 25, 2021 20:13
@jportner jportner requested review from a team as code owners August 25, 2021 20:13
@jportner
Copy link
Contributor Author

@elasticmachine merge upstream

@jportner
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. A few remarks and questions

Comment on lines +50 to +51
{ ...CASES.MULTI_NAMESPACE_ISOLATED_ONLY_SPACE_1, namespaces: [SPACE_1_ID] }, // second try searches for it in a single other space, which is valid
{ ...CASES.MULTI_NAMESPACE_DEFAULT_AND_SPACE_1, namespaces: [SPACE_2_ID], ...fail404() },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering: our bulkGet security suite is only performing tests where it bulkGets a single object, right 😅 ?

Copy link
Contributor Author

@jportner jportner Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the bulk* SO test suites behave as follows:

  • For non-forbidden cases (successes and object-specific errors), batch them into a single request
  • For forbidden cases (where the entire request fails with a 403), test each case individually

Note this is not automated in any way but it is controlled by use of the singleRequest option in each test definition.

return true;
}

const namespacesToCheck = new Set(namespaces);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I don't think there's any need to convert to a set here? Can't we just work with the initial array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, had the same thought. The only corner case I can imagine where it'd be beneficial to have a Set is if we have large existingNamespaces and namespaces lists that have intersection only at the end of these lists (and bulkGet is large enough to have a significant compound effect), but it probably doesn't make sense to cater for that.

Having a set here will not completely protect us from a malicious actor anyway, they can just do something like this...., for every ID:

const namespaces = [...Array.from({ length: 1000000 }).map(() => Math.random().toString()), 'valid-ns']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the Set here is to reduce the time complexity, this takes us down from O(n^2) to O(n). This is because line 179 below loops through existingNamespaces and checks to see if any existing space is present in namespacesToCheck.

// documents with the correct namespace prefix. We may revisit this in the future.
const doc1 = createRawDoc(SINGLE_NAMESPACE_TYPE, { namespace: 'some-space' }); // the namespace field is ignored
const doc2 = createRawDoc(SINGLE_NAMESPACE_TYPE, { namespaces: ['some-space'] }); // the namespaces field is ignored
expect(rawDocExistsInNamespaces(registry, doc1, [])).toBe(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a remark: We already talked about it in a similar function from a previous PR, but I still don't like the way single-NS docs are short-circuiting these functions, as it could lead to errors if a developer decides to use it elsewhere than in a part of the code where we're already assured that the fetched single-NS objects are of the correct requested space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll one-up you, I don't like single-NS docs at all 😄

Comment on lines 295 to 301
await this.legacyEnsureAuthorized(
this.getUniqueObjectTypes(objects),
'bulk_get',
options.namespace,
namespaces,
{
args,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: I thought we were doing per-object security checks and returning per-object unauthorized errors (as we do for 404)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! If the user is unauthorized for part of the request, the entire request fails 😄

Comment on lines +238 to +242
let availableSpacesPromise: Promise<string[]> | undefined;
const getAvailableSpaces = async () => {
if (!availableSpacesPromise) {
availableSpacesPromise = this.getSearchableSpaces([ALL_SPACES_ID]).catch((err) => {
if (Boom.isBoom(err) && err.output.payload.statusCode === 403) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just

const getAvailableSpaces = async () => { ... }
const availableSpaces = await getAvailableSpaces();

instead of using Promise.all around the expectedResults ? current implementation seems more complex than necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or something like this to block on this only when necessary:

const availableSpaces = objects.some((object) => object.namespaces?.includes(ALL_SPACES_ID))
  ? await this.getSearchableSpaces([ALL_SPACES_ID]).catch((err) => {
      // the user doesn't have access to any spaces
      if (Boom.isBoom(err) && err.output.payload.statusCode === 403) {
        return [];
      }

      throw err;
    })
  : [];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the idea here was not to fetch all available spaces if we didn't need to, since that is a potentially expensive operation.

We could go through @azasypkin's suggestion but it effectively means we loop through all objects' namespaces fields twice in the worst case scenario. I'm on the fence about changing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could go through @azasypkin's suggestion but it effectively means we loop through all objects' namespaces fields twice in the worst case scenario.

My assumption was that the time it takes to iterate through objects in request is negligible comparing to the request execution time, but it's just not-validated assumption, feel free to pick whatever approach you feel is better!

Comment on lines 281 to 290
if (isLeft(expectedResult) && actualResult?.error?.statusCode !== 403) {
const { type, id } = expectedResult.value;
return ({
type,
id,
error: SavedObjectsErrorHelpers.createBadRequestError(
'"namespaces" can only specify a single space when used with space-isolated types'
).output.payload,
} as unknown) as SavedObject<T>;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud here, but I wonder if we would be able to perform this check before the underlying call to client.bulkGet?. I think we have all the necessary informations without performing the call?

In that case, we could potentially only bulkGet the objects that are not in that situation, and aggregate with the ones that are matching this case? May be premature optimization though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original implementation that I wrote used that approach -- any object that we predicted should result in a bad request, it skipped for the base client bulkGet.

The problem is that we really need to factor in behavior of both the Secure SOC wrapper and the SOR. If the user is unauthorized to get a specific type, then the request would never reach the SOR level validation. So that original implementation caused even more irregular API responses (and made the integration tests fail spectacularly).

However: I realized as I wrote this comment that the && actualResult?.error?.statusCode !== 403 bit is not necessary; if the user is unauthorized to fetch one specific object, the whole operation fails, so that code would never be reached anyway. So I'll take that bit out 😄

Copy link
Member

@azasypkin azasypkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM 👍

Comment on lines +1002 to +1003
const getNamespaceId = (namespaces?: string[]) =>
namespaces !== undefined ? SavedObjectsUtils.namespaceStringToId(namespaces[0]) : namespace;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: no need to change anything, just wanted to note that if the reader (okay, that was me) doesn't know the internal implementation of generateRawId by heart they may be confused by the SavedObjectsUtils.namespaceStringToId(namespaces[0]) for shareable objects as it gives impression that we search only in the very first namespace 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I had to check the implementation too to be sure 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment here to clarify.

return true;
}

const namespacesToCheck = new Set(namespaces);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, had the same thought. The only corner case I can imagine where it'd be beneficial to have a Set is if we have large existingNamespaces and namespaces lists that have intersection only at the end of these lists (and bulkGet is large enough to have a significant compound effect), but it probably doesn't make sense to cater for that.

Having a set here will not completely protect us from a malicious actor anyway, they can just do something like this...., for every ID:

const namespaces = [...Array.from({ length: 1000000 }).map(() => Math.random().toString()), 'valid-ns']

Comment on lines +238 to +242
let availableSpacesPromise: Promise<string[]> | undefined;
const getAvailableSpaces = async () => {
if (!availableSpacesPromise) {
availableSpacesPromise = this.getSearchableSpaces([ALL_SPACES_ID]).catch((err) => {
if (Boom.isBoom(err) && err.output.payload.statusCode === 403) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or something like this to block on this only when necessary:

const availableSpaces = objects.some((object) => object.namespaces?.includes(ALL_SPACES_ID))
  ? await this.getSearchableSpaces([ALL_SPACES_ID]).catch((err) => {
      // the user doesn't have access to any spaces
      if (Boom.isBoom(err) && err.output.payload.statusCode === 403) {
        return [];
      }

      throw err;
    })
  : [];

@jportner jportner added the auto-backport Deprecated - use backport:version if exact versions are needed label Aug 26, 2021
@jportner jportner enabled auto-merge (squash) August 26, 2021 14:11
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

API count

id before after diff
core 2246 2247 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@jportner jportner merged commit 695280b into elastic:master Aug 26, 2021
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed release_note:skip Skip the PR/issue when compiling release notes v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add multi-namespace capabilities to SOR.bulkGet
4 participants