Add support for cursors and limits on LookupResources API calls #1296

josephschorr · 2023-05-03T20:19:57Z

Note
This PR has changes for the ReachableResources and LookupResources dispatch

This change adds support for (optional) cursors and limits on LookupResource API calls, changing them to be fully streaming, which should significantly reduce or even remove the pause when loading very large sets of resources

This is accomplished by changing ReachableResources into a non-parallelized, ordered (internally defined) and cursored API and having LookupResources invoke it in a streaming manner, passing the cursor down into it.

This PR also has a number of small tech debt cleanups, including renaming Lookup -> LookupResources everywhere

Fixes #43

vroldanbet

first batch of comments, I'm still reviewing 😅

Please note that I'm going commit by commit, so some of my questions may not apply in subsequent commits

internal/graph/cursors.go

proto/internal/dispatch/v1/dispatch.proto

vroldanbet · 2023-05-10T09:43:43Z

internal/graph/reachableresources.go

-			})
-		}
+	ci := newCursorInformation(req.OptionalCursor)
+	return withSingletonInCursor(ci, "same-type",


introduce constant SECTION_SAME_TYPE

As the section names are only used within the code once, I don't know if I see great value in const-ing them

I thought they could be documented at the top and help the reader understand the different sections involved in a reachability graph call

vroldanbet · 2023-05-10T10:48:12Z

internal/graph/cursors.go

+	// outgoingCursorSections are the sections to be added to the outgoing *partial* cursor.
+	// It is the responsibility of the *caller* to append together the incoming cursors to form
+	// the final cursor.
+	outgoingCursorSections []string


I'm not sure if these strings will be allocated over and over during reachability dispatch or if the go compiler will be clever enough to internalize them, but I'm thinking we could provide nicer type checking and reduce allocations and runtime-memory at request time if we create a Section enum type

The problem is that sections are be "arbitrary" in their nesting: how would we predefine a type?

correct me if im wrong but this slice essentially encodes a "header value" and "N subsequent values" for each section. How is that something we cannot abstract into a type and avoid using strings. It seemed like all sections had predefined names, but the nesting was indeed arbitrary. Do I miss something?

internal/graph/reachableresources.go

vroldanbet · 2023-05-10T12:06:47Z

internal/graph/cursors.go

+
+	isFirstIteration := true
+	for index, item := range items {
+		if index < afterIndex {


I'm having a hard time understanding why cursorInformation has the structure it has. I'm probably missing something since this PR is massive and I'm going commit by commit, but I keep asking myself why the choice of modeling sections and values with a slice. I recall you mentioning maps wouldn't work because we don't have a guarantee of the order, which is very important for the cursor, but why couldn't create a struct field for each section, and give the cursor a more type-safe shape. This kind of index contortions and jumping around is what preoccupies me, if we get anything wrong... we could be skipping elements during the iteration.

Because of the arbitrary nesting of cursor sections. As an example, entrypoint #0 might contain a data access underneath it, while entrypoint #1 contains a direct match return, and entrypoint #2 follows a TTU that leads to yet more entrypoints. The chain of sections is different for each path of code being followed.

so you say you can't model it as something like this?

type section struct { name string values []string children []section }

internal/graph/reachableresources.go

vroldanbet · 2023-05-10T18:39:52Z

internal/graph/reachableresources.go

-	}
-	return a
-}
+var queryLimit uint64 = 100


I'm not sure if this is part of this PR to address or something for a follow up, but wouldn't this hardcoded limit means that we would be potentially retrieving more elements that the LookupResources API asked for?
Say you call with limit 10, you want 10 elements at a time, but the ReverseQueryRelationships call is retrieving a larger page (100), more than needed.

Correct but its necessary for efficiency reasons

How's that more efficient? Can you elaborate?

Because you're not returning these results, you're returning the results computed from them. As an example, say you ask for 10 resources... we might have to follow 50 resources here to compute those 10, so its better to look them up in bulk

that's a fair point. I guess this is another of those points where we would fare better by keeping in-memory stats of say, the p99 size of relationships returned that yielded a positive result. not much we can do about the negative result. Should I open an issue?

internal/graph/cursors.go

vroldanbet · 2023-05-10T20:29:59Z

internal/graph/reachableresources.go

@@ -360,21 +362,28 @@ func (crr *ConcurrentReachableResources) redispatchOrReport(
 		if foundResourceType.Namespace == parentRequest.ResourceRelation.Namespace &&
 			foundResourceType.Relation == parentRequest.ResourceRelation.Relation {
 			return parentStream.Publish(&v1.DispatchReachableResourcesResponse{


A bit tangential, but while reviewing the semantics of optimized revisions, I think I found that we don't have any test for this scenario right here, which if I understood correctly is reachability is invoked over the last entry point along the current path.

This is possibly a dumb question: something I noted was the optimized entrypoints might sometimes return empty even though there are entry points via the full entry point methods, (test) case in point here. Can you help me understand why it is ok to not dispatch the "non-optimized" reachability graph when optimized revisions is empty?

optimized entrypoints will never return empty if there is at least one valid entrypoint in the unoptimized version. optimized entrypoints is different because it only follows a single child of the intersection, rather than all of them, because if we find a resource on one path of the intersection, it must be on all others for it to be valid

I suspected that was the case but was confused by a test-case that has optimized revision returning empty

internal/graph/reachableresources.go

vroldanbet · 2023-05-11T08:56:34Z

internal/graph/reachableresources.go

-	}
-	return a
-}
+var queryLimit uint64 = 100


How's that more efficient? Can you elaborate?

internal/graph/cursors.go

vroldanbet · 2023-05-11T09:20:39Z

internal/graph/reachableresources.go

+			rsm := newResourcesSubjectMap(resourceType)
+			var lastTpl *core.RelationTuple
+			for tpl := it.Next(); tpl != nil; tpl = it.Next() {
+				if it.Err() != nil {


if err := it.Err(); err != nil { return nil, err }

Why this vs just calling it.Err again?

it's idiomatic go?

internal/graph/reachableresources.go

internal/dispatch/graph/reachableresources_test.go

internal/graph/reachableresources.go

internal/dispatch/graph/reachableresources_test.go

internal/namespace/reachabilitygraph.go

internal/dispatch/graph/reachableresources_test.go

vroldanbet · 2023-05-11T12:54:09Z

internal/graph/reachableresources.go

@@ -225,7 +236,7 @@ func (crr *ConcurrentReachableResources) chunkedRedispatch(
 	resourceType *core.RelationReference,
 	handler func(ctx context.Context, ci cursorInformation, resources dispatchableResourcesSubjectMap) error,
 ) error {
-	return withQueryInCursor(ci, "query-rels",
+	return withDatastoreCursorInCursor(ci, "query-rels",


Sorry if I'm reiterating myself on some concepts, but I'm going commit by commit, so some assumptions do not hold from one commit to another.

Even though this commit is introducing the concept of limits, we are deliberately not using it in withDatastoreCursorInCursor. I think that's probably ok because we might need to redispatch anyway, which gives you a subset, but I wonder if, based on the limit, we can follow a heuristic to follow a proportional amount of relationships from ReverseQueryRelationship. If your limit is, say, 1K (since we cannot do more than that because of proto validation), we could issue up to 10 DB roundtrips because we hardcode query limit as 100.

vroldanbet · 2023-05-11T13:57:46Z

internal/graph/cursors.go

+	}
+
+	// -1 means that the handler has been completed.
+	return next(ci.mustWithOutgoingSection(name, "-1"))


let's extract that -1 and turn it into a constant with its semantics documented.

internal/namespace/reachabilitygraph.go

internal/graph/limits.go

internal/graph/resourcesubjectsmap.go

internal/graph/reachableresources.go

vroldanbet · 2023-05-11T18:11:39Z

internal/dispatch/caching/caching.go

 		Namespace: prometheusNamespace,
 		Subsystem: prometheusSubsystem,
-		Name:      "lookup_total",
+		Name:      "lookup_resources_total",


rename of prometheus metrics is going to break anybody out there that had created Grafana dashboards for SpiceDB... including us 😅

Given this is a pretty big change and we'd need to carefully monitor how this affects our production systems during the rollout, can we keep the rename for all the things except prometheus, and we do the prometheus rename in a follow up, in an isolated deployment?

I think it's going to cause problems to folks out there monitoring SpiceDB. Is there perhaps a way to do aliasing?

Hrmph.... maybe we should alias it?

I couldn't find any reference to aliasing in prometheus, and duplicating all these metrics can easily increase the costs on the observability stack in large enough deployments. I'm not sure honestly, the most conservative would be to leave it as it is, everything else would be properly named. Or we can decide to not honor metrics names, which can be painful for the community.

internal/dispatch/caching/caching.go

internal/dispatch/graph/lookupresources_test.go

proto/internal/impl/v1/impl.proto

internal/dispatch/keys/computed_test.go

internal/dispatch/keys/hasher_common.go

internal/services/v1/hash.go

internal/graph/reachableresources.go

internal/services/integrationtesting/consistency_test.go

internal/middleware/consistency/consistency.go

vroldanbet · 2023-05-12T12:55:24Z

pkg/cursor/cursor_test.go

+
+func TestDecode(t *testing.T) {
+	for _, testCase := range []struct {
+		format           string


nit: name. It took me a few more seconds than I would have liked to realize this was a test case name.

josephschorr · 2023-05-17T18:17:26Z

Updated

vroldanbet

LGTM, thanks for your patience with such a long review! ✨

vroldanbet · 2023-05-19T13:37:52Z

internal/dispatch/caching/caching.go

 		Namespace: prometheusNamespace,
 		Subsystem: prometheusSubsystem,
-		Name:      "lookup_total",
+		Name:      "lookup_resources_total",


I couldn't find any reference to aliasing in prometheus, and duplicating all these metrics can easily increase the costs on the observability stack in large enough deployments. I'm not sure honestly, the most conservative would be to leave it as it is, everything else would be properly named. Or we can decide to not honor metrics names, which can be painful for the community.

vroldanbet · 2023-05-19T13:43:43Z

internal/graph/cursors.go

+	// outgoingCursorSections are the sections to be added to the outgoing *partial* cursor.
+	// It is the responsibility of the *caller* to append together the incoming cursors to form
+	// the final cursor.
+	outgoingCursorSections []string


correct me if im wrong but this slice essentially encodes a "header value" and "N subsequent values" for each section. How is that something we cannot abstract into a type and avoid using strings. It seemed like all sections had predefined names, but the nesting was indeed arbitrary. Do I miss something?

vroldanbet · 2023-05-19T14:22:42Z

internal/graph/cursors.go

+
+	isFirstIteration := true
+	for index, item := range items {
+		if index < afterIndex {


so you say you can't model it as something like this?

type section struct { name string values []string children []section }

internal/namespace/reachabilitygraph.go

vroldanbet · 2023-05-19T14:45:35Z

internal/services/v1/permissions.go

 	errG.Go(func() error {
 		return namespace.CheckNamespaceAndRelation(
 			checksCtx,
-			req.Subject.Object.ObjectType,
-			normalizeSubjectRelation(req.Subject),
-			true,
+			req.ResourceObjectType,
+			req.Permission,
+			false,
 			ds,
 		)
 	})
 	errG.Go(func() error {
 		return namespace.CheckNamespaceAndRelation(
 			ctx,
-			req.ResourceObjectType,
-			req.Permission,
-			false,
+			req.Subject.Object.ObjectType,
+			stringz.DefaultEmpty(req.Subject.OptionalRelation, tuple.Ellipsis),
+			true,
 			ds,
 		)
 	})


opened #1333

internal/dispatch/keys/hasher_common.go

vroldanbet · 2023-05-19T16:11:19Z

internal/graph/lookupresources.go

@@ -36,6 +33,10 @@ type ValidatedLookupResourcesRequest struct {
 	Revision datastore.Revision
 }

+// reachableResourcesLimit is a limit set on the reachable resources calls to ensure caching
+// stores smaller chunks.
+const reachableResourcesLimit = 1000


are you implying by omission that caching is not a problem and that different RR response sizes are reusable regardless?

Actually, no: we're not forwarding the limit because of the filtering aspect: a request for 100 resources from LR might need 1000 from reachable resources, because 900 are checked and are not returned. This is, to be fair, a corner case and in most cases the limits should be fairly evenly matched, but it is possible, so we run a (small) risk of having to make extra dispatch requests to reachable resources in that scenario. Thoughts?

I think this once again one of those situations where the system should adapt dynamically keeping stats around, and issue the limit that yields less amount of dispatch requests. For the time being this is fine, but let's open follow up issues because it will be forgotten in the ReachableResources ocean 🙏🏻

vroldanbet · 2023-05-19T16:12:16Z

internal/graph/lookupresources.go

-
-				default:
-					return spiceerrors.MustBugf("unknown check result status for reachable resources")
+					[]string{reachableResource.ResourceId},


Let's open a follow up issue please 🙏🏻

vroldanbet · 2023-05-19T16:13:19Z

internal/graph/lookupresources.go

+			if errors.Is(err, context.Canceled) {
+				return nil
+			}


ah, got it 👍🏻

internal/graph/reachableresources.go

josephschorr · 2023-05-19T20:06:59Z

so you say you can't model it as something like this?

type section struct {
  name string
  values []string
  children []section
}

Not all sections have values. We could define a Section sub-type message in the protocol buffer, and then have it be:

message CursorSection {
  string name = 1;
  value_oneof {
     string str_value = 2;
     uint64 numeric_value = 3;
  }
}

That would work, but it wouldn't add a great deal to the type safety.

@vroldanbet Let me know if you'd like me to do ^

vroldanbet · 2023-05-22T12:16:40Z

That would work, but it wouldn't add a great deal to the type safety.
@vroldanbet Let me know if you'd like me to do ^

Let's leave as it is 👍🏻

…r support

…leted

Also adds a test to ensure cursor hashes are stable

josephschorr requested a review from a team May 3, 2023 20:19

github-actions bot added area/api v1 Affects the v1 API area/dependencies Affects dependencies area/dispatch Affects dispatching of requests area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels May 3, 2023

josephschorr marked this pull request as draft May 3, 2023 20:22

josephschorr force-pushed the cursored-lookup-resources branch from a9f14f3 to 37a17dd Compare May 3, 2023 20:26

josephschorr marked this pull request as ready for review May 3, 2023 20:51

vroldanbet reviewed May 10, 2023

View reviewed changes

internal/graph/reachableresources.go Show resolved Hide resolved

vroldanbet reviewed May 10, 2023

View reviewed changes

vroldanbet reviewed May 11, 2023

View reviewed changes

internal/graph/cursors.go Outdated Show resolved Hide resolved

vroldanbet reviewed May 11, 2023

View reviewed changes

internal/graph/reachableresources.go Outdated Show resolved Hide resolved

vroldanbet reviewed May 11, 2023

View reviewed changes

internal/dispatch/graph/reachableresources_test.go Outdated Show resolved Hide resolved

internal/graph/reachableresources.go Outdated Show resolved Hide resolved

internal/dispatch/graph/reachableresources_test.go Outdated Show resolved Hide resolved

vroldanbet reviewed May 11, 2023

View reviewed changes

josephschorr force-pushed the cursored-lookup-resources branch from db9bd85 to a563da0 Compare May 11, 2023 19:57

vroldanbet reviewed May 12, 2023

View reviewed changes

josephschorr force-pushed the cursored-lookup-resources branch from a563da0 to 246ca8b Compare May 17, 2023 18:17

github-actions bot removed the area/dependencies Affects dependencies label May 17, 2023

jakedt mentioned this pull request May 17, 2023

internal/services/v1: add bulk export to the experimental service #1326

Merged

vroldanbet mentioned this pull request May 19, 2023

refactor CheckNamespaceAndRelation to support multiple values #1333

Closed

vroldanbet previously approved these changes May 19, 2023

View reviewed changes

josephschorr dismissed vroldanbet’s stale review via 0509cda May 22, 2023 15:38

josephschorr force-pushed the cursored-lookup-resources branch from 246ca8b to 0509cda Compare May 22, 2023 15:38

vroldanbet previously approved these changes May 22, 2023

View reviewed changes

josephschorr dismissed vroldanbet’s stale review via e7aff85 May 22, 2023 15:44

josephschorr added 21 commits May 22, 2023 11:45

Initial cursor changes for supporting cursors in ReachableResources

3d13d24

Make reachability graph have a stable ordering

fcad523

Add support for a limit on ReachableResources

543ef91

Rename Lookup -> LookupResources and make streaming in prep for curso…

f59ec4f

…r support

Reimplement LR in fully streaming mode

ac36919

Add cursor support for LookupResources

f807d15

Add the revision into the dispatch cursor to ensure it doesn't change

01627a5

Have consistency tests issue cursored LR calls

42e4fe8

Have consistency use the revision found within a cursor, if specified

b5a2396

Lint fixes

dd1138b

WASM hasher fixes

93ef1cf

Fix benchmark tests for changes to test runner

cd03d19

Make sure to cancel all reachable resources contexts when LR has comp…

5482679

…leted

Requested changes

907142b

Ensure resourcesubjectsmap produces a stable, sorted order

05329c2

Address review feedback

8c49268

Use shared consistent hashing for caveat contexts

539ddaa

Also adds a test to ensure cursor hashes are stable

Address review feedback

9be33af

Fix lint warning

12bcd61

Rebase fixes

b915fcc

Add missing relationship after rebase to deletion test

776745b

josephschorr force-pushed the cursored-lookup-resources branch from e7aff85 to 776745b Compare May 22, 2023 15:45

vroldanbet approved these changes May 22, 2023

View reviewed changes

josephschorr merged commit a81b79c into authzed:main May 22, 2023

josephschorr deleted the cursored-lookup-resources branch May 22, 2023 16:02

github-actions bot locked and limited conversation to collaborators May 22, 2023

Add support for cursors and limits on LookupResources API calls #1296

Add support for cursors and limits on LookupResources API calls #1296

Conversation

josephschorr commented May 3, 2023 • edited Loading

vroldanbet left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephschorr commented May 17, 2023

vroldanbet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephschorr commented May 19, 2023

vroldanbet commented May 22, 2023

josephschorr commented May 3, 2023 •

edited

Loading

vroldanbet left a comment •

edited

Loading