Allow ListEntitiesAsync to only return entities with non-null state #1421

bachuv · 2020-08-01T00:08:34Z

Resolves #1364.

These changes add an IncludeDeleted flag to EntityQuery that allows customers to choose whether the ListEntitiesAsync API should returned deleted (i.e. state is null) entities or not. If the IncludeDeleted flag is false, then only entities with a non-null state are returned.

cgillum

In addition to the feedback below, let's add some test cases.

cgillum · 2020-08-03T16:21:49Z

src/WebJobs.Extensions.DurableTask/ContextImplementations/DurableClient.cs

+                var statefulEntities = entityResult.Entities.Where(e => e.State != null);
+                var statefulEntityResult = ConvertToEntityQueryResult(statefulEntities, entityResult.ContinuationToken);
+
+                while (statefulEntities.ToList().Count < query.PageSize && !string.IsNullOrEmpty(entityResult.ContinuationToken))


There's no benefit to doing a .ToList() here since you're not actually doing anything with the list you're creating. It would have been better to just say statefulEntities.Count().

cgillum · 2020-08-03T16:24:59Z

src/WebJobs.Extensions.DurableTask/ContextImplementations/DurableClient.cs

+                    condition = new OrchestrationStatusQueryCondition(query);
+                    result = await ((IDurableClient)this).ListInstancesAsync(condition, cancellationToken);
+                    entityResult = new EntityQueryResult(result);
+                    statefulEntities = entityResult.Entities.Where(e => e.State != null);


Since you're overwriting statefulEntities here, aren't you also overwriting any of the previously fetched results? For implementing this task, I would have expected you to have a List<DurableEntityStatus> statefulEntities outside the loop and then call statefulEntities.AddRange(...) inside of every loop iteration until the size of the list reaches the query.PageSize.

cgillum · 2020-08-03T16:28:55Z

src/WebJobs.Extensions.DurableTask/ContextImplementations/DurableClient.cs

+
+                    statefulEntityResult = ConvertToEntityQueryResult(statefulEntities, entityResult.ContinuationToken);
+                }
+                return statefulEntityResult;


It looks like the final count might actually be larger than query.PageSize. For example, if I query for 10 entities, get back a list of 10 with one of them deleted, I'll query for another 10 and potentially return 19 entities. I think this behavior is probably fine, but should update the XML documentation for the PageSize to indicate that we may actually return more than the requested number of entities under certain conditions.

If we do want page size to be exact, we could modify the query condition between each iteration of the for-loop to be equal to the difference between our current number of entities and our page size. This way we would ensure that we never go above page-size in non-null entities.

This comes with it's own set of downsides, mainly in the potentially very large number of network calls it would take if you are at 99 of 100 non-null entities and then you encounter a batch of many null entities in a row.

ConnorMcMahon

I agree with Chris's feedback.

To speak to this approach in general, it seems we have 4 options:

Overshoot the page count by up to PageCount-1.
Query for the page count and filter out all results, treating page count as a maximum value.
Keep shrinking your pagecount as we get more valid results, until we fill up the page count.
Wire up something into DurableTask.AzureStorage that allows us to filter queries based on state being non-null.

Of these options, number 4 seems the most correct, but it is also the most work by a fairly substantial margin. If we don't want to do that work, then options 1 or 2 seem to be the easiest. 1 is a bit harder to document, and 2 provides less value, as it is essentially something the customer could do with a one line LINQ query at that point.

ConnorMcMahon · 2020-08-03T16:48:35Z

src/WebJobs.Extensions.DurableTask/ContextImplementations/DurableClient.cs

+
+                    statefulEntityResult = ConvertToEntityQueryResult(statefulEntities, entityResult.ContinuationToken);
+                }
+                return statefulEntityResult;


If we do want page size to be exact, we could modify the query condition between each iteration of the for-loop to be equal to the difference between our current number of entities and our page size. This way we would ensure that we never go above page-size in non-null entities.

This comes with it's own set of downsides, mainly in the potentially very large number of network calls it would take if you are at 99 of 100 non-null entities and then you encounter a batch of many null entities in a row.

cgillum · 2020-08-03T18:33:31Z

Just to comment on @ConnorMcMahon comments:

The problem with option (4) is that I don't think you can write a single table storage query that will give you only non-deleted entities in a single go. You therefore have to loop no matter where you apply the change unless we add additional columns that specifically track deleted status. I don't know if the ListEntitiesAsync API is important enough to warrant that much work, vs. doing something cheap for now and revamping how we do entity tracking later as part of a larger DTFx + entities integration project (which I think will be inevitable at some point).
Regarding page count, I specifically didn't recommend shrinking it because that could put you into a degenerate situation where you shrink page count down to 1 and then end up doing tons of subsequent queries because you just happened to run into a batch of deleted entities. Keeping page count unchanged seemed like a reasonable way to avoid this problem, IMO. Of course, it has the downside of returning lots of extra results. Whether this is problematic depends on how customers are using the results of this query.
Filtering out results (if I understand correctly) is dangerous because of how the results are tied to the continuation token. For example, if the user asks for 100 entities and we end up fetching a second batch of 100 because 5 of the first 100 were deleted, you would filter-out/discard as many as 95 entities from the second batch. Subsequent queries with the returned continuation token would then grab entities from 300+ and the user would never see those 95 intermediate entities that were filtered out in the previous call.

Given the imperfections of all our options, do you think the compromise I'm proposing is acceptable or do you think we should just go all the way and make DTFx more specifically aware of entities so that we don't have to do any weird hacks at any level of the stack?

ConnorMcMahon · 2020-08-04T16:07:57Z

@cgillum

I was rather unclear with my explanation of approach 2.

What I meant to convey was along the lines of if a request has a page size of 100, but if 30 of them are null, having the "page" we actually return just contain the 70 non-null values, leaving the continuation token as is so the customer could resume where our page left off. This has the advantage of being easily documented, as page size becomes an upper limit for the size of the page, rather than the actual count. However, I find this approach has limited value, as the customer can do this in post-processing with a simple .Where(entity => entity.State !=null) check themselves, though it would potentially save on some network bandwidth depending on how the customer is using ListEntities.

In general, I agree that approach 3's degenerate case rules it out, and approach 4 would be better to tackle down the line. I am fine with either the current algorithm or approach 2 like I described above.

bachuv · 2020-08-04T23:09:48Z

I agree with Connor's approach 2 to use the page size as a maximum value because of the simplicity to document and for customers to follow. Using this approach would also ensure that the final entity count wouldn't exceed the page size (exceeding the page size could confuse customers).

I can go ahead and make changes to incorporate approach 2 if there are no other worries. @cgillum @ConnorMcMahon let me know if there is anything else I should consider.

cgillum · 2020-08-05T00:32:56Z

I think I understand what approach (2) is doing now. My biggest concern is that customers can get easily confused by this behavior. Consider other paging APIs that exist today, like Azure Storage APIs. Normally if you query for 100 items and only get back 70, you implicitly assume there are no more than 70 items to be found. That assumption would be wrong in our case. Even more confusing is if you query for 100 items and get back zero. The fact that you may need to manually query several times before you see the first entity could be really unexpected. We can document this behavior, but I still worry that customers may not understand or accept it. I actually worry this might create a worse experience than what we have today.

Maybe there's a simpler solution we should be considering. One thing that seems odd to me is that a "deleted" entity remains in a Running state. If we changed it to a non-running state (for example, Completed) then we could easily and accurately filter out deleted entities by internally querying for entity orchestrations that are in the "Running" state. This is a bigger change, but it might not actually require any changes to the DTFx layer.

Thoughts? Adding @sebastianburckhardt

withinboredom · 2020-08-15T13:51:22Z

I wonder if we could use a combination of < & > and {"exists":true to locate entities that exist? It seems pretty brittle though.

sebastianburckhardt · 2020-08-17T17:38:19Z

One relatively simple thing that could perhaps solve this is to modify the instance query so it checks the CustomStatus.

From what I understand, the condition

instanceState.CustomState.StartsWith("{\"entityExists\":true")

would correctly filter the non-deleted entities, and we can encode this as an Azure Table query.

(This is similar to what @withinboredom suggested but CustomStatus is part of the instances table, so it does not need to query the history table).

ConnorMcMahon · 2021-07-06T23:13:40Z

We should revisit this PR with Sebastian's suggestion.

sebastianburckhardt · 2021-11-04T14:10:44Z

I think I have a relatively simple solution. Currently testing and I will push a commit once I am satisfied with it. Some thoughts:

There is no need to always fill pages to the maximum. This is not even true on Azure Storage, which may return less than the maximum page size even if there are more records to return (https://docs.microsoft.com/en-us/rest/api/storageservices/query-timeout-and-pagination). So we do not need to worry about returning portions that are smaller than the max page size.
We can use the CustomStatus to determine if an entity exists, without having to fetch the (possibly voluminous) input.
I would suggest we change the default for "IncludeDeletedEntities" to false. This is a breaking change but the current default of returning deleted entities is a recurring source of confusion (e.g. IDurableEntityClient.CleanEntityStorageAsync() doesn't delete the entity from storage #1990).

…atus to avoid always fetching input

cgillum · 2021-11-04T22:31:09Z

src/WebJobs.Extensions.DurableTask/ContextImplementations/DurableClient.cs

+                entityResult = new EntityQueryResult(result, query.IncludeDeleted);
+                condition.ContinuationToken = entityResult.ContinuationToken;
+            }
+            while (entityResult.ContinuationToken != null && !entityResult.Entities.Any());


I'm trying to think about the case when entityResult.ContinuationToken != null is true and !entityResult.Entities.Any() is false. I think that means there are more records to read but we're going to break early anyways because we found at least one entity. Is that the intended design? If so, why wouldn't we keep reading the remaining records to see if we can find more entities? It seems to me that users will never get more than one page of results back.

Random 2¢. I think as an application developer, I wouldn't want it churning through pages of results to return me a single full page of results. That could really hurt performance outside of my control. If it only does one page (even if it's only one result), then I can timebox it (return as many results as possible in 100ms) or whatever to return results applicable to my application.

Thanks for chiming in, @withinboredom. Now that you mention this, I realize that I misunderstood what's happening here. I forgot that both EntityQueryResult and EntityQuery have continuation token fields, meaning that it's perfectly fine to return a single page of results because the caller can continue to ask for more pages if they want. This should have been obvious by looking more closely at the previous implementation, but with this in mind, I think this solution is perfectly reasonable (pages may contain just one entity in the worst case, but you can get more).

Timeboxing - i.e. limiting the amount of time we spend churning through pages for the first entity, is another interesting feature, but I think that could be done as a separate improvement in a new PR (it might require interface changes, though).

Those are good points.

As written right now, the code could take arbitrary time to return the first result (for example, if lots of entities were deleted and AzureStorage keeps returning those first). In such a situation I should return an empty result and a continuation token to keep the client request from timing out.

cgillum · 2021-11-04T22:32:14Z

src/WebJobs.Extensions.DurableTask/EntityQueryResult.cs

+            if (includeDeleted)
+            {
+                this.Entities = orchestrationResult.DurableOrchestrationState
+                    .Select(status => new DurableEntityStatus(status));


Do you need to add a .ToList() filter on the result here, like you do for the else case below?

Yes, I think I should add that.

sebastianburckhardt · 2021-11-05T14:13:35Z

I suppose the final question to address here is whether it is o.k. to change the default for "IncludeDeletedEntities".

Right now it is true, which avoids a breaking change, but confuses users. I suppose I would prefer to set it to false. @cgillum, what do you think?

cgillum · 2021-11-05T17:04:54Z

@sebastianburckhardt I'm okay with making a breaking behavior change here by changing the default. The current behavior feels like a bug. We should make a point to call this out in our release notes, though.

cgillum

The time limit part is an interesting add, but I can't think of any practical problems with this approach. I guess this gives us the ability to offer some form of a reasonable SLA for this method. :)

LGTM!

sebastianburckhardt · 2021-11-09T00:21:49Z

I will merge with latest dev and update the release notes after #2002 is merged.

# Conflicts: # src/WebJobs.Extensions.DurableTask/ContextImplementations/DurableClient.cs

sebastianburckhardt · 2022-01-13T00:43:27Z

I think we meant to merge this long ago.

Looks like we still need to call this out in release notes.

Add to release notes

…okens

bachuv added 2 commits July 30, 2020 15:36

initial commit

f8d3e21

added support for edge cases

60654f1

bachuv requested review from cgillum and ConnorMcMahon August 1, 2020 00:09

cgillum requested changes Aug 3, 2020

View reviewed changes

ConnorMcMahon reviewed Aug 3, 2020

View reviewed changes

cgillum added the blocked label Aug 17, 2020

sebastianburckhardt mentioned this pull request Aug 18, 2020

Durable entity with null state persistence #1418

Open

sebastianburckhardt removed the blocked label Nov 4, 2021

revised version that simplifies paging requirements and uses CustomSt…

0f80070

…atus to avoid always fetching input

sebastianburckhardt requested a review from cgillum November 4, 2021 22:12

cgillum reviewed Nov 4, 2021

View reviewed changes

address PR feedback.

33d5e38

cgillum approved these changes Nov 5, 2021

View reviewed changes

sebastianburckhardt added 3 commits November 8, 2021 16:24

Merge branch 'dev' into vabachu/listentities

75cfd59

# Conflicts: # src/WebJobs.Extensions.DurableTask/ContextImplementations/DurableClient.cs

Merge branch 'dev' into vabachu/listentities

f09d2fa

updated release_notes.md

9ec5863

sebastianburckhardt added this to the v2.6.1 milestone Jan 13, 2022

sebastianburckhardt added 5 commits January 14, 2022 11:06

Merge branch 'dev' into vabachu/listentities

c159e96

fix tests

5fb044d

fix tests again, hopefully correctly this time

ab60a05

fix entity queries in tests so they make proper use of continuation t…

c2251c4

…okens

fix another test that broke due to last commit

590052b

sebastianburckhardt merged commit 4c754ed into dev Jan 15, 2022

cgillum deleted the vabachu/listentities branch January 15, 2022 01:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow ListEntitiesAsync to only return entities with non-null state #1421

Allow ListEntitiesAsync to only return entities with non-null state #1421

bachuv commented Aug 1, 2020

cgillum left a comment

cgillum Aug 3, 2020

cgillum Aug 3, 2020

cgillum Aug 3, 2020

ConnorMcMahon Aug 3, 2020

ConnorMcMahon left a comment

ConnorMcMahon Aug 3, 2020

cgillum commented Aug 3, 2020

ConnorMcMahon commented Aug 4, 2020

bachuv commented Aug 4, 2020

cgillum commented Aug 5, 2020

withinboredom commented Aug 15, 2020

sebastianburckhardt commented Aug 17, 2020 •

edited

Loading

ConnorMcMahon commented Jul 6, 2021

sebastianburckhardt commented Nov 4, 2021 •

edited

Loading

cgillum Nov 4, 2021

withinboredom Nov 5, 2021

cgillum Nov 5, 2021

sebastianburckhardt Nov 5, 2021

cgillum Nov 4, 2021

sebastianburckhardt Nov 5, 2021

sebastianburckhardt commented Nov 5, 2021

cgillum commented Nov 5, 2021

cgillum left a comment

sebastianburckhardt commented Nov 9, 2021

sebastianburckhardt commented Jan 13, 2022

Allow ListEntitiesAsync to only return entities with non-null state #1421

Allow ListEntitiesAsync to only return entities with non-null state #1421

Conversation

bachuv commented Aug 1, 2020

cgillum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ConnorMcMahon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgillum commented Aug 3, 2020

ConnorMcMahon commented Aug 4, 2020

bachuv commented Aug 4, 2020

cgillum commented Aug 5, 2020

withinboredom commented Aug 15, 2020

sebastianburckhardt commented Aug 17, 2020 • edited Loading

ConnorMcMahon commented Jul 6, 2021

sebastianburckhardt commented Nov 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebastianburckhardt commented Nov 5, 2021

cgillum commented Nov 5, 2021

cgillum left a comment

Choose a reason for hiding this comment

sebastianburckhardt commented Nov 9, 2021

sebastianburckhardt commented Jan 13, 2022

sebastianburckhardt commented Aug 17, 2020 •

edited

Loading

sebastianburckhardt commented Nov 4, 2021 •

edited

Loading