Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ListEntitiesAsync to only return entities with non-null state #1421

Merged
merged 12 commits into from
Jan 15, 2022

Conversation

bachuv
Copy link
Collaborator

@bachuv bachuv commented Aug 1, 2020

Resolves #1364.

These changes add an IncludeDeleted flag to EntityQuery that allows customers to choose whether the ListEntitiesAsync API should returned deleted (i.e. state is null) entities or not. If the IncludeDeleted flag is false, then only entities with a non-null state are returned.

@bachuv bachuv requested review from cgillum and ConnorMcMahon August 1, 2020 00:09
Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the feedback below, let's add some test cases.

var statefulEntities = entityResult.Entities.Where(e => e.State != null);
var statefulEntityResult = ConvertToEntityQueryResult(statefulEntities, entityResult.ContinuationToken);

while (statefulEntities.ToList().Count < query.PageSize && !string.IsNullOrEmpty(entityResult.ContinuationToken))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no benefit to doing a .ToList() here since you're not actually doing anything with the list you're creating. It would have been better to just say statefulEntities.Count().

condition = new OrchestrationStatusQueryCondition(query);
result = await ((IDurableClient)this).ListInstancesAsync(condition, cancellationToken);
entityResult = new EntityQueryResult(result);
statefulEntities = entityResult.Entities.Where(e => e.State != null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're overwriting statefulEntities here, aren't you also overwriting any of the previously fetched results? For implementing this task, I would have expected you to have a List<DurableEntityStatus> statefulEntities outside the loop and then call statefulEntities.AddRange(...) inside of every loop iteration until the size of the list reaches the query.PageSize.


statefulEntityResult = ConvertToEntityQueryResult(statefulEntities, entityResult.ContinuationToken);
}
return statefulEntityResult;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the final count might actually be larger than query.PageSize. For example, if I query for 10 entities, get back a list of 10 with one of them deleted, I'll query for another 10 and potentially return 19 entities. I think this behavior is probably fine, but should update the XML documentation for the PageSize to indicate that we may actually return more than the requested number of entities under certain conditions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do want page size to be exact, we could modify the query condition between each iteration of the for-loop to be equal to the difference between our current number of entities and our page size. This way we would ensure that we never go above page-size in non-null entities.

This comes with it's own set of downsides, mainly in the potentially very large number of network calls it would take if you are at 99 of 100 non-null entities and then you encounter a batch of many null entities in a row.

Copy link
Contributor

@ConnorMcMahon ConnorMcMahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Chris's feedback.

To speak to this approach in general, it seems we have 4 options:

  1. Overshoot the page count by up to PageCount-1.
  2. Query for the page count and filter out all results, treating page count as a maximum value.
  3. Keep shrinking your pagecount as we get more valid results, until we fill up the page count.
  4. Wire up something into DurableTask.AzureStorage that allows us to filter queries based on state being non-null.

Of these options, number 4 seems the most correct, but it is also the most work by a fairly substantial margin. If we don't want to do that work, then options 1 or 2 seem to be the easiest. 1 is a bit harder to document, and 2 provides less value, as it is essentially something the customer could do with a one line LINQ query at that point.


statefulEntityResult = ConvertToEntityQueryResult(statefulEntities, entityResult.ContinuationToken);
}
return statefulEntityResult;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do want page size to be exact, we could modify the query condition between each iteration of the for-loop to be equal to the difference between our current number of entities and our page size. This way we would ensure that we never go above page-size in non-null entities.

This comes with it's own set of downsides, mainly in the potentially very large number of network calls it would take if you are at 99 of 100 non-null entities and then you encounter a batch of many null entities in a row.

@cgillum
Copy link
Member

cgillum commented Aug 3, 2020

Just to comment on @ConnorMcMahon comments:

  • The problem with option (4) is that I don't think you can write a single table storage query that will give you only non-deleted entities in a single go. You therefore have to loop no matter where you apply the change unless we add additional columns that specifically track deleted status. I don't know if the ListEntitiesAsync API is important enough to warrant that much work, vs. doing something cheap for now and revamping how we do entity tracking later as part of a larger DTFx + entities integration project (which I think will be inevitable at some point).

  • Regarding page count, I specifically didn't recommend shrinking it because that could put you into a degenerate situation where you shrink page count down to 1 and then end up doing tons of subsequent queries because you just happened to run into a batch of deleted entities. Keeping page count unchanged seemed like a reasonable way to avoid this problem, IMO. Of course, it has the downside of returning lots of extra results. Whether this is problematic depends on how customers are using the results of this query.

  • Filtering out results (if I understand correctly) is dangerous because of how the results are tied to the continuation token. For example, if the user asks for 100 entities and we end up fetching a second batch of 100 because 5 of the first 100 were deleted, you would filter-out/discard as many as 95 entities from the second batch. Subsequent queries with the returned continuation token would then grab entities from 300+ and the user would never see those 95 intermediate entities that were filtered out in the previous call.

Given the imperfections of all our options, do you think the compromise I'm proposing is acceptable or do you think we should just go all the way and make DTFx more specifically aware of entities so that we don't have to do any weird hacks at any level of the stack?

@ConnorMcMahon
Copy link
Contributor

@cgillum

I was rather unclear with my explanation of approach 2.

What I meant to convey was along the lines of if a request has a page size of 100, but if 30 of them are null, having the "page" we actually return just contain the 70 non-null values, leaving the continuation token as is so the customer could resume where our page left off. This has the advantage of being easily documented, as page size becomes an upper limit for the size of the page, rather than the actual count. However, I find this approach has limited value, as the customer can do this in post-processing with a simple .Where(entity => entity.State !=null) check themselves, though it would potentially save on some network bandwidth depending on how the customer is using ListEntities.

In general, I agree that approach 3's degenerate case rules it out, and approach 4 would be better to tackle down the line. I am fine with either the current algorithm or approach 2 like I described above.

@bachuv
Copy link
Collaborator Author

bachuv commented Aug 4, 2020

I agree with Connor's approach 2 to use the page size as a maximum value because of the simplicity to document and for customers to follow. Using this approach would also ensure that the final entity count wouldn't exceed the page size (exceeding the page size could confuse customers).

I can go ahead and make changes to incorporate approach 2 if there are no other worries. @cgillum @ConnorMcMahon let me know if there is anything else I should consider.

@cgillum
Copy link
Member

cgillum commented Aug 5, 2020

I think I understand what approach (2) is doing now. My biggest concern is that customers can get easily confused by this behavior. Consider other paging APIs that exist today, like Azure Storage APIs. Normally if you query for 100 items and only get back 70, you implicitly assume there are no more than 70 items to be found. That assumption would be wrong in our case. Even more confusing is if you query for 100 items and get back zero. The fact that you may need to manually query several times before you see the first entity could be really unexpected. We can document this behavior, but I still worry that customers may not understand or accept it. I actually worry this might create a worse experience than what we have today.

Maybe there's a simpler solution we should be considering. One thing that seems odd to me is that a "deleted" entity remains in a Running state. If we changed it to a non-running state (for example, Completed) then we could easily and accurately filter out deleted entities by internally querying for entity orchestrations that are in the "Running" state. This is a bigger change, but it might not actually require any changes to the DTFx layer.

Thoughts? Adding @sebastianburckhardt

@withinboredom
Copy link
Contributor

I wonder if we could use a combination of < & > and {"exists":true to locate entities that exist? It seems pretty brittle though.

@sebastianburckhardt
Copy link
Collaborator

sebastianburckhardt commented Aug 17, 2020

One relatively simple thing that could perhaps solve this is to modify the instance query so it checks the CustomStatus.

From what I understand, the condition

instanceState.CustomState.StartsWith("{\"entityExists\":true")

would correctly filter the non-deleted entities, and we can encode this as an Azure Table query.

(This is similar to what @withinboredom suggested but CustomStatus is part of the instances table, so it does not need to query the history table).

@ConnorMcMahon
Copy link
Contributor

We should revisit this PR with Sebastian's suggestion.

@sebastianburckhardt
Copy link
Collaborator

sebastianburckhardt commented Nov 4, 2021

I think I have a relatively simple solution. Currently testing and I will push a commit once I am satisfied with it. Some thoughts:

entityResult = new EntityQueryResult(result, query.IncludeDeleted);
condition.ContinuationToken = entityResult.ContinuationToken;
}
while (entityResult.ContinuationToken != null && !entityResult.Entities.Any());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think about the case when entityResult.ContinuationToken != null is true and !entityResult.Entities.Any() is false. I think that means there are more records to read but we're going to break early anyways because we found at least one entity. Is that the intended design? If so, why wouldn't we keep reading the remaining records to see if we can find more entities? It seems to me that users will never get more than one page of results back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random 2¢. I think as an application developer, I wouldn't want it churning through pages of results to return me a single full page of results. That could really hurt performance outside of my control. If it only does one page (even if it's only one result), then I can timebox it (return as many results as possible in 100ms) or whatever to return results applicable to my application.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for chiming in, @withinboredom. Now that you mention this, I realize that I misunderstood what's happening here. I forgot that both EntityQueryResult and EntityQuery have continuation token fields, meaning that it's perfectly fine to return a single page of results because the caller can continue to ask for more pages if they want. This should have been obvious by looking more closely at the previous implementation, but with this in mind, I think this solution is perfectly reasonable (pages may contain just one entity in the worst case, but you can get more).

Timeboxing - i.e. limiting the amount of time we spend churning through pages for the first entity, is another interesting feature, but I think that could be done as a separate improvement in a new PR (it might require interface changes, though).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are good points.

As written right now, the code could take arbitrary time to return the first result (for example, if lots of entities were deleted and AzureStorage keeps returning those first). In such a situation I should return an empty result and a continuation token to keep the client request from timing out.

if (includeDeleted)
{
this.Entities = orchestrationResult.DurableOrchestrationState
.Select(status => new DurableEntityStatus(status));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add a .ToList() filter on the result here, like you do for the else case below?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I should add that.

@sebastianburckhardt
Copy link
Collaborator

I suppose the final question to address here is whether it is o.k. to change the default for "IncludeDeletedEntities".

Right now it is true, which avoids a breaking change, but confuses users. I suppose I would prefer to set it to false. @cgillum, what do you think?

@cgillum
Copy link
Member

cgillum commented Nov 5, 2021

@sebastianburckhardt I'm okay with making a breaking behavior change here by changing the default. The current behavior feels like a bug. We should make a point to call this out in our release notes, though.

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time limit part is an interesting add, but I can't think of any practical problems with this approach. I guess this gives us the ability to offer some form of a reasonable SLA for this method. :)

LGTM!

@sebastianburckhardt
Copy link
Collaborator

I will merge with latest dev and update the release notes after #2002 is merged.

@sebastianburckhardt
Copy link
Collaborator

I think we meant to merge this long ago.

Looks like we still need to call this out in release notes.

  • Add to release notes

@sebastianburckhardt sebastianburckhardt added this to the v2.6.1 milestone Jan 13, 2022
@sebastianburckhardt sebastianburckhardt merged commit 4c754ed into dev Jan 15, 2022
@cgillum cgillum deleted the vabachu/listentities branch January 15, 2022 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ListEntitiesAsync should not return deleted entities
5 participants