-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow ListEntitiesAsync to only return entities with non-null state #1421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the feedback below, let's add some test cases.
var statefulEntities = entityResult.Entities.Where(e => e.State != null); | ||
var statefulEntityResult = ConvertToEntityQueryResult(statefulEntities, entityResult.ContinuationToken); | ||
|
||
while (statefulEntities.ToList().Count < query.PageSize && !string.IsNullOrEmpty(entityResult.ContinuationToken)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no benefit to doing a .ToList()
here since you're not actually doing anything with the list you're creating. It would have been better to just say statefulEntities.Count()
.
condition = new OrchestrationStatusQueryCondition(query); | ||
result = await ((IDurableClient)this).ListInstancesAsync(condition, cancellationToken); | ||
entityResult = new EntityQueryResult(result); | ||
statefulEntities = entityResult.Entities.Where(e => e.State != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're overwriting statefulEntities
here, aren't you also overwriting any of the previously fetched results? For implementing this task, I would have expected you to have a List<DurableEntityStatus> statefulEntities
outside the loop and then call statefulEntities.AddRange(...)
inside of every loop iteration until the size of the list reaches the query.PageSize
.
|
||
statefulEntityResult = ConvertToEntityQueryResult(statefulEntities, entityResult.ContinuationToken); | ||
} | ||
return statefulEntityResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the final count might actually be larger than query.PageSize
. For example, if I query for 10 entities, get back a list of 10 with one of them deleted, I'll query for another 10 and potentially return 19 entities. I think this behavior is probably fine, but should update the XML documentation for the PageSize
to indicate that we may actually return more than the requested number of entities under certain conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do want page size to be exact, we could modify the query condition between each iteration of the for-loop to be equal to the difference between our current number of entities and our page size. This way we would ensure that we never go above page-size in non-null entities.
This comes with it's own set of downsides, mainly in the potentially very large number of network calls it would take if you are at 99 of 100 non-null entities and then you encounter a batch of many null entities in a row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Chris's feedback.
To speak to this approach in general, it seems we have 4 options:
- Overshoot the page count by up to PageCount-1.
- Query for the page count and filter out all results, treating page count as a maximum value.
- Keep shrinking your pagecount as we get more valid results, until we fill up the page count.
- Wire up something into DurableTask.AzureStorage that allows us to filter queries based on state being non-null.
Of these options, number 4 seems the most correct, but it is also the most work by a fairly substantial margin. If we don't want to do that work, then options 1 or 2 seem to be the easiest. 1 is a bit harder to document, and 2 provides less value, as it is essentially something the customer could do with a one line LINQ query at that point.
|
||
statefulEntityResult = ConvertToEntityQueryResult(statefulEntities, entityResult.ContinuationToken); | ||
} | ||
return statefulEntityResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do want page size to be exact, we could modify the query condition between each iteration of the for-loop to be equal to the difference between our current number of entities and our page size. This way we would ensure that we never go above page-size in non-null entities.
This comes with it's own set of downsides, mainly in the potentially very large number of network calls it would take if you are at 99 of 100 non-null entities and then you encounter a batch of many null entities in a row.
Just to comment on @ConnorMcMahon comments:
Given the imperfections of all our options, do you think the compromise I'm proposing is acceptable or do you think we should just go all the way and make DTFx more specifically aware of entities so that we don't have to do any weird hacks at any level of the stack? |
I was rather unclear with my explanation of approach 2. What I meant to convey was along the lines of if a request has a page size of 100, but if 30 of them are null, having the "page" we actually return just contain the 70 non-null values, leaving the continuation token as is so the customer could resume where our page left off. This has the advantage of being easily documented, as page size becomes an upper limit for the size of the page, rather than the actual count. However, I find this approach has limited value, as the customer can do this in post-processing with a simple In general, I agree that approach 3's degenerate case rules it out, and approach 4 would be better to tackle down the line. I am fine with either the current algorithm or approach 2 like I described above. |
I agree with Connor's approach 2 to use the page size as a maximum value because of the simplicity to document and for customers to follow. Using this approach would also ensure that the final entity count wouldn't exceed the page size (exceeding the page size could confuse customers). I can go ahead and make changes to incorporate approach 2 if there are no other worries. @cgillum @ConnorMcMahon let me know if there is anything else I should consider. |
I think I understand what approach (2) is doing now. My biggest concern is that customers can get easily confused by this behavior. Consider other paging APIs that exist today, like Azure Storage APIs. Normally if you query for 100 items and only get back 70, you implicitly assume there are no more than 70 items to be found. That assumption would be wrong in our case. Even more confusing is if you query for 100 items and get back zero. The fact that you may need to manually query several times before you see the first entity could be really unexpected. We can document this behavior, but I still worry that customers may not understand or accept it. I actually worry this might create a worse experience than what we have today. Maybe there's a simpler solution we should be considering. One thing that seems odd to me is that a "deleted" entity remains in a Running state. If we changed it to a non-running state (for example, Completed) then we could easily and accurately filter out deleted entities by internally querying for entity orchestrations that are in the "Running" state. This is a bigger change, but it might not actually require any changes to the DTFx layer. Thoughts? Adding @sebastianburckhardt |
I wonder if we could use a combination of |
One relatively simple thing that could perhaps solve this is to modify the instance query so it checks the CustomStatus. From what I understand, the condition instanceState.CustomState.StartsWith("{\"entityExists\":true") would correctly filter the non-deleted entities, and we can encode this as an Azure Table query. (This is similar to what @withinboredom suggested but CustomStatus is part of the instances table, so it does not need to query the history table). |
We should revisit this PR with Sebastian's suggestion. |
I think I have a relatively simple solution. Currently testing and I will push a commit once I am satisfied with it. Some thoughts:
|
…atus to avoid always fetching input
entityResult = new EntityQueryResult(result, query.IncludeDeleted); | ||
condition.ContinuationToken = entityResult.ContinuationToken; | ||
} | ||
while (entityResult.ContinuationToken != null && !entityResult.Entities.Any()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think about the case when entityResult.ContinuationToken != null
is true and !entityResult.Entities.Any()
is false. I think that means there are more records to read but we're going to break early anyways because we found at least one entity. Is that the intended design? If so, why wouldn't we keep reading the remaining records to see if we can find more entities? It seems to me that users will never get more than one page of results back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random 2¢. I think as an application developer, I wouldn't want it churning through pages of results to return me a single full page of results. That could really hurt performance outside of my control. If it only does one page (even if it's only one result), then I can timebox it (return as many results as possible in 100ms) or whatever to return results applicable to my application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for chiming in, @withinboredom. Now that you mention this, I realize that I misunderstood what's happening here. I forgot that both EntityQueryResult
and EntityQuery
have continuation token fields, meaning that it's perfectly fine to return a single page of results because the caller can continue to ask for more pages if they want. This should have been obvious by looking more closely at the previous implementation, but with this in mind, I think this solution is perfectly reasonable (pages may contain just one entity in the worst case, but you can get more).
Timeboxing - i.e. limiting the amount of time we spend churning through pages for the first entity, is another interesting feature, but I think that could be done as a separate improvement in a new PR (it might require interface changes, though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are good points.
As written right now, the code could take arbitrary time to return the first result (for example, if lots of entities were deleted and AzureStorage keeps returning those first). In such a situation I should return an empty result and a continuation token to keep the client request from timing out.
if (includeDeleted) | ||
{ | ||
this.Entities = orchestrationResult.DurableOrchestrationState | ||
.Select(status => new DurableEntityStatus(status)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to add a .ToList()
filter on the result here, like you do for the else
case below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I should add that.
I suppose the final question to address here is whether it is o.k. to change the default for "IncludeDeletedEntities". Right now it is |
@sebastianburckhardt I'm okay with making a breaking behavior change here by changing the default. The current behavior feels like a bug. We should make a point to call this out in our release notes, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time limit part is an interesting add, but I can't think of any practical problems with this approach. I guess this gives us the ability to offer some form of a reasonable SLA for this method. :)
LGTM!
I will merge with latest dev and update the release notes after #2002 is merged. |
# Conflicts: # src/WebJobs.Extensions.DurableTask/ContextImplementations/DurableClient.cs
I think we meant to merge this long ago. Looks like we still need to call this out in release notes.
|
Resolves #1364.
These changes add an
IncludeDeleted
flag toEntityQuery
that allows customers to choose whether theListEntitiesAsync
API should returned deleted (i.e. state is null) entities or not. If theIncludeDeleted
flag is false, then only entities with a non-null state are returned.