Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for CountAsync() #589

Closed
wahyuen opened this issue Jul 24, 2019 · 13 comments
Closed

Support for CountAsync() #589

wahyuen opened this issue Jul 24, 2019 · 13 comments
Assignees
Labels
bug Something isn't working LINQ

Comments

@wahyuen
Copy link
Contributor

wahyuen commented Jul 24, 2019

Is your feature request related to a problem? Please describe.
In version 2.x of the SDK, we were able to execute asynchronous calls for determining the count of a IQueryable. It seems like you can only perform this in a synchronous fashion in the current version of the SDK.

Describe the solution you'd like
Support the CountAsync() LINQ syntax

Describe alternatives you've considered
None, there doesn't appear to be support for this currently.

@j82w
Copy link
Contributor

j82w commented Jul 24, 2019

The way you can do this in v3 is by using the ToFeedIterator method.

//V3 Asynchronous query execution with LINQ query generation
FeedIterator<ToDoActivity> setIterator = this.Container.GetItemLinqQueryable<ToDoActivity>()
    .Where(item => (item.taskNum < 100))
    .ToFeedIterator();

int totalResultCount = 0;
while (setIterator.HasMoreResults)
{
    FeedResponse<ToDoActivity> queryResponse = await setIterator.ReadNextAsync();
    totalResultCount += queryResponse.Count();
}

@j82w
Copy link
Contributor

j82w commented Jul 24, 2019

That solution is not as clean as CountAsync().

Possible solutions:

  1. Add CountAsync() as extension method for IQueryable
  2. Add ToListAsync() to FeedIterator. (await ToFeedIterator().ToListAsync()).Count
  3. Add a static method to FeedIterator to do the async count. FeedIterator.CountAsync(Iqueryable)

@wahyuen
Copy link
Contributor Author

wahyuen commented Jul 25, 2019

@j82w Thanks for the response. We had a quick test with your above suggestion and had some interesting observations to share.

Test Scenarios:

  • DocumentClient with synchronous Count call
  • DocumentClient with asynchronous CountAsync call
  • CosmosClient with sychronous Count call
  • CosmosClient with above mentioned asychronous iterator async pattern
    Microsoft.Azure.DocumentDB.Core 2.5.1
    Microsoft.Azure.Cosmos 3.0.0

Test Code

var feedOptions = new FeedOptions
{
	MaxBufferedItemCount = -1,
	MaxItemCount = -1,
	MaxDegreeOfParallelism = -1,
	EnableCrossPartitionQuery = true
};

var documentClientSyncCount = documentClient.CreateDocumentQuery<Image>(uri, feedOptions)
	.Count(x => x.Investigations.Contains(investigationKey));

Console.WriteLine($"DocumentClient.Count: {documentClientSyncCount} - {watch.Elapsed.TotalMilliseconds:0}ms");

watch.Restart();

var documentClientAsyncCount = await documentClient.CreateDocumentQuery<Image>(uri, feedOptions)
	.Where(x => x.Investigations.Contains(investigationKey)).CountAsync();

Console.WriteLine($"DocumentClient.CountAsync: {documentClientAsyncCount} - {watch.Elapsed.TotalMilliseconds:0}ms");

watch.Restart();

var queryRequestOptions = new QueryRequestOptions
{
	MaxBufferedItemCount = -1,
	MaxConcurrency = -1,
	MaxItemCount = -1
};

var cosmosSyncCount = imagesContainer.GetItemLinqQueryable<Image>(true, queryRequestOptions).Count(x => x.Investigations.Contains(investigationKey));
Console.WriteLine($"CosmosClient.Count: {cosmosSyncCount} - {watch.Elapsed.TotalMilliseconds:0}ms");

watch.Restart();

var cosmosAsyncCount = 0;
var countIterator = imagesContainer.GetItemLinqQueryable<Image>(false, queryRequestOptions).Where(x => x.Investigations.Contains(investigationKey))
	.ToFeedIterator();

while (countIterator.HasMoreResults)
{
	var response = await countIterator.ReadNextAsync();

	cosmosAsyncCount += response.Resource.Count();
}

Console.WriteLine($"CosmosClient.CountAsync: {cosmosAsyncCount} - {watch.Elapsed.TotalMilliseconds:0}ms");

Results

DocumentClient.Count: 1763 - 2035ms
DocumentClient.CountAsync: 1763 - 2779ms
CosmosClient.Count: 1763 - 955ms
CosmosClient.CountAsync: 1763 - 8112ms

Observations

  • surprisingly, even the DocumentClient.CountAsync() seems to be slower than the sync version
  • the above mentioned iterator pattern seems to be significantly worse in performance than the sync version
  • the current implemenation of Count on CosmosClient seems to be the faster of all so far!!

We ran in to this issue as we are attempting to port our current 2.x SDK code to the new 3.x SDK. From this perspective, we have a large number of cases where we wish to perform various counts to a number of collections and we will often firing these off using CountAsync and simply execute a await Task.WhenAll(tasks) call to wait for them to all return.

If it was possible, we'd love to see your proposal of CountAsync extension method, as we believe this is the most idiomatic syntax given where previous developers are migrating from. If the SDK could combine the performance of the current implementation of Count in an asynchronous manner, we would be very happy campers :)

@j82w
Copy link
Contributor

j82w commented Jul 25, 2019

@wahyuen is your only goal to get a count or do you eventually need to use the items?

CosmosClient.Count isn't returning the objects. It's only returning a count.
CosmosClient.CountAsync is returning the objects which requires allocating more memory and parsing the responses.

Can you try doing this so the queries will be the same for both CosmosClient.Count and CosmosClient.CountAsync?

@j82w
Copy link
Contributor

j82w commented Jul 25, 2019

Regarding the extension method we would really like to avoid it. Extension methods can't be mocked. It also can lead to confusion in some cases since it shows up on all IQueryable instead of just the Cosmos ones.

@wahyuen
Copy link
Contributor Author

wahyuen commented Jul 26, 2019

@j82w in the context of the queries where we are using the counts, we only need to deal with the number itself and don't interact or interate the actual items themselves. Given that, anything where we can simply have the database server optimally return the count without having to transfer the items themselves would be ideal in this situation.

In the current version, I don't think we can rewrite the query to the form you described above? The Count keyword returns an int from which I don't think we can then apply the ToFeedIterator() extension method since that operates on an IQueryable?

Slightly outside of this ticket, but perhaps related, in the 2.x SDK we relied quite alot on the extension methods found inside DocumentQueryable which exposed out calls such as CountAsync, MaxAsync, AverageAsync among others. I suspect there might be others moving forward who would want to migrate from 2.x to 3.x who might be looking for equivalent functionality. If this is in a different form, thats probably still ok, so long as we have a path forward :)

@j82w
Copy link
Contributor

j82w commented Jul 26, 2019

Sorry, I should have tested that before posting. I forgot count executes the query.

Thanks for bringing this gap to my attention. We will come up with a plan to unblock these scenarios. We might end up just adding the extensions methods, but I would really like to find a more unit test friendly path.

@mark-manticore
Copy link

Hitting the exact same issue upgrading from v2 to v3. Previously using the CountAsync() method after applying a "where" predicate.

_documentClient.CreateDocumentQuery<T>(_documentCollectionUri, options).Where(predicate).CountAsync();

Iterating over the documents and summing the count up seems less than ideal (and RU costly) when the final count is the only thing needed. The predicate condition spans multiple partitions so a stored procedure would also not be viable.

With the ability to convert a LINQ query to SQL query string I could replace the initial portion of the string "select *" with a "select value count(1)" for a query only containing the where portion. This seems like a hacky workaround though.

@mark-manticore
Copy link

mark-manticore commented Jul 31, 2019

With the newly released 3.1 SDK I'm now able to work around this issue with the following code.

var requestOptions = new QueryRequestOptions();
requestOptions.EnableScanInQuery = true;

var queryDefinition = _container.GetItemLinqQueryable<T>().Where(predicate).ToQueryDefinition();
var queryText = queryDefinition.QueryText.Replace("VALUE root", "VALUE COUNT(1)", StringComparison.OrdinalIgnoreCase);
var queryIterator = _container.GetItemQueryIterator<int>(queryText, null, requestOptions);
var response = await queryIterator.ReadNextAsync();
return response.Resource.FirstOrDefault();

@rasmuschristensen
Copy link

I'm in need of the same, just to check if something exists. I created this extension. Similar to could be made to support count

https://gist.github.com/rasmuschristensen/81d4e4694b4993765c3ad36d8f45fd36

@simplynaveen20
Copy link
Member

Fixed in #729

@petro2050
Copy link

@simplynaveen20 @j82w Regarding #729, are the async methods slower than sync methods still? Have you worked on performance improvements?

@j82w
Copy link
Contributor

j82w commented Apr 22, 2020

@petro2050 can you create a new issue with how you are doing the call and the perf you are seeing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working LINQ
Projects
None yet
Development

No branches or pull requests

6 participants