-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combination of Distinct and Offset does not work in v3.14.0 #1979
Comments
I was not able to reproduce the issue. Please provide a sample program that reproduces the issue. Something that inserts the required data set and runs the query with your exact query request options. |
Thanks for looking into this. Here is my Program.cs file: using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
using Microsoft.Azure.Cosmos;
using Newtonsoft.Json;
namespace CosmosSample
{
public class Program
{
private static readonly Random Random = new();
private static async Task Main()
{
// Create client, database and container.
var client = new CosmosClient(
"https://localhost:8081",
"C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==",
new CosmosClientOptions
{
ConnectionMode = ConnectionMode.Direct,
AllowBulkExecution = true
});
var databaseResponse = await client.CreateDatabaseIfNotExistsAsync("SampleDb");
var containerResponse =
await databaseResponse.Database.CreateContainerIfNotExistsAsync("SampleContainer", "/Metadata/BatchId",
20000);
var container = containerResponse.Container;
// Create Data.
var concurrentTasks = new List<Task>();
var batchIds = Enumerable.Range(0, 500).Select(_ => Guid.NewGuid()).ToArray();
for (int i = 0; i < 10; i++)
{
for (int y = 0; y < 500; y++)
{
concurrentTasks.Add(Task.Run(async () =>
{
var cosmosDto = new CosmosDto
{
Id = Guid.NewGuid(),
Metadata = new CosmosDto.MetadataDto
{
BatchId = batchIds[Random.Next(0, batchIds.Length)]
}
};
var response = await container.CreateItemAsync(cosmosDto);
Debug.Assert(response.StatusCode == HttpStatusCode.Created);
}));
}
await Task.WhenAll(concurrentTasks);
concurrentTasks.Clear();
}
var countIterator = container.GetItemQueryIterator<int>("SELECT VALUE COUNT(1) FROM c");
var countResponse = await countIterator.ReadNextAsync();
Console.WriteLine("Total number of records in DB: " + countResponse.Resource.First());
// Query Data.
int count = 0;
var result = new List<CosmosDto>();
var query = new QueryDefinition(
"SELECT DISTINCT c.Metadata FROM SampleContainer c ORDER BY c._ts DESC OFFSET 0 LIMIT 15");
Console.WriteLine("Query: " + query.QueryText);
var iterator = container.GetItemQueryIterator<CosmosDto>(query);
while (iterator.HasMoreResults)
{
Console.Write("Querying Data... ");
count++;
var moreResults = await iterator.ReadNextAsync();
result.AddRange(moreResults.Resource);
Console.WriteLine(
$"Retrieved {moreResults.Resource.Count()} records. Cost: {moreResults.RequestCharge} RUs.");
}
Console.WriteLine($"Number of ReadNextAsync calls: {count}");
}
}
public class CosmosDto
{
[JsonProperty(PropertyName = "id")] public Guid Id { get; set; }
public MetadataDto Metadata { get; set; }
public class MetadataDto
{
public Guid BatchId { get; set; }
}
}
} And here is my csproj file: <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net5.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Azure.Cosmos" Version="3.15.1" />
</ItemGroup>
</Project>
|
With this sample program I noticed that the more data I generate the longer SDK is going to keep looping and retrieving 0 records which is why it seems like an infinite loop in our environments that have millions of records. I tried 3.13.0 again and that SDK made only 1 request. |
@bchong95 were you able to reproduce the issue with my sample program? |
Can you try increasing the page size to see if that speeds up the execution? |
@bchong95 I just tried the latest SDK version (3.17.0-preview) that was supposed to fix this issue but I'm still seeing unexpected behavior (although this version of the SDK makes less calls to the database to retrieve the same data). Let me know if you are unable to reproduce the issue. |
@bchong95 @neildsh Any comments on @bernarden findings on 3.17.0-preview? |
Hi @bernarden, this is expected behavior based on the interface: This returns |
Hi @neildsh,
|
Hi @bernarden , This is expected. Sometimes query results will be split over multiple pages. Each page's results is generated by a separate query execution. When query results cannot be returned in one single execution, Azure Cosmos DB will automatically split results into multiple pages. Each of these pages would have a separate RU charge, even if no records are returned. Here are some examples for how to iterate through multiple pages of query results: https://docs.microsoft.com/en-us/azure/cosmos-db/sql-query-pagination#handling-multiple-pages-of-results You should loop through all the query pages to get the results. The total RU charge for the query is the sum of the RU charges for each individual page. |
Fine. Let's call them features. |
One thing I want to recommend is rewriting the query to |
In general the empty pages are to satisfy your sort order on _ts. The client needs to grab one page of records from each partition and then sort the results client side. Until then it's forced to pass back empty pages to give you a preemption point for the query. On the 5th page we finally have one page from each of the 4 partitions and we can serve the ORDER BY condition and that's when we return you the 5 documents. You can look at the CosmosDiagnostics to get a better sense of this. |
Closing due to in-activity, pease feel free to re-open. |
Describe the bug
After upgrading Cosmos DB SDK to the latest version, one of my queries stopped returning the data and now also causing an infinite loop. Removing
Distinct
or theOFFSET 0 LIMIT 15
part from the query makes SDK return the data again. I also tried versions [3.6.0 - 3.13.0] and they worked fine.To Reproduce
Expected behavior
ReadNextAsync()
should return the requested data back.HasMoreResults
should returnfalse
when SDK fails internally for whatever reason.Actual behavior
ReadNextAsync()
returnsOK
status butResource
value is an empty array.HasMoreResults
always staystrue
.I validated the query through Data Explorer in Azure and that returns all the data correctly.
Environment summary
SDK Version: 3.14.0/3.15.0-preview
OS Version: Windows
The text was updated successfully, but these errors were encountered: