Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combination of Distinct and Offset does not work in v3.14.0 #1979

Closed
bernarden opened this issue Nov 3, 2020 · 16 comments · Fixed by #2168
Closed

Combination of Distinct and Offset does not work in v3.14.0 #1979

bernarden opened this issue Nov 3, 2020 · 16 comments · Fixed by #2168
Assignees
Labels
bug Something isn't working Hotfix A hotfix is required for the issue needs-investigation QUERY

Comments

@bernarden
Copy link

Describe the bug
After upgrading Cosmos DB SDK to the latest version, one of my queries stopped returning the data and now also causing an infinite loop. Removing Distinct or the OFFSET 0 LIMIT 15 part from the query makes SDK return the data again. I also tried versions [3.6.0 - 3.13.0] and they worked fine.

To Reproduce

var result = new List<CosmosDto>();
var query = new QueryDefinition("SELECT distinct c.Version, c.MetaData FROM Collection c ORDER BY c._ts DESC OFFSET 0 LIMIT 15");
var iterator = container.GetItemQueryIterator<CosmosDto>(query, requestOptions: requestOptions);
while (iterator.HasMoreResults)
{
    var moreResults = await iterator.ReadNextAsync();
    result.AddRange(moreResults.Resource);
}

Expected behavior
ReadNextAsync() should return the requested data back.
HasMoreResults should return false when SDK fails internally for whatever reason.

Actual behavior
ReadNextAsync() returns OK status but Resource value is an empty array.
HasMoreResults always stays true.
I validated the query through Data Explorer in Azure and that returns all the data correctly.

Environment summary
SDK Version: 3.14.0/3.15.0-preview
OS Version: Windows

@j82w j82w added the QUERY label Nov 3, 2020
@j82w j82w added bug Something isn't working needs-investigation labels Nov 3, 2020
@bchong95
Copy link
Contributor

bchong95 commented Nov 3, 2020

I was not able to reproduce the issue. Please provide a sample program that reproduces the issue. Something that inserts the required data set and runs the query with your exact query request options.

@bernarden
Copy link
Author

bernarden commented Nov 4, 2020

Thanks for looking into this. Here is my Program.cs file:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
using Microsoft.Azure.Cosmos;
using Newtonsoft.Json;

namespace CosmosSample
{
    public class Program
    {
        private static readonly Random Random = new();

        private static async Task Main()
        {
            // Create client, database and container.
            var client = new CosmosClient(
                "https://localhost:8081",
                "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==",
                new CosmosClientOptions
                {
                    ConnectionMode = ConnectionMode.Direct,
                    AllowBulkExecution = true
                });
            var databaseResponse = await client.CreateDatabaseIfNotExistsAsync("SampleDb");
            var containerResponse =
                await databaseResponse.Database.CreateContainerIfNotExistsAsync("SampleContainer", "/Metadata/BatchId",
                    20000);
            var container = containerResponse.Container;

            // Create Data.
            var concurrentTasks = new List<Task>();
            var batchIds = Enumerable.Range(0, 500).Select(_ => Guid.NewGuid()).ToArray();
            for (int i = 0; i < 10; i++)
            {
                for (int y = 0; y < 500; y++)
                {
                    concurrentTasks.Add(Task.Run(async () =>
                    {
                        var cosmosDto = new CosmosDto
                        {
                            Id = Guid.NewGuid(),
                            Metadata = new CosmosDto.MetadataDto
                            {
                                BatchId = batchIds[Random.Next(0, batchIds.Length)]
                            }
                        };
                        var response = await container.CreateItemAsync(cosmosDto);
                        Debug.Assert(response.StatusCode == HttpStatusCode.Created);
                    }));
                }
            
                await Task.WhenAll(concurrentTasks);
                concurrentTasks.Clear();
            }

            var countIterator = container.GetItemQueryIterator<int>("SELECT VALUE COUNT(1) FROM c");
            var countResponse = await countIterator.ReadNextAsync();
            Console.WriteLine("Total number of records in DB: " + countResponse.Resource.First());

            // Query Data.
            int count = 0;
            var result = new List<CosmosDto>();
            var query = new QueryDefinition(
                "SELECT DISTINCT c.Metadata FROM SampleContainer c ORDER BY c._ts DESC OFFSET 0 LIMIT 15");
            Console.WriteLine("Query: " + query.QueryText);
            var iterator = container.GetItemQueryIterator<CosmosDto>(query);
            while (iterator.HasMoreResults)
            {
                Console.Write("Querying Data... ");
                count++;
                var moreResults = await iterator.ReadNextAsync();
                result.AddRange(moreResults.Resource);
                Console.WriteLine(
                    $"Retrieved {moreResults.Resource.Count()} records. Cost: {moreResults.RequestCharge} RUs.");
            }

            Console.WriteLine($"Number of ReadNextAsync calls: {count}");
        }
    }

    public class CosmosDto
    {
        [JsonProperty(PropertyName = "id")] public Guid Id { get; set; }

        public MetadataDto Metadata { get; set; }

        public class MetadataDto
        {
            public Guid BatchId { get; set; }
        }
    }
}

And here is my csproj file:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net5.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.Azure.Cosmos" Version="3.15.1" />
  </ItemGroup>

</Project>

@bernarden
Copy link
Author

With this sample program I noticed that the more data I generate the longer SDK is going to keep looping and retrieving 0 records which is why it seems like an infinite loop in our environments that have millions of records. I tried 3.13.0 again and that SDK made only 1 request.

@bernarden
Copy link
Author

bernarden commented Nov 10, 2020

@bchong95 were you able to reproduce the issue with my sample program?

@bchong95
Copy link
Contributor

Can you try increasing the page size to see if that speeds up the execution?

@bernarden
Copy link
Author

bernarden commented Dec 21, 2020

I updated the sample program to target cosmos sdk 3.15.1 and net5.0.
Here are the runs with page sizes 5, 50 and 500.
cosmos-query-with-different-page-sizes

@kirankumarkolli kirankumarkolli added the Hotfix A hotfix is required for the issue label Jan 6, 2021
@bernarden
Copy link
Author

@bchong95 I just tried the latest SDK version (3.17.0-preview) that was supposed to fix this issue but I'm still seeing unexpected behavior (although this version of the SDK makes less calls to the database to retrieve the same data). Let me know if you are unable to reproduce the issue.

image

@ealsur
Copy link
Member

ealsur commented Feb 22, 2021

@bchong95 @neildsh Any comments on @bernarden findings on 3.17.0-preview?

@neildsh
Copy link
Contributor

neildsh commented Feb 24, 2021

Hi @bernarden, this is expected behavior based on the interface:
public override async ValueTask<bool> MoveNextAsync(ITrace trace)

This returns true if there are any results and false otherwise. So you cannot return false on the last page, because then the last page would not be read. This results in one extra call to MoveNextAsync that returns immediately.

@bernarden
Copy link
Author

Hi @neildsh,
These are the issues that I see with the image above:

  1. Why does SDK make the first 4 calls that return no data back but still cost 74.16 RUs.
  2. The query attempt that actually returns the data back costs 0 RUs.
  3. I haven't looked at the internals of this SDK but my expectation is that the iterator should know that it already returned the required number of documents based on the limit specified. This should result in just one call to iterator.ReadNextAsync() in this example instead of 6..

@bernarden
Copy link
Author

@bchong95 @ealsur @neildsh @sboshra Why was this issue closed?

@timsander1
Copy link
Contributor

Hi @bernarden ,

This is expected. Sometimes query results will be split over multiple pages. Each page's results is generated by a separate query execution. When query results cannot be returned in one single execution, Azure Cosmos DB will automatically split results into multiple pages. Each of these pages would have a separate RU charge, even if no records are returned.

Here are some examples for how to iterate through multiple pages of query results: https://docs.microsoft.com/en-us/azure/cosmos-db/sql-query-pagination#handling-multiple-pages-of-results

You should loop through all the query pages to get the results. The total RU charge for the query is the sum of the RU charges for each individual page.

@bernarden
Copy link
Author

Fine. Let's call them features.

@bchong95
Copy link
Contributor

bchong95 commented Mar 3, 2021

One thing I want to recommend is rewriting the query to SELECT DISTINCT VALUE c.Metadata FROM c ORDER BY c.Metadata OFFSET 0 LIMIT 5. You might see better results, since we can stream results out.

@bchong95
Copy link
Contributor

bchong95 commented Mar 3, 2021

In general the empty pages are to satisfy your sort order on _ts. The client needs to grab one page of records from each partition and then sort the results client side. Until then it's forced to pass back empty pages to give you a preemption point for the query. On the 5th page we finally have one page from each of the 4 partitions and we can serve the ORDER BY condition and that's when we return you the 5 documents. You can look at the CosmosDiagnostics to get a better sense of this.

@ghost
Copy link

ghost commented Dec 15, 2021

Closing due to in-activity, pease feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Hotfix A hotfix is required for the issue needs-investigation QUERY
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants