Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linq to DocumentDb, where clause on child #58

Closed
wis3guy opened this issue Sep 23, 2015 · 5 comments
Closed

Linq to DocumentDb, where clause on child #58

wis3guy opened this issue Sep 23, 2015 · 5 comments

Comments

@wis3guy
Copy link

wis3guy commented Sep 23, 2015

In a project i'm currently working on, we have come to realise that we should not use DocumentDb collections as if they are the equivalent of a table in f.ex SQL Server. As a result, we are now persisting all of the entities, belonging to a single tenant in a single collection.

We already have lots of linq queries in our codebase which assume that each document type (aggregate root) is persisted in a dedicated collection. In an attempt to make the transition painless, i set out to refactor our data access object, so that its api continues to reason about aggregate roots, and deal with the single collection vs dedicated collections in it's implementation.

My approach is to wrap an aggregate root in an Resource<T> object, which derives from Resource and exposes a Model property as well as a Type property. I thought i would then be able to expose an IQueryable<T> to consuming code based on the following code:

        return _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
            .Where(x => x.Type == typeof(TModel).Name)
            .Select(x => x.Model);

Initial testing showed that this worked as planned and i confidently committed my changes. When doing functional testing however, we found that some queried models had all of their properties set to their default values (ie. null, 0, false, etc).

I can reproduce the problem with the following code:

        var wrong = _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
            .Where(x => x.Type == typeof(TModel).Name)
            .Select(x => x.Model)
            .Where(x => !x.IsDeleted)
            .ToArray();

        var correct = _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
            .Where(x => x.Type == typeof(TModel).Name)
            .Where(x => !x.Model.IsDeleted)
            .Select(x => x.Model)
            .ToArray();

The results of the above queries are not the same!!

  • Both queries return the same number of TModel instances.
  • Only the instances returned by the second example have their properties populated.

In order for my refactoring to be successful, i need wrong to be ... right :) Falling back to SQL is not an option as we value type safety of linq. Changing our approach to expose the Resource<T> objects would touch lots of code, as it requires all *.Property references to be substituted by *.Model.Property references.

It seems an issue with the linq provider that is part of the DocumentDb client.

We use Microsoft.Azure.DocumentDb version 1.4.1

@ghost
Copy link

ghost commented Sep 24, 2015

Could you please do a .ToString() on the wrong & correct before calling ToArray().
i.e. please give the results of the following

var correct = _client.CreateDocumentQuery<Resource>(_collection.DocumentsLink)
.Where(x => x.Type == typeof(TModel).Name)
.Where(x => !x.Model.IsDeleted)
.Select(x => x.Model)
.ToString()

and

var wrong = _client.CreateDocumentQuery<Resource>(_collection.DocumentsLink)
.Where(x => x.Type == typeof(TModel).Name)
.Select(x => x.Model)
.Where(x => !x.IsDeleted)
.ToString()

I suspect you will find that "wrong" is not doing the x.IsDeleted check server-side, but client-side.
the ":correct" is the correct way of building this query.

will need to look in to why the "wrong" doesn't populate the fields of the returned objects though, it should.

are you able to share some sample documents (even if you email or share with ryan craw at microsoft dot com)

@wis3guy
Copy link
Author

wis3guy commented Sep 24, 2015

@ryancrawcour Thank you for your response.

I suspect you will find that "wrong" is not doing the x.IsDeleted check server-side, but client-side.
the ":correct" is the correct way of building this query.

I don't want to sound cocky, but i disagree to the last part of your statement. Imho, if you expose an IQueryable<T> it should allow for both methods to be used. It is a matter of taste which you prefer, neither is wrong or right. Isn't the whole idea of deferred execution that the consuming code can trigger the execution by doing the first enumeration? Therefore i would go even further and say that the linq provider may not 'decide on its own' whether things are handled on the server or client. It should all be executed on the server, or yield a NotSupportedException. Otherwise, one could unknowingly end up with compiling, functioning code, that has bad performance because under the hood the client downloads large data sets only to apply predicates client-side.

Below is the result of doing a ToString() on each of the queryables:

  • correct: {"query":"SELECT VALUE root.Model FROM root WHERE ((root.Type = \"DocumentType\") AND (NOT root.Model.IsDeleted)) "}
  • wrong: {"query":"SELECT * FROM root WHERE ((root.Type = \"DocumentType\") AND (NOT root.Model.IsDeleted)) "}

This looks promising! The actual queries are almost the same and everything is done on the server.

For context, below is the type definition of DocumentType and its base type:

public class DocumentType : AggregateRoot
{
    public string Name { get; set; }
    public bool IsDeleted { get; set; }
}

public abstract class AggregateRoot
{
    public string ResourceId { get; set; }

    public bool IsTransient()
    {
        return string.IsNullOrEmpty(ResourceId);
    }
}

Also, per your request, assume these sample documents:

{
  "Model": {
    "Name": "Passport",
    "IsDeleted": false,
    "ResourceId": "Bm0EALL6QQUEAAAAAAAAAA=="
  },
  "Type": "DocumentType",
  "id": "9cd6edc3-3989-4900-9c36-8d568fd60a38",
  "_rid": "Bm0EALL6QQUEAAAAAAAAAA==",
  "_self": "dbs/Bm0EAA==/colls/Bm0EALL6QQU=/docs/Bm0EALL6QQUEAAAAAAAAAA==/",
  "_ts": 1442907766,
  "_etag": "\"00002900-0000-0000-0000-560106760000\"",
  "_attachments": "attachments/"
}

{
  "Model": {
    "Name": "Obsolete",
    "IsDeleted": true,
    "ResourceId": "Bm0EALL6QQUDAAAAAAAAAA=="
  },
  "Type": "DocumentType",
  "id": "78e5d2dc-7482-4461-a341-88c971197735",
  "_rid": "Bm0EALL6QQUDAAAAAAAAAA==",
  "_self": "dbs/Bm0EAA==/colls/Bm0EALL6QQU=/docs/Bm0EALL6QQUDAAAAAAAAAA==/",
  "_ts": 1442907766,
  "_etag": "\"00002700-0000-0000-0000-560106760000\"",
  "_attachments": "attachments/"
}

{
  "Model": {
    "Name": "Driver's license",
    "IsDeleted": false,
    "ResourceId": "Bm0EALL6QQUFAAAAAAAAAA=="
  },
  "Type": "DocumentType",
  "id": "8ca7221d-9d65-4576-a7f7-286c079ddc7d",
  "_rid": "Bm0EALL6QQUFAAAAAAAAAA==",
  "_self": "dbs/Bm0EAA==/colls/Bm0EALL6QQU=/docs/Bm0EALL6QQUFAAAAAAAAAA==/",
  "_ts": 1442907766,
  "_etag": "\"00002b00-0000-0000-0000-560106760000\"",
  "_attachments": "attachments/"
}

Hope this helps.

Any idea when the actual source of the linq provider will be accessible on github?

@ghost
Copy link

ghost commented Sep 24, 2015

So there's a subtle difference between the two queries which I need to investigate. The one is selecting only Model, where the other is selecting *.

@ghost
Copy link

ghost commented Nov 23, 2015

fixed in v.1.5.1

@ghost ghost closed this as completed Nov 23, 2015
@wis3guy
Copy link
Author

wis3guy commented Nov 27, 2015

@ryancrawcour I'm sorry, but the latest release did not entirely fix our problem. Please assume issue #81 as entered by my colleague @mmsommer.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant