Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for filtering on value predicate facets #4034

Closed
robsws opened this issue Sep 20, 2019 · 12 comments · Fixed by #4217
Closed

Add support for filtering on value predicate facets #4034

robsws opened this issue Sep 20, 2019 · 12 comments · Fixed by #4217
Assignees
Labels
area/facets Issues related to face handling, querying, etc. area/querylang/filter Related to the filter directive. area/querylang Issues related to the query language specification and implementation. kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate/work on it.
Milestone

Comments

@robsws
Copy link

robsws commented Sep 20, 2019

I want to be able to filter value predicates on their facets as well as uid predicates. For example, I want to be able to execute something like the following query on the toy graph specified in the facets documentation:

curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d '
{
  data(func: eq(name, "Alice")) {
     name
     mobile
     car @facets(gt(since, "2019-01-01"))
  }
}' | python -m json.tool | less

Running that query right now provides the following error response:

{
  "errors": [
    {
      "message": ": Facet filtering is not supported on values.",
      "extensions": {
        "code": "ErrorInvalidRequest"
      }
    }
  ],
  "data": null
}
@MichelDiz MichelDiz added area/facets Issues related to face handling, querying, etc. area/querylang Issues related to the query language specification and implementation. kind/enhancement Something could be better. labels Sep 20, 2019
@MichelDiz
Copy link
Contributor

That is something we already discussed internally.

This has some logical barriers. We could not tell if the filter would have to be applied to the node itself or the intrinsic value to the predicate.

I believe that if the filter should be applied to the node itself, we should allow using facet filters in Root rather than in the query body.

@robsws are you mutating via RDF or JSON? What is your use case? Filter the node with the car facet or show the car value facet only if the facet value filter is true?

links just for internal context

https://dgraphlabs.slack.com/archives/G9SRXDGN5/p1546911576227800
https://dgraphlabs.slack.com/archives/G9SRXDGN5/p1546913902244600

@campoy campoy added area/querylang/filter Related to the filter directive. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate/work on it. labels Sep 20, 2019
@campoy campoy added this to the Dgraph v1.2 milestone Sep 20, 2019
@campoy
Copy link
Contributor

campoy commented Sep 20, 2019

I think we should definitely implement this.

Since a value predicate can hold facets, we should be able to filter by them as in other predicates.
Could you explain a bit more in detail which concerns do you have regarding the language side of things, @MichelDiz ?

@MichelDiz
Copy link
Contributor

MichelDiz commented Sep 21, 2019

@campoy it is linked to the order of concepts in the syntax. Facets were created for the relationship between edges, the ability to create facets in values and allow functions at this level to affect the query itself, it slightly changes its concept a priori.

I don't know what the whole design implications of this are. If any of the approaches below would affect other use cases.

Although Facets are not first-class citizen I think at the root would make more sense depending on the context. But perhaps in the body, it is the most simple and straightforward.

If we will support this, we must determine whether facets value filtering in this context will be executed
e.g. in the Root of the query (Just a rough example)

{
  data(func: eq(name, "Alice")) @filter(car@facets(gt(since, "2019-01-01"))) {
     name
     mobile
     car 
  }
}

or in the query body.

{
  data(func: eq(name, "Alice")) {
     name
     mobile
     car @facets(gt(since, "2019-01-01"))
  }
}

If we choose in the query body, what happens if we have 3 Alices in the DB and only one has a car? (assuming the "since" value of one is 2006)

The intention is:

1 - Display the three Alices and not show the value "since"(or even "car")? because Alice has the car since 2006 and not 2019 - as it is in the query - And you are determining that facet filter directly on the predicate/value. So, do you want to apply this filtering to the value or the query as a whole?

2 - Do not display any Alice, because all other Alices do not have the "since" facet or a value greater than 2019? If you wanna apply this to the query as a whole.

Now, assuming the query is with the value @facets (gt (since, "2005-01-01"))

1 - Would we display only one Alice? because all the others do not have the "since" facet.

In short term is

Do we apply this to the value or the query? If we apply to value, I cannot understand the immediate usefulness of it. It would be like "skip" this value for now, but show me the result anyway.

@robsws
Copy link
Author

robsws commented Sep 23, 2019

For my use case, I would like to be able to filter multiples of a particular value edge down to one or more based on the value of the facet. For example, maybe if Alice had several cars like this:

    _:alice <name> "Alice" .
    _:alice <mobile> "040123456" (since=2006-01-02T15:04:05) .
    _:alice <car> "MA0123" (since=2006-02-02T13:01:09, first=true) .
    _:alice <car> "XY0123" (since=2008-02-02T13:01:09, first=false) .
    _:alice <car> "GU0123" (since=2019-02-02T13:01:09, first=false) .

I want to be able to fetch the car (or cars) she had where since > "2019-01-01" with my query above and ignore the other entries.

I am concerned that if the extra clause appears at the root, then we won't be able to do facet filtering on deeper levels of the query like we can with uid predicates. Consider an example where people can change the registration plates for their car, then we might want to do something like this:

    _:alice <name> "Alice" .
    _:alice <mobile> "040123456" (since=2006-01-02T15:04:05) .
    _:alice <car> _:car .
    _:car <registration> "AA1111"  (since=2018-01-01T13:01:09) .
    _:car <registration> "BB1111"  (since=2019-01-01T13:01:09) .
curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d '
{
  data(func: eq(name, "Alice")) {
     name
     mobile
     car {
         registration @facets(gt(since, "2019-01-01"))
     }
  }
}' | python -m json.tool | less

With regards to how to deal with cases where the facet isn't specified, this is inconsequential to my use case since the facet will exist on all edges of that type, but I understand that it's something that needs to be decided upon.

@campoy
Copy link
Contributor

campoy commented Sep 23, 2019

I'd say we should unify how we filter by facets and do the same for predicates pointing to other nodes as well as predicates pointing to values.

If car was pointing to a node with field since then you'd write:

{
  data(func: eq(name, "Alice")) {
     name
     mobile
     car @filter(gt(since, "2019-01-01"))
  }
}

The same way, if you have a predicate that points to a string value and you want to filter by a face attached to it, you should be able to write:

{
  data(func: eq(name, "Alice")) {
     name
     mobile
     car @facets(gt(since, "2019-01-01"))
  }
}

Let's see with a sample dataset created by this mutation:

{
  set {
    _:a <name> "Alice" .
    _:a <dgraph.type> "Person" .
    _:b <name> "Bob" .
    _:b <dgraph.type> "Person" .
    _:c <name> "Carla" .
    _:c <dgraph.type> "Person" .
    
    _:b <car> _:bc .
    _:bc <name> "Bugatti" .
    _:bc <since> "2015-01-01"^^<xs:dateTime> .
    
    _:c <car> _:cc .
    _:cc <name> "Corolla" .
    _:cc <since> "2019-02-01"^^<xs:dateTime> .    
  }
}

And schema:

<car>: [uid] .
<name>: string .
<since>: datetime @index(day) .

Then we can query who owns a car since after 2019-01-01 with:

{
  p(func: type(Person)) {
    name
    car @filter(gt(since, "2019-01-01")) {
      name
    }
  }
}

And we obtain:

{
  "data": {
    "p": [
      {
        "name": "Alice"
      },
      {
        "name": "Bob"
      },
      {
        "name": "Carla",
        "car": [
          {
            "name": "Corolla"
          }
        ]
      }
    ]
  },
  "extensions": {...}
}

In the same way, if we held the since info in a facet for car, which still points to a car UID:

{
  set {
    _:a <name> "Alice" .
    _:a <dgraph.type> "Person" .
    _:b <name> "Bob" .
    _:b <dgraph.type> "Person" .
    _:c <name> "Carla" .
    _:c <dgraph.type> "Person" .
    
    _:b <car> _:bc (since="2015-01-01") .
    _:bc <name> "Bugatti" .
    
    _:c <car> _:cc (since="2019-02-01") .
    _:cc <name> "Corolla" .
  }
}

We can query in a very similar way by using filters on facets:

{
  p(func: type(Person)) {
    name
    car @facets(gt(since, "2019-01-01")) {
      name
    }
  }
}

Finally, since the car node only holds a name now, we can make it so car points to a string instead of a UID.

{
  set {
    _:a <name> "Alice" .
    _:a <dgraph.type> "Person" .
    _:b <name> "Bob" .
    _:b <dgraph.type> "Person" .
    _:c <name> "Carla" .
    _:c <dgraph.type> "Person" .
    
    _:b <car> "Bugatti" (since="2015-01-01") .
    _:c <car> "Corolla" (since="2019-02-01") .
  }
}

I would now expect this query, which is still very similar to the previous ones, to work:

{
  p(func: type(Person)) {
    name
    car @facets(gt(since, "2019-01-01"))
  }
}

Currently, it fails with Message: : Facet filtering is not supported on values.

@MichelDiz
Copy link
Contributor

MichelDiz commented Sep 23, 2019

@robsws there is a small issue with what you want.

    _:alice <car> "MA0123" (since=2006-02-02T13:01:09, first=true) .
    _:alice <car> "XY0123" (since=2008-02-02T13:01:09, first=false) .
    _:alice <car> "GU0123" (since=2019-02-02T13:01:09, first=false) .

[Lists] can't hold facets for each value of the list. That is, it is not possible to do what you mentioned in your last comment tho. And adding facet support to lists would be another feature beyond that.

So instead of that, is better to do:

{
  set {
    _:alice <name> "Alice" .
    _:alice <mobile> "040123456" (since=2006-01-02T15:04:05) .
      
    _:alice <cars> _:MA0123 (since=2006-02-02T13:01:09, first=true) .
    _:alice <cars> _:XY0123 (since=2008-02-02T13:01:09, first=false) .
    _:alice <cars> _:GU0123 (since=2019-02-02T13:01:09, first=false) .
      
    _:MA0123 <model> "MA0123" .
    _:XY0123 <model> "XY0123" .
    _:GU0123 <model> "GU0123" .
  }
}

Query

{
  data(func: eq(name, "Alice")) {
     uid
     name
     mobile
     cars @facets @facets(gt(since, "2019-01-01"))
  }
}

Result

{
  "data": {
    "data": [
      {
        "uid": "0x50954",
        "name": "Alice",
        "mobile": "040123456",
        "cars": [
          {
            "cars|first": false,
            "cars|since": "2019-02-02T13:01:09Z"
          }
        ]
      }
    ]
  }
}

@campoy
Copy link
Contributor

campoy commented Sep 23, 2019

If I understand correctly, @robsws is not saying that a given facet is applied to each value in a list predicate, but rather in his application all predicates will be created with their own facets.

This looks like a pretty straight forward addition at this point, would you agree with this @MichelDiz

@MichelDiz
Copy link
Contributor

MichelDiz commented Sep 23, 2019

Not sure if I get you right. In his last example, the sample will be inevitably a list. The only way to avoid creating a list is by giving a predicate for each car. e.g:

# If you use the same predicate it has to be a list, or it will be overwritten.
# or you have to have a pred for each car as bellow.
    _:alice <car1> "MA0123" (since=2006-02-02T13:01:09, first=true) .
    _:alice <car2> "XY0123" (since=2008-02-02T13:01:09, first=false) .
    _:alice <car3> "GU0123" (since=2019-02-02T13:01:09, first=false) .

That's the only way to have a value facet for now. But, I don't think this is ideal. This can eventually lead to the creation of numerous predicates, which goes beyond control.

Despite this point, functions in facet values ​​are in my view welcome. But it is necessary to understand the implications.

it is necessary to emphasize that nothing below works, this is part of a discussion.

Another detail with this sample:

    _:alice <car> _:car .
    _:car1 <registration> "AA1111"  (since=2017-03-02T13:01:09Z) .
    _:car1 <model> "MA0123" .
    _:car2 <registration> "AA222"  (since=2018-01-01T13:01:09Z) .
    _:car2 <model> "XY0123" .
    _:car3 <registration> "BB1111"  (since=2019-01-01T13:01:09Z) .
    _:car3 <model> "GU0123" .

This syntax is okay for this specific case of a nested block.

{
  data(func: eq(name, "Alice")) {
     name
     mobile
     car {
         registration @facets(gt(since, "2019-01-01"))
         model
     }
  }
}

But the result is:

{
  "data": {
    "data": [
      {
        "name": "Alice",
        "mobile": "040123456",
        "car": [
          {
            "registration": "BB1111",
            "model": "GU0123",
            "registration|since": "2019-01-01T13:01:09Z"
          }
        ]
      }
    ]
  }
}

OR ?

{
  "data": {
    "data": [
      {
        "name": "Alice",
        "mobile": "040123456",
        "car": [
          {
            "registration": "AA1111",
            "model": "MA0123",
            "registration|since": "2017-03-02T13:01:09Z"
          },
          {
            "registration": "AA222",
            "model": "XY0123",
            "registration|since": "2018-01-01T13:01:09Z"
          },
          {
            "model": "GU0123"
          }
        ]
      }
    ]
  }
}

If the response is the first result, I think it would be better to have this value facet in root/edge, like bellow. As I mentioned from my first comment.

It would look like this if in the root query/edge.

this would be similar to https://docs.dgraph.io/query-language/#uid-in

{
  data(func: eq(name, "Alice")) {
     name
     mobile
     car @facets @facets(in "registration" gt(since, "2019-01-01")){
         registration
         model
     }
  }
}
{
  "data": {
    "data": [
      {
        "name": "Alice",
        "mobile": "040123456",
        "car": [
          {
            "registration": "BB1111",
            "model": "GU0123",
            "registration|since": "2019-02-02T13:01:09Z"
          }
        ]
      }
    ]
  }
}

@robsws
Copy link
Author

robsws commented Sep 24, 2019

Hi, thanks for continuing to look into this. I have a few comments based on the discussion above:

[Lists] can't hold facets for each value of the list. That is, it is not possible to do what you mentioned in your last comment tho. And adding facet support to lists would be another feature beyond that.

If I understand correctly, @robsws is not saying that a given facet is applied to each value in a list predicate, but rather in his application all predicates will be created with their own facets.

I think this is correct yes - each individual predicate in the list has it's own facets, rather than the list itself having facets. As if each predicate were individual, different predicates rather than a single list predicate. Hopefully this means it should be possible to implement?

It's important for us to be able to filter several value edges out of the same type by their facet, as essentially our use case will involve switching between different "versions" of a predicate based on the timestamp in the facet. Modifying the example a little to include start and end times:

    _:alice <car> "MA0123" (start=2006-02-02T13:01:09, finish=2008-02-02T13:01:09, first=true) .
    _:alice <car> "XY0123" (start=2008-02-02T13:01:09, finish=2019-02-02T13:01:09, first=false) .
    _:alice <car> "GU0123" (start=2019-02-02T13:01:09, finish=2020-02-12T13:09:01, first=false) .

By using a timestamp variable in our query, we can determine what car Alice had on a particular date, say 2018-12-01, by doing something like this:

data(func: eq(name, "Alice")) {
     name
     mobile
     car @facets(gt(finish, "2018-12-01") and lt(start, "2018-12-01"))
  }

If it is impossible to do value facet filtering like I've demonstrated in the example above, then the functionality I don't think is much good for my use case unfortunately.


If the response is the first result, I think it would be better to have this value facet in root/edge, like bellow. As I mentioned from my first comment.

It would look like this if in the root query/edge.

this would be similar to https://docs.dgraph.io/query-language/#uid-in

> {
>   data(func: eq(name, "Alice")) {
>      name
>      mobile
>      car @facets @facets(in "registration" gt(since, "2019-01-01")){
>          registration
>          model
>      }
>   }
> }

Thanks for clarifying this - I thought that you were suggesting that the syntax appear in the ultimate root of the query rather than just the subquery in which it should apply.

I'm intrigued by that "lookahead" syntax actually (i.e. filter a uid predicate based on the facets of a value predicate on the other side of the uid predicate). If maybe Alice owned multiple cars at the same time, and I wanted to see what cars she owned at a particular time that have registrations beginning with "AA", would it be possible to do something like this?:

{
  data(func: eq(name, "Alice")) {
     name
     mobile
     car @filter(regexp(registration, "/^AA.*/")) @facets(in "registration" gt(since, "2019-01-01")){
         registration
         model
     }
  }
}

@MichelDiz
Copy link
Contributor

MichelDiz commented Sep 24, 2019

@robsws About the last question, yeah, it definitely would be possible.

About having facets on individual values of a list. I'm not sure, at some time ago a user asked about this functionality, and the conclusion among the core team was that it would not be possible. I don't know about today, but this is another feature. It would not be part of this issue here.

@campoy
Copy link
Contributor

campoy commented Oct 23, 2019

Hey @MichelDiz,

I totally missed your comment above, sorry about that.

The result to the query you propose should be:

{
  "data": {
    "data": [
      {
        "name": "Alice",
        "mobile": "040123456",
        "car": [
          {
            "registration": "AA1111",
            "model": "MA0123",
            "registration|since": "2017-03-02T13:01:09Z"
          },
          {
            "registration": "AA222",
            "model": "XY0123",
            "registration|since": "2018-01-01T13:01:09Z"
          },
          {
            "model": "GU0123"
          }
        ]
      }
    ]
  }
}

If you wanted to also remove the nodes above you would use something else.
I might propose to use @cascade in this case, but in any case, that's an orthogonal concern to this issue.

At this point I'd say we should simply implement this fix as is and in the same way we filter for facets on predicates of type uid or [uid].

If I'm not mistaken @ashish-goswami is already working on this.

@MichelDiz
Copy link
Contributor

Okay, that behave makes sense. But thinking a little more. I think one thing still missing. Are we gonna skip only the value facet or the predicate/value and value facet?

The example I gave it does not return the registration predicate. So, it is skipping both.

the other alternative:

          {
            "registration": "AA1111",
            "model": "MA0123",
            "registration|since": "2017-03-02T13:01:09Z"
          },
          {
            "registration": "AA222",
            "model": "XY0123",
            "registration|since": "2018-01-01T13:01:09Z"
          },
          {
            "registration": "BB1111",
            "model": "GU0123"
          }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/facets Issues related to face handling, querying, etc. area/querylang/filter Related to the filter directive. area/querylang Issues related to the query language specification and implementation. kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

6 participants