Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filtering on non-indexed predicates #4305

Closed
mboxall opened this issue Nov 21, 2019 · 8 comments · Fixed by #4531
Closed

Allow filtering on non-indexed predicates #4305

mboxall opened this issue Nov 21, 2019 · 8 comments · Fixed by #4531
Assignees
Labels
kind/enhancement Something could be better. status/accepted We accept to investigate/work on it.
Milestone

Comments

@mboxall
Copy link

mboxall commented Nov 21, 2019

Currently functions can only be applied to predicates that have been indexed.

I'm requesting the ability filter on non-indexed predicates too.

In some cases it could be more efficient to iterate over an intermediate result set and apply the filter function than it would be to use an index.

For example, this query returns a count of 252 countries with 3.6 ms of processing:

{
    countries(func: type(Country)) {
        count(uid)
    }
}

Adding a filter on sys_start_time (indexed as hour) returns the same result but requires 358 ms of processing:

{
    countries(func: type(Country)) @filter(lt(sys_start_time, "2019-11-21")) {
        count(uid)
    }
}	

Our database has 145 million nodes, and the distribution of sys_start_time values is uneven, with potentially millions of nodes sharing the same value.

For this example it would be advantageous not to index sys_start_time and to simply scan through a short list of 252 countries, eliminating any results that fail the filter function.

@MichelDiz
Copy link
Contributor

Actually you can do it https://docs.dgraph.io/tips/#search-on-non-indexed-predicates

e.g:

{
  var(func: type(Country)) {
    p as sys_start_time
  }
  query(func: lt(val(p), "2019-11-21")) {
      count(uid)
  }
}

@manishrjain manishrjain self-assigned this Nov 22, 2019
@robsws
Copy link

robsws commented Nov 22, 2019

Thanks @MichelDiz for highlighting that. Would this approach still work if we were to have several expanded edges in our query as well? Something along the lines of this:

{
    countries(func: type(Country)) @filter(lt(sys_start_time, "2019-11-21")) {
        hasCity @filter(lt(sys_start_time, "2019-11-21")) {
            ~livesIn @filter(lt(sys_start_time, "2019-11-21")) {
                Name
            }
        }
    }
}

I'm concerned even if that is possible that it might be a bit unwieldy.

@manishrjain
Copy link
Contributor

I think it should still work. @MichelDiz can confirm.

@manishrjain manishrjain added the kind/question Something requiring a response. label Dec 2, 2019
@manishrjain manishrjain assigned MichelDiz and unassigned manishrjain Dec 2, 2019
@MichelDiz
Copy link
Contributor

MichelDiz commented Dec 3, 2019

yeah, still works. I did this small test of concept

{
   A as var (func: type(Country)) {
# OR #  var(func: has(hasCity)) @filter(NOT has(~hasCity)) {
      p as sys_start_time
      hasCity {
          p2 as sys_start_time
          ~livesIn {
              p3 as sys_start_time
          }
      }
 }
    countries(func: uid(A)) @filter(lt(val(p), "2019-11-21")) {
      uid
      sys_start_time
      hasCity @filter(lt(val(p2), "2019-11-21")) {
          sys_start_time
          uid
          Name
          ~livesIn @filter(lt(val(p3), "2019-11-21")) {
                    uid
                    sys_start_time
                    Name
             }
        }
    }
}
{
  "data": {
    "countries": [
      {
        "uid": "0x111b5",
        "sys_start_time": "2019-10-21",
        "hasCity": [
          {
            "sys_start_time": "2019-10-21",
            "uid": "0x111b3",
            "Name": "Some City",
            "~livesIn": [
              {
                "uid": "0x111b4",
                "sys_start_time": "2019-10-21",
                "Name": "Red Light"
              }
            ]
          }
        ]
      }
    ]
  }
}

Mutation

{
	"set": [{
			"dgraph.type": "Country",
			"Name": "USA",
			"sys_start_time": "2019-10-21",
			"hasCity": [{
				"uid": "_:SomeCity",
				"Name": "Some City",
				"sys_start_time": "2019-10-21"
			}]
		},
		{
			"dgraph.type": "District",
			"Name": "Red Light",
			"sys_start_time": "2019-10-21",
			"livesIn": {
				"uid": "_:SomeCity"
			}
		}
	]
}

@manishrjain
Copy link
Contributor

If @MichelDiz 's solution works, can we close this issue?

@mboxall
Copy link
Author

mboxall commented Jan 3, 2020

@MichelDiz 's work-around does work, but we would prefer a more general solution. Our application re-writes the user query to automatically add point-in-time @filter clauses to all nodes involved in the query. Automatically applying the proposed work-around to any possible user query presents a challenge and would add more complexity to our application than ideal.

@manishrjain
Copy link
Contributor

While these functions (lt, gt, etc.) won't work at root without an index (because that entails scanning through the entire predicate), we can make them work for filters. No promises, it would entirely depend upon how much complexity we add to the database. But, we'll definitely give it a shot.

@sleto-it
Copy link
Contributor

Hi all,
This ticket was fixed by PR #4531, which has now been merged

The fix will be delivered in v1.2

Please comment if more help is needed, and we will reopen

Many thanks,

@sleto-it sleto-it added this to the Dgraph v1.2 milestone Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Something could be better. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

6 participants