Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQL/Match Question] Get node count between + edge data in a single query - "optimize" ? #8915

Closed
matanshukry opened this issue Jun 19, 2019 · 21 comments

Comments

@matanshukry
Copy link
Contributor

OrientDB Version: 3.0.20

Java Version: 1.8.0_191

OS: Windows 10

Assuming I have this model:

Person (userId String)
Friend (from person to person) (status: 1=requested,2=approved)

I need to create a query that will find all Person that will match 'John' (Lucene full text), and will return (a) the friend data between them and person 'Bar', and (b) The amount of friends that person and 'Bar' has in common.

The current query I have is:

SELECT (MATCH {class:Person, where:(SEARCH_CLASS("John") = true)}.bothE("Friend"){as:friend, where:(status=2)}.bothV("Person"){where:(userId="bulu")} RETURN friend:{*}) as friend, (MATCH {class:Person, where:(SEARCH_CLASS("John") = true)}.bothE("Friend"){where:(status=2),as:fs1}.bothV("Person"){as:friend, where:($matched != $currentMatch)}.bothE("Friend"){where:(status=2 AND $matched.fs1 != $currentMatch)}.bothV("Person"){where:(userId="bar" AND $matched.friend != $currentMatch)} RETURN COUNT(friend) as cnt) as count

Which technically is working, but:
(1) It looks like a really bad query, performance-wise. I have to repeat few things in each sub-query (such as SEARCH_CLASS). Anyway to avoid that?
(2) The result contains 2 columns, 'friend' and 'count', but in each of them I have an array, and in the count I even have a nested json with 'cnt'. Is there a way to avoid all that, so that in the first column/cell I will simply have the json data of 'friend', and in the second I will simply have an integer - the count?

@matanshukry matanshukry changed the title [Match Question] Get node count between + edge data in a single query - "optimize" ? [SQL/Match Question] Get node count between + edge data in a single query - "optimize" ? Jun 19, 2019
@luigidellaquila
Copy link
Member

Hi @matanshukry

You can try the following:

SELECT friend:{*}, nFriends FROM (
  MATCH 
   {class:Person, where:(SEARCH_CLASS("John") = true), as:p1}
   .bothE("Friend"){as:friend, where:(status=2)}
   .bothV("Person"){where:(userId="bulu")} 
   .bothE(){as:otherFriend}.bothV(Friend){where:($matched.p1 = $currentMatch)}
  RETURN friend, count(otherFriend) as nFriends
  GROUP BY friend
)

I didn't test it, so there is a chance that it has typos, but I hope you got the idea

Thanks

Luigi

@luigidellaquila
Copy link
Member

sorry, here's the right query

SELECT friend:{*}, nFriends FROM (
  MATCH 
   {class:Person, where:(SEARCH_CLASS("John") = true), as:p1}
   .bothE("Friend"){as:friend, where:(status=2)}
   .bothV("Person"){where:(userId="bulu")} 
   .both("Friend"){as:otherFriend}.both("Friend"){where:($matched.p1 = $currentMatch)}
  RETURN friend, count(otherFriend) as nFriends
  GROUP BY friend
)

Thanks

Luigi

@matanshukry
Copy link
Contributor Author

@luigidellaquila Awesome! I really liked the way you did it, great!

One question though: why do we need the SELECT and the GROUP BY? the match it self seems to return the same data (with very little edit of a field in the RETURN)

MATCH 
	{class:Person, where:(SEARCH_CLASS("John") = true), as:p1}
	.bothE("Friend"){as:friend, where:(status=2)}
	.bothV("Person"){where:(userId="bulu")} 
	.both("Friend"){as:otherFriend}.both("Friend"){where:($matched.p1 = $currentMatch)}
RETURN friend:{*}, count(otherFriend) as nFriends

@luigidellaquila
Copy link
Member

Yes, it should work (just add a GROUP BY).
My only doubt is if the grouping on the expanded "friend" will execute correctly (I have to double-check it)

Thanks

LUigi

@matanshukry
Copy link
Contributor Author

matanshukry commented Jun 20, 2019

@luigidellaquila - that's the thing I don't understand - why do we need to add a GROUP BY? the match I wrote is meant to be used alone, without a select at all. There should only be one friend anyway, due to how the match was written.. no?

Also, I noticed after we find the target person, {where:(userId="bulu")}, we keep looking for Friend->person (as otherFriend), and again (to match current match). Shouldn't both these connections have status=2 on them too though? hence each .both("Friend") should actually be .bothE("Friend"){where:(status=2)}.bothV("User") ?

@luigidellaquila
Copy link
Member

The GROUP BY is needed just because you have an aggregation function (count()).

About the status=2, yes, if that's a requirement, then your solution is correct

Thanks

Luigi

@matanshukry
Copy link
Contributor Author

@luigidellaquila How come it works without the GROUP BY then? I ran the Match only and it returned the correct result, without the select/group by..

Regarding status=2: the value 2 is coming as a parameter actually. Is there anyway to avoid using it 3 times? (in each where)

@luigidellaquila
Copy link
Member

Hi @matanshukry

The execution planner is smart enough to add the GROUP BY automatically in some cases, but I always suggest to set it explicitly to make the query more clear.

You can pass named parameters to the query as :paramName (eg. via Java API) as follows:

String query = "";
query += "MATCH ";
query += "	{class:Person, where:(SEARCH_CLASS("John") = true), as:p1}";
query += "	.bothE("Friend"){as:friend, where:(status = :status)}";
query += "	.bothV("Person"){where:(userId="bulu")} ";
query += "	.bothE("Friend"){where:(status = :status)}.bothV(){as:otherFriend}";
query += "	.bothE("Friend"){where:(status = :status)}.bothV(){where:($matched.p1 = $currentMatch)}";
query += "RETURN friend:{*}, count(otherFriend) as nFriends";

Map<String, Object> params = new HashMap<>();
params.put("status", 2);

OResultSet resultSet = db.query(query, params);

Thanks

Luigi

@matanshukry
Copy link
Contributor Author

@luigidellaquila Awesome, that solves all my problems and questions and then some! Thanks :)

@matanshukry
Copy link
Contributor Author

@luigidellaquila Spoke a bit too soon;
When there are no mutual friends (nFriends == 0), I'm not getting any results. Even in such a case, I still need to get the 'friend'. How do I do that?

@matanshukry matanshukry reopened this Jun 20, 2019
@luigidellaquila
Copy link
Member

Hi @matanshukry

You can add an optional:true to the items you expect do not exist:

query += "	.bothE("Friend"){where:(status = :status), optional:true}.bothV(){as:otherFriend, optional:true}";
query += "	.bothE("Friend"){where:(status = :status), optional:true}.bothV(){where:($matched.p1 = $currentMatch)}";
q

Thanks

Luigi

@luigidellaquila
Copy link
Member

luigidellaquila commented Jun 20, 2019

well, actually you can write it as follows, that is much better:


MATCH 
	{class:Person, where:(SEARCH_CLASS("John") = true), as:p1}
	.bothE("Friend"){as:friend, where:(status = :status)}
	.bothV("Person"){where:(userId="bulu")} 
	.bothE("Friend"){where:(status = :status), optional:true}.bothV(){as:otherFriend, optional:true}
	.bothE("Friend"){where:(status = :status), optional:true}.bothV(){as:p1}
RETURN friend:{*}, count(otherFriend) as nFriends

(see the {as:p1} instead of {where:($matched.p1 = $currentMatch)})

Thanks

Luigi

@matanshukry
Copy link
Contributor Author

@luigidellaquila
Optional works, great!

The {as:p1} instead of where doesn't though; I'm getting an exception with no explicit indication on where:

Error: com.orientechnologies.orient.core.exception.OStorageException: Error on executing command: match...
	DB name="mydb"

Error: java.lang.UnsupportedOperationException

p.s.
"match..." included entire query I made, but I removed it. There were no indications on it or anything though.

Either way it works, so I'm closing it.
If {as: <used_variable>} is supposed to work as a filter and you would like me to open an issue on it - let me know.

@matanshukry
Copy link
Contributor Author

matanshukry commented Jun 21, 2019

@luigidellaquila I'm now looking for something a bit different, but still the same subject. I'll just reopen this then. Let me know if you prefer and I'll open a new issue.

This is my current query:

MATCH {class:Person, where:(userId="matan"), as:person}.bothE("Friend"){as:friend, where:(status=2)}.bothV("Person"){where:(userId="bulu"),as:p1}.bothE("Friend"){where:(status=2), optional: true}.bothV("Person"){as:mutualFriend, optional: true, where:($matched.person != $currentMatch AND $matched.p1 != $currentMatch)}.bothE("Friend"){where:(status=2), optional: true, as:f2}.bothV("Person"){where:($matched.person = $currentMatch), optional: true, as: backTarget} RETURN person:{*}, friend:{*}, COUNT(backTarget) as friendsCount GROUP BY person

The general query requirements are:

  1. Find person A (parameter, in here "matan")
  2. Find friend connection between A and B (requester, parameter, in here "bulu")
  3. Find their mutual friends (list of their ids) - Maximum 20, since there could be a lot.

But/points from previous query:

  1. I had to add optional in .bothV() too. At the beginning it was simply returning empty records, as if all the chain needed to be optional. Now it throws an exception (UnsupportedOperationException like above).
  2. I'm counting backTarget rather than mutualFriend since the latter is optional, so I'm getting multiple records with the same mutualFriend, but the records I want are only the ones "reaching to the end" (where backTarget is not empty)
  3. Currently, it will only return the mutual friends (3) if there is any connection between A and B (2). It should return the mutual friends either way though; 2 and 3 should be somewhat independent.

Method 2

I thought about doing this using 2 sub-queries, but not sure how to 'expand' them outside (expand doesn't work):

SELECT (MATCH {class:Person, where:(userId="matan"), as:person}
.bothE("Friend"){as:friend}
.bothV("Person"){where:(userId="bulu")}
RETURN person:{*},friend:{*})
,(MATCH {class:Person, where:(userId="matan"), as:person}
.bothE("Friend"){as:f1,where:(status=2)}
.bothV("Person"){as:mutualFriend,where:($currentMatch != $matched.person)}
.bothE("Friend"){as:f2,where:(status=2)}
.bothV("Person"){as:backTarget,where:(userId="bulu" AND $currentMatch != $matched.mutualFriend)}
RETURN mutualFriend.userId)

The query will return something like this:

+----+----------+--------------------------------------------+
|#   |($$$SUBQUE|($$$SUBQUERY$$_0)                                                                                                             |
+----+----------+--------------------------------------------+
|0   |[{mutua...|[{person:{userId:matan,in_Friendship:[1],...|
+----+----------+--------------------------------------------+

@matanshukry matanshukry reopened this Jun 21, 2019
@matanshukry
Copy link
Contributor Author

@luigidellaquila / others - Athoughts? is that simply not implemented yet?

I'm willing to help if needed/possible..

@luigidellaquila
Copy link
Member

Hi @matanshukry

I'd like to do a couple of tests with the queries, any chance to have a sample DB?

Thanks

Luigi

@matanshukry
Copy link
Contributor Author

@luigidellaquila Actually no; I just don't have one. I'm creating a few records at a time and playing with them. I made this up for you though, hopefully it will help:

create class Person extends V
create property Person.name STRING

create class Friend extends E
create property Friend.createdAt long
create property Friend.status integer

CREATE VERTEX Person SET name="matan"
CREATE VERTEX Person SET name="koko"
CREATE VERTEX Person SET name="bulu"
CREATE VERTEX Person SET name="xoxo"
CREATE VERTEX Person SET name="nala"
CREATE VERTEX Person SET name="pumba"

CREATE EDGE Friend FROM (SELECT FROM Person WHERE name="matan") TO (SELECT FROM Person WHERE name="koko") SET createdAt=3184,status=1
CREATE EDGE Friend FROM (SELECT FROM Person WHERE name="matan") TO (SELECT FROM Person WHERE name="bulu") SET createdAt=3184,status=2
CREATE EDGE Friend FROM (SELECT FROM Person WHERE name="matan") TO (SELECT FROM Person WHERE name="xoxo") SET createdAt=3184,status=2
CREATE EDGE Friend FROM (SELECT FROM Person WHERE name="bulu") TO (SELECT FROM Person WHERE name="koko") SET createdAt=3184,status=2
CREATE EDGE Friend FROM (SELECT FROM Person WHERE name="nala") TO (SELECT FROM Person WHERE name="koko") SET createdAt=3184,status=2
CREATE EDGE Friend FROM (SELECT FROM Person WHERE name="pumba") TO (SELECT FROM Person WHERE name="xoxo") SET createdAt=3184,status=2
CREATE EDGE Friend FROM (SELECT FROM Person WHERE name="nala") TO (SELECT FROM Person WHERE name="xoxo") SET createdAt=3184,status=2
CREATE EDGE Friend FROM (SELECT FROM Person WHERE name="nala") TO (SELECT FROM Person WHERE name="pumba") SET createdAt=3184,status=1
CREATE EDGE Friend FROM (SELECT FROM Person WHERE name="pumba") TO (SELECT FROM Person WHERE name="koko") SET createdAt=3184,status=1

I'm really waiting for this issue,, so please keep me informed, either here or though email.

@luigidellaquila
Copy link
Member

Hi @matanshukry

I think I found the reason for the UnsupportedOperationException, I fixed it and I'm running the tests.

With the fix, the following works fine:

MATCH
   	{class:Person, as:a, where:(name = 'matan')}.bothE(){where:(status = 2)}.bothV(){as:b, where:(name = 'bulu')},
	{as:a}.bothE(){where:(status = 2)}.bothV(){as:f, where:($matched.a != $currentMatch AND $matched.b != $currentMatch)}.bothE(){where:(status = 2)}.bothV(){as:b}
RETURN  a:{*} as a, b:{*} as b, list(f) as f group by a, b

But it only returns results that have mutual friends. Unfortunately, you cannot use optional in this case, otherwise you will obtain results that are friends of one party, but not of the other (optional links in fact), but you can work around this with the following query:

select a, b, list(f) FROM (
	SELECT expand(unionAll($a, $b))
	LET 
	$a = (MATCH
   		{class:Person, as:a, where:(name = 'matan')}.bothE(){where:(status = 2)}.bothV(){as:b, where:(name = 'bulu')},
		{as:a}.bothE(){where:(status = 2)}.bothV(){as:f, where:($matched.a != $currentMatch AND $matched.b != $currentMatch)}.bothE(){where:(status = 2)}.bothV(){as:b}
		RETURN  a:{*} as a, b:{*} as b, list(f) as f group by a, b),
	$b = (MATCH
	   	{class:Person, as:a, where:(name = 'matan')}.bothE(){where:(status = 2)}.bothV(){as:b, where:(name = 'bulu')}
		RETURN  a:{*} as a, b:{*} as b, [] as f)
) GROUP BY a, b

I'll keep you updated as soon as the fix is on the snapshot

Thanks

Luigi

@luigidellaquila
Copy link
Member

Ok, I pushed the fix, please let me know if you need further help

Thanks

Luigi

@matanshukry
Copy link
Contributor Author

@luigidellaquila Awesome, that does work as I've wanted!

I have another question regarding the issue, also regarding performance:
I want to query both a friend list (maximum 20) and the total number of all the friends:

  1. Is it possible? I can put LIMIT on the match in $a, but then I won't have the count. If I don't put the LIMIT in the $a= clause, can I put it after somehow?
  2. How is the performance for counting such edges? On a regular field we can simply put an index and the count should be a really fast query, but we're talking about connections here. I'm guessing the count for such a use case would be an iteration over all of the elements of the match; is that correct? Is there anyway to increase the performance for such a query (e.g. an index) ?

@matanshukry
Copy link
Contributor Author

@luigidellaquila Also, I'm having some troubles tweaking the query a little;
Specifically, in the $a you wrote, it is required to have a connection between 'matan' and 'bulu'; that is not true however. I changed it so that:

  1. $a will now simply calculate mutual friends, and doesn't even have the friendship. So it will have the friend entity and the mutualFriends list, but no friendship.
  2. $b Calculate the friend and the friendship
  3. I added an extra $c to return the friend, since if there are no mutual friends nor there is any friend connection, I still need the friend entity.
    The problem now lies in merging them all together; I'm ending up with multiple lines.

$a: 'friend' and 'mutualFriends' (if mutualFriends = 0, friend is null)
$b: 'friend' and 'friendship' (if friendship doesn't exist, friend is null)
$c: 'friend'

How do I merge them though, so I'll get it all in a single line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants