Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested traversal (subquery) bug (2.1.7 -> 3.0 migration) #8683

Closed
azakkerman opened this issue Nov 29, 2018 · 4 comments
Closed

Nested traversal (subquery) bug (2.1.7 -> 3.0 migration) #8683

azakkerman opened this issue Nov 29, 2018 · 4 comments

Comments

@azakkerman
Copy link

OrientDB Version: 3.0.11

Java Version: 1.8

OS: Linux

Expected behavior

We expected outer traversal based on the subquery to return one additional node (this is indeed the behavior we observed on 2.1.7)

Actual behavior

Outer traversal does not find the extra node

Workaround

'Shield' the inner traversal with nested SELECT by id.

Minimal code to reproduce the bug

import com.orientechnologies.orient.core.sql.executor.OResult;
import java.util.List;
import java.util.stream.Collectors;
import org.apache.tinkerpop.gremlin.orientdb.OrientGraph;
import org.apache.tinkerpop.gremlin.orientdb.OrientGraphFactory;
import org.apache.tinkerpop.gremlin.orientdb.executor.OGremlinResult;
import org.apache.tinkerpop.gremlin.orientdb.executor.OGremlinResultSet;
import org.junit.Assert;
import org.junit.Test;

/**
 * This class is used to illustrate traversal bug we discovered with OrientDB 2.x upgrade to 3.0.11
 * If and when the bug is fixed, this test will start failing, allowing us to revert the workaround
 */
public class OrientDBBugReportTest {

    private static List<String> runTraversalQuery(OrientGraph graph, String query, Object... args) {
        OGremlinResultSet result = graph.executeSql(query, args);
        System.out.println("Trying query: \n" + query);
        List<String> resultNodes =
                result.stream()
                        .map(OGremlinResult::getRawResult)
                        .map(OResult::toJSON)
                        .collect(Collectors.toList());
        System.out.println("Result: \n" + String.join("\n", resultNodes));
        return resultNodes;
    }

    @Test
    public void traversalBugIsNotFixed() {
        OrientGraphFactory factory = new OrientGraphFactory("memory:test");
        try (OrientGraph graph = factory.getNoTx()) {
            // Create a new vertex class, called Node
            graph.executeSql("CREATE CLASS Node EXTENDS V");
            // Node has two properties, a integer id and a string tag
            graph.executeSql("CREATE PROPERTY Node.id INTEGER");
            graph.executeSql("CREATE PROPERTY Node.tag STRING");
            // Create an index by id field
            graph.executeSql("CREATE INDEX node_id_index IF NOT EXISTS ON Node (id) UNIQUE");

            // Create a new edge class, called Link
            graph.executeSql("CREATE CLASS Link EXTENDS E");

            graph.executeSql(
                    "INSERT INTO Node (id, tag) VALUES (1, 'inner'), (2, 'inner'), (3, 'outer')");

            // Link Node 1 -> Node 2
            graph.executeSql(
                    "CREATE EDGE Link "
                            + "FROM (SELECT FROM Node.node_id_index WHERE id=1)"
                            + " TO (SELECT FROM Node.node_id_index WHERE id=2)");
            // Link Node 2 -> Node 3
            graph.executeSql(
                    "CREATE EDGE Link "
                            + "FROM (SELECT FROM Node.node_id_index WHERE id=2)"
                            + " TO (SELECT FROM Node.node_id_index WHERE id=3)");

            String innerQuery =
                    "TRAVERSE OUT('Link') FROM (SELECT FROM Node.node_id_index WHERE id = ?) WHILE tag = ?";

            // Traverse all Nodes with "inner" tag value starting with 1
            List<String> innerNodes = runTraversalQuery(graph, innerQuery, 1, "inner");
            Assert.assertEquals("We have exactly two inner nodes", 2, innerNodes.size());

            String outerQueryOriginal =
                    "TRAVERSE OUT('Link') FROM ("
                            + innerQuery
                            + ") WHILE $depth <= 1 STRATEGY BREADTH_FIRST";
            List<String> outerQueryOriginalNodes =
                    runTraversalQuery(graph, outerQueryOriginal, 1, "inner");
            Assert.assertEquals(
                    "We should see all 3 nodes here but due to OrientDB bug we see only 2",
                    2,
                    outerQueryOriginalNodes.size());

            String outerQueryWorkaround =
                    "TRAVERSE OUT('Link') FROM ("
                            + "SELECT FROM Node.node_id_index WHERE id IN ("
                            + "SELECT id FROM ("
                            + innerQuery
                            + "))"
                            + ") WHILE $depth <= 1 STRATEGY BREADTH_FIRST";
            List<String> outerQueryWorkaroundNodes =
                    runTraversalQuery(graph, outerQueryWorkaround, 1, "inner");
            Assert.assertEquals(
                    "We should see all 3 nodes here and we see all 3",
                    3,
                    outerQueryWorkaroundNodes.size());
        }
    }
}
@luigidellaquila
Copy link
Member

Hi @azakkerman

I'm not sure I got the point here (I didn't run the test yet, to be honest), you are executing this query

            String outerQueryOriginal =
                    "TRAVERSE OUT('Link') FROM ("
                            + innerQuery
                            + ") WHILE $depth <= 1 STRATEGY BREADTH_FIRST";

and you say that you expect three nodes in the result, but this is not the expected result: the third node has $depth = 2, so it will be discarded.

Am I missing something?

Thanks

Luigi

@azakkerman
Copy link
Author

azakkerman commented Nov 29, 2018

Hi, Luigi,

  • The inner query is supposed to return 2 nodes (node 1 and node 2), which it does (note, that the inner traversal query has no WHILE clause, the WHILE $depth <=1 clause is on the outer traversal.
  • The results of the inner query are given as FROM parameters to the outer query.
  • It is the outer traversal query that has the WHILE clause, so it should be able to find node 3, because it is 1 edge away from node 2.

And thank you for looking into it.

Anatoly.

@luigidellaquila
Copy link
Member

Hi @azakkerman

Ok, I got the point and I think I know where the problem is. Historically, the TRAVERSE statement does not re-traverse already traversed records, so the second time it reaches the node 2 it discards it, so it does not proceed to traverse node 3.

A possible solution is to replace the TRAVERSE with a MATCH with while condition (that does not discard duplicates), but you have to do some refactoring of the queries

Thanks

Luigi

@azakkerman
Copy link
Author

azakkerman commented Nov 30, 2018

Hi, @luigidellaquila,

These semantics don’t seem to be stable. It would imply that depending on the order in which the outer traversal runs on the FROM nodes, it will produce different results. That seems to violate basic principles of SQL. If outer traversal has a WHILE clause, it should not be pruning potential nodes the first time a node is visited and doesn't match the WHILE clause, since there are possible other paths to that node that do match the WHILE clause.

The workaround also seems to illustrate that if the nodes returned from the inner traversal are simply re-selected and fed into the outer traversal, then the outer traversal works as expected, not pruning the search prematurely?
Thank you,
Anatoly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants