Fulltext search #477

rnewman · 2017-06-12T21:33:05Z

This is currently a rebase and cleanup of #402. It now reflects changes I made while revving ground, so it builds and passes tests.

I did a fair bit of rebasing to take the opportunity to split up where_fn.rs; that refactoring commit contains no other changes, unless I screwed something up.

This isn't quite ready yet: I replaced a bunch of bail! with unimplemented!, because I want users to be able to unify inputs with fulltext-bound variables, and otherwise have fulltext clauses behave consistently with pattern clauses.

This commit lifts some logic out of the scalar ground handler to apply elsewhere. When a new value binding is encountered for a variable to which column bindings have already been established, we do two things: - We apply a new constraint to the primary column. This ensures that the behavior for ground-first and ground-second is equivalent. - We eliminate any existing column type extraction: it won't be necessary now that a constant value and constant type are known.

…ers.

rnewman · 2017-06-14T01:06:33Z

I think this is ready for review, @ncalexan. I kept my work separate; see the commits after "implement fulltext".

Significant changes:

I fixed constant floats.
I allowed implicit placeholders in bindings.
I reused a bunch of existing pattern code.
I made bound places unify.

Note that in this version the score is 0 and a Long. The projector needs to be fixed.

ncalexan

Great work!

ncalexan · 2017-06-15T00:08:45Z

query-algebrizer/src/clauses/fulltext.rs

+        let a = a.ok_or(ErrorKind::InvalidArgument(where_fn.operator.clone(), "attribute".into(), 1))?;
+        let attribute = schema.attribute_for_entid(a).cloned().ok_or(ErrorKind::InvalidArgument(where_fn.operator.clone(), "attribute".into(), 1))?;
+
+        let fulltext_values = DatomsTable::FulltextValues;


nit: is there value to these copies? I doubt it.

ncalexan · 2017-06-15T00:13:39Z

query-algebrizer/src/clauses/fulltext.rs

+            }
+            self.constrain_var_to_type(var.clone(), ValueType::Double);
+
+            // Right now we don't support binding a column to a constant value.  Therefore, we


You have now done the work for ground to produce exactly the constant value, all the way out of the projector. Can you translate this to a (ground ?score 0.0) rather than injecting a computed table into the mix?

This is removed in a later commit: we no longer use a computed table for score.

ncalexan · 2017-06-15T00:25:40Z

query-algebrizer/src/clauses/mod.rs

@@ -338,7 +338,25 @@ impl ConjoiningClauses {
 impl ConjoiningClauses {
    /// Be careful with this. It'll overwrite existing bindings.
    pub fn bind_value(&mut self, var: &Variable, value: TypedValue) {
-        self.constrain_var_to_type(var.clone(), value.value_type());
+        let vt = value.value_type();


nit: vt doesn't have value here.

ncalexan · 2017-06-15T00:27:55Z

query-algebrizer/src/clauses/mod.rs

+
+        // Are we also trying to figure out the type of the value when the query runs?
+        // If so, constrain that!
+        if let Some(table) = self.extracted_types.get(&var)


Just a style point: I find the .map(... clone()) odd -- I'd generally say if let Some(ref table) ... table.clone().

ncalexan · 2017-06-15T00:33:12Z

sql/src/lib.rs

@@ -160,7 +160,14 @@ impl QueryBuilder for SQLiteQueryBuilder {
            &Ref(entid) => self.push_sql(entid.to_string().as_str()),
            &Boolean(v) => self.push_sql(if v { "1" } else { "0" }),
            &Long(v) => self.push_sql(v.to_string().as_str()),
-            &Double(OrderedFloat(v)) => self.push_sql(v.to_string().as_str()),
+            &Double(OrderedFloat(v)) => {
+                // Rust's floats print without a trailing '.' in some cases.


TIL. Is there a potential loss of resolution? That is, I'm wondering if a.bcdefEx and abcdef.0 might not represent different IEEE floats? I believe the resolution for very large and very small numbers is different, and "naive" parsing changes very large to very small (with high resolution) floats, but I have really not thought this through.

Would rust-lang/rust#30967 (comment) ({:.1}) address the loss of resolution?

I did some experimentation in a Rust playground for this.

Using {:.1} doesn't behave as you'd expect:

https://is.gd/8QaoD5

fn main() { let i: f64 = 123.0f64; // Integer. let v: f64 = 9999.00123f64; let x: f64 = 9.99900123e3f64; let z: f64 = 1.000123f64; println!(":: {:e} vs {:.1} vs {}", i, i, i); println!(":: {:e} vs {:.1} vs {}", x, x, x); println!(":: {:e} vs {:.1} vs {}", v, v, v); println!(":: {:e} vs {:.1} vs {}", z, z, z); let s = "9.99900123e3"; let parsed = s.parse::<f64>().unwrap(); println!("Parsed to {:e}, {:.1}, {}", parsed, parsed, parsed); println!("Same: {}, {}", parsed == v, parsed == x); }

=>

:: 1.23e2 vs 123.0 vs 123 :: 9.99900123e3 vs 9999.0 vs 9999.00123 :: 9.99900123e3 vs 9999.0 vs 9999.00123 :: 1.000123e0 vs 1.0 vs 1.000123 Parsed to 9.99900123e3, 9999.0, 9999.00123 Same: true, true

As you can see, the only solution that round-trips, never prints as an integer, and always preserves precision is {:e}.

ncalexan · 2017-06-15T00:37:37Z

query-algebrizer/src/clauses/fulltext.rs

-            self.bind_column_to_var(schema, alias.clone(), VariableColumn::Variable(var.clone()), var.clone());
-
-            self.from.push(SourceAlias(table, alias.clone()));
+            self.bind_value(var, TypedValue::Double(0.0.into()));


Ah, very nice. I never found the expression for this idea while working on these patches.

ncalexan · 2017-06-15T00:41:06Z

query-translator/tests/translate.rs

+                    :in ?entity
+                    :where [(fulltext $ :foo/bar "hello") [[?entity ?val _ _]]]]"#;
+    let SQLQuery { sql, args } = translate(&schema, query);
+    assert_eq!(sql, "SELECT DISTINCT `fulltext_values00`.text AS `?val` \


Is this test correct? This doesn't seem to be any different from the query without :in ?entity. Can you spell out what's changed here, or compare the two (with and without the :in) in the test?

The later version of the test expands this into three or four tests: q_once is the bit that validates whether you supplied enough inputs.

ncalexan · 2017-06-15T00:43:19Z

query-algebrizer/tests/fulltext.rs

+fn test_apply_fulltext() {
+    let schema = prepopulated_schema();
+
+    // If you use a non-FTS attribute, we will short-circuit.


Not a coding error, like you had for a missing attribute? I think I prefer to favor the static consume" with great error messages.

I'm trying to draw a line between detecting coding errors and allowing queries to adapt to different schema. Most queries should not produce an error when run: they're valid Datalog, they just don't match the data in the store.

I imagine a future in which we provide a logging or reporting channel that includes things like "this query is known to return no results because attribute A isn't present in the schema".

ncalexan · 2017-06-15T00:46:17Z

tests/query.rs

+        [:db/add "v" :foo/fts "I've come to talk with you again"]
+    ]"#).unwrap().tempids.get("v").cloned().expect("v was mapped");
+
+    let r = conn.q_once(&mut c,


A helper that turns QueryResults into EDN wouldn't go amiss here...

ncalexan · 2017-06-15T00:47:45Z

tests/query.rs

+        _ => panic!("Expected query to fail."),
+    }
+
+    // If it's bound, and the right type, it'll work!


* You can't use fulltext search on a non-fulltext attribute. * Allow for implicit placeholder bindings in fulltext.

rnewman · 2017-06-15T17:34:45Z

dd39f6d

* You can't use fulltext search on a non-fulltext attribute. * Allow for implicit placeholder bindings in fulltext.

Richard Newman added 3 commits June 12, 2017 14:25

Pre: implement IntoIterator for ValueTypeSet.

85724f2

Add a test that late inputs aren't allowed in ground.

02f2954

Refactor arg conversion and ground into separate files.

cbf9176

rnewman self-assigned this Jun 12, 2017

rnewman added the in progress label Jun 12, 2017

Richard Newman added 3 commits June 12, 2017 17:47

Implement MATCHES throughout SQL machinery.

20ff0fc

Implement fulltext.

887060d

rnewman force-pushed the rnewman/fts-mentat branch 2 times, most recently from 614bb83 to 637a426 Compare June 14, 2017 00:41

Pre: ensure that constant floats end up as floats in SQL, never integ…

5c687c8

…ers.

rnewman force-pushed the rnewman/fts-mentat branch from 637a426 to 21cd3fb Compare June 14, 2017 01:04

rnewman changed the title ~~WIP: fulltext~~ Fulltext search Jun 14, 2017

rnewman requested a review from ncalexan June 14, 2017 01:06

rnewman mentioned this pull request Jun 14, 2017

[query] Support variable fulltext searches #479

Closed

Richard Newman added 7 commits June 14, 2017 15:51

Pre: move Either to mentat_core::util.

b639810

Work on fulltext.

8c2bc27

You can't use fulltext search on a non-fulltext attribute.

814141f

Allow for implicit placeholder bindings in fulltext.

4cf8f0f

Add fulltext algebrizing tests.

2f37bb4

Add an end-to-end test for fulltext.

6458ba1

Note that in this version the score is 0 and a Long. The projector needs to be fixed.

Support variable fulltext searches. (#479)

1fa5a26

rnewman force-pushed the rnewman/fts-mentat branch from 21cd3fb to 1fa5a26 Compare June 14, 2017 22:51

ncalexan approved these changes Jun 15, 2017

View reviewed changes

rnewman added a commit that referenced this pull request Jun 15, 2017

Implement fulltext. (#477) r=nalexander

3f264e9

* You can't use fulltext search on a non-fulltext attribute. * Allow for implicit placeholder bindings in fulltext.

rnewman added a commit that referenced this pull request Jun 15, 2017

Implement fulltext. (#477) r=nalexander

dd39f6d

rnewman closed this Jun 15, 2017

rnewman deleted the rnewman/fts-mentat branch June 15, 2017 17:34

This was referenced Jun 15, 2017

Implement querying fulltext values. #402

Closed

[query] Algebrize fulltext function #307

Closed

RDR8 pushed a commit to RDR8/mentat that referenced this pull request Mar 12, 2018

Implement fulltext. (mozilla#477) r=nalexander

d304263

* You can't use fulltext search on a non-fulltext attribute. * Allow for implicit placeholder bindings in fulltext.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fulltext search #477

Fulltext search #477

rnewman commented Jun 12, 2017

rnewman commented Jun 14, 2017

ncalexan left a comment

ncalexan Jun 15, 2017

ncalexan Jun 15, 2017

rnewman Jun 15, 2017

ncalexan Jun 15, 2017

ncalexan Jun 15, 2017

ncalexan Jun 15, 2017

rnewman Jun 15, 2017

ncalexan Jun 15, 2017

ncalexan Jun 15, 2017

rnewman Jun 15, 2017

ncalexan Jun 15, 2017

rnewman Jun 15, 2017

ncalexan Jun 15, 2017

ncalexan Jun 15, 2017

rnewman commented Jun 15, 2017

Fulltext search #477

Fulltext search #477

Conversation

rnewman commented Jun 12, 2017

rnewman commented Jun 14, 2017

ncalexan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rnewman commented Jun 15, 2017