Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Fulltext search #477

Closed
wants to merge 14 commits into from
Closed

Fulltext search #477

wants to merge 14 commits into from

Conversation

rnewman
Copy link
Collaborator

@rnewman rnewman commented Jun 12, 2017

This is currently a rebase and cleanup of #402. It now reflects changes I made while revving ground, so it builds and passes tests.

I did a fair bit of rebasing to take the opportunity to split up where_fn.rs; that refactoring commit contains no other changes, unless I screwed something up.

This isn't quite ready yet: I replaced a bunch of bail! with unimplemented!, because I want users to be able to unify inputs with fulltext-bound variables, and otherwise have fulltext clauses behave consistently with pattern clauses.

@rnewman rnewman self-assigned this Jun 12, 2017
Richard Newman added 3 commits June 12, 2017 17:47
This commit lifts some logic out of the scalar ground handler to apply
elsewhere.

When a new value binding is encountered for a variable to which column
bindings have already been established, we do two things:

- We apply a new constraint to the primary column. This ensures that the
  behavior for ground-first and ground-second is equivalent.
- We eliminate any existing column type extraction: it won't be
  necessary now that a constant value and constant type are known.
@rnewman rnewman force-pushed the rnewman/fts-mentat branch 2 times, most recently from 614bb83 to 637a426 Compare June 14, 2017 00:41
@rnewman rnewman changed the title WIP: fulltext Fulltext search Jun 14, 2017
@rnewman
Copy link
Collaborator Author

rnewman commented Jun 14, 2017

I think this is ready for review, @ncalexan. I kept my work separate; see the commits after "implement fulltext".

Significant changes:

  • I fixed constant floats.
  • I allowed implicit placeholders in bindings.
  • I reused a bunch of existing pattern code.
  • I made bound places unify.

Copy link
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

let a = a.ok_or(ErrorKind::InvalidArgument(where_fn.operator.clone(), "attribute".into(), 1))?;
let attribute = schema.attribute_for_entid(a).cloned().ok_or(ErrorKind::InvalidArgument(where_fn.operator.clone(), "attribute".into(), 1))?;

let fulltext_values = DatomsTable::FulltextValues;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there value to these copies? I doubt it.

}
self.constrain_var_to_type(var.clone(), ValueType::Double);

// Right now we don't support binding a column to a constant value. Therefore, we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have now done the work for ground to produce exactly the constant value, all the way out of the projector. Can you translate this to a (ground ?score 0.0) rather than injecting a computed table into the mix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed in a later commit: we no longer use a computed table for score.

@@ -338,7 +338,25 @@ impl ConjoiningClauses {
impl ConjoiningClauses {
/// Be careful with this. It'll overwrite existing bindings.
pub fn bind_value(&mut self, var: &Variable, value: TypedValue) {
self.constrain_var_to_type(var.clone(), value.value_type());
let vt = value.value_type();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: vt doesn't have value here.


// Are we also trying to figure out the type of the value when the query runs?
// If so, constrain that!
if let Some(table) = self.extracted_types.get(&var)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a style point: I find the .map(... clone()) odd -- I'd generally say if let Some(ref table) ... table.clone().

@@ -160,7 +160,14 @@ impl QueryBuilder for SQLiteQueryBuilder {
&Ref(entid) => self.push_sql(entid.to_string().as_str()),
&Boolean(v) => self.push_sql(if v { "1" } else { "0" }),
&Long(v) => self.push_sql(v.to_string().as_str()),
&Double(OrderedFloat(v)) => self.push_sql(v.to_string().as_str()),
&Double(OrderedFloat(v)) => {
// Rust's floats print without a trailing '.' in some cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. Is there a potential loss of resolution? That is, I'm wondering if a.bcdefEx and abcdef.0 might not represent different IEEE floats? I believe the resolution for very large and very small numbers is different, and "naive" parsing changes very large to very small (with high resolution) floats, but I have really not thought this through.

Would rust-lang/rust#30967 (comment) ({:.1}) address the loss of resolution?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some experimentation in a Rust playground for this.

Using {:.1} doesn't behave as you'd expect:

https://is.gd/8QaoD5

fn main() {
    let i: f64 = 123.0f64;                     // Integer.
    let v: f64 = 9999.00123f64;
    let x: f64 = 9.99900123e3f64;
    let z: f64 = 1.000123f64;
    println!(":: {:e} vs {:.1} vs {}", i, i, i);
    println!(":: {:e} vs {:.1} vs {}", x, x, x);
    println!(":: {:e} vs {:.1} vs {}", v, v, v);
    println!(":: {:e} vs {:.1} vs {}", z, z, z);
        
    let s = "9.99900123e3";
    let parsed = s.parse::<f64>().unwrap();
    println!("Parsed to {:e}, {:.1}, {}", parsed, parsed, parsed);
    println!("Same: {}, {}", parsed == v, parsed == x);
}

=>

:: 1.23e2 vs 123.0 vs 123
:: 9.99900123e3 vs 9999.0 vs 9999.00123
:: 9.99900123e3 vs 9999.0 vs 9999.00123
:: 1.000123e0 vs 1.0 vs 1.000123
Parsed to 9.99900123e3, 9999.0, 9999.00123
Same: true, true

As you can see, the only solution that round-trips, never prints as an integer, and always preserves precision is {:e}.

self.bind_column_to_var(schema, alias.clone(), VariableColumn::Variable(var.clone()), var.clone());

self.from.push(SourceAlias(table, alias.clone()));
self.bind_value(var, TypedValue::Double(0.0.into()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, very nice. I never found the expression for this idea while working on these patches.

:in ?entity
:where [(fulltext $ :foo/bar "hello") [[?entity ?val _ _]]]]"#;
let SQLQuery { sql, args } = translate(&schema, query);
assert_eq!(sql, "SELECT DISTINCT `fulltext_values00`.text AS `?val` \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test correct? This doesn't seem to be any different from the query without :in ?entity. Can you spell out what's changed here, or compare the two (with and without the :in) in the test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The later version of the test expands this into three or four tests: q_once is the bit that validates whether you supplied enough inputs.

fn test_apply_fulltext() {
let schema = prepopulated_schema();

// If you use a non-FTS attribute, we will short-circuit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a coding error, like you had for a missing attribute? I think I prefer to favor the static consume" with great error messages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to draw a line between detecting coding errors and allowing queries to adapt to different schema. Most queries should not produce an error when run: they're valid Datalog, they just don't match the data in the store.

I imagine a future in which we provide a logging or reporting channel that includes things like "this query is known to return no results because attribute A isn't present in the schema".

[:db/add "v" :foo/fts "I've come to talk with you again"]
]"#).unwrap().tempids.get("v").cloned().expect("v was mapped");

let r = conn.q_once(&mut c,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A helper that turns QueryResults into EDN wouldn't go amiss here...

_ => panic!("Expected query to fail."),
}

// If it's bound, and the right type, it'll work!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nifty!

rnewman added a commit that referenced this pull request Jun 15, 2017
* You can't use fulltext search on a non-fulltext attribute.
* Allow for implicit placeholder bindings in fulltext.
rnewman added a commit that referenced this pull request Jun 15, 2017
@rnewman
Copy link
Collaborator Author

rnewman commented Jun 15, 2017

dd39f6d

@rnewman rnewman closed this Jun 15, 2017
@rnewman rnewman deleted the rnewman/fts-mentat branch June 15, 2017 17:34
RDR8 pushed a commit to RDR8/mentat that referenced this pull request Mar 12, 2018
* You can't use fulltext search on a non-fulltext attribute.
* Allow for implicit placeholder bindings in fulltext.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants