Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lenient parser #114

Merged
merged 2 commits into from
Oct 3, 2023
Merged

Conversation

GodTamIt
Copy link
Contributor

Note: this is a strawman for supporting lenient parser in the bindings. Of course, this can't be landed until there is a new release of the tantivy crate with the lenient parser included.

Closes #109


// TODO(https://github.com/PyO3/pyo3/issues/1190): Expose this to bindings once trait <-> ABC is
// supported in PyO3.
pub(crate) trait QueryParserError {
Copy link
Contributor Author

@GodTamIt GodTamIt Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a base class because it seems base classes make it tough to do things like IntoPy and there isn't support for upcasting.

See: PyO3/pyo3#1836 and PyO3/pyo3#787

src/searcher.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated
chrono = "0.4.23"
tantivy = "0.20.1"
tantivy = { git = "https://github.com/quickwit-oss/tantivy.git" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because there hasn't been a release with lenient parser support in tantivy yet.

@@ -14,8 +14,9 @@ crate-type = ["cdylib"]
pyo3-build-config = "0.19.1"

[dependencies]
base64 = "0.21"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to access the base64 error that tantivy holds. This dependency won't change much otherwise, since it's also being used by tantivy anyways.

@GodTamIt GodTamIt marked this pull request as ready for review September 29, 2023 03:46
@GodTamIt
Copy link
Contributor Author

Ready for review now

r? @cjrh @adamreichold

@cjrh
Copy link
Collaborator

cjrh commented Sep 29, 2023

Hoping to get around to it this weekend.

Cargo.lock Outdated
@@ -15,9 +15,9 @@ dependencies = [

[[package]]
name = "aho-corasick"
version = "1.0.2"
version = "1.1.1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many transitive dependencies are also getting updated in this PR. It makes me a little nervous but I am ok to continue with fixing this up. In future though, it would be better if we can restrict the dep changes to only what is required by the other changes in the same PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've undone these changes now.

src/index.rs Outdated
Comment on lines 420 to 452
pub fn parse_query_lenient(
&self,
query: &str,
default_field_names: Option<Vec<String>>,
) -> PyResult<(Query, Vec<PyObject>)> {
let mut default_fields = vec![];
let schema = self.index.schema();

if let Some(default_field_names_vec) = default_field_names {
for default_field_name in &default_field_names_vec {
if let Ok(field) = schema.get_field(default_field_name) {
let field_entry = schema.get_field_entry(field);
if !field_entry.is_indexed() {
return Err(exceptions::PyValueError::new_err(
format!(
"Field `{default_field_name}` is not set as indexed in the schema."
),
));
}
default_fields.push(field);
} else {
return Err(exceptions::PyValueError::new_err(format!(
"Field `{default_field_name}` is not defined in the schema."
)));
}
}
} else {
for (field, field_entry) in self.index.schema().fields() {
if field_entry.is_indexed() {
default_fields.push(field);
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of mainly curiosity, I looked at removing the mutability of default_fields by returning the vec from the if-expression:

    pub fn parse_query_lenient(
        &self,
        query: &str,
        default_field_names: Option<Vec<String>>,
    ) -> PyResult<(Query, Vec<PyObject>)> {
        let schema = self.index.schema();

        let default_fields = if let Some(default_field_names_vec) =
            default_field_names
        {
            default_field_names_vec
                .iter()
                .map(|field_name| {
                    schema
                        .get_field(field_name)
                        .map_err(|_err| {
                            exceptions::PyValueError::new_err(format!(
                                "Field `{field_name}` is not defined in the schema."
                            ))
                        })
                        .and_then(|field| {
                            schema.get_field_entry(field).is_indexed().then_some(field).ok_or(
                                exceptions::PyValueError::new_err(
                                    format!(
                                        "Field `{field_name}` is not set as indexed in the schema."
                                    ),
                                ))
                        })
                }).collect::<Result<Vec<_>, _>>()?
        } else {
            self.index
                .schema()
                .fields()
                .filter_map(|(f, fe)| fe.is_indexed().then_some(f))
                .collect::<Vec<_>>()
        };

I don't find it super compelling to make this change though. What do you think? I'm going to approve the PR regardless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not. I have a preference for functional-styling here.

@wallies
Copy link
Collaborator

wallies commented Oct 2, 2023

@GodTamIt you should be good to merge this and ill cut a new release, as the lenient parser was in tantivy 0.21

@cjrh cjrh merged commit 2040463 into quickwit-oss:master Oct 3, 2023
10 checks passed
st1020 pushed a commit to HXSecurity/tantivy-py that referenced this pull request Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support lenient parser in bindings
3 participants