Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement non-searchable fields in Lyra #171

Closed
micheleriva opened this issue Nov 6, 2022 · 3 comments
Closed

Implement non-searchable fields in Lyra #171

micheleriva opened this issue Nov 6, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@micheleriva
Copy link
Member

Is your feature request related to a problem? Please describe.
This is an RFC for a new Lyra feature, following #168 issue

Describe the solution you'd like

TL;DR: if a property is not part of the schema definition, don't create an index for it but keep it in the Lyra.docs hashmap.

The Proposal

We would like to implement non-searchable fields in Lyra.
Right now, we only support three indexable types: string, number, boolean. As for now (v0.2.8 at the time of writing), creates an index for string type only, leaving the number and boolean indexes to be implemented in future iterations.

My proposal is to make Lyra semi-schemaless, meaning, we only need to define a schema for indexable types.
Let's see the following example. We have the following shape:

[
  {
     "quote": "Patience is the key to joy",
     "author": "Rumi",
     "tags": ["inspirational", "philosophical", "spiritual"],
  }
  ...
]

Lyra does not currently support string[] types, so we will need to join() the array into a single string, such that we'll end up with the following schema definition:

import { create } from '@lyrasearch/lyra'

create({
  schema: {
    quote: 'string',
    author: 'string',
    tags: 'string'
  }
})

This could be fine as Lyra is capable of tokenizing comma-separated strings, but what if we don't want to index the tags property?

As for now, we're forced to index the tags property anyway, which will force Lyra to create a new tree duplicating a lot of useless data.

The proposal consists into defining the schema of indexable properties only, such that given the following data shape:

[
  {
     "quote": "Patience is the key to joy",
     "author": "Rumi",
     "tags": ["inspirational", "philosophical", "spiritual"],
  }
  ...
]

We can write down the following schema definition:

import { create } from '@lyrasearch/lyra'

create({
  schema: {
    quote: 'string',
    author: 'string',
-    tags: 'string'
  }
})

Lyra will keep the entire document in the Lyra.docs hashmap, but won't tokenize, stem, nor create a tree for the tags property, which will be unsearchable.

It will only be possible to search through the properties defined in the Lyra schema definition.

As a first iteration, index definitions are immutable and it won't be possible to add a new schema property after Lyra's initialization.

Open to comments and discussions, will be part of Lyra v0.3.0 🙂

@ShogunPanda
Copy link
Contributor

I'm totally fine with this proposal.

Also because I think that what people probably were already expecting when defining the schema.

Good job buddy!

@mateonunez
Copy link
Collaborator

Hi @micheleriva, I agree with the semi-schema for Lyra. Sometimes it is very difficult to predict the Lyra schema and the errors generated by unsupported fields can be many. To avoid this I created a simple plugin (lyra-schema-resolver).

Actually, through the insert method, the schema must match the document being inserted, otherwise an error is thrown.
How could Lyra avoid this? By allowing document insertion without worrying about the schema? I don't think this is the best way, the risk is to have many documents that have nothing to do with the searchable schema.

A solution for it could be to include more properties in the schema generation, something like this:

import { create } from '@lyrasearch/lyra'

create({
  schema: {
    quote: 'string',
    author: 'string',
    tags: {
      type: 'string[]',
      searchable: false // implicit or explicit
    }
  }
})

In this way, Lyra can still check if the document matches the schema and more properties can be added for future proposals, but it does not fit into the schemaless vision.

@angeloanan
Copy link

Heya, I hope that I'm not late to the discussion here as 0.3.x has been shipped.

Is there a way to have first-class Typescript type support for nonsearchable fields? I feel like the problem described at #171 (comment) will be quite annoying in a Typescript project.

Simply: Inserting a record with a non-searchable fields will throw a Typescript error.
Lyra's insert function expects a ResolveSchema of Lyra's database. This makes Typescript complain when inserting data that are not in the schema (essentially non-searchable fields). Escape hatches includes using @ts-expect-error but this feels hacky.

Is it possible to do some Typescript type-golfing to support these without escape hatches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants