Implement non-searchable fields in Lyra #171

micheleriva · 2022-11-06T19:50:30Z

Is your feature request related to a problem? Please describe.
This is an RFC for a new Lyra feature, following #168 issue

Describe the solution you'd like

TL;DR: if a property is not part of the schema definition, don't create an index for it but keep it in the Lyra.docs hashmap.

The Proposal

We would like to implement non-searchable fields in Lyra.
Right now, we only support three indexable types: string, number, boolean. As for now (v0.2.8 at the time of writing), creates an index for string type only, leaving the number and boolean indexes to be implemented in future iterations.

My proposal is to make Lyra semi-schemaless, meaning, we only need to define a schema for indexable types.
Let's see the following example. We have the following shape:

[
  {
     "quote": "Patience is the key to joy",
     "author": "Rumi",
     "tags": ["inspirational", "philosophical", "spiritual"],
  }
  ...
]

Lyra does not currently support string[] types, so we will need to join() the array into a single string, such that we'll end up with the following schema definition:

import { create } from '@lyrasearch/lyra'

create({
  schema: {
    quote: 'string',
    author: 'string',
    tags: 'string'
  }
})

This could be fine as Lyra is capable of tokenizing comma-separated strings, but what if we don't want to index the tags property?

As for now, we're forced to index the tags property anyway, which will force Lyra to create a new tree duplicating a lot of useless data.

The proposal consists into defining the schema of indexable properties only, such that given the following data shape:

[
  {
     "quote": "Patience is the key to joy",
     "author": "Rumi",
     "tags": ["inspirational", "philosophical", "spiritual"],
  }
  ...
]

We can write down the following schema definition:

import { create } from '@lyrasearch/lyra'

create({
  schema: {
    quote: 'string',
    author: 'string',
-    tags: 'string'
  }
})

Lyra will keep the entire document in the Lyra.docs hashmap, but won't tokenize, stem, nor create a tree for the tags property, which will be unsearchable.

It will only be possible to search through the properties defined in the Lyra schema definition.

As a first iteration, index definitions are immutable and it won't be possible to add a new schema property after Lyra's initialization.

Open to comments and discussions, will be part of Lyra v0.3.0 🙂

The text was updated successfully, but these errors were encountered:

ShogunPanda · 2022-11-07T09:40:38Z

I'm totally fine with this proposal.

Also because I think that what people probably were already expecting when defining the schema.

Good job buddy!

mateonunez · 2022-11-08T08:09:39Z

Hi @micheleriva, I agree with the semi-schema for Lyra. Sometimes it is very difficult to predict the Lyra schema and the errors generated by unsupported fields can be many. To avoid this I created a simple plugin (lyra-schema-resolver).

Actually, through the insert method, the schema must match the document being inserted, otherwise an error is thrown.
How could Lyra avoid this? By allowing document insertion without worrying about the schema? I don't think this is the best way, the risk is to have many documents that have nothing to do with the searchable schema.

A solution for it could be to include more properties in the schema generation, something like this:

import { create } from '@lyrasearch/lyra'

create({
  schema: {
    quote: 'string',
    author: 'string',
    tags: {
      type: 'string[]',
      searchable: false // implicit or explicit
    }
  }
})

In this way, Lyra can still check if the document matches the schema and more properties can be added for future proposals, but it does not fit into the schemaless vision.

angeloanan · 2023-01-05T17:17:56Z

Heya, I hope that I'm not late to the discussion here as 0.3.x has been shipped.

Is there a way to have first-class Typescript type support for nonsearchable fields? I feel like the problem described at #171 (comment) will be quite annoying in a Typescript project.

Simply: Inserting a record with a non-searchable fields will throw a Typescript error.
Lyra's insert function expects a ResolveSchema of Lyra's database. This makes Typescript complain when inserting data that are not in the schema (essentially non-searchable fields). Escape hatches includes using @ts-expect-error but this feels hacky.

Is it possible to do some Typescript type-golfing to support these without escape hatches?

micheleriva mentioned this issue Nov 6, 2022

Non-searchable types for Schema #168

Closed

LBRDan mentioned this issue Nov 12, 2022

Initial non searchable fields support on missing schema/doc key match #174

Merged

micheleriva added the enhancement New feature or request label Nov 13, 2022

LBRDan mentioned this issue Nov 14, 2022

Cannot read properties of undefined #180

Closed

micheleriva closed this as completed Dec 4, 2022

mateonunez mentioned this issue Jan 25, 2023

Support schemaless #260

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement non-searchable fields in Lyra #171

Implement non-searchable fields in Lyra #171

micheleriva commented Nov 6, 2022

ShogunPanda commented Nov 7, 2022

mateonunez commented Nov 8, 2022

angeloanan commented Jan 5, 2023

Implement non-searchable fields in Lyra #171

Implement non-searchable fields in Lyra #171

Comments

micheleriva commented Nov 6, 2022

ShogunPanda commented Nov 7, 2022

mateonunez commented Nov 8, 2022

angeloanan commented Jan 5, 2023