Document Search (Field-Search)

Whereas the simple Index can just consume id-content pairs, the Document-Index is able to process more complex data structures like JSON. Technically, a Document-Index is a layer on top of several default indexes. You can create multiple independent Document-Indexes in parallel, any of them can use the Worker or Persistent model optionally.

FlexSearch Documents also contain these features:

Document Store including Enrichment
Multi-Field-Search
Multi-Tag-Search
Resolver (Chain Complex Queries)
Result Highlighting
Export/Import
Worker
Persistent

Document Options

Document options basically inherits from Index Options, so you can apply most of those options either in the top scope of the config (for all fields) or as per field or both of them.

Option	Values	Description	Default
`document`	Document Descriptor	Includes any specific information about how the document data should be indexed	(mandatory)
`worker`	Boolean String	Enable a worker distributed model. Read more about here: Worker Index	`false`

Document Search Options

Document search options basically inherit from Index Search Options, so you can apply most of those options either in the top scope of the config (for all fields) or as per field or both of them.

Option	Values	Description	Default
`index` `field`	String Array<String> Array<SearchOptions>	Sets the document fields which should be searched. When no field is set, all fields will be searched. Custom options per field are also supported.
`tag`	Object<field:tag>	Sets the document fields which should be searched. When no field is set, all fields will be searched. Custom options per field are also supported.
`enrich`	Boolean	Enrich IDs from the results with the corresponding documents.	`false`
`highlight`	Highlighting Options String	Highlight query matches in the result (for Document Indexes only)	`false`
`merge`	Boolean	Merge multiple fields in resultset into one and group results per ID	`false`
`pluck`	String	Pick and apply search to just one field and return a flat result representation	`false`

The Document Descriptor

When creating a Document-Index you will need to define a document descriptor in the field document. This descriptor is including any specific information about how the document data should be indexed.

Option	Values	Description	Default
`id`	String		`"id"`
`index`	String Array<String> Array<FieldOptions>
`tag`	String Array<String> Array<FieldOptions>
`store`	Boolean String Array<String> Array<FieldOptions>		`false`

Field Options

You can use all standard Index Options within field options.

Option	Values	Description	Default
`field`	String	The field name (colon seperated syntax)	(mandatory)
`filter`	Function
`custom`	Function

Assuming our document has a simple data structure like this:

{ 
    "id": 0, 
    "content": "some text"
}

An appropriate Document Descriptor has always to define at least 2 things:

the property id describes the location of the document ID within a document item
the property index (or tag) containing one or multiple fields from the document, which should be indexed for searching

// create a document index
const index = new Document({
    document: {
        id: "id",
        index: "content"
    }
});

// add documents to the index
index.add({ 
    id: 0, 
    content: "some text"
});

As briefly explained above, the field id describes where the ID or unique key lives inside your documents. When not passed it will always take the field id from the top level scope of your data.

The property index takes all fields you would like to have indexed. When just selecting one field, then you can pass a string.

The next example will add 2 fields title and content to the index:

var docs = [{
    id: 0,
    title: "Title A",
    content: "Body A"
},{
    id: 1,
    title: "Title B",
    content: "Body B"
}];

const index = new Document({
    id: "id",
    index: ["title", "content"]
});

Add both fields to the document descriptor and pass individual Index-Options for each field:

const index = new Document({
    id: "id",
    index: [{
        field: "title",
        tokenize: "forward",
        encoder: Charset.LatinAdvanced,
        resolution: 9
    },{
        field:  "content",
        tokenize: "forward",
        encoder: Charset.LatinAdvanced,
        resolution: 3
    }]
});

Field options inherits from top level options when passed, e.g.:

const index = new Document({
    tokenize: "forward",
    encoder: Charset.LatinAdvanced,
    resolution: 9,
    document: {
        id: "id",
        index:[{
            field: "title"
        },{
            field: "content",
            resolution: 3
        }]
    }
});

Assigning the Encoder instance to the top level configuration will share the encoder to all fields. You should avoid this when contents of fields don't have the same type of content (e.g. one field contains terms, another contains numeric IDs).

Nested Data Fields (Complex Objects)

Assume the document array looks more complex (has nested branches etc.), e.g.:

{
  "record": {
    "id": 0,
    "title": "some title",
    "content": {
      "header": "some text",
      "footer": "some text"
    }
  }
}

Then use the colon separated notation root:child:child as a name for each field defining the hierarchy which corresponds to the document:

const index = new Document({
    document: {
        id: "record:id",
        index: [
            "record:title",
            "record:content:header",
            "record:content:footer"
        ]
    }
});

Tip

Just add fields you want to query against. Do not add fields to the index, you just need in the result. For this purpose you can store documents independently of its index (read below).

To query against one or multiple specific fields you have to pass the exact key of the field you have defined in the document descriptor as a field name (with colon syntax):

index.search(query, {
    field: [
        "record:title",
        "record:content:header",
        "record:content:footer"
    ]
});

Same as:

index.search(query, [
    "record:title",
    "record:content:header",
    "record:content:footer"
]);

Using field-specific options:

index.search("some query", [{
    field: "record:title",
    limit: 100,
    suggest: true
},{
    field: "record:content:header",
    limit: 100,
    suggest: false
}]);

You can also perform a search through the same field with different queries:

index.search([{
    field: "record:title",
    query: "some query",
    limit: 100,
    suggest: true
},{
    field: "record:title",
    query: "some other query",
    limit: 100,
    suggest: true
}]);

Complex Documents

You need to follow 2 rules for your documents:

The document cannot start with an Array at the root. This will introduce sequential data and isn't supported yet. See below for a workaround for such data.

[ // <-- not allowed as document start!
  {
    "id": 0,
    "title": "title"
  }
]

The document ID can't be nested inside an Array. This will introduce sequential data and isn't supported yet. See below for a workaround for such data.

{
  "records": [ // <-- not allowed when ID or tag lives inside!
    {
      "id": 0,
      "title": "title"
    }
  ]
}

Here an example for a supported complex document:

{
  "meta": {
    "tag": "cat",
    "id": 0
  },
  "contents": [
    {
      "body": {
        "title": "some title",
        "footer": "some text"
      },
      "keywords": ["some", "key", "words"]
    },
    {
      "body": {
        "title": "some title",
        "footer": "some text"
      },
      "keywords": ["some", "key", "words"]
    }
  ]
}

The corresponding document descriptor (when all fields should be indexed) looks like:

const index = new Document({
    document: {
        id: "meta:id",
        index: [
            "contents:body:title",
            "contents:body:footer"
        ],
        tag: [
            "meta:tag",
            "contents:keywords"
        ]
    }
});

Remember when searching you have to use the same colon-separated-string as a key from your field definition.

index.search(query, { 
    index: "contents:body:title"
});

Not Supported Documents (Sequential Data)

This example breaks both rules described above:

[ // <-- not allowed as document start!
  {
    "tag": "cat",
    "records": [ // <-- not allowed when ID or tag lives inside!
      {
        "id": 0,
        "body": {
          "title": "some title",
          "footer": "some text"
        },
        "keywords": ["some", "key", "words"]
      },
      {
        "id": 1,
        "body": {
          "title": "some title",
          "footer": "some text"
        },
        "keywords": ["some", "key", "words"]
      }
    ]
  }
]

You need to unroll your data within a simple loop before adding to the index.

A workaround to such a data structure from above could look like:

const index = new Document({
    document: {
        id: "id",
        index: [
            "body:title",
            "body:footer"
        ],
        tag: [
            "tag",
            "keywords"
        ]
    }
});

function add(sequential_data){

    for(let x = 0, item; x < sequential_data.length; x++){

        item = sequential_data[x];

        for(let y = 0, record; y < item.records.length; y++){
            record = item.records[y];
            // append tag to each record
            record.tag = item.tag;
            // add to index
            index.add(record);
        }
    }  
}

// now just use add() helper method as usual:
add([{
    // sequential structured data
    // take the data example above
}]);

Add/Update/Remove Documents

Add a document to the index:

index.add({
    id: 0,
    title: "Foo",
    content: "Bar"
});

Update index:

index.update({
    id: 0,
    title: "Foo",
    content: "Foobar"
});

Remove a document and all its contents from an index, by ID:

index.remove(id);

Or by the document data:

index.remove(doc);

Field-Search

Search through all fields:

index.search(query);

Search through a specific field:

index.search(query, { index: "title" });

Search through a given set of fields:

index.search(query, { index: ["title", "content"] });

Pass custom options and/or queries to each field:

index.search([{
    field: "content",
    query: "some query",
    limit: 100,
    suggest: true
},{
    field: "content",
    query: "some other query",
    limit: 100,
    suggest: true
}]);

Limit & Offset

By default, every query is limited to 100 entries. Unbounded queries leads into issues. You need to set the limit as an option to adjust the size.

You can set the limit and the offset for each query:

index.search(query, { limit: 20, offset: 100 });

You cannot pre-count the size of the result-set. That's a limit by the design of FlexSearch. When you really need a count of all results you are able to page through, then just assign a high enough limit and get back all results and apply your paging offset manually (this works also on server-side). FlexSearch is fast enough that this isn't an issue.

See all available field-search options

The Result Set

Schema of the default result-set:

fields[] => { field, result[] => id }

Schema of an enriched result-set:

fields[] => { field, result[] => { id, doc }}

The top-level scope of the result set is an array of fields on which the query was applied to. Each of this field has a record (object) with 2 properties field and result. The result could be an array of IDs or is getting enriched by the stored document data (when index was created with store: true).

A default non-enriched result set looks like:

[{
    field: "title",
    result: [0, 1, 2]
},{
    field: "content",
    result: [3, 4, 5]
}]

An enriched result set looks like:

[{
    field: "title",
    result: [
        { id: 0, doc: { /* document */ }},
        { id: 1, doc: { /* document */ }},
        { id: 2, doc: { /* document */ }}
    ]
},{
    field: "content",
    result: [
        { id: 3, doc: { /* document */ }},
        { id: 4, doc: { /* document */ }},
        { id: 5, doc: { /* document */ }}
    ]
}]

Merge Document Results

Schema of the merged result-set:

result[] => { id, doc, field[] }}

By passing the search option merge: true all fields of the result set will be merged (grouped by ID):

[{
    id: 1001,
    doc: {/* stored document */}
    field: ["fieldname-1", "fieldname-2"]
},{
    id: 1002,
    doc: {/* stored document */}
    field: ["fieldname-3"]
}]

Pluck Single Fields

When using pluck instead of field you can explicitly select just one field and get back a flat representation:

index.search(query, { 
    pluck: "title",
    enrich: true
});

[
    { id: 0, doc: { /* document */ }},
    { id: 1, doc: { /* document */ }},
    { id: 2, doc: { /* document */ }}
]

Document Store

Only a document index can have a store. You can use a document index instead of a flat index to get this functionality also when only storing ID-content-pairs.

You can define independently which fields should be indexed and which fields should be stored. This way you can index fields which should not be included in the search result.

Do not use a store when: 1. an array of IDs as the result is good enough, or 2. you already have the contents/documents stored elsewhere (outside the index).

When the store attribute was set, you have to include all fields which should be stored explicitly (acts like a whitelist).

When the store attribute was not set, the original document is stored as a fallback.

This will add the whole original content to the store:

const index = new Document({
    document: { 
        index: "content",
        store: true
    }
});

index.add({ id: 0, content: "some text" });

Access documents from internal store

You can get indexed documents from the store:

var data = index.get(1);

You can update/change store contents directly without changing the index by:

index.set(1, data);

To update the store and also update the index then just use index.update, index.add or index.append.

When you perform a query, weather it is a document index or a flat index, then you will always get back an array of IDs.

Optionally you can enrich the query results automatically with stored contents by:

index.search(query, { enrich: true });

Your results look now like:

[{
    id: 0,
    doc: { /* content from store */ }
},{
    id: 1,
    doc: { /* content from store */ }
}]

Configure Document Store (Recommended)

When storing documents, you can configure independently what should be indexed and what should be stored. This can reduce required index space significantly. Indexed fields do not require to be included in the stored data (also the ID isn't necessary to keep in store). It is recommended to just add fields to the store you'll need in the final result to process further on.

A short example of configuring a document store:

const index = new Document({
    document: { 
        index: "content",
        store: ["author", "email"] 
    }
});

index.add({
    id: 0,
    author: "Jon Doe",
    email: "john@mail.com",
    content: "Some content for the index ..."
});

You can query through the contents and will get back the stored values instead:

index.search("some content", { enrich: true });

Your results are now looking like:

[{
    field: "content",
    result: [{
        id: 0,
        doc: {
            author: "Jon Doe",
            email: "john@mail.com",
        }
    }]
}]

Both field "author" and "email" are not indexed, whereas the indexed field "content" was not included in the stored data.

Filter Fields (Index / Tags / Datastore)

You can pass a function to the field option property filter. This function just has to return true if the document should be indexed.

const index = new Document({
    document: {
        id: "id",
        index: [{
            // custom field:
            field: "somefield",
            filter: function(data){
                // return false to filter out
                // return anything else to keep
                return true;
            }
        }],
        tag: [{
            field: "city",
            filter: function(data){
                // return false to filter out
                // return anything else to keep
                return true;
            }
        }],
        store: [{
            field: "anotherfield",
            filter: function(data){
                // return false to filter out
                // return anything else to keep
                return true;
            }
        }]
    }
});

Custom Fields (Index / Tags / Datastore)

You can pass a function to the field option property custom to either:

change and/or extend the original input string
create a new "virtual" field which is not included in document data

Dataset example:

{
    "id": 10001,
    "firstname": "John",
    "lastname": "Doe",
    "city": "Berlin",
    "street": "Alexanderplatz",
    "number": "1a",
    "postal": "10178"
}

You can apply custom fields derived from document data or by any external data:

const index = new Document({
    document: {
        id: "id",
        index: [{
            // custom field:
            field: "fullname",
            custom: function(data){
                // return custom string
                return data.firstname + " " + 
                       data.lastname;
            }
        },{
            // custom field:
            field: "location",
            custom: function(data){
                return data.street + " " +
                       data.number + ", " +
                       data.postal + " " +
                       data.city;
            }
        }],
        tag: [{
            // existing field
            field: "city"
        },{
            // custom field:
            field: "category",
            custom: function(data){
                let tags = [];
                // push one or multiple tags
                // ....
                return tags;
            }
        }],
        store: [{
            field: "anotherfield",
            custom: function(data){
                // return a falsy value to filter out
                // return anything else as to keep in store
                return data;
            }
        }]
    }
});

Filter is also available in custom functions when returning false.

Perform a query against the custom field as usual:

const result = index.search({
    query: "10178 Berlin Alexanderplatz",
    field: "location"
});

const result = index.search({
    query: "john doe",
    tag: { "city": "Berlin" }
});

Best Practices: TypeScript

When using TypeScript, you can type your document data when creating a Document-Index. This will provide enhanced type checks of your syntax.

Create a schema accordingly to your document data, e.g.:

type doctype = {
    id: number,
    title: string,
    description: string,
    tags: string[]
};

Create the document index by assigning the type doctype:

const document = new Document<doctype>({
    id: "id",
    store: true,
    index: [{
        field: "title"
    },{
        field: "description"
    }],
    tag: "tags"
});

Best Practices: Merge Documents

Read here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Uh oh!

Files

document-search.md

document-search.md

Document Search (Field-Search)

Document Options

Document Search Options

The Document Descriptor

Field Options

Nested Data Fields (Complex Objects)

Complex Documents

Not Supported Documents (Sequential Data)

Add/Update/Remove Documents

Field-Search

Limit & Offset

The Result Set

Merge Document Results

Pluck Single Fields

Tags

Multi-Tag Search

Document Store

Access documents from internal store

Configure Document Store (Recommended)

Filter Fields (Index / Tags / Datastore)

Custom Fields (Index / Tags / Datastore)

Best Practices: TypeScript

Best Practices: Merge Documents

Collapse file tree

Files

document-search.md

Latest commit

History

document-search.md

File metadata and controls

Document Search (Field-Search)

Document Options

Document Search Options

The Document Descriptor

Field Options

Nested Data Fields (Complex Objects)

Complex Documents

Not Supported Documents (Sequential Data)

Add/Update/Remove Documents

Field-Search

Limit & Offset

The Result Set

Merge Document Results

Pluck Single Fields

Tags

Multi-Tag Search

Document Store

Access documents from internal store

Configure Document Store (Recommended)

Filter Fields (Index / Tags / Datastore)

Custom Fields (Index / Tags / Datastore)

Best Practices: TypeScript

Best Practices: Merge Documents