Three configuration presets are available for scaling this tool to larger collections. They are designed primarily for InfiSearch's main intended use case of supporting static site search.
Each preset primarily makes a tradeoff between the document collection size it can support and the number of rounds of network requests (RTT
).
The default preset is small
, which generates a monolithic index and field store, much like other client side indexing tools.
Specify the preset
key in your configuration file to change this.
{
"preset": "small" | "medium" | "large"
}
small
,medium
andlarge
corresponds to 0, 1, or 2 rounds of network requests in the table below.
Preset | Description |
---|---|
small |
Generates a monolithic index and field store. Identical to most other client side indexing tools. |
medium |
Generates an almost-monolithic index but sharded field store. Only required field stores are retrieved for generating result previews. |
large |
Generates both a sharded index and field store. Only index files that are required for the query are retrieved. Keeps stop words. This is the preset used in the demo here! |
In summary, scaling this tool for larger collections dosen't come freely, and necessitates fragmenting the index and/or field stores, retrieving only what's needed. This means extra network requests, but to a reasonable degree.
This tool should be able to handle 800MB
(not counting things like HTML tags) collections with the full set of features enabled in the large
preset.
There are a few other options especially worth highlighting that can help reduce the index size (and hence support larger collections) or modify caching strategies.
-
In addition to upfront caching of index files with the
pl_cache_threshold
indexing parameter, InfiSearch also persistently caches any index shard that was requested before, but fell short of thepl_cache_threshold
. -
This option is mostly only useful when using the
small / medium
presets which generate a monolithic index. Ignoring stop words in this case can reduce the overall index size, if you are willing to forgo its benefits. -
Positions take up a considerable (~3/4) portion of the index size but produces useful information for proximity ranking, and enables performing phrase queries.
Presets modify only the following properties:
- Search Configuration:
cacheAllFieldStores
- Indexing Configuration:
num_docs_per_store
,pl_limit
,pl_cache_threshold
Any of these values specified in the configuration file will override that of the preset's.