{
"indexing_config": {
"with_positions": true
}
}
This option controls if positions are stored. Features such as phrase queries that require positional information will not work if this is disabled. Turning this off for very large collections (~> 1GB) can increase the tool's scalability, at the cost of such features.
{
"indexing_config": {
"num_threads": max(min(physical cores, logical cores) - 1, 1)
}
}
InfiSearch regards each file as a single document by default. You can index multiple files into one document using the reserved field _add_files
. This is useful if you need to override or add data but can't modify the source document easily.
Overrides should be provided with JSON, CSV, or HTML files, as TXT and PDF files have no reliable way of supplying the _add_files
field. In addition, you will need to manually map the CSV data to the _add_files
field. This is automatically done for JSON and HTML files.
Suppose you have the following files:
folder
|-- main.html
|-- overrides.json
To index main.html
and override its link, you would have:
overrides.json
{
"link": "https://infi-search.com",
"_add_files": "./main.html"
}
Indexer Configuration
{
"indexing_config": {
"exclude": ["main.html"]
}
}
This excludes indexing main.html
directly, but does so through overrides.json
.
⚠️ This section serves as a reference, prefer the preconfigured scaling presets if possible.
Field Configuration
{
"fields_config": {
"cache_all_field_stores": true,
"num_docs_per_store": 100000000
},
"indexing_config": {
"pl_limit": 4294967295,
"pl_cache_threshold": 0,
"num_pls_per_dir": 1000
}
}
All fields specified with storage=[{ "type": "text" }]
are cached up front when this is enabled.
This is the same option as the one under search functionality options, and has lower priority.
The num_docs_per_store
parameter controls how many documents' texts to store in one JSON file. Batching multiple files together increases file size but can lead to less files and better browser caching.
This is a threshold (in bytes) at which to "cut" index (pl meaning postings list) chunks. Increasing this produces less but bigger chunks (which take longer to retrieve).
Index chunks that exceed this size (in bytes) are cached by the search library on initilisation. It is used to configure InfiSearch for response time (over scalability) for typical use cases.