Skip to content

Latest commit

 

History

History
124 lines (78 loc) · 4.5 KB

README.md

File metadata and controls

124 lines (78 loc) · 4.5 KB

Tantivy

Tantivy is a Elixir library that wraps the tantivy full text search library using an Erlang port.

Installation

If available in Hex, the package can be installed by adding tantivy to your list of dependencies in mix.exs:

def deps do
  [
    {:tantivy, "~> 0.1.0"}
  ]
end

A GenServer also needs to be started as part of your supervision tree:

  def start(_type, _args) do
    children = [
	  ...
      # start the full text index server
      {Tantivy, name: MyApp.MyIndex, command: my_index_command()},
	  ...
    ]
...

  defp my_index_command do
    config = Application.fetch_env!(:my_app, MyApp.MyIndex)
    dir = Keyword.fetch!(config, :dir)
    command = Keyword.get(config, :command) || "tantivy"
    "#{command} port -i #{dir}"
  end

And in your config.exs, to configure the index dir:

# full text search
config :my_app, MyApp.MyIndex, dir: "#{System.get_env("HOME")}/#{Mix.env()}-index/"

It is supposed to be used together with a forked version of tantivy-cli. Please make sure use the erlang-port branch.

Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/tantivy.

Usage

You must define a document schema before hand, as outlined in the tantivy-cli documentation here:. Right now it is required to have a unsigned interger field called id. If you use a relational database such as PostgreSQL, the auto-increment primary key will be a perfect fit. Please note that this limitation is from this wrapper, not tantivy itself.

As shown in the previous example, the Elixir part will launch a GenServer that communicate with a external process via a port. All requests will be forwarded to the external tantivy process. Although the GenServer serialize requests; the wire protocol is designed with a split command/completion style so requests will be executed in parallel on the other sie of the port, and completed out of order. Therefore, multiple searching operations can be outstanding, maximizing the throughput.

Adding a document

A document is a map conforming to the previously defined schema. You can add documents one by one or pass a list:

alias MyApp.MyIndex

Tantivy.add(MyIndex, %{id: 1, title: title})
Tantivy.add(MyIndex, [doc0, doc1])

The above function is fully async, and will return :ok immediately. Right now there is no way to capture failure.

Deleting a document

You can only delete document one at a time, with the passed id.

alias MyApp.MyIndex

Tantivy.remove(MyIndex, 1)

This function will delete all documents with the same id. Again, delete is fully async.

Update a document

Update is fused delete then add:

alias MyApp.MyIndex

Tantivy.remove(MyIndex, 1, %{id: 1, title: title})

Again, update is fully async.

Search

This is where all the fun begin:

alias MyApp.MyIndex

list = Tantivy.search(MyIndex, query) # default to at most 100 results
list = Tantivy.search(MyIndex, query, 200)

query is a query string as defined by tantivy. The query syntax is very simple: A query like Joe Biden means any document that contains Joe or Biden. To seach for Joe and Biden, you have to use +Joe +Biden. Or you can search for "Joe Biden", which means any documents with Joe and Biden in consequtive positions. For more detail, please consult tantivy documentation

Search will return a list of documents as seen by tantivy. It is recommended not to store anything in tantivy except id, and use other storage such as a database. IF you only store id, the returning document list will be something like:

[%{id: [1]}, ...]

Please keep in mind that each field will be associated with a list of values. Tantivy allows multiple value per field and will return a list regardless what you put in.

Credits

Huge thanks to the wonderful Tantivy full text search engine library. This wrapper only exposes the minimal amount of functionality that I need for myself. It does not do justice to the underneath Rust library. If you need something else, feel free to send my PRs.