lancedb · eddyxu · Aug 2, 2024 · Aug 2, 2024
diff --git a/python/python/lance/dataset.py b/python/python/lance/dataset.py
@@ -1196,10 +1196,9 @@ def create_scalar_index(
  dataset = lance.dataset("/tmp/images.lance")
  my_table = dataset.scanner(filter="my_col != 7").to_table()
 
- Scalar indices can also speed up scans containing a vector search and a
- prefilter:
+ Vector search with pre-filers can also benefit from scalar indices. For example,
 
- .. code-block::python
+ .. code-block:: python
 
  import lance
 
@@ -1214,44 +1213,25 @@ def create_scalar_index(
  prefilter=True
  )
 
- Scalar indices can only speed up scans for basic filters using
- equality, comparison, range (e.g. ``my_col BETWEEN 0 AND 100``), and set
- membership (e.g. `my_col IN (0, 1, 2)`)
-
- Scalar indices can be used if the filter contains multiple indexed columns and
- the filter criteria are AND'd or OR'd together
- (e.g. ``my_col < 0 AND other_col> 100``)
-
- Scalar indices may be used if the filter contains non-indexed columns but,
- depending on the structure of the filter, they may not be usable. For example,
- if the column ``not_indexed`` does not have a scalar index then the filter
- ``my_col = 0 OR not_indexed = 1`` will not be able to use any scalar index on
- ``my_col``.
-
- To determine if a scan is making use of a scalar index you can use
- ``explain_plan`` to look at the query plan that lance has created. Queries
- that use scalar indices will either have a ``ScalarIndexQuery`` relation or a
- ``MaterializeIndex`` operator.
-
- There are 4 types of scalar indices available today. The most common
- type is ``BTREE``. This index is inspired by the btree data structure
- although only the first few layers of the btree are cached in memory. It iwll
- perform well on columns with a large number of unique values and few rows per
- value.
-
- The other common index type is ``BITMAP``. This index stores a bitmap for each
- unique value in the column. This index is useful for columns with a small
- number of unique values and many rows per value.
 
- The ``LABEL_LIST`` index type is a special index that is used to index list
- columns whose values have small cardinality. For example, a column that
- contains lists of tags (e.g. ``["tag1", "tag2", "tag3"]``) can be indexed
- with a ``LABEL_LIST`` index. This index can only speedup queries with
- ``array_has_any`` or ``array_has_all`` filters.
-
- The ``INVERTED`` index type is used to index document columns. This index
- can conduct full-text searches. For example, a column that contains any word
- of query string "hello world". The results will be ranked by BM25.
+ There are 4 types of scalar indices available today.
+
+ * ``BTREE``. The most common type is ``BTREE``. This index is inspired
+ by the btree data structure although only the first few layers of the btree
+ are cached in memory. It will
+ perform well on columns with a large number of unique values and few rows per
+ value.
+ * ``BITMAP``. This index stores a bitmap for each unique value in the column.
+ This index is useful for columns with a small number of unique values and
+ many rows per value.
+ * ``LABEL_LIST``. A special index that is used to index list
+ columns whose values have small cardinality. For example, a column that
+ contains lists of tags (e.g. ``["tag1", "tag2", "tag3"]``) can be indexed
+ with a ``LABEL_LIST`` index. This index can only speedup queries with
+ ``array_has_any`` or ``array_has_all`` filters.
+ * ``INVERTED``. It is used to index document columns. This index
+ can conduct full-text searches. For example, a column that contains any word
+ of query string "hello world". The results will be ranked by BM25.
 
  Note that the ``LANCE_BYPASS_SPILLING`` environment variable can be used to
  bypass spilling to disk. Setting this to true can avoid memory exhaustion
@@ -1285,6 +1265,26 @@ def create_scalar_index(
  "category",
  "BTREE",
  )
+
+ Scalar indices can only speed up scans for basic filters using
+ equality, comparison, range (e.g. ``my_col BETWEEN 0 AND 100``), and set
+ membership (e.g. `my_col IN (0, 1, 2)`)
+
+ Scalar indices can be used if the filter contains multiple indexed columns and
+ the filter criteria are AND'd or OR'd together
+ (e.g. ``my_col < 0 AND other_col> 100``)
+
+ Scalar indices may be used if the filter contains non-indexed columns but,
+ depending on the structure of the filter, they may not be usable. For example,
+ if the column ``not_indexed`` does not have a scalar index then the filter
+ ``my_col = 0 OR not_indexed = 1`` will not be able to use any scalar index on
+ ``my_col``.
+
+ To determine if a scan is making use of a scalar index you can use
+ ``explain_plan`` to look at the query plan that lance has created. Queries
+ that use scalar indices will either have a ``ScalarIndexQuery`` relation or a
+ ``MaterializeIndex`` operator.
+
  """
  if isinstance(column, str):
  column = [column]