From b88d6baead15e7e23611e6a4fdaffc727454f58b Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Mon, 21 Oct 2024 08:47:25 -0400
Subject: [PATCH 01/10] docs(get-started): remove references to Bloom Compactor
 hash ring

The bloom compactor component has been removed in favor of the bloom
planner/builder/gateway, none of which use hash rings.
---
 docs/sources/get-started/hash-rings.md | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/docs/sources/get-started/hash-rings.md b/docs/sources/get-started/hash-rings.md
index 8bb024f4085f..a4a242015ee2 100644
--- a/docs/sources/get-started/hash-rings.md
+++ b/docs/sources/get-started/hash-rings.md
@@ -31,7 +31,6 @@ These components need to be connected into a hash ring:
 - query schedulers
 - compactors
 - rulers
-- bloom compactors (Experimental)
 
 These components can optionally be connected into a hash ring:
 - index gateway
@@ -104,13 +103,3 @@ The ruler ring is used to determine which rulers evaluate which rule groups.
 ## About the index gateway ring
 
 The index gateway ring is used to determine which gateway is responsible for which tenant's indexes when queried by rulers or queriers.
-
-## About the Bloom Compactor ring
-{{% admonition type="warning" %}}
-This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available.  No SLA is provided.  
-{{% /admonition %}}
-
-The Bloom Compactor ring is used to determine which subset of compactors own a given tenant, 
-and which series fingerprint ranges each compactor owns. 
-The ring is also used to determine which compactor owns retention. 
-Retention will be applied by the compactor owning the smallest token in the ring.

From e5350fde075949be3676a6c114fb187358d0a1b6 Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Mon, 21 Oct 2024 08:49:56 -0400
Subject: [PATCH 02/10] docs(get-started): correct list of components started
 by backend

Replace references to bloom compactor with the new bloom planner and
bloom builder components.
---
 docs/sources/get-started/quick-start.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/sources/get-started/quick-start.md b/docs/sources/get-started/quick-start.md
index cfe0c5e6ed87..69910fa0c315 100644
--- a/docs/sources/get-started/quick-start.md
+++ b/docs/sources/get-started/quick-start.md
@@ -37,7 +37,7 @@ The Docker Compose configuration runs the following components, each in its own
 - **Gateway** (nginx) which receives requests and redirects them to the appropriate container based on the request's URL.
 - **Loki read component**: which runs a Query Frontend and a Querier.
 - **Loki write component**: which runs a Distributor and an Ingester.
-- **Loki backend component**: which runs an Index Gateway, Compactor, Ruler, Bloom Compactor (experimental), and Bloom Gateway (experimental).
+- **Loki backend component**: which runs an Index Gateway, Compactor, Ruler, Bloom Planner (experimental), Bloom Builder (experimental), and Bloom Gateway (experimental).
 - **Minio**: which Loki uses to store its index and chunks.
 - **Grafana**: which provides visualization of the log lines captured within Loki.
 
@@ -141,9 +141,9 @@ This quickstart assumes you are running Linux.
    - You can access the Grafana Alloy UI at [http://localhost:12345](http://localhost:12345).
 
 6. (Optional) You can check all the containers are running by running the following command:
-   
+
    ```bash
-   docker ps -a 
+   docker ps -a
    ```
 
 
@@ -321,7 +321,7 @@ Within the entrypoint section, the Loki data source is configured with the follo
 - `URL: http://gateway:3100` (URL of the Loki data source. Loki uses an nginx gateway to direct traffic to the appropriate component)
 - `jsonData.httpHeaderName1: "X-Scope-OrgID"` (header name for the organization ID)
 - `secureJsonData.httpHeaderValue1: "tenant1"` (header value for the organization ID)
-  
+
 It is important to note when Loki is configured in any other mode other than monolithic deployment, you are required to pass a tenant ID in the header. Without this, queries will return an authorization error.
 
 <!-- INTERACTIVE page step2.md END -->
@@ -344,4 +344,4 @@ It's a self-contained environment for learning about Mimir, Loki, Tempo, and Gra
 The project includes detailed explanations of each component and annotated configurations for a single-instance deployment.
 You can also push the data from the environment to [Grafana Cloud](https://grafana.com/cloud/).
 
-<!-- INTERACTIVE page finish.md END -->
\ No newline at end of file
+<!-- INTERACTIVE page finish.md END -->

From 0117f2baacaee5f7c6080ecbfdc838905e4b5806 Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Mon, 21 Oct 2024 09:14:28 -0400
Subject: [PATCH 03/10] docs(operations): update Query Acceleration with Blooms
 topic

Update the Query Acceleration with Blooms topic to reference structured
metadata blooms over the removed line blooms.

Documentation on how to configure bloom components has been temporarily
removed while the architecture is still under active changes.
---
 .../operations/query-acceleration-blooms.md   | 246 ++----------------
 1 file changed, 28 insertions(+), 218 deletions(-)

diff --git a/docs/sources/operations/query-acceleration-blooms.md b/docs/sources/operations/query-acceleration-blooms.md
index 2fec5f292270..59222129fe90 100644
--- a/docs/sources/operations/query-acceleration-blooms.md
+++ b/docs/sources/operations/query-acceleration-blooms.md
@@ -1,8 +1,8 @@
 ---
-title: Query Acceleration with Blooms (Experimental) 
-menuTitle: Query Acceleration with Blooms 
-description: Describes how to enable and configure query acceleration with blooms. 
-weight: 
+title: Query Acceleration with Blooms (Experimental)
+menuTitle: Query Acceleration with Blooms
+description: Describes how to enable and configure query acceleration with blooms.
+weight:
 keywords:
   - blooms
   - query acceleration
@@ -10,235 +10,45 @@ keywords:
 
 # Query Acceleration with Blooms (Experimental)
 {{% admonition type="warning" %}}
-This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available.  No SLA is provided.  
+This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available.  No SLA is provided.
 {{% /admonition %}}
 
-Loki 3.0 leverages [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) to speed up queries by reducing the 
-amount of data Loki needs to load from the store and iterate through. Loki is often used to run “needle in a haystack” 
-queries; these are queries where a large number of log lines are searched, but only a few log lines match the [filtering 
-expressions]({{< relref "../query/log_queries#line-filter-expression" >}}) of the query. 
-Some common use cases are needing to find a specific text pattern in a message, or all logs tied to a specific customer ID.
+Loki leverages [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) to speed up queries by reducing the amount of data Loki needs to load from the store and iterate through.
+Loki is often used to run "needle in a haystack" queries; these are queries where a large number of log lines are searched, but only a few log lines match the query.
+Some common use cases are needing to find all logs tied to a specific trace ID or customer ID.
 
 An example of such queries would be looking for a trace ID on a whole cluster for the past 24 hours:
 
 ```logql
-{cluster="prod"} |= "traceID=3c0e3dcd33e7"
+{cluster="prod"} | traceID="3c0e3dcd33e7"
 ```
 
-Loki would download all the chunks for all the streams matching `{cluster=”prod”}` for the last 24 hours and iterate
-through each log line in the chunks checking if the string `traceID=3c0e3dcd33e7` is present.
+Without accelerated filtering, Loki downloads all the chunks for all the streams matching `{cluster="prod"}` for the last 24 hours and iterates through each log line in the chunks, checking if the [structured metadata][] key `traceID` with value `3c0e3dcd33e7` is present.
 
-With accelerated filtering, Loki is able to skip most of the chunks and only process the ones where we have a 
-statistical confidence that the string might be present. 
-The underlying blooms are built by the [Bloom Builder](#bloom-planner-and-builder) component
-and served by the new [Bloom Gateway](#bloom-gateway) component.
+With accelerated filtering, Loki is able to skip most of the chunks and only process the ones where we have a statistical confidence that the structured metadata pair might be present.
 
-## Enable Query Acceleration with Blooms
-{{< admonition type="warning" >}}
-Building and querying bloom filters are by design not supported in single binary deployment.
-It can be used with Single Scalable deployment (SSD), but it is recommended to
-run bloom components only in fully distributed microservice mode.
-The reason is that bloom filters also come with a relatively high cost for both building
-and querying the bloom filters that only pays off at large scale deployments.
-{{< /admonition >}}
+## Adding data to blooms
 
-To start building and using blooms you need to:
-- Deploy the [Bloom Planner and Builder](#bloom-planner-and-builder) components (as [microservices][microservices] or via the [SSD][ssd] `backend` target) and enable the components in the [Bloom Build config][bloom-build-cfg].
-- Deploy the [Bloom Gateway](#bloom-gateway) component (as a [microservice][microservices] or via the [SSD][ssd] `backend` target) and enable the component in the [Bloom Gateway config][bloom-gateway-cfg].
-- Enable blooms building and filtering for each tenant individually, or for all of them by default.
+To make data available for query acceleration, send [structured metadata][] to Loki. Loki builds blooms from all strucutred metadata keys and values.
 
-```yaml
-# Configuration block for the bloom creation.
-bloom_build:
-  enabled: true
-  planner:
-    planning_interval: 6h
-  builder:
-    planner_address: bloom-planner.<namespace>.svc.cluster.local.:9095
+## Querying blooms
 
-# Configuration block for bloom filtering.
-bloom_gateway:
-  enabled: true
-  client:
-    addresses: dnssrvnoa+_bloom-gateway-grpc._tcp.bloom-gateway-headless.<namespace>.svc.cluster.local
+Loki will check blooms for any [label filter expression][] that satisfies _all_ of the following criteria:
 
-# Enable blooms creation and filtering for all tenants by default
-# or do it on a per-tenant basis.
-limits_config:
-  bloom_creation_enabled: true
-  bloom_split_series_keyspace_by: 1024
-  bloom_gateway_enable_filtering: true
-```
-
-For more configuration options refer to the [Bloom Gateway][bloom-gateway-cfg], [Bloom Build][bloom-build-cfg] and 
-[per tenant-limits][tenant-limits] configuration docs. 
-We strongly recommend reading the whole documentation for this experimental feature before using it.
-
-## Bloom Planner and Builder
-Building bloom filters from the chunks in the object storage is done by two components: the Bloom Planner and the Bloom
-Builder, where the planner creates tasks for bloom building, and sends the tasks to the builders to process and
-upload the resulting blocks.
-Bloom filters are grouped in bloom blocks spanning multiple streams (also known as series) and chunks from a given day. 
-To learn more about how blocks and metadata files are organized, refer to the 
-[Building and querying blooms](#building-and-querying-blooms) section below.
-
-The Bloom Planner runs as a single instance and calculates the gaps in fingerprint ranges for a certain time period for
-a tenant for which bloom filters need to be built. It dispatches these tasks to the available builders.
-The planner also applies the [blooms retention](#retention). 
-
-The Bloom Builder is a stateless horizontally scalable component and can be scaled independently of the planner to fulfill
-the processing demand of the created tasks.
-
-You can find all the configuration options for these components in the [Configure section for the Bloom Builder][bloom-build-cfg].
-Refer to the [Enable Query Acceleration with Blooms](#enable-query-acceleration-with-blooms) section below for 
-a configuration snippet enabling this feature.
-
-### Retention
-The Bloom Planner applies bloom block retention on object storage. Retention is disabled by default.
-When enabled, retention is applied to all tenants. The retention for each tenant is the longest of its [configured][tenant-limits] 
-general retention (`retention_period`) and the streams retention (`retention_stream`).
-
-For example, in the following example, tenant A has a bloom retention of 30 days, and tenant B a bloom retention of 40 days.
-
-```yaml
-overrides:
-    "A": 
-        retention_period: 30d
-    "B":
-        retention_period: 30d
-        retention_stream:
-            - selector: '{namespace="prod"}'
-              priority: 1
-              period: 40d
-```
-
-### Sizing and configuration
-The single planner instance runs the planning phase for bloom blocks for each tenant in the given interval
-and puts the created tasks to an internal task queue.
-Builders process tasks sequentially by pulling them from the queue. The amount of builder replicas required to complete
-all pending tasks before the next planning iteration depends on the value of `-bloom-build.planner.bloom_split_series_keyspace_by`,
-the amount of tenants, and the log volume of the streams.
-
-The maximum block size is configured per tenant via `-bloom-build.max-block-size`.
-The actual block size might exceed this limit given that we append streams blooms to the block until the 
-block is larger than the configured maximum size. Blocks are created in memory and as soon as they are written to the 
-object store they are freed. Chunks and TSDB files are downloaded from the object store to the file system. 
-We estimate that builders are able to process 4MB worth of data per second per core.
-
-## Bloom Gateway
-Bloom Gateways handle chunks filtering requests from the [index gateway](https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/components/#index-gateway). 
-The service takes a list of chunks and a filtering expression and matches them against the blooms, 
-filtering out those chunks not matching the given filter expression.
-
-This component is horizontally scalable and every instance only owns a subset of the stream 
-fingerprint range for which it performs the filtering. 
-The sharding of the data is performed on the client side using DNS discovery of the server instances 
-and the [jumphash](https://arxiv.org/abs/1406.2294) algorithm for consistent hashing 
-and even distribution of the stream fingerprints across Bloom Gateway instances.
-
-You can find all the configuration options for this component in the Configure section for the [Bloom Gateways][bloom-gateway-cfg].
-Refer to the [Enable Query Acceleration with Blooms](#enable-query-acceleration-with-blooms) section below for a configuration snippet enabling this feature.
-
-### Sizing and configuration
-Bloom Gateways use their local file system as a Least Recently Used (LRU) cache for blooms that are 
-downloaded from object storage. The size of the blooms depend on the ingest volume and the log content cardinality, 
-as well as on build settings of the blooms, namely n-gram length, skip-factor, and false-positive-rate.
-With default settings, bloom filters make up roughly 3% of the chunk data.
-
-Example calculation for storage requirements of blooms for a single tenant.
-```
-100 MB/s ingest rate ~> 8.6 TB/day chunks ~> 260 GB/day blooms
-```
+* The label filter expression using **string equality**, such as `| key="value"`.
+* The label filter expression is querying for structured metadata and not a stream label.
+* The label filter expression is placed before any [parser expression][], [labels format expression][], [drop labels expression][], or [keep labels expression][].
 
-Since reading blooms depends heavily on disk IOPS, Bloom Gateways should make use of multiple, 
-locally attached SSD disks (NVMe) to increase i/o throughput. 
-Multiple directories on different disk mounts can be specified using the `-bloom.shipper.working-directory` [setting][storage-config-cfg] 
-when using a comma separated list of mount points, for example:
-```
--bloom.shipper.working-directory="/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3"
-```
+To take full advantage of blooms, ensure that filtering structured metadata is done before any parse expression:
 
-Bloom Gateways need to deal with relatively large files: the bloom filter blocks. 
-Even though the binary format of the bloom blocks allows for reading them into memory in smaller pages, 
-the memory consumption depends on the amount of pages that are concurrently loaded into memory for processing. 
-The product of three settings control the maximum amount of bloom data in memory at any given 
-time: `-bloom-gateway.worker-concurrency`, `-bloom-gateway.block-query-concurrency`, and `-bloom.max-query-page-size`.
-
-Example, assuming 4 CPU cores:
-```
--bloom-gateway.worker-concurrency=4      // 1x NUM_CORES
--bloom-gateway.block-query-concurrency=8 // 2x NUM_CORES
--bloom.max-query-page-size=64MiB
-
-4 x 8 x 64MiB = 2048MiB
+```logql
+{cluster="prod"} | logfmt | json | detected_level="error"  # NOT ACCELERATED: structured metadata filter is after a parse stage
+{cluster="prod"} | detected_level="error" | logfmt | json  # ACCELERATED: structured metadata filter is before any parse stage
 ```
 
-Here, the memory requirement for block processing is 2GiB.
-To get the minimum requirements for the Bloom Gateways, you need to double the value.
-
-## Building and querying blooms
-Bloom filters are built per stream and aggregated together into block files. 
-Streams are assigned to blocks by their fingerprint, following the same ordering scheme as Loki’s TSDB and sharding calculation.
-This gives a data locality benefit when querying as streams in the same shard are likely to be in the same block.
-
-In addition to blocks, builders maintain a list of metadata files containing references to bloom blocks and the 
-TSDB index files they were built from. Gateways and the planner use these metadata files to discover existing blocks.
-
-Every `-bloom-build.planner.interval`, the planner will load the latest TSDB files for all tenants for
-which bloom building is enabled, and compares the TSDB files with the latest bloom metadata files. 
-If there are new TSDB files or any of them have changed, the planner will create a task for the streams and chunks
-referenced by the TSDB file.
-
-The builder pulls a task from the planner's queue and processes the containing streams and chunks.
-For a given stream, the builder will iterate through all the log lines inside its new  chunks and build a bloom for the
-stream. In case of changes for a previously processed TSDB file, builders will try to reuse blooms from existing blocks
-instead of building new ones from scratch.
-The builder computes [n-grams](https://en.wikipedia.org/wiki/N-gram#:~:text=An%20n%2Dgram%20is%20a,pairs%20extracted%20from%20a%20genome.)
-for each log line of each chunk of a stream and appends both the hash of each n-gram and the hash of each n-gram plus
-the chunk identifier to the bloom. The former allows gateways to skip whole streams while the latter is for skipping
-individual chunks.
-
-For example, given a log line `abcdef` in the chunk `c6dj8g`, we compute its n-grams: `abc`, `bcd`, `cde`, `def`. 
-And append to the stream bloom the following hashes: `hash("abc")`, `hash("abc" + "c6dj8g")` ... `hash("def")`, `hash("def" + "c6dj8g")`.
-
-By adding n-grams to blooms instead of whole log lines, we can perform partial matches. 
-For the example above, a filter expression `|= "bcd"` would match against the bloom.
-The filter `|= "bcde` would also match the bloom since we decompose the filter into n-grams: 
-`bcd`, `cde` which both are present in the bloom.
-
-N-grams sizes are configurable. The longer the n-gram is, the fewer tokens we need to append to the blooms, 
-but the longer filtering expressions need to be able to check them against blooms. 
-For the example above, where the n-gram length is 3, we need filtering expressions that have at least 3 characters.
-
-### Queries for which blooms are used
-Loki will check blooms for any log filtering expression within a query that satisfies the following criteria:
-- The filtering expression contains at least as many characters as the n-gram length used to build the blooms.
-  - For example, if the n-grams length is 5, the filter `|= "foo"` will not take advantage of blooms but `|= "foobar"` would.
-- If the filter is a regex, we use blooms only if we can simplify the regex to a set of simple matchers.
-  - For example, `|~ "(error|warn)"` would be simplified into `|= "error" or "warn"` thus would make use of blooms, 
-    whereas `|~ "f.*oo"` would not be simplifiable.
-- The filtering expression is a match (`|=`) or regex match (`|~`) filter. We don’t use blooms for not equal (`!=`) or not regex (`!~`) expressions.
-  - For example, `|= "level=error"` would use blooms but `!= "level=error"` would not.
-- The filtering expression is placed before a [line format expression](https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#line-format-expression).
-  - For example, with `|= "level=error" | logfmt | line_format "ERROR {{.err}}" |= "traceID=3ksn8d4jj3"`, 
-    the first filter (`|= "level=error"`) will benefit from blooms but the second one (`|= "traceID=3ksn8d4jj3"`) will not.
-
-## Query sharding
-Query acceleration does not just happen while processing chunks, but also happens from the query planning phase where
-the query frontend applies [query sharding](https://lokidex.com/posts/tsdb/#sharding). 
-Loki 3.0 introduces a new [per-tenant configuration][tenant-limits] flag `tsdb_sharding_strategy` which defaults to computing 
-shards as in previous versions of Loki by using the index stats to come up with the closest power of two that would 
-optimistically divide the data to process in shards of roughly the same size. Unfortunately, 
-the amount of data each stream has is often unbalanced with the rest, 
-therefore, some shards end up processing more data than others.
-
-Query acceleration introduces a new sharding strategy: `bounded`, which uses blooms to reduce the chunks to be 
-processed right away during the planning phase in the query frontend, 
-as well as evenly distributes the amount of chunks each sharded query will need to process.
-
-[tenant-limits]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#limits_config
-[bloom-gateway-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#bloom_gateway
-[bloom-build-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#bloom_build
-[storage-config-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#storage_config
-[microservices]: https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/deployment-modes/#microservices-mode
-[ssd]: https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/deployment-modes/#simple-scalable
+[structured metadata]: {{< relref "../get-started/labels/structured-metadata" >}}
+[label filter expression]: {{< relref "../query/log_queries/_index.md#label-filter-expression" >}}
+[parser expression]: {{< relref "../query/log_queries/_index.md#parser-expression" >}}
+[labels format expression]: {{< relref "../query/log_queries/_index.md#labels-format-expression" >}}
+[drop labels expression]: {{< relref "../query/log_queries/_index.md#drop-labels-expression" >}}
+[keep labels expression]: {{< relref "../query/log_queries/_index.md#keep-labels-expression" >}}

From 4561c6fdb5419c8fad2b000241d8031d0302cb3f Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Tue, 22 Oct 2024 10:08:43 -0400
Subject: [PATCH 04/10] docs: bring back old content for how to operate bloom
 components

---
 .../operations/query-acceleration-blooms.md   | 176 +++++++++++++++++-
 1 file changed, 173 insertions(+), 3 deletions(-)

diff --git a/docs/sources/operations/query-acceleration-blooms.md b/docs/sources/operations/query-acceleration-blooms.md
index 59222129fe90..3c8a8f79911b 100644
--- a/docs/sources/operations/query-acceleration-blooms.md
+++ b/docs/sources/operations/query-acceleration-blooms.md
@@ -9,8 +9,9 @@ keywords:
 ---
 
 # Query Acceleration with Blooms (Experimental)
+
 {{% admonition type="warning" %}}
-This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available.  No SLA is provided.
+This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available. No SLA is provided.
 {{% /admonition %}}
 
 Loki leverages [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) to speed up queries by reducing the amount of data Loki needs to load from the store and iterate through.
@@ -27,11 +28,13 @@ Without accelerated filtering, Loki downloads all the chunks for all the streams
 
 With accelerated filtering, Loki is able to skip most of the chunks and only process the ones where we have a statistical confidence that the structured metadata pair might be present.
 
-## Adding data to blooms
+## Using query acceleration
+
+### Add data to blooms
 
 To make data available for query acceleration, send [structured metadata][] to Loki. Loki builds blooms from all strucutred metadata keys and values.
 
-## Querying blooms
+### Query blooms
 
 Loki will check blooms for any [label filter expression][] that satisfies _all_ of the following criteria:
 
@@ -46,9 +49,176 @@ To take full advantage of blooms, ensure that filtering structured metadata is d
 {cluster="prod"} | detected_level="error" | logfmt | json  # ACCELERATED: structured metadata filter is before any parse stage
 ```
 
+## Operating blooms
+
+### Enable Query Acceleration with Blooms
+
+{{< admonition type="warning" >}}
+Building and querying bloom filters are by design not supported in single binary deployment.
+It can be used with Single Scalable deployment (SSD), but it is recommended to run bloom components only in fully distributed microservice mode.
+The reason is that bloom filters also come with a relatively high cost for both building and querying the bloom filters that only pays off at large scale deployments.
+{{< /admonition >}}
+
+To start building and using blooms you need to:
+
+- Deploy the [Bloom Planner and Builder](#bloom-planner-and-builder) components (as [microservices][microservices] or via the [SSD][ssd] `backend` target) and enable the components in the [Bloom Build config][bloom-build-cfg].
+- Deploy the [Bloom Gateway](#bloom-gateway) component (as a [microservice][microservices] or via the [SSD][ssd] `backend` target) and enable the component in the [Bloom Gateway config][bloom-gateway-cfg].
+- Enable blooms building and filtering for each tenant individually, or for all of them by default.
+
+```yaml
+# Configuration block for the bloom creation.
+bloom_build:
+  enabled: true
+  planner:
+    planning_interval: 6h
+  builder:
+    planner_address: bloom-planner.<namespace>.svc.cluster.local.:9095
+
+# Configuration block for bloom filtering.
+bloom_gateway:
+  enabled: true
+  client:
+    addresses: dnssrvnoa+_bloom-gateway-grpc._tcp.bloom-gateway-headless.<namespace>.svc.cluster.local
+
+# Enable blooms creation and filtering for all tenants by default
+# or do it on a per-tenant basis.
+limits_config:
+  bloom_creation_enabled: true
+  bloom_split_series_keyspace_by: 1024
+  bloom_gateway_enable_filtering: true
+```
+
+For more configuration options refer to the [Bloom Gateway][bloom-gateway-cfg], [Bloom Build][bloom-build-cfg] and [per tenant-limits][tenant-limits] configuration docs.
+We strongly recommend reading the whole documentation for this experimental feature before using it.
+
+### Bloom Planner and Builder
+
+Building bloom filters from the chunks in the object storage is done by two components: the Bloom Planner and the Bloom
+Builder, where the planner creates tasks for bloom building, and sends the tasks to the builders to process and upload the resulting blocks.
+Bloom filters are grouped in bloom blocks spanning multiple streams (also known as series) and chunks from a given day.
+To learn more about how blocks and metadata files are organized, refer to the [Building blooms](#building-blooms) section below.
+
+The Bloom Planner runs as a single instance and calculates the gaps in fingerprint ranges for a certain time period for a tenant for which bloom filters need to be built.
+It dispatches these tasks to the available builders. The planner also applies the [blooms retention](#retention).
+
+{{< admonition type="warning" >}}
+Do not run more than one instance of the Bloom Planner.
+{{< /admonition >}}
+
+The Bloom Builder is a stateless horizontally scalable component and can be scaled independently of the planner to fulfill the processing demand of the created tasks.
+
+You can find all the configuration options for these components in the [Configure section for the Bloom Builder][bloom-build-cfg].
+Refer to the [Enable Query Acceleration with Blooms](#enable-query-acceleration-with-blooms) section below for a configuration snippet enabling this feature.
+
+#### Retention
+
+The Bloom Planner applies bloom block retention on object storage. Retention is disabled by default.
+When enabled, retention is applied to all tenants. The retention for each tenant is the longest of its [configured][tenant-limits] general retention (`retention_period`) and the streams retention (`retention_stream`).
+
+For example, in the following example, tenant A has a bloom retention of 30 days, and tenant B a bloom retention of 40 days for the `{namespace="prod"}` stream.
+
+```yaml
+overrides:
+    "A":
+        retention_period: 30d
+    "B":
+        retention_period: 30d
+        retention_stream:
+            - selector: '{namespace="prod"}'
+              priority: 1
+              period: 40d
+```
+
+#### Sizing and configuration
+
+The single planner instance runs the planning phase for bloom blocks for each tenant in the given interval and puts the created tasks to an internal task queue.
+Builders process tasks sequentially by pulling them from the queue. The amount of builder replicas required to complete all pending tasks before the next planning iteration depends on the value of `-bloom-build.planner.bloom_split_series_keyspace_by`, the amount of tenants, and the log volume of the streams.
+
+The maximum block size is configured per tenant via `-bloom-build.max-block-size`.
+The actual block size might exceed this limit given that we append streams blooms to the block until the block is larger than the configured maximum size.
+Blocks are created in memory and as soon as they are written to the object store they are freed. Chunks and TSDB files are downloaded from the object store to the file system.
+We estimate that builders are able to process 4MB worth of data per second per core.
+
+### Bloom Gateway
+
+Bloom Gateways handle chunks filtering requests from the [index gateway](https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/components/#index-gateway).
+The service takes a list of chunks and a filtering expression and matches them against the blooms, filtering out those chunks not matching the given filter expression.
+
+This component is horizontally scalable and every instance only owns a subset of the stream fingerprint range for which it performs the filtering.
+The sharding of the data is performed on the client side using DNS discovery of the server instances and the [jumphash](https://arxiv.org/abs/1406.2294) algorithm for consistent hashing and even distribution of the stream fingerprints across Bloom Gateway instances.
+
+You can find all the configuration options for this component in the Configure section for the [Bloom Gateways][bloom-gateway-cfg].
+Refer to the [Enable Query Acceleration with Blooms](#enable-query-acceleration-with-blooms) section below for a configuration snippet enabling this feature.
+
+#### Sizing and configuration
+
+Bloom Gateways use their local file system as a Least Recently Used (LRU) cache for blooms that are downloaded from object storage.
+The size of the blooms depend on the ingest volume and number of unique structured metadata key-value pairs, as well as on build settings of the blooms, namely false-positive-rate.
+With default settings, bloom filters make up <1% of the raw structured metadata size.
+
+Since reading blooms depends heavily on disk IOPS, Bloom Gateways should make use of multiple, locally attached SSD disks (NVMe) to increase I/O throughput.
+Multiple directories on different disk mounts can be specified using the `-bloom.shipper.working-directory` [setting][storage-config-cfg] when using a comma separated list of mount points, for example:
+
+```
+-bloom.shipper.working-directory="/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3"
+```
+
+Bloom Gateways need to deal with relatively large files: the bloom filter blocks.
+Even though the binary format of the bloom blocks allows for reading them into memory in smaller pages, the memory consumption depends on the amount of pages that are concurrently loaded into memory for processing.
+The product of three settings control the maximum amount of bloom data in memory at any given time: `-bloom-gateway.worker-concurrency`, `-bloom-gateway.block-query-concurrency`, and `-bloom.max-query-page-size`.
+
+Example, assuming 4 CPU cores:
+
+```
+-bloom-gateway.worker-concurrency=4      // 1x NUM_CORES
+-bloom-gateway.block-query-concurrency=8 // 2x NUM_CORES
+-bloom.max-query-page-size=64MiB
+
+4 x 8 x 64MiB = 2048MiB
+```
+
+Here, the memory requirement for block processing is 2GiB.
+To get the minimum requirements for the Bloom Gateways, you need to double the value.
+
+### Building blooms
+
+Bloom filters are built per stream and aggregated together into block files.
+Streams are assigned to blocks by their fingerprint, following the same ordering scheme as Loki’s TSDB and sharding calculation.
+This gives a data locality benefit when querying as streams in the same shard are likely to be in the same block.
+
+In addition to blocks, builders maintain a list of metadata files containing references to bloom blocks and the
+TSDB index files they were built from. Gateways and the planner use these metadata files to discover existing blocks.
+
+Every `-bloom-build.planner.interval`, the planner will load the latest TSDB files for all tenants for which bloom building is enabled, and compares the TSDB files with the latest bloom metadata files.
+If there are new TSDB files or any of them have changed, the planner will create a task for the streams and chunks referenced by the TSDB file.
+
+The builder pulls a task from the planner's queue and processes the containing streams and chunks.
+For a given stream, the builder will iterate through all the log lines inside its new chunks and build a bloom for the stream.
+In case of changes for a previously processed TSDB file, builders will try to reuse blooms from existing blocks instead of building new ones from scratch.
+The builder converts structured metadata from each log line of each chunk of a stream and appends the hash of each key, value, and key-value pair to the bloom, followed by the hashes combined with the chunk identifier.
+The first set of hashes allows gateways to skip whole streams, while the latter is for skipping individual chunks.
+
+For example, given structured metadata `foo=bar` in the chunk `c6dj8g`, we append to the stream bloom the following hashes: `hash("foo")`, `hash("bar")`, `hash("foo=bar")`, `hash("c6dj8g" + "foo")` ... `hash("c6dj8g" + "foo=bar")`.
+
+### Query sharding
+
+Query acceleration does not just happen while processing chunks, but also happens from the query planning phase where the query frontend applies [query sharding](https://lokidex.com/posts/tsdb/#sharding).
+Loki 3.0 introduces a new [per-tenant configuration][tenant-limits] flag `tsdb_sharding_strategy` which defaults to computing shards as in previous versions of Loki by using the index stats to come up with the closest power of two that would optimistically divide the data to process in shards of roughly the same size.
+Unfortunately, the amount of data each stream has is often unbalanced with the rest, therefore, some shards end up processing more data than others.
+
+Query acceleration introduces a new sharding strategy: `bounded`, which uses blooms to reduce the chunks to be processed right away during the planning phase in the query frontend, as well as evenly distributes the amount of chunks each sharded query will need to process.
+
 [structured metadata]: {{< relref "../get-started/labels/structured-metadata" >}}
 [label filter expression]: {{< relref "../query/log_queries/_index.md#label-filter-expression" >}}
 [parser expression]: {{< relref "../query/log_queries/_index.md#parser-expression" >}}
 [labels format expression]: {{< relref "../query/log_queries/_index.md#labels-format-expression" >}}
 [drop labels expression]: {{< relref "../query/log_queries/_index.md#drop-labels-expression" >}}
 [keep labels expression]: {{< relref "../query/log_queries/_index.md#keep-labels-expression" >}}
+[tenant-limits]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#limits_config
+[bloom-gateway-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#bloom_gateway
+[bloom-build-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#bloom_build
+[storage-config-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#storage_config
+[microservices]: https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/deployment-modes/#microservices-mode
+[ssd]: https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/deployment-modes/#simple-scalable
+
+

From 9b974dc4dac6c189cad22dfff7180463d66825cd Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Tue, 22 Oct 2024 13:43:31 -0400
Subject: [PATCH 05/10] Update
 docs/sources/operations/query-acceleration-blooms.md

Co-authored-by: J Stickler <julie.stickler@grafana.com>
---
 docs/sources/operations/query-acceleration-blooms.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sources/operations/query-acceleration-blooms.md b/docs/sources/operations/query-acceleration-blooms.md
index 3c8a8f79911b..d5e9817c157b 100644
--- a/docs/sources/operations/query-acceleration-blooms.md
+++ b/docs/sources/operations/query-acceleration-blooms.md
@@ -32,7 +32,7 @@ With accelerated filtering, Loki is able to skip most of the chunks and only pro
 
 ### Add data to blooms
 
-To make data available for query acceleration, send [structured metadata][] to Loki. Loki builds blooms from all strucutred metadata keys and values.
+To make data available for query acceleration, send [structured metadata][] to Loki. Loki builds blooms from all structured metadata keys and values.
 
 ### Query blooms
 

From f8492f7403743545a1a7a9e118659d6ff53d6e1c Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Tue, 22 Oct 2024 13:46:36 -0400
Subject: [PATCH 06/10] docs: remove relref usages

---
 docs/sources/operations/query-acceleration-blooms.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/sources/operations/query-acceleration-blooms.md b/docs/sources/operations/query-acceleration-blooms.md
index d5e9817c157b..af1fe5391852 100644
--- a/docs/sources/operations/query-acceleration-blooms.md
+++ b/docs/sources/operations/query-acceleration-blooms.md
@@ -208,12 +208,12 @@ Unfortunately, the amount of data each stream has is often unbalanced with the r
 
 Query acceleration introduces a new sharding strategy: `bounded`, which uses blooms to reduce the chunks to be processed right away during the planning phase in the query frontend, as well as evenly distributes the amount of chunks each sharded query will need to process.
 
-[structured metadata]: {{< relref "../get-started/labels/structured-metadata" >}}
-[label filter expression]: {{< relref "../query/log_queries/_index.md#label-filter-expression" >}}
-[parser expression]: {{< relref "../query/log_queries/_index.md#parser-expression" >}}
-[labels format expression]: {{< relref "../query/log_queries/_index.md#labels-format-expression" >}}
-[drop labels expression]: {{< relref "../query/log_queries/_index.md#drop-labels-expression" >}}
-[keep labels expression]: {{< relref "../query/log_queries/_index.md#keep-labels-expression" >}}
+[structured metadata]: https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/labels/structured-metadata
+[label filter expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#label-filter-expression
+[parser expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#parser-expression
+[labels format expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#labels-format-expression
+[drop labels expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#drop-labels-expression
+[keep labels expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#keep-labels-expression
 [tenant-limits]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#limits_config
 [bloom-gateway-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#bloom_gateway
 [bloom-build-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#bloom_build

From 97cb0003358e35a434178ec1b750653152ed9a43 Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Thu, 24 Oct 2024 11:13:30 -0400
Subject: [PATCH 07/10] docs: separate usage of bloom filters and management of
 bloom filters

---
 ...cceleration-blooms.md => bloom-filters.md} | 57 ++++++-------------
 docs/sources/query/query_accceleration.md     | 45 +++++++++++++++
 2 files changed, 61 insertions(+), 41 deletions(-)
 rename docs/sources/operations/{query-acceleration-blooms.md => bloom-filters.md} (86%)
 create mode 100644 docs/sources/query/query_accceleration.md

diff --git a/docs/sources/operations/query-acceleration-blooms.md b/docs/sources/operations/bloom-filters.md
similarity index 86%
rename from docs/sources/operations/query-acceleration-blooms.md
rename to docs/sources/operations/bloom-filters.md
index af1fe5391852..c1067a8346f4 100644
--- a/docs/sources/operations/query-acceleration-blooms.md
+++ b/docs/sources/operations/bloom-filters.md
@@ -1,14 +1,16 @@
 ---
-title: Query Acceleration with Blooms (Experimental)
-menuTitle: Query Acceleration with Blooms
-description: Describes how to enable and configure query acceleration with blooms.
+title: Bloom filters (Experimental)
+menuTitle: Bloom filters
+description: Describes how to enable and configure query acceleration with bloom filters.
 weight:
 keywords:
   - blooms
   - query acceleration
+aliases:
+  - ./query-acceleration-blooms
 ---
 
-# Query Acceleration with Blooms (Experimental)
+# Bloom filters (Experimental)
 
 {{% admonition type="warning" %}}
 This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available. No SLA is provided.
@@ -28,30 +30,9 @@ Without accelerated filtering, Loki downloads all the chunks for all the streams
 
 With accelerated filtering, Loki is able to skip most of the chunks and only process the ones where we have a statistical confidence that the structured metadata pair might be present.
 
-## Using query acceleration
+To learn how to write queries to use bloom filters, refer to [Query acceleration][].
 
-### Add data to blooms
-
-To make data available for query acceleration, send [structured metadata][] to Loki. Loki builds blooms from all structured metadata keys and values.
-
-### Query blooms
-
-Loki will check blooms for any [label filter expression][] that satisfies _all_ of the following criteria:
-
-* The label filter expression using **string equality**, such as `| key="value"`.
-* The label filter expression is querying for structured metadata and not a stream label.
-* The label filter expression is placed before any [parser expression][], [labels format expression][], [drop labels expression][], or [keep labels expression][].
-
-To take full advantage of blooms, ensure that filtering structured metadata is done before any parse expression:
-
-```logql
-{cluster="prod"} | logfmt | json | detected_level="error"  # NOT ACCELERATED: structured metadata filter is after a parse stage
-{cluster="prod"} | detected_level="error" | logfmt | json  # ACCELERATED: structured metadata filter is before any parse stage
-```
-
-## Operating blooms
-
-### Enable Query Acceleration with Blooms
+## Enable bloom filters
 
 {{< admonition type="warning" >}}
 Building and querying bloom filters are by design not supported in single binary deployment.
@@ -91,7 +72,7 @@ limits_config:
 For more configuration options refer to the [Bloom Gateway][bloom-gateway-cfg], [Bloom Build][bloom-build-cfg] and [per tenant-limits][tenant-limits] configuration docs.
 We strongly recommend reading the whole documentation for this experimental feature before using it.
 
-### Bloom Planner and Builder
+## Bloom Planner and Builder
 
 Building bloom filters from the chunks in the object storage is done by two components: the Bloom Planner and the Bloom
 Builder, where the planner creates tasks for bloom building, and sends the tasks to the builders to process and upload the resulting blocks.
@@ -110,7 +91,7 @@ The Bloom Builder is a stateless horizontally scalable component and can be scal
 You can find all the configuration options for these components in the [Configure section for the Bloom Builder][bloom-build-cfg].
 Refer to the [Enable Query Acceleration with Blooms](#enable-query-acceleration-with-blooms) section below for a configuration snippet enabling this feature.
 
-#### Retention
+### Retention
 
 The Bloom Planner applies bloom block retention on object storage. Retention is disabled by default.
 When enabled, retention is applied to all tenants. The retention for each tenant is the longest of its [configured][tenant-limits] general retention (`retention_period`) and the streams retention (`retention_stream`).
@@ -129,7 +110,7 @@ overrides:
               period: 40d
 ```
 
-#### Sizing and configuration
+### Sizing and configuration
 
 The single planner instance runs the planning phase for bloom blocks for each tenant in the given interval and puts the created tasks to an internal task queue.
 Builders process tasks sequentially by pulling them from the queue. The amount of builder replicas required to complete all pending tasks before the next planning iteration depends on the value of `-bloom-build.planner.bloom_split_series_keyspace_by`, the amount of tenants, and the log volume of the streams.
@@ -139,7 +120,7 @@ The actual block size might exceed this limit given that we append streams bloom
 Blocks are created in memory and as soon as they are written to the object store they are freed. Chunks and TSDB files are downloaded from the object store to the file system.
 We estimate that builders are able to process 4MB worth of data per second per core.
 
-### Bloom Gateway
+## Bloom Gateway
 
 Bloom Gateways handle chunks filtering requests from the [index gateway](https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/components/#index-gateway).
 The service takes a list of chunks and a filtering expression and matches them against the blooms, filtering out those chunks not matching the given filter expression.
@@ -150,7 +131,7 @@ The sharding of the data is performed on the client side using DNS discovery of
 You can find all the configuration options for this component in the Configure section for the [Bloom Gateways][bloom-gateway-cfg].
 Refer to the [Enable Query Acceleration with Blooms](#enable-query-acceleration-with-blooms) section below for a configuration snippet enabling this feature.
 
-#### Sizing and configuration
+### Sizing and configuration
 
 Bloom Gateways use their local file system as a Least Recently Used (LRU) cache for blooms that are downloaded from object storage.
 The size of the blooms depend on the ingest volume and number of unique structured metadata key-value pairs, as well as on build settings of the blooms, namely false-positive-rate.
@@ -180,7 +161,7 @@ Example, assuming 4 CPU cores:
 Here, the memory requirement for block processing is 2GiB.
 To get the minimum requirements for the Bloom Gateways, you need to double the value.
 
-### Building blooms
+## Building blooms
 
 Bloom filters are built per stream and aggregated together into block files.
 Streams are assigned to blocks by their fingerprint, following the same ordering scheme as Loki’s TSDB and sharding calculation.
@@ -200,7 +181,7 @@ The first set of hashes allows gateways to skip whole streams, while the latter
 
 For example, given structured metadata `foo=bar` in the chunk `c6dj8g`, we append to the stream bloom the following hashes: `hash("foo")`, `hash("bar")`, `hash("foo=bar")`, `hash("c6dj8g" + "foo")` ... `hash("c6dj8g" + "foo=bar")`.
 
-### Query sharding
+## Query sharding
 
 Query acceleration does not just happen while processing chunks, but also happens from the query planning phase where the query frontend applies [query sharding](https://lokidex.com/posts/tsdb/#sharding).
 Loki 3.0 introduces a new [per-tenant configuration][tenant-limits] flag `tsdb_sharding_strategy` which defaults to computing shards as in previous versions of Loki by using the index stats to come up with the closest power of two that would optimistically divide the data to process in shards of roughly the same size.
@@ -208,17 +189,11 @@ Unfortunately, the amount of data each stream has is often unbalanced with the r
 
 Query acceleration introduces a new sharding strategy: `bounded`, which uses blooms to reduce the chunks to be processed right away during the planning phase in the query frontend, as well as evenly distributes the amount of chunks each sharded query will need to process.
 
+[Query acceleration]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/query-acceleration
 [structured metadata]: https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/labels/structured-metadata
-[label filter expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#label-filter-expression
-[parser expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#parser-expression
-[labels format expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#labels-format-expression
-[drop labels expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#drop-labels-expression
-[keep labels expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#keep-labels-expression
 [tenant-limits]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#limits_config
 [bloom-gateway-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#bloom_gateway
 [bloom-build-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#bloom_build
 [storage-config-cfg]: https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#storage_config
 [microservices]: https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/deployment-modes/#microservices-mode
 [ssd]: https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/deployment-modes/#simple-scalable
-
-
diff --git a/docs/sources/query/query_accceleration.md b/docs/sources/query/query_accceleration.md
new file mode 100644
index 000000000000..604099eeb0a6
--- /dev/null
+++ b/docs/sources/query/query_accceleration.md
@@ -0,0 +1,45 @@
+---
+title: Query acceleration
+menuTitle: Query acceleration
+description: Provides instructions on how to write LogQL queries to benefit from query acceleration.
+weight: 900
+keywords:
+  - blooms
+  - query acceleration
+---
+
+# Query acceleration (Experimental)
+
+{{% admonition type="warning" %}}
+Query acceleration using blooms is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available. No SLA is provided.
+{{% /admonition %}}
+
+If [bloom filters][] are enabled, you can write LogQL queries using [structured metadata][] to benefit from query acceleration.
+
+## Prerequisites
+
+* [Bloom filters][bloom filters] must be enabled.
+* Logs must be sending [structured metadata][].
+
+### Query blooms
+
+Queries will be accelerated for any [label filter expression][] that satisfies _all_ of the following criteria:
+
+* The label filter expression using **string equality**, such as `| key="value"`.
+* The label filter expression is querying for structured metadata and not a stream label.
+* The label filter expression is placed before any [parser expression][], [labels format expression][], [drop labels expression][], or [keep labels expression][].
+
+To take full advantage of query acceleration with blooms, ensure that filtering structured metadata is done before any parse expression:
+
+```logql
+{cluster="prod"} | logfmt | json | detected_level="error"  # NOT ACCELERATED: structured metadata filter is after a parse stage
+{cluster="prod"} | detected_level="error" | logfmt | json  # ACCELERATED: structured metadata filter is before any parse stage
+```
+
+[bloom filters]: https://grafana.com/docs/loki/<LOKI_VERSION>/operations/bloom-filters/
+[structured metadata]: https://grafana.com/docs/loki/<LOKI_VERSION>/get-started/labels/structured-metadata
+[label filter expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#label-filter-expression
+[parser expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#parser-expression
+[labels format expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#labels-format-expression
+[drop labels expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#drop-labels-expression
+[keep labels expression]: https://grafana.com/docs/loki/<LOKI_VERSION>/query/log_queries/#keep-labels-expression

From 3ad4e0dff4b5951af63e4a97e1b69bee9d226ca4 Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Thu, 24 Oct 2024 12:08:46 -0400
Subject: [PATCH 08/10] Apply suggestions from code review

Co-authored-by: J Stickler <julie.stickler@grafana.com>
---
 docs/sources/operations/bloom-filters.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/sources/operations/bloom-filters.md b/docs/sources/operations/bloom-filters.md
index c1067a8346f4..beb7f57cffe0 100644
--- a/docs/sources/operations/bloom-filters.md
+++ b/docs/sources/operations/bloom-filters.md
@@ -18,7 +18,7 @@ This feature is an [experimental feature](/docs/release-life-cycle/). Engineerin
 
 Loki leverages [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) to speed up queries by reducing the amount of data Loki needs to load from the store and iterate through.
 Loki is often used to run "needle in a haystack" queries; these are queries where a large number of log lines are searched, but only a few log lines match the query.
-Some common use cases are needing to find all logs tied to a specific trace ID or customer ID.
+Some common use cases are searching all logs tied to a specific trace ID or customer ID.
 
 An example of such queries would be looking for a trace ID on a whole cluster for the past 24 hours:
 
@@ -113,7 +113,7 @@ overrides:
 ### Sizing and configuration
 
 The single planner instance runs the planning phase for bloom blocks for each tenant in the given interval and puts the created tasks to an internal task queue.
-Builders process tasks sequentially by pulling them from the queue. The amount of builder replicas required to complete all pending tasks before the next planning iteration depends on the value of `-bloom-build.planner.bloom_split_series_keyspace_by`, the amount of tenants, and the log volume of the streams.
+Builders process tasks sequentially by pulling them from the queue. The amount of builder replicas required to complete all pending tasks before the next planning iteration depends on the value of `-bloom-build.planner.bloom_split_series_keyspace_by`, the number of tenants, and the log volume of the streams.
 
 The maximum block size is configured per tenant via `-bloom-build.max-block-size`.
 The actual block size might exceed this limit given that we append streams blooms to the block until the block is larger than the configured maximum size.
@@ -145,7 +145,7 @@ Multiple directories on different disk mounts can be specified using the `-bloom
 ```
 
 Bloom Gateways need to deal with relatively large files: the bloom filter blocks.
-Even though the binary format of the bloom blocks allows for reading them into memory in smaller pages, the memory consumption depends on the amount of pages that are concurrently loaded into memory for processing.
+Even though the binary format of the bloom blocks allows for reading them into memory in smaller pages, the memory consumption depends on the number of pages that are concurrently loaded into memory for processing.
 The product of three settings control the maximum amount of bloom data in memory at any given time: `-bloom-gateway.worker-concurrency`, `-bloom-gateway.block-query-concurrency`, and `-bloom.max-query-page-size`.
 
 Example, assuming 4 CPU cores:

From 83d79291b006d1e717f00cd7a988c4ef0ee5ec94 Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Thu, 24 Oct 2024 12:33:42 -0400
Subject: [PATCH 09/10] docs: fix stale anchor

---
 docs/sources/operations/bloom-filters.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/sources/operations/bloom-filters.md b/docs/sources/operations/bloom-filters.md
index beb7f57cffe0..63b0c4ecfaa6 100644
--- a/docs/sources/operations/bloom-filters.md
+++ b/docs/sources/operations/bloom-filters.md
@@ -89,7 +89,7 @@ Do not run more than one instance of the Bloom Planner.
 The Bloom Builder is a stateless horizontally scalable component and can be scaled independently of the planner to fulfill the processing demand of the created tasks.
 
 You can find all the configuration options for these components in the [Configure section for the Bloom Builder][bloom-build-cfg].
-Refer to the [Enable Query Acceleration with Blooms](#enable-query-acceleration-with-blooms) section below for a configuration snippet enabling this feature.
+Refer to the [Enable bloom filters](#enable-bloom-filters) section above for a configuration snippet enabling this feature.
 
 ### Retention
 
@@ -129,7 +129,7 @@ This component is horizontally scalable and every instance only owns a subset of
 The sharding of the data is performed on the client side using DNS discovery of the server instances and the [jumphash](https://arxiv.org/abs/1406.2294) algorithm for consistent hashing and even distribution of the stream fingerprints across Bloom Gateway instances.
 
 You can find all the configuration options for this component in the Configure section for the [Bloom Gateways][bloom-gateway-cfg].
-Refer to the [Enable Query Acceleration with Blooms](#enable-query-acceleration-with-blooms) section below for a configuration snippet enabling this feature.
+Refer to the [Enable bloom filters](#enable-bloom-filters) section above for a configuration snippet enabling this feature.
 
 ### Sizing and configuration
 

From cd6d201e0f6183d5e8734b98636dba41dd7ce48d Mon Sep 17 00:00:00 2001
From: Robert Fratto <robertfratto@gmail.com>
Date: Thu, 24 Oct 2024 13:39:14 -0400
Subject: [PATCH 10/10] docs: fix title

---
 docs/sources/query/query_accceleration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sources/query/query_accceleration.md b/docs/sources/query/query_accceleration.md
index 604099eeb0a6..ab377b828243 100644
--- a/docs/sources/query/query_accceleration.md
+++ b/docs/sources/query/query_accceleration.md
@@ -1,5 +1,5 @@
 ---
-title: Query acceleration
+title: Query acceleration (Experimental)
 menuTitle: Query acceleration
 description: Provides instructions on how to write LogQL queries to benefit from query acceleration.
 weight: 900