Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_recovery_source sometimes remains after merge #82595

Closed
jtibshirani opened this issue Jan 14, 2022 · 10 comments
Closed

_recovery_source sometimes remains after merge #82595

jtibshirani opened this issue Jan 14, 2022 · 10 comments
Assignees

Comments

@jtibshirani
Copy link
Contributor

If _source is disabled or filtered in the mappings, we add a _recovery_source field to support shard recoveries and CCR. Once it's no longer needed, then future merges will drop the _recovery_source field to reclaim space.

In certain cases, it appears that _recovery_source can stick around even after a merge. I noticed this issue through the dense vector rally track. This command indexes 100,000 documents with _source disabled, then force merges to 1 segment:

esrally race --track=dense_vector --challenge=index-and-search --track-params="ingest_percentage:10" --on-error abort

At the end, the shard was larger than expected:

195M	data/indices/gPefBjHjTCCxU_EnbSuGrQ/0/index

Using the disk usage API, we see this is due to recovery source:

   "_recovery_source" : {
        "total" : "149.9mb",
        "total_in_bytes" : 157209753,
        ....

There are no replicas, so the force merge should have removed recovery source. I can reproduce this with both 1 and 2 shards. I haven't found a small-scale reproduction yet.

@jtibshirani jtibshirani added >bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Jan 14, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 14, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@ruslaniv
Copy link

ruslaniv commented Nov 30, 2022

I'm seeing the exact same behavior when trying to exclude dense_vector field from being stored in the _source:

"mappings": {
        "_source": {"excludes": ["title_vector"]},
        "properties": {
            "title_vector": {
                "type": "dense_vector",
                "dims": 1024,
                "index": true,
                "similarity": "dot_product"
            },        
}

Upon inspecting the index and based on the size of _recovery_source field and amount of documents I indexed, it looks like the field stores vectors as plain floats without any compression.
FYI one 1024-dim vector stored as plain floats takes approximately 21-22 Kb whereas the same vector compressed and optimized by ES takes about 4 Kb. So that's quite a difference!

@ruslaniv
Copy link

ruslaniv commented Dec 6, 2022

@elasticmachine is there any progress on this issue?
Right now this problem is causing our index to grow to 250Gb instead of estimated 50Gb and the index no longer fits available RAM which severely degrades search performance.

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Dec 6, 2022

The _recovery_source field should be removed at merge time in all docs that are in the latest safe commit and are not retained for recovery by any retention lease. There is no real coordination between merges and retention lease movements so it's possible for documents including _recovery_source to end up in a large-ish segment that doesn't see another merge for a long time. But there shouldn't normally be very many documents like that, at least not unless there are some retention leases which lag a long way behind the max seqno for some reason.

You can get information about the retention leases and sequence numbers with the following command:

GET /<INDEX>/_stats?level=shards&filter_path=indices.*.shards.*.retention_leases,indices.*.shards.*.seq_no,indices.*.shards.*.commit

Can you share the output of that command here?

Edit to add: could you also share the full breakdown of disk usage for your index:

POST /<INDEX>/_disk_usage?run_expensive_tasks=true

@ruslaniv
Copy link

ruslaniv commented Dec 7, 2022

David, thank you for your help!

Here is the info on the retention leases on the index in question

Output
{
  "indices": {
    "proposals.proposals.vector_20221119": {
      "shards": {
        "0": [
          {
            "commit": {
              "id": "FOt/dGPF1B6NxwFV1EWNlw==",
              "generation": 682,
              "user_data": {
                "local_checkpoint": "14491789",
                "es_version": "8.4.1",
                "min_retained_seq_no": "14484118",
                "max_seq_no": "14491789",
                "history_uuid": "8ZXxEfCfRVibbeCI0-hD2Q",
                "max_unsafe_auto_id_timestamp": "-1",
                "translog_uuid": "FyoeUjQ1T8yBf-fZQ8gqsQ"
              },
              "num_docs": 3939193
            },
            "seq_no": {
              "max_seq_no": 14491789,
              "local_checkpoint": 14491789,
              "global_checkpoint": 14491789
            },
            "retention_leases": {
              "primary_term": 1,
              "version": 78196,
              "leases": [
                {
                  "id": "peer_recovery/vGiOGPHoQnSNmndhy2Np1A",
                  "retaining_seq_no": 14491790,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/23uocumgThOiaraPXP_JNA",
                  "retaining_seq_no": 14491790,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                }
              ]
            }
          },
          {
            "commit": {
              "id": "0tJKDmvaz43/5YosYvy9nQ==",
              "generation": 678,
              "user_data": {
                "local_checkpoint": "14491789",
                "es_version": "8.4.1",
                "min_retained_seq_no": "14487455",
                "max_seq_no": "14491789",
                "history_uuid": "8ZXxEfCfRVibbeCI0-hD2Q",
                "max_unsafe_auto_id_timestamp": "-1",
                "translog_uuid": "YU3iTTrPSumU5E7tgGwPlw"
              },
              "num_docs": 3939193
            },
            "seq_no": {
              "max_seq_no": 14491789,
              "local_checkpoint": 14491789,
              "global_checkpoint": 14491789
            },
            "retention_leases": {
              "primary_term": 1,
              "version": 78196,
              "leases": [
                {
                  "id": "peer_recovery/vGiOGPHoQnSNmndhy2Np1A",
                  "retaining_seq_no": 14491790,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/23uocumgThOiaraPXP_JNA",
                  "retaining_seq_no": 14491790,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                }
              ]
            }
          }
        ],
        "1": [
          {
            "commit": {
              "id": "wlSNJOgD2Jm4Ms4eMN8n1w==",
              "generation": 682,
              "user_data": {
                "local_checkpoint": "14526770",
                "min_retained_seq_no": "14521481",
                "es_version": "8.4.1",
                "max_seq_no": "14526770",
                "translog_uuid": "7txDNt4ITMarfIm-NbPGZw",
                "max_unsafe_auto_id_timestamp": "-1",
                "history_uuid": "E-gKvtUtSTS0Ff5ABOv0lQ"
              },
              "num_docs": 3941107
            },
            "seq_no": {
              "max_seq_no": 14526770,
              "local_checkpoint": 14526770,
              "global_checkpoint": 14526770
            },
            "retention_leases": {
              "primary_term": 2,
              "version": 78123,
              "leases": [
                {
                  "id": "peer_recovery/3fQTuJNpQOeCg9k2zQI6Rg",
                  "retaining_seq_no": 14526771,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/vGiOGPHoQnSNmndhy2Np1A",
                  "retaining_seq_no": 14526771,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                }
              ]
            }
          },
          {
            "commit": {
              "id": "0tJKDmvaz43/5YosYvy9ng==",
              "generation": 683,
              "user_data": {
                "local_checkpoint": "14526770",
                "es_version": "8.4.1",
                "min_retained_seq_no": "14524098",
                "max_seq_no": "14526770",
                "history_uuid": "E-gKvtUtSTS0Ff5ABOv0lQ",
                "max_unsafe_auto_id_timestamp": "-1",
                "translog_uuid": "1rZPK20OQ66Jk7rHtjAoAg"
              },
              "num_docs": 3941107
            },
            "seq_no": {
              "max_seq_no": 14526770,
              "local_checkpoint": 14526770,
              "global_checkpoint": 14526770
            },
            "retention_leases": {
              "primary_term": 2,
              "version": 78123,
              "leases": [
                {
                  "id": "peer_recovery/3fQTuJNpQOeCg9k2zQI6Rg",
                  "retaining_seq_no": 14526771,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/vGiOGPHoQnSNmndhy2Np1A",
                  "retaining_seq_no": 14526771,
                  "timestamp": 1670374822751,
                  "source": "peer recovery"
                }
              ]
            }
          }
        ],
        "2": [
          {
            "commit": {
              "id": "wlSNJOgD2Jm4Ms4eMN8nrQ==",
              "generation": 678,
              "user_data": {
                "local_checkpoint": "14375247",
                "min_retained_seq_no": "14370437",
                "es_version": "8.4.1",
                "max_seq_no": "14375247",
                "translog_uuid": "wmeJxF9lSOaUJ0hDigqE1g",
                "max_unsafe_auto_id_timestamp": "-1",
                "history_uuid": "UCaiMhAXR4-zJHNO54tC4A"
              },
              "num_docs": 3940119
            },
            "seq_no": {
              "max_seq_no": 14375247,
              "local_checkpoint": 14375247,
              "global_checkpoint": 14375247
            },
            "retention_leases": {
              "primary_term": 3,
              "version": 78244,
              "leases": [
                {
                  "id": "peer_recovery/3fQTuJNpQOeCg9k2zQI6Rg",
                  "retaining_seq_no": 14375248,
                  "timestamp": 1670374825724,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/23uocumgThOiaraPXP_JNA",
                  "retaining_seq_no": 14375248,
                  "timestamp": 1670374825724,
                  "source": "peer recovery"
                }
              ]
            }
          },
          {
            "commit": {
              "id": "FOt/dGPF1B6NxwFV1EWNmA==",
              "generation": 678,
              "user_data": {
                "local_checkpoint": "14375247",
                "es_version": "8.4.1",
                "min_retained_seq_no": "14374549",
                "max_seq_no": "14375247",
                "history_uuid": "UCaiMhAXR4-zJHNO54tC4A",
                "max_unsafe_auto_id_timestamp": "-1",
                "translog_uuid": "hefqLf7iQXaY0ryQuLUAGQ"
              },
              "num_docs": 3940119
            },
            "seq_no": {
              "max_seq_no": 14375247,
              "local_checkpoint": 14375247,
              "global_checkpoint": 14375247
            },
            "retention_leases": {
              "primary_term": 3,
              "version": 78244,
              "leases": [
                {
                  "id": "peer_recovery/3fQTuJNpQOeCg9k2zQI6Rg",
                  "retaining_seq_no": 14375248,
                  "timestamp": 1670374825724,
                  "source": "peer recovery"
                },
                {
                  "id": "peer_recovery/23uocumgThOiaraPXP_JNA",
                  "retaining_seq_no": 14375248,
                  "timestamp": 1670374825724,
                  "source": "peer recovery"
                }
              ]
            }
          }
        ]
      }
    }
  }
}

Unfortunately the disk analysis command would not complete due to 504 Gateway Time-out error after about 60_000 milliseconds.

What I can do is create another index with the exactly same mapping and index say 10_000 documents and then run disk analysis. Because I was able to run this command on smaller indexes, actually that's how I found out about _recovery_source field.

@nik9000
Copy link
Member

nik9000 commented Dec 7, 2022

Unfortunately the disk analysis command would not complete due to 504 Gateway Time-out error after about 60_000 milliseconds.

image

Darn proxy.

Because I was able to run this command on smaller indexes, actually that's how I found out about _recovery_source field.

Watch out - using smaller indices with _disk_usage can show _recovery_source when in a bigger index it'll have been merged away. At least, that's been my experience with smaller indices - mostly because I can make them so fast and the merge process doesn't clean them up until the merge after the replication.

@DaveCTurner
Copy link
Contributor

We haven't seen anything to suggest that there's a problem with the logic to remove the _recovery_source field on merge once it's safe to do so. Instead it seems that this issue comes about because we don't today schedule merges (or even just the rewrite of individual segments) in order to clean this field up once it becomes unnecessary. Since that's more of a low-level Lucene-interaction question related to the merge scheduling logic I'm going to relabel this for the attention of the search team.

@DaveCTurner DaveCTurner added :Search/Search Search-related issues that do not fall into other categories and removed :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Apr 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Apr 16, 2024
@javanna javanna added :StorageEngine/Logs You know, for Logs and removed :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team labels Apr 22, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@martijnvg
Copy link
Member

Closing in favor of #116726

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants