Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributor requires ingester configuration to access ring state #5800

Open
jakubgs opened this issue Mar 5, 2024 · 6 comments
Open

Distributor requires ingester configuration to access ring state #5800

jakubgs opened this issue Mar 5, 2024 · 6 comments

Comments

@jakubgs
Copy link
Contributor

jakubgs commented Mar 5, 2024

Describe the bug
While trying to move away from a multi kvstore configuration as suggested by @friedrichg I discovered bizzare behavior. I was using both ETCD and Consul as secondary, mistakenly thinking this could allow me to easily switch to Consul in case of an ETCD outage, but apparently that is not so.

When I removed the configuration for multi kvstore and reduced it to just use ETCD my distributors could not notice any of the Ingesters in the ring. First they appeared as Enhealthy, and when the forget button was pressed they simply disappeared. This confused me quite a bit, but on a hunch I added an ingester configuration section in the config for my distributor nodes:

ingester:
  lifecycler:
    ring:
      ...

And after that the distributor started recognizing the ingesters that were using ETCD as their primary kv store.

This shows that the distributor service actually requires ingester configuration to interact with ingesters.
Considering the documentation for ingester_config states:

image

Which is clearly wrong, because it ALSO configures things for the distributor. Only though pure instinct did I discover this.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex 1.16.0
  2. Configure distributor node using only the distributor config section to use kv store other than default.
  3. Notice that it continues to read ingester ring state from the ingester configuration.

Expected behavior
Sane one.

@jakubgs
Copy link
Contributor Author

jakubgs commented Mar 5, 2024

And it appears this section is also required for querier service to discover ingesters.

What other services also require it? This should be documented and made clear.

@yeya24
Copy link
Contributor

yeya24 commented Mar 5, 2024

This is the configuration used for all components that use the Ingester Ring, which includes Ingester itself, Distributor, Querier and Ruler.
Yeah I think maybe we can do a better job on clarifying things in our doc. @danielblando @alanprot WDYT?

@jakubgs
Copy link
Contributor Author

jakubgs commented Mar 5, 2024

@friedrichg
Copy link
Member

Historically this was just a flag (and still is)

-ring.store

Which has nothing to do with ingester. But is at least more neutral.
This then became ingester_config in the configuration file.

There are more similar instances of this in the configuration file that are hard to understand by new users. Cortex strives for backward compatibility. And it the persue of backward compatibility we leave behind the configuration experience for new users. Sorry for that. Thanks for sharing your experience @jakubgs

I don't think we can fix that fast without creating cortex 2.0 and creating a lot of churn for old users. What is probably better for all is to use give new users more examples of working configurations so they can jump start fast and tweak stuff as needed. My suggestion to improve this in helm cortexproject/cortex-helm-chart#473

In your case, you are deploying this in virtual machines, so I don't think helm applies to you. Do you want to contribute some of the configuration work you have done ? I think some other users would benefit from it 😄

@jakubgs
Copy link
Contributor Author

jakubgs commented Mar 7, 2024

I understand the need for backwards compatibility, and that's fair, but the docs could be more clear about which config is used or required by which service.

Our infra repo for metrics infrastructure is private, but in theory I could extract just the Cortex role out and make it public.
Though it would be nice to also preserve the history of commits if I do that, and that would involve a bit more work. I'll think about how I could do that.

@jakubgs
Copy link
Contributor Author

jakubgs commented Mar 10, 2024

I have extracted our Cortex Ansible repository to a separate public repo: https://github.com/status-im/infra-role-cortex

I did it using this tool, though I had to do some history cleanup: https://github.com/newren/git-filter-repo

Not sure how useful this will be, but maybe it can help someone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants