Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding doc explaining the importance of groups for compactor #1555

Merged
merged 1 commit into from
Sep 25, 2019
Merged

Adding doc explaining the importance of groups for compactor #1555

merged 1 commit into from
Sep 25, 2019

Conversation

Leulz
Copy link
Contributor

@Leulz Leulz commented Sep 22, 2019

Changes

Added a Groups section in compact.md explaining the importance for the external_labels to be both unique and persistent across different Prometheis.

Motivation

I have recently had to deal with quite a large data loss incident.

I believe that it happened because we had a compactor running while the external_labels of the Prometheis that produced blocks for it had as replica a value that had a hash appended to it.

That hash changed when a Prometheus version restarted for some reason, since it was a hash generated on startup. No more blocks from the group generated by the now-dead replica arrived, and so the old blocks never became big enough for them to be downsampled, and eventually they were deleted because of the retention policy we had.

So I think it's quite important to explain how groups are used by the compactor. I apologize if my explanation is incorrect in some way, I tried to read the code for the compactor, but I am not very familiar with Go.

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Please add the DCO.

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
@Leulz
Copy link
Contributor Author

Leulz commented Sep 23, 2019

@GiedriusS Done, thanks!

@brancz brancz merged commit 3fe208f into thanos-io:master Sep 25, 2019
ivan-kiselev pushed a commit to ivan-kiselev/thanos that referenced this pull request Sep 26, 2019
…io#1555)

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
ivan-kiselev pushed a commit to ivan-kiselev/thanos that referenced this pull request Sep 26, 2019
…io#1555)

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
brancz pushed a commit that referenced this pull request Sep 26, 2019
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* some formatting

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve comments

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve last comment

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Set ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <antonio@santosvelasco.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>

* Add full-stop

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* remove duplicate

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
GiedriusS pushed a commit that referenced this pull request Oct 28, 2019
Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
GiedriusS pushed a commit that referenced this pull request Oct 28, 2019
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* some formatting

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve comments

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* resolve last comment

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update README

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove default

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Set ready

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <antonio@santosvelasco.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>

* Add full-stop

Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>

* remove duplicate

Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
@Leulz Leulz deleted the compactor-doc-groups branch February 26, 2020 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants