-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e: Fixed WaitSumMetrics to fail on non existing metric #2256
Conversation
Some context thanks to @kakkoyun: thanos-io/thanos#2256 |
Thanks @bwplotka! I see the use case but I'm dubious about the fix. As you can see some integration tests fail, and the problem is that some metrics are not immediately exported right after registration. Think about metrics with labels: the first series is exported once a value is tracked for the first time, which may occur after the first time |
This is not true... it's tracked for the first time when you do This means:
Additionally, think about this:
Anyway, in my free time, I can look on some failing cases to see how I would fix them with my "own tips"... Let me know if that makes sense. |
d339d57
to
57b0f7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some changes to rework metrics used in integration tests. We have a bunch of global metrics in Cortex which I'm progressively get rid of. Will work on the table manager metrics too, but in a separate PR.
The metrics used in integration tests are the following:
Fixed:
- cortex_alertmanager_configs
- cortex_dynamo_sync_tables_seconds > introduced cortex_table_manager_sync_success_timestamp_seconds
The label is the tenant which can't be predicted, but used a trick in integration tests to check another related metric first:
- cortex_ingester_memory_series_created_total
- cortex_ingester_memory_series_removed_total
Already OK:
- cortex_ring_tokens_total
- cortex_ingester_shipper_uploads_total
- cortex_ingester_memory_series
- cortex_querier_bucket_store_blocks_loaded
- memberlist_client_cluster_members_count
- cortex_querier_blocks_index_cache_items
- cortex_querier_blocks_index_cache_items_added_total
- cortex_querier_blocks_index_cache_hits_total
@@ -1,5 +1,3 @@ | |||
// +build integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meaning we gave to this tag was "a test requiring Docker". Does it cause you any trouble keeping it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my point is that it's not an integration test so technically this tag is wrong.
What if change this tag to requires_docker
? (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care too much how it's called, but feel like this is change just for the change. What if we have integration tests that don't use docker in the future? README.md is not updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update integration/README.md
accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea behind using a tag here was to avoid running tests in integration dir when doing go test ./...
from main Cortex package. I simply used tag name that matched the directory name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but feel like this is change just for the change
Just seen this and... what? What do you mean? Why I would put on myself more work to improve something in Cortex for no reason? @pstibrany Sorry, I don't get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't feel that renaming the tag was necessary, but ... whatever works. I don't care too much about the name as long as there is a tag in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending on Marco's comment about the integration tag.
Thank you @pracucci ! And sorry for more work for you. But I think it actually will help Cortex as well to avoid init metric problem ❤️ cc @kakkoyun as we saved the world again with out talk 💪 😄 |
@bwplotka We're fulfiling our mission is to make the world better place one PR at a time 😁 |
Makefile
Outdated
@@ -135,7 +135,7 @@ shell: | |||
bash | |||
|
|||
configs-integration-test: | |||
/bin/bash -c "go test -v -tags 'netgo integration' -timeout 30s ./pkg/configs/... ./pkg/ruler/..." | |||
/bin/bash -c "go test -v -tags 'netgo requires_docker' -timeout 30s ./pkg/configs/... ./pkg/ruler/..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different set of integration tests. You should actually rollback this and replace the -tags=integration
in .circleci/config.yml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ups
@@ -1,5 +1,3 @@ | |||
// +build integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update integration/README.md
accordingly?
Also, we've just merged another integration test. Could you rebase |
dc2d6bd
to
f570616
Compare
This is kind of tricky as logically sum of non-existing is 0, but also it's super easy to make a mistake and put wrong metric and sneakly introduce bug... so I think being strict makes sense here? WDYT? Alternative is to extend `isExpected func(sums ...float64) bool` to something like `isExpected func(exists bool, sums ...float64) bool` but I think the proposed simplification makes sense here. (: Also I think e2e base unit test should be run all the time not only on integration build, remove the tag from those in main e2e core package. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
f570616
to
55f7276
Compare
PTAL (: |
55f7276
to
5e2b851
Compare
Done, sorry for missed comment. |
Flakiness? |
No, missed comment :) |
damn I did not read the comment properly like 3rd time 🤦♂️ |
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
5e2b851
to
16a9a44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Once CI pass we can merge it.
Thanks Bartek, this is useful. |
Thanks ❤️ |
…ct/cortex#2256) * e2e: Fixed WaitSumMetrics to fail on non existing metric This is kind of tricky as logically sum of non-existing is 0, but also it's super easy to make a mistake and put wrong metric and sneakly introduce bug... so I think being strict makes sense here? WDYT? Alternative is to extend `isExpected func(sums ...float64) bool` to something like `isExpected func(exists bool, sums ...float64) bool` but I think the proposed simplification makes sense here. (: Also I think e2e base unit test should be run all the time not only on integration build, remove the tag from those in main e2e core package. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed init of the metrics used in integration tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Moved build tag to `required_docker` to be explicit. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>
Hello Busy Cortexianians 👋
This is kind of tricky as logically sum of non-existing is 0, but also it's super easy to make a
mistake and put the wrong metric and sneakily introduce bug... so I think being strict makes sense here? WDYT?
Alternative is to extend
isExpected func(sums ...float64) bool
to something likeisExpected func(exists bool, sums ...float64) bool
but I think the proposed simplification makes sense here. (:
Also, I think the e2e base unit test should be run all the time not only on integration build,
remove the tag from those in the main e2e core package.
cc @pstibrany @pracucci
Signed-off-by: Bartlomiej Plotka bwplotka@gmail.com