Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spans): Add indexes for tag columns #5871

Merged
merged 3 commits into from
May 9, 2024

Conversation

phacops
Copy link
Contributor

@phacops phacops commented May 7, 2024

Searching through the spans dataset is key for the Trace Explorer. We'd like to add indexes to speed search queries on user tags.

We didn't add an index for sentry_tags.key since they are mostly the same for all tags (these are specific tags set by Relay for Sentry use).

@phacops phacops requested a review from a team as a code owner May 7, 2024 15:11
@phacops phacops requested a review from Zylphrex May 7, 2024 15:12
Copy link

github-actions bot commented May 7, 2024

This PR has a migration; here is the generated SQL

-- start migrations

-- forward migration spans : 0013_spans_add_indexes_for_tag_columns
Local op: ALTER TABLE spans_local ADD INDEX IF NOT EXISTS bf_tags_key tags.key TYPE bloom_filter(0.0) GRANULARITY 1;
Local op: ALTER TABLE spans_local ADD INDEX IF NOT EXISTS bf_tags_hash_map _tags_hash_map TYPE bloom_filter(0.0) GRANULARITY 1;
Local op: ALTER TABLE spans_local ADD INDEX IF NOT EXISTS bf_sentry_tags_hash_map _sentry_tags_hash_map TYPE bloom_filter(0.0) GRANULARITY 1;
-- end forward migration spans : 0013_spans_add_indexes_for_tag_columns




-- backward migration spans : 0013_spans_add_indexes_for_tag_columns
Local op: ALTER TABLE spans_local DROP INDEX IF EXISTS bf_tags_key;
Local op: ALTER TABLE spans_local DROP INDEX IF EXISTS bf_tags_hash_map;
Local op: ALTER TABLE spans_local DROP INDEX IF EXISTS bf_sentry_tags_hash_map;
-- end backward migration spans : 0013_spans_add_indexes_for_tag_columns

Copy link

codecov bot commented May 7, 2024

Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time.

❌ Failed Test Results:

Completed 1 tests with 1 failed, 0 passed and 0 skipped.

View the full list of failed tests
Test Description Failure message
Testsuite:
pytest

Test name:
tests.test_api.TestApi::test_count

Envs:
- default
No failure message available

storage_set = StorageSetKey.SPANS
indexes = [
("bf_tags_key", "tags.key", "bloom_filter(0.0)", 1),
("bf_tags_hash_map", "_tags_hash_map", "bloom_filter(0.0)", 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the tags hash map indexes don't already exist? Why would the columns be there if the index wasn't added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE TABLE default.spans_local (
  `project_id` UInt64,
  `transaction_id` Nullable(UUID) CODEC(ZSTD(1)),
  `transaction_op` Nullable(String) CODEC(ZSTD(1)),
  `trace_id` UUID CODEC(ZSTD(1)),
  `span_id` UInt64 CODEC(ZSTD(1)),
  `profile_id` Nullable(UUID) CODEC(ZSTD(1)),
  `parent_span_id` Nullable(UInt64) CODEC(ZSTD(1)),
  `segment_id` UInt64 CODEC(ZSTD(1)),
  `is_segment` UInt8 CODEC(DoubleDelta, ZSTD(1)),
  `segment_name` String DEFAULT '' CODEC(ZSTD(1)),
  `start_timestamp` DateTime CODEC(DoubleDelta, ZSTD(1)),
  `start_ms` UInt16 CODEC(Delta(2), ZSTD(1)),
  `end_timestamp` DateTime CODEC(DoubleDelta, ZSTD(1)),
  `end_ms` UInt16 CODEC(Delta(2), ZSTD(1)),
  `duration` UInt32 CODEC(Delta(4), ZSTD(1)),
  `exclusive_time` Float64 CODEC(ZSTD(1)),
  `op` LowCardinality(String) CODEC(ZSTD(1)),
  `group` UInt64 CODEC(Delta(8), ZSTD(1)),
  `group_raw` UInt64 CODEC(Delta(8), ZSTD(1)),
  `span_status` UInt8 CODEC(T64, Delta(1), ZSTD(1)),
  `span_kind` LowCardinality(String),
  `description` String CODEC(ZSTD(1)),
  `status` Nullable(UInt32),
  `module` LowCardinality(String) CODEC(ZSTD(1)),
  `action` LowCardinality(Nullable(String)),
  `domain` Nullable(String),
  `platform` LowCardinality(Nullable(String)) CODEC(ZSTD(1)),
  `user` Nullable(String) CODEC(ZSTD(1)),
  `tags.key` Array(String) CODEC(ZSTD(1)),
  `tags.value` Array(String) CODEC(ZSTD(1)),
  `measurements.key` Array(LowCardinality(String)) CODEC(ZSTD(1)),
  `measurements.value` Array(Float64) CODEC(ZSTD(1)),
  `metrics_summary` String DEFAULT '' CODEC(ZSTD(1)),
  `partition` UInt16 CODEC(T64, Delta(2), ZSTD(1)),
  `offset` UInt64 CODEC(T64, Delta(8), ZSTD(1)),
  `retention_days` UInt16 CODEC(DoubleDelta, ZSTD(1)),
  `deleted` UInt8 CODEC(DoubleDelta, ZSTD(1)),
  `_tags_hash_map` Array(UInt64) MATERIALIZED arrayMap((k, v) -> cityHash64(concat(replaceRegexpAll(k, '(\\=|\\\\)', '\\\\\\1'), '=', v)), tags.key, tags.value) CODEC(ZSTD(1)),
  `sentry_tags.key` Array(String) CODEC(ZSTD(1)),
  `sentry_tags.value` Array(String) CODEC(ZSTD(1)),
  `_sentry_tags_hash_map` Array(UInt64) MATERIALIZED arrayMap((k, v) -> cityHash64(concat(replaceRegexpAll(toString(k), '(\\=|\\\\)', '\\\\\\1'), '=', toString(v))), sentry_tags.key, sentry_tags.value) CODEC(ZSTD(1)),
  `_measurements_hash_map` Array(UInt64) MATERIALIZED arrayMap(k -> cityHash64(replaceRegexpAll(toString(k), '(\\=|\\\\)', '\\\\\\1')), measurements.key),
  INDEX bf_span_id span_id TYPE bloom_filter GRANULARITY 1,
  INDEX bf_trace_id trace_id TYPE bloom_filter(0.) GRANULARITY 1,
  INDEX bf_segment_name segment_name TYPE bloom_filter(0.) GRANULARITY 1
) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/spans/{shard}/default/spans_local', '{replica}', deleted)
PARTITION BY (retention_days, toMonday(end_timestamp))
ORDER BY (project_id, group, end_timestamp, cityHash64(span_id))
SAMPLE BY cityHash64(span_id) TTL end_timestamp + toIntervalDay(retention_days)
SETTINGS index_granularity = 8192

I took this from the system queries page on the admin tool and as far as I can tell, there's no index on these yet.

Copy link
Contributor Author

@phacops phacops May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think they exist.

Here is a result of

SELECT create_table_query FROM system.tables WHERE name = 'spans_local';
CREATE TABLE default.spans_local (
  `project_id` UInt64,
  `transaction_id` Nullable(UUID) CODEC(ZSTD(1)),
  `transaction_op` Nullable(String) CODEC(ZSTD(1)),
  `trace_id` UUID CODEC(ZSTD(1)),
  `span_id` UInt64 CODEC(ZSTD(1)),
  `profile_id` Nullable(UUID) CODEC(ZSTD(1)),
  `parent_span_id` Nullable(UInt64) CODEC(ZSTD(1)),
  `segment_id` UInt64 CODEC(ZSTD(1)),
  `is_segment` UInt8 CODEC(DoubleDelta, ZSTD(1)),
  `segment_name` String DEFAULT '' CODEC(ZSTD(1)),
  `start_timestamp` DateTime CODEC(DoubleDelta, ZSTD(1)),
  `start_ms` UInt16 CODEC(Delta(2), ZSTD(1)),
  `end_timestamp` DateTime CODEC(DoubleDelta, ZSTD(1)),
  `end_ms` UInt16 CODEC(Delta(2), ZSTD(1)),
  `duration` UInt32 CODEC(Delta(4), ZSTD(1)),
  `exclusive_time` Float64 CODEC(ZSTD(1)),
  `op` LowCardinality(String) CODEC(ZSTD(1)),
  `group` UInt64 CODEC(Delta(8), ZSTD(1)),
  `group_raw` UInt64 CODEC(Delta(8), ZSTD(1)),
  `span_status` UInt8 CODEC(T64, Delta(1), ZSTD(1)),
  `span_kind` LowCardinality(String),
  `description` String CODEC(ZSTD(1)),
  `status` Nullable(UInt32),
  `module` LowCardinality(String) CODEC(ZSTD(1)),
  `action` LowCardinality(Nullable(String)),
  `domain` Nullable(String),
  `platform` LowCardinality(Nullable(String)) CODEC(ZSTD(1)),
  `user` Nullable(String) CODEC(ZSTD(1)),
  `tags.key` Array(String) CODEC(ZSTD(1)),
  `tags.value` Array(String) CODEC(ZSTD(1)),
  `measurements.key` Array(LowCardinality(String)) CODEC(ZSTD(1)),
  `measurements.value` Array(Float64) CODEC(ZSTD(1)),
  `metrics_summary` String DEFAULT '' CODEC(ZSTD(1)),
  `partition` UInt16 CODEC(T64, Delta(2), ZSTD(1)),
  `offset` UInt64 CODEC(T64, Delta(8), ZSTD(1)),
  `retention_days` UInt16 CODEC(DoubleDelta, ZSTD(1)),
  `deleted` UInt8 CODEC(DoubleDelta, ZSTD(1)),
  `_tags_hash_map` Array(UInt64) MATERIALIZED arrayMap((k, v) -> cityHash64(concat(replaceRegexpAll(k, '(\\=|\\\\)', '\\\\\\1'), '=', v)), tags.key, tags.value) CODEC(ZSTD(1)),
  `sentry_tags.key` Array(String) CODEC(ZSTD(1)), `sentry_tags.value` Array(String) CODEC(ZSTD(1)),
  `_sentry_tags_hash_map` Array(UInt64) MATERIALIZED arrayMap((k, v) -> cityHash64(concat(replaceRegexpAll(toString(k), '(\\=|\\\\)', '\\\\\\1'), '=', toString(v))), sentry_tags.key, sentry_tags.value) CODEC(ZSTD(1)),
  `_measurements_hash_map` Array(UInt64) MATERIALIZED arrayMap(k -> cityHash64(replaceRegexpAll(toString(k), '(\\=|\\\\)', '\\\\\\1')), measurements.key),
  INDEX bf_span_id span_id TYPE bloom_filter GRANULARITY 1,
  INDEX bf_trace_id trace_id TYPE bloom_filter(0.) GRANULARITY 1,
  INDEX bf_segment_name segment_name TYPE bloom_filter(0.) GRANULARITY 1
) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/spans/{shard}/default/spans_local', '{replica}', deleted)
PARTITION BY (retention_days, toMonday(end_timestamp))
ORDER BY (project_id, group, end_timestamp, cityHash64(span_id))
SAMPLE BY cityHash64(span_id)
TTL end_timestamp + toIntervalDay(retention_days)
SETTINGS index_granularity = 8192;

@phacops phacops requested a review from evanh May 7, 2024 17:31
@phacops phacops merged commit fd3b817 into master May 9, 2024
30 checks passed
@phacops phacops deleted the pierre/spans-add-indices-to-tags-description branch May 9, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants