Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(uptime): add initial table migration #6690

Closed

Conversation

JoshFerge
Copy link
Member

@JoshFerge JoshFerge requested a review from a team as a code owner December 18, 2024 21:53
@JoshFerge JoshFerge requested review from a team and phacops December 18, 2024 21:53
Copy link

github-actions bot commented Dec 18, 2024

This PR has a migration; here is the generated SQL for ./snuba/migrations/groups.py ()

-- start migrations

-- forward migration uptime_monitor_checks : 0001_uptime_monitor_checks
Local op: CREATE TABLE IF NOT EXISTS uptime_monitor_checks_local (organization_id UInt64, project_id UInt64, environment LowCardinality(Nullable(String)), uptime_subscription_id UUID, uptime_check_id UUID, scheduled_check_time DateTime64(3), timestamp DateTime64(3), duration_ms UInt64, region_slug LowCardinality(String), check_status LowCardinality(String), check_status_reason LowCardinality(Nullable(String)), http_status_code UInt16, trace_id UUID, retention_days UInt16) ENGINE ReplicatedReplacingMergeTree('/clickhouse/tables/uptime_monitor_checks/{shard}/default/uptime_monitor_checks_local', '{replica}') PRIMARY KEY (organization_id, project_id, toDateTime(timestamp), uptime_check_id, trace_id) ORDER BY (organization_id, project_id, toDateTime(timestamp), uptime_check_id, trace_id) PARTITION BY (retention_days, toMonday(timestamp)) TTL toDateTime(timestamp) + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS uptime_monitor_checks_dist (organization_id UInt64, project_id UInt64, environment LowCardinality(Nullable(String)), uptime_subscription_id UUID, uptime_check_id UUID, scheduled_check_time DateTime64(3), timestamp DateTime64(3), duration_ms UInt64, region_slug LowCardinality(String), check_status LowCardinality(String), check_status_reason LowCardinality(Nullable(String)), http_status_code UInt16, trace_id UUID, retention_days UInt16) ENGINE Distributed(`cluster_one_sh`, default, uptime_monitor_checks_local, cityHash64(reinterpretAsUInt128(trace_id)));
-- end forward migration uptime_monitor_checks : 0001_uptime_monitor_checks




-- backward migration uptime_monitor_checks : 0001_uptime_monitor_checks
Distributed op: DROP TABLE IF EXISTS uptime_monitor_checks_dist;
Local op: DROP TABLE IF EXISTS uptime_monitor_checks_local;
-- end backward migration uptime_monitor_checks : 0001_uptime_monitor_checks

Column("timestamp", DateTime()),
Column("_sort_timestamp", DateTime()),
Column("duration", UInt(64)),
Column("region_id", UInt(16, Modifiers(nullable=True))),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: make non-nullable / a low cardinality string

Column("scheduled_check_time", DateTime()),
Column("timestamp", DateTime()),
Column("_sort_timestamp", DateTime()),
Column("duration", UInt(64)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indicate the unit in the field name.

Column("retention_days", UInt(16)),
]

indices: Sequence[AddIndicesData] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't start with adding indices already. If you don't want to put the trace_id in the sort key, fine, but this might not be necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've added the trace_id to the sort key / primary key

@JoshFerge JoshFerge requested a review from phacops December 19, 2024 00:15
Column("organization_id", UInt(64)),
Column("project_id", UInt(64)),
Column("environment", String(Modifiers(nullable=True, low_cardinality=True))),
Column("uptime_subscription_id", UInt(64)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: this should be a UUID.

@JoshFerge JoshFerge force-pushed the jferg-uptime-monitor-migration branch from 45b3eb7 to 3b62660 Compare December 19, 2024 20:15
columns=columns,
engine=table_engines.ReplacingMergeTree(
primary_key="(organization_id, project_id, toDateTime(timestamp), uptime_check_id, trace_id)",
order_by="(organization_id, project_id, toDateTime(timestamp), uptime_check_id, trace_id)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there only 1 check per trace? You might something more specific here otherwise the ReplacingMergeTree will merge 2 rows with the same sort key.

The engine differs from MergeTree in that it removes duplicate entries with the same sorting key value (ORDER BY table section, not PRIMARY KEY).

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree

If you're not going to update values or if you don't care about duplicates, you could just use a MergeTree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one check per trace. duplicates would not be ideal

@JoshFerge JoshFerge force-pushed the jferg-uptime-monitor-migration branch from 3b62660 to 1ecfaa9 Compare December 20, 2024 01:19
Copy link

codecov bot commented Dec 20, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2619 1 2618 5
View the top 1 failed tests by shortest run time
tests.utils.streams.test_topics::test_valid_topics
Stack Traces | 0.124s run time
Traceback (most recent call last):
  File ".../local/lib/python3.11....../site-packages/sentry_kafka_schemas/sentry_kafka_schemas.py", line 80, in get_topic
    with open(topic_path) as f:
         ^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '.../local/lib/python3.11.../sentry_kafka_schemas/topics/snuba-dead-letter-uptime-monitor-checks.yaml'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../utils/streams/test_topics.py", line 10, in test_valid_topics
    sentry_kafka_schemas.get_topic(
  File ".../local/lib/python3.11....../site-packages/sentry_kafka_schemas/sentry_kafka_schemas.py", line 83, in get_topic
    raise SchemaNotFound
sentry_kafka_schemas.sentry_kafka_schemas.SchemaNotFound

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

@JoshFerge
Copy link
Member Author

closing as the migration needs to happen first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants