Skip to content

feat: replicate task runs to clickhouse to power dashboard improvements #2035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
May 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
ba7c8a1
WIP clickhouse package with test containers setup
ericallam Apr 25, 2025
a49930a
More clickhouse client setup now with otel and real tests, and the v1…
ericallam Apr 28, 2025
5cd2d20
Add some additional columns to raw_run_events_v1
ericallam Apr 28, 2025
9a1fd64
WIP runs dashboard service
ericallam Apr 28, 2025
bae987d
Create a new run engine event bus event for the runs dashboard to hoo…
ericallam Apr 28, 2025
3349897
Track run events in the run engine
ericallam Apr 28, 2025
4a6e9c2
make sure engine v1 runs get synced to CH
ericallam Apr 29, 2025
cc6695c
Update the attemptNumber of v3 task runs
ericallam Apr 29, 2025
04b2039
Restructure the run events to be more sparse
ericallam Apr 30, 2025
56aeaf0
emit more stuff
ericallam May 1, 2025
a8533e0
Setup replication package
ericallam May 1, 2025
1c00d98
scaffold the replication package
ericallam May 1, 2025
1c14125
replication wip
ericallam May 2, 2025
cde1bfb
resolve conflicts
ericallam May 3, 2025
f3dc43b
more replication stuff
ericallam May 6, 2025
fa4185b
Add ability to drop the replication slot completely on teardown
ericallam May 6, 2025
ee13ebb
Use the new single replacingmergetree task events table for replication
ericallam May 6, 2025
c30a014
get it working
ericallam May 7, 2025
2131b66
insert payloads into their own table only on insert and then join
ericallam May 8, 2025
f3e9041
prepare for using clickhouse cloud and now running ch migrations duri…
ericallam May 8, 2025
da0565e
Handover WIP and tests
ericallam May 9, 2025
fc6b69b
Testing the replication service
ericallam May 9, 2025
651d51a
Remove the runs dashboard stuff that we aren't using anymore
ericallam May 9, 2025
b8dc32d
Added a test for large payloads
ericallam May 9, 2025
ae14fa2
hacky typecheck fix
ericallam May 9, 2025
1d7c2ad
Fix new internal package typecheck issues and start adding telemetry …
ericallam May 9, 2025
955bc25
tracing over spans, some other improvements
ericallam May 11, 2025
3c50bfe
Improvements to the runs replication service, now ready for testing
ericallam May 12, 2025
14a4183
Some fixes and cleanups
ericallam May 12, 2025
024c30a
Don't need this code anymore
ericallam May 12, 2025
aaf65cc
move transaction types into the runs replication service
ericallam May 12, 2025
beabd26
only send spans where there are transaction events
ericallam May 12, 2025
5466599
A couple of suggested tweaks
ericallam May 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions apps/webapp/app/env.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -725,6 +725,47 @@ const EnvironmentSchema = z.object({
// BetterStack
BETTERSTACK_API_KEY: z.string().optional(),
BETTERSTACK_STATUS_PAGE_ID: z.string().optional(),

RUN_REPLICATION_REDIS_HOST: z
.string()
.optional()
.transform((v) => v ?? process.env.REDIS_HOST),
RUN_REPLICATION_REDIS_READER_HOST: z
.string()
.optional()
.transform((v) => v ?? process.env.REDIS_READER_HOST),
RUN_REPLICATION_REDIS_READER_PORT: z.coerce
.number()
.optional()
.transform(
(v) =>
v ?? (process.env.REDIS_READER_PORT ? parseInt(process.env.REDIS_READER_PORT) : undefined)
),
RUN_REPLICATION_REDIS_PORT: z.coerce
.number()
.optional()
.transform((v) => v ?? (process.env.REDIS_PORT ? parseInt(process.env.REDIS_PORT) : undefined)),
RUN_REPLICATION_REDIS_USERNAME: z
.string()
.optional()
.transform((v) => v ?? process.env.REDIS_USERNAME),
RUN_REPLICATION_REDIS_PASSWORD: z
.string()
.optional()
.transform((v) => v ?? process.env.REDIS_PASSWORD),
RUN_REPLICATION_REDIS_TLS_DISABLED: z.string().default(process.env.REDIS_TLS_DISABLED ?? "false"),

RUN_REPLICATION_CLICKHOUSE_URL: z.string().optional(),
RUN_REPLICATION_ENABLED: z.string().default("0"),
RUN_REPLICATION_SLOT_NAME: z.string().default("task_runs_to_clickhouse_v1"),
RUN_REPLICATION_PUBLICATION_NAME: z.string().default("task_runs_to_clickhouse_v1_publication"),
RUN_REPLICATION_MAX_FLUSH_CONCURRENCY: z.coerce.number().int().default(100),
RUN_REPLICATION_FLUSH_INTERVAL_MS: z.coerce.number().int().default(1000),
RUN_REPLICATION_FLUSH_BATCH_SIZE: z.coerce.number().int().default(100),
RUN_REPLICATION_LEADER_LOCK_TIMEOUT_MS: z.coerce.number().int().default(30_000),
RUN_REPLICATION_LEADER_LOCK_EXTEND_INTERVAL_MS: z.coerce.number().int().default(10_000),
RUN_REPLICATION_ACK_INTERVAL_SECONDS: z.coerce.number().int().default(10),
RUN_REPLICATION_LOG_LEVEL: z.enum(["log", "error", "warn", "info", "debug"]).default("info"),
});

export type Environment = z.infer<typeof EnvironmentSchema>;
Expand Down
4 changes: 3 additions & 1 deletion apps/webapp/app/metrics.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ import { env } from "./env.server";

export const metricsRegister = singleton("metricsRegister", initializeMetricsRegister);

function initializeMetricsRegister() {
export type MetricsRegister = Registry<OpenMetricsContentType>;

function initializeMetricsRegister(): MetricsRegister {
const registry = new Registry<OpenMetricsContentType>();

register.setDefaultLabels({
Expand Down
Loading