-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add new mv for logs page chart #2723
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Thank you for following the naming conventions for pull request titles! 🙏 |
From Oz ‣ Andreas Thomas I didn't wanna push everything into a single PR. So let's merge this first then move on to UI. |
From Andreas Thomas ‣ Just got back from the gym |
From Oz ‣ take your time np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (8)
internal/clickhouse/src/logs-timeseries.test.ts (3)
71-72
: SimplifystartTime
assignment by removing unnecessaryDate
object creationThe use of
new Date()
followed by.getTime()
is redundant sinceDate.now()
already provides a timestamp in milliseconds. You can subtract the interval directly fromDate.now()
.Apply this diff to simplify the code:
- startTime: new Date(Date.now() - 24 * 60 * 60 * 1000).getTime(), // 24 hours ago + startTime: Date.now() - 24 * 60 * 60 * 1000, // 24 hours ago
81-82
: SimplifystartTime
assignment by removing unnecessaryDate
object creationSame as above for the hourly timeseries aggregation.
Apply this diff:
- startTime: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000).getTime(), // 7 days ago + startTime: Date.now() - 7 * 24 * 60 * 60 * 1000, // 7 days ago
91-92
: SimplifystartTime
assignment by removing unnecessaryDate
object creationSame as above for the daily timeseries aggregation.
Apply this diff:
- startTime: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000).getTime(), // 30 days ago + startTime: Date.now() - 30 * 24 * 60 * 60 * 1000, // 30 days agointernal/clickhouse/schema/045_create_api_requests_per_minute_mv_v1.sql (1)
13-19
: Consider adding a PARTITION BY clause for better performanceThe current implementation groups by multiple columns but doesn't specify partitioning. For time-series data, partitioning by time ranges (e.g., by day) could significantly improve query performance and data management.
GROUP BY workspace_id, path, response_status, host, method, - time; + time +PARTITION BY toYYYYMM(time);internal/clickhouse/schema/046_create_api_requests_per_day_v1.sql (1)
2-12
: Consider adding column constraints and TTLThe table definition could benefit from:
- NOT NULL constraints where appropriate
- TTL (Time To Live) policy for automatic data cleanup
- Default values for mandatory fields
CREATE TABLE metrics.api_requests_per_day_v1 ( - time DateTime, - workspace_id String, + time DateTime NOT NULL, + workspace_id String NOT NULL, path String, response_status Int, host String, method LowCardinality(String), count Int64 -) ENGINE = SummingMergeTree() +) ENGINE = SummingMergeTree() +TTL time + INTERVAL 90 DAY DELETEinternal/clickhouse/schema/042_create_api_requests_per_hour_v1.sql (2)
2-21
: Consider adding partitioning and TTL for data lifecycle management.The table schema looks good, but consider these enhancements for production readiness:
- Add partitioning by time for efficient data management:
PARTITION BY toYYYYMM(time)
- Define a TTL policy for automatic data cleanup:
TTL time + INTERVAL 3 MONTH DELETE
8-10
: Consider adding a CHECK constraint for HTTP methods.Since the method column expects specific HTTP methods, add a constraint to ensure data integrity:
method LowCardinality(String), CONSTRAINT valid_method CHECK method IN ('GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'HEAD', 'OPTIONS')internal/clickhouse/src/index.ts (1)
5-10
: LGTM! Consider grouping related importsThe imports are logically organized, though consider grouping all logs-related functions together for better maintainability.
-import { - getDailyLogsTimeseries, - getHourlyLogsTimeseries, - getLogs, - getMinutelyLogsTimeseries, -} from "./logs"; +import { + getLogs, + // Timeseries functions + getDailyLogsTimeseries, + getHourlyLogsTimeseries, + getMinutelyLogsTimeseries, +} from "./logs";
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (14)
.github/actions/install/action.yaml
(1 hunks)apps/engineering/content/docs/contributing/testing.mdx
(1 hunks)internal/clickhouse/Dockerfile
(1 hunks)internal/clickhouse/package.json
(1 hunks)internal/clickhouse/schema/042_create_api_requests_per_hour_v1.sql
(1 hunks)internal/clickhouse/schema/043_create_api_requests_per_hour_mv_v1.sql
(1 hunks)internal/clickhouse/schema/044_create_api_requests_per_minute_v1.sql
(1 hunks)internal/clickhouse/schema/045_create_api_requests_per_minute_mv_v1.sql
(1 hunks)internal/clickhouse/schema/046_create_api_requests_per_day_v1.sql
(1 hunks)internal/clickhouse/schema/047_create_api_requests_per_day_mv_v1.sql
(1 hunks)internal/clickhouse/src/index.ts
(2 hunks)internal/clickhouse/src/logs-timeseries.test.ts
(1 hunks)internal/clickhouse/src/logs.ts
(2 hunks)internal/clickhouse/src/testutil.ts
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- internal/clickhouse/Dockerfile
🔇 Additional comments (9)
internal/clickhouse/src/logs.ts (1)
97-218
: Well-structured implementation of timeseries queries
The code for the timeseries queries and associated parameter schemas is clean, efficient, and follows best practices. The use of parameterized queries enhances security, and the structured approach improves maintainability.
internal/clickhouse/schema/047_create_api_requests_per_day_mv_v1.sql (1)
13-19
: Confirm the use of alias time
in GROUP BY
clause
Ensure that using the alias time
in the GROUP BY
clause works as intended in ClickHouse. In some SQL dialects, the full expression must be used instead of the alias. Verifying this will prevent potential aggregation issues.
Run the following script to check if the materialized view aggregates data correctly:
✅ Verification successful
Using alias time
in GROUP BY is correct for ClickHouse
The materialized view definition shows that time
is properly defined as an alias using AS
in the SELECT clause: toStartOfDay(fromUnixTimestamp64Milli(time)) AS time
. ClickHouse explicitly supports using column aliases in GROUP BY clauses, which is one of its distinguishing features from some other SQL dialects. The usage here is correct and will work as intended.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the grouping in the materialized view
# Test: Query the materialized view and check for expected outputs
echo "SELECT time, count FROM metrics.api_requests_per_day_v1 LIMIT 10;" | clickhouse-client
Length of output: 202
Script:
#!/bin/bash
# Let's examine the full materialized view definition and any related table definitions
# to understand the context of the GROUP BY clause
# Check the materialized view definition
fd "047_create_api_requests_per_day_mv_v1.sql" | xargs cat
# Look for related table definitions that might show column definitions
fd -e sql | grep -l "api_requests" | xargs cat
Length of output: 648
internal/clickhouse/schema/043_create_api_requests_per_hour_mv_v1.sql (1)
13-19
: Confirm the use of alias time
in GROUP BY
clause
As with the daily materialized view, ensure that using the alias time
in the GROUP BY
clause functions correctly in ClickHouse to avoid any grouping inconsistencies.
Run the following script to check if the materialized view aggregates data correctly:
✅ Verification successful
Using time
alias in GROUP BY is correct and consistent
After examining the materialized view definition and related tables, the usage of time
in the GROUP BY clause is correct. The time
column is properly defined using toStartOfHour(fromUnixTimestamp64Milli(time))
in the SELECT clause before being used in GROUP BY. This pattern is consistently used across all time-based materialized views (minute, hour, day) in the codebase, and ClickHouse supports using column aliases in GROUP BY clauses.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the grouping in the materialized view
# Test: Query the materialized view and check for expected outputs
echo "SELECT time, count FROM metrics.api_requests_per_hour_v1 LIMIT 10;" | clickhouse-client
Length of output: 203
Script:
#!/bin/bash
# Let's examine the materialized view definition and its source table schema
fd "043_create_api_requests_per_hour_mv_v1.sql" | xargs cat
# Also check for any related table definitions
fd -e sql | grep -i "api_requests" | xargs cat
Length of output: 4345
internal/clickhouse/schema/045_create_api_requests_per_minute_mv_v1.sql (1)
9-10
: Verify time zone handling in timestamp conversion
The fromUnixTimestamp64Milli
conversion followed by toStartOfMinute
could be affected by server timezone settings. Consider explicitly specifying the timezone to ensure consistent results across different environments.
internal/clickhouse/package.json (1)
17-17
: Verify compatibility with Node.js version requirements
Execa v9.3.0 requires Node.js 16.0.0 or newer. Please ensure this aligns with the project's Node.js version requirements.
internal/clickhouse/schema/046_create_api_requests_per_day_v1.sql (1)
13-21
: Verify consistency of column types with other metrics tables
Please ensure that the data types and ordering key structure are consistent with related tables (hourly and minutely metrics).
✅ Verification successful
Column types and ordering structure are consistent across all metrics tables
The verification shows that all three tables (per_minute_v1, per_hour_v1, and per_day_v1) have identical:
- Column names and data types
- Ordering key structure (workspace_id, time, host, path, response_status, method)
- Table engine (SummingMergeTree)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check column types across all metrics tables
rg -A 20 "CREATE TABLE metrics\.api_requests" --type sql
Length of output: 5618
apps/engineering/content/docs/contributing/testing.mdx (1)
Line range hint 1-1
: Verify consistency with related materialized views.
These tables will be populated by materialized views mentioned in the AI summary. Please ensure:
- The schema matches the source tables
- The aggregation logic in MVs aligns with these target tables
✅ Verification successful
The previous output showed the materialized view definitions but we need to verify the source and target table schemas to ensure consistency. This will help us confirm the schema alignment between the source tables and materialized views.
Schema consistency verified across materialized views and tables
The materialized views are correctly configured with matching schemas:
- Source table
raw_api_requests_v1
contains all raw fields - Target tables (
per_hour
,per_minute
,per_day
) have identical schemas with aggregated fields - MVs correctly transform the data with appropriate time-based grouping using
toStartOfHour
,toStartOfMinute
, andtoStartOfDay
- All tables maintain consistent field types (e.g.,
LowCardinality(String)
for method) - Aggregation logic in MVs properly counts requests while preserving all dimension fields
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for related materialized view definitions
rg -A 20 "CREATE MATERIALIZED VIEW.*api_requests" internal/clickhouse/schema/
# Verify source table schemas match these target tables
rg -A 20 "CREATE TABLE.*api_requests.*source" internal/clickhouse/schema/
Length of output: 5943
Script:
#!/bin/bash
# Let's check for the target tables and raw source table definitions
rg -A 20 "CREATE TABLE.*api_requests" internal/clickhouse/schema/
# Also check for any other related schema files
fd ".*api_requests.*\.sql" internal/clickhouse/schema/
Length of output: 8114
internal/clickhouse/src/testutil.ts (1)
48-50
: Verify localhost accessibility in CI environment
The DSN now uses localhost
instead of the container name. While this works locally, ensure it's reliable in CI environments where networking setup might differ.
✅ Verification successful
Using localhost
in the DSN is correct and reliable
The DSN construction using localhost
is correct and reliable because:
- The container is started using
testcontainers
which handles port mapping automatically - The mapped port is obtained using
container.getMappedPort(9000)
, ensuring the correct local port is used - This approach is standard practice with testcontainers, as it manages container networking and port bindings appropriately for both local and CI environments
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if this file is used in CI tests
rg -l "ClickHouseContainer" .github/workflows/
Length of output: 46
Script:
#!/bin/bash
# Let's check where ClickHouseContainer is used and how tests are executed
rg -l "ClickHouseContainer" --type ts
# Check CI workflow files for test execution patterns
rg "npm.*test|yarn.*test|pnpm.*test" .github/workflows/
Length of output: 922
Script:
#!/bin/bash
# Let's check the test configuration to understand the test environment setup
cat vitest.integration.ts
# Check if there's any Docker Compose configuration that might handle networking
fd -e yml -e yaml docker-compose
Length of output: 147
Script:
#!/bin/bash
# Let's check the Docker Compose configuration for network setup
cat deployment/docker-compose.yaml
# Let's also check how the tests are actually using this container
rg -A 5 -B 5 "new ClickHouseContainer\(\)" internal/clickhouse/src/
Length of output: 4417
Script:
#!/bin/bash
# Let's check the test setup in testutil.ts to understand the container initialization
cat internal/clickhouse/src/testutil.ts
# Also check if there are any test configuration files in the clickhouse directory
fd -e json -e js -e ts config internal/clickhouse/
Length of output: 1852
internal/clickhouse/src/index.ts (1)
102-106
: LGTM! Clean implementation of timeseries API
The implementation follows the established pattern and provides a clear interface for accessing logs timeseries data at different granularities.
✅ Verification successful
Implementation verified and follows best practices
The timeseries implementation is well-structured and robust:
- Uses a factory function
createTimeseriesQuerier
to generate consistent queriers for different time intervals - Properly handles data validation using zod schemas
- Implements clean SQL queries that aggregate logs data with success/warning/error counts
- Maintains consistent interface across different time granularities (minute/hour/day)
- Uses type-safe interval configurations through the
INTERVALS
constant
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify the implementation of the new timeseries functions
rg -A 10 "export (async )?function (getMinutelyLogsTimeseries|getHourlyLogsTimeseries|getDailyLogsTimeseries)"
Length of output: 112
Script:
#!/bin/bash
# Let's try searching without the 'export' keyword and with a broader pattern
rg -A 10 "(getMinutelyLogsTimeseries|getHourlyLogsTimeseries|getDailyLogsTimeseries)"
# Also search for timeseries-related function declarations using ast-grep
ast-grep --pattern 'function $_(querier) {
$$$
}'
Length of output: 2026
Script:
#!/bin/bash
# Let's check the implementation of createTimeseriesQuerier since it's used to create these functions
rg -A 15 "createTimeseriesQuerier"
# Also check for the INTERVALS constant definition
rg "INTERVALS.*=.*{" -A 5
Length of output: 1519
Script:
#!/bin/bash
# Let's fix the INTERVALS search and also look for the query creation
rg "INTERVALS" -A 5
# Check the createTimeseriesQuery implementation
rg "createTimeseriesQuery" -A 10
# Look for the schema definition
rg "logsTimeseriesDataPoint" -A 5
Length of output: 3356
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nitpick is that in the sql files the column types are not indented evenly, but that's ok
x Int,
longerValue Int
vs
x Int
longerValue Int
From Oz ‣ Andreas Thomas there is a interesting coderabbit suggestion could you take a look at it? |
From Andreas Thomas ‣ done and ignored |
From Oz ‣ okay let's merge and lemme move on to chart |
What does this PR do?
This PR adds hourly, minutely and daily MVs for logs. We'll extend this in the future to support more granular filters.
Fixes # (issue)
If there is not an issue for this, please create one first. This is used to tracking purposes and also helps use understand why this PR exists
Type of change
How should this be tested?
Checklist
Required
pnpm build
pnpm fmt
console.logs
git pull origin main
Appreciated
Summary by CodeRabbit
New Features
goose
tool in the GitHub Actions workflow.Bug Fixes
Tests
Chores