Added materialized view and duplicated v2 Tinybird endpoints#25719
Added materialized view and duplicated v2 Tinybird endpoints#25719
Conversation
…re robust polling approach
WalkthroughAdds a new Tinybird datasource declaration Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| SCHEMA > | ||
| `site_uuid` LowCardinality(String), | ||
| `session_id` String, | ||
| `pageviews` AggregateFunction(count, UInt64), | ||
| `first_pageview` AggregateFunction(min, DateTime), | ||
| `last_pageview` AggregateFunction(max, DateTime), | ||
| `source` AggregateFunction(argMin, String, DateTime), | ||
| `device` AggregateFunction(argMin, String, DateTime), | ||
| `utm_source` AggregateFunction(argMin, String, DateTime), | ||
| `utm_medium` AggregateFunction(argMin, String, DateTime), | ||
| `utm_campaign` AggregateFunction(argMin, String, DateTime), | ||
| `utm_term` AggregateFunction(argMin, String, DateTime), | ||
| `utm_content` AggregateFunction(argMin, String, DateTime) | ||
|
|
||
| ENGINE "AggregatingMergeTree" | ||
| ENGINE_SORTING_KEY "site_uuid, session_id" |
There was a problem hiding this comment.
This file is new, so there is no diff to compare against the previous version of it.
| @@ -0,0 +1,162 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_kpis.pipe ghost/core/core/server/data/tinybird/endpoints/api_kpis_v2.pipe
| @@ -0,0 +1,28 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_top_devices.pipe ghost/core/core/server/data/tinybird/endpoints/api_top_devices_v2.pipe
| @@ -0,0 +1,31 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_top_locations.pipe ghost/core/core/server/data/tinybird/endpoints/api_top_locations_v2.pipe
| @@ -0,0 +1,39 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_top_pages.pipe ghost/core/core/server/data/tinybird/endpoints/api_top_pages_v2.pipe
| @@ -0,0 +1,28 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_top_sources.pipe ghost/core/core/server/data/tinybird/endpoints/api_top_sources_v2.pipe
| @@ -0,0 +1,30 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_top_utm_campaigns.pipe ghost/core/core/server/data/tinybird/endpoints/api_top_utm_campaigns_v2.pipe
| @@ -0,0 +1,30 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_top_utm_contents.pipe ghost/core/core/server/data/tinybird/endpoints/api_top_utm_contents_v2.pipe
| @@ -0,0 +1,30 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_top_utm_mediums.pipe ghost/core/core/server/data/tinybird/endpoints/api_top_utm_mediums_v2.pipe
| @@ -0,0 +1,30 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_top_utm_sources.pipe ghost/core/core/server/data/tinybird/endpoints/api_top_utm_sources_v2.pipe
| @@ -0,0 +1,30 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/endpoints/api_top_utm_terms.pipe ghost/core/core/server/data/tinybird/endpoints/api_top_utm_terms_v2.pipe
| @@ -0,0 +1,85 @@ | |||
| TOKEN "axis" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/pipes/filtered_sessions.pipe ghost/core/core/server/data/tinybird/pipes/filtered_sessions_v2.pipe
| @@ -0,0 +1,22 @@ | |||
| TOKEN "axis" READ | |||
There was a problem hiding this comment.
diff -u ghost/core/core/server/data/tinybird/pipes/mv_session_data.pipe ghost/core/core/server/data/tinybird/pipes/mv_session_data_v2.pipe
|
Working on a clean way to toggle the frontend to use the v2 version of the endpoints so we can test out the new pipeline on the frontend as well. We'll need this to help validate the v2 implementation against the existing live implementation. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
ghost/core/core/server/data/tinybird/endpoints/api_top_utm_campaigns_v2.pipe (1)
4-13: Consider adding date filter to session_data node for query optimization.The
session_datanode filters only bysite_uuid, while date filtering is deferred to the join withfiltered_sessions_v2. For large datasets, adding the date range filter directly to this node could reduce the scan size on_mv_session_data_v2before the join.That said, if Tinybird's query optimizer handles this well via predicate pushdown, the current approach is acceptable.
ghost/core/core/server/data/tinybird/endpoints/api_kpis_v2.pipe (1)
106-106: Redundantdistinctinsideuniq()function.The
uniq()function already counts distinct values, so wrappingdistinctaround the column is redundant. This won't cause incorrect results but adds unnecessary noise.- uniq(distinct s.session_id) as visits, + uniq(s.session_id) as visits,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
ghost/core/core/server/data/tinybird/datasources/_mv_session_data_v2.datasource(1 hunks)ghost/core/core/server/data/tinybird/endpoints/README.md(2 hunks)ghost/core/core/server/data/tinybird/endpoints/api_kpis_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_top_devices_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_top_locations_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_top_pages_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_top_sources_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_top_utm_campaigns_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_top_utm_contents_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_top_utm_mediums_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_top_utm_sources_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_top_utm_terms_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/pipes/filtered_sessions_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/pipes/mv_session_data_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/tests/api_kpis_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_top_devices_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_top_locations_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_top_pages_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_top_sources_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_top_utm_campaigns_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_top_utm_contents_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_top_utm_mediums_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_top_utm_sources_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_top_utm_terms_v2.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: TryGhost/Ghost PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T14:28:50.351Z
Learning: Analytics using Tinybird should reference scripts in `ghost/core/core/server/data/tinybird/scripts/` and datafiles in `ghost/core/core/server/data/tinybird/`
📚 Learning: 2025-11-25T14:28:50.351Z
Learnt from: CR
Repo: TryGhost/Ghost PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T14:28:50.351Z
Learning: Analytics using Tinybird should reference scripts in `ghost/core/core/server/data/tinybird/scripts/` and datafiles in `ghost/core/core/server/data/tinybird/`
Applied to files:
ghost/core/core/server/data/tinybird/endpoints/api_top_utm_campaigns_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_sources_v2.pipeghost/core/core/server/data/tinybird/tests/api_top_pages_v2.yamlghost/core/core/server/data/tinybird/pipes/filtered_sessions_v2.pipeghost/core/core/server/data/tinybird/tests/api_top_locations_v2.yamlghost/core/core/server/data/tinybird/endpoints/api_top_devices_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_utm_contents_v2.pipeghost/core/core/server/data/tinybird/pipes/mv_session_data_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_utm_terms_v2.pipeghost/core/core/server/data/tinybird/datasources/_mv_session_data_v2.datasourceghost/core/core/server/data/tinybird/endpoints/api_top_locations_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_utm_mediums_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_utm_sources_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_kpis_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_pages_v2.pipe
📚 Learning: 2025-12-11T18:43:07.356Z
Learnt from: cmraible
Repo: TryGhost/Ghost PR: 25705
File: ghost/core/core/server/data/tinybird/pipes/mv_session_data.pipe:5-5
Timestamp: 2025-12-11T18:43:07.356Z
Learning: In Tinybird pipe files (.pipe), a standalone '%' character at the start of a SQL block enables Tinybird templating (e.g., {{ String(site_uuid, ...) }}). This is valid syntax and should not be flagged as an error. Apply this guidance to all .pipe files in the repository; reviewers should not treat these templated blocks as syntax errors and should ensure tooling/linting recognizes the templating syntax.
Applied to files:
ghost/core/core/server/data/tinybird/endpoints/api_top_utm_campaigns_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_sources_v2.pipeghost/core/core/server/data/tinybird/pipes/filtered_sessions_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_devices_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_utm_contents_v2.pipeghost/core/core/server/data/tinybird/pipes/mv_session_data_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_utm_terms_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_locations_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_utm_mediums_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_utm_sources_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_kpis_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_top_pages_v2.pipe
📚 Learning: 2025-03-13T09:00:20.205Z
Learnt from: mike182uk
Repo: TryGhost/Ghost PR: 22471
File: apps/admin-x-activitypub/src/utils/pending-activity.ts:13-71
Timestamp: 2025-03-13T09:00:20.205Z
Learning: The pending activity utilities in the Ghost ActivityPub module are covered by tests in the file `apps/admin-x-activitypub/test/unit/utils/pending-activity.ts`.
Applied to files:
ghost/core/core/server/data/tinybird/tests/api_top_pages_v2.yamlghost/core/core/server/data/tinybird/tests/api_top_utm_contents_v2.yamlghost/core/core/server/data/tinybird/tests/api_top_sources_v2.yaml
📚 Learning: 2025-11-24T17:29:43.865Z
Learnt from: CR
Repo: TryGhost/Ghost PR: 0
File: e2e/AGENTS.md:0-0
Timestamp: 2025-11-24T17:29:43.865Z
Learning: Applies to e2e/**/*.test.ts : Test suite names should follow the format 'Ghost Admin - Feature' or 'Ghost Public - Feature'
Applied to files:
ghost/core/core/server/data/tinybird/tests/api_top_utm_contents_v2.yamlghost/core/core/server/data/tinybird/tests/api_top_utm_campaigns_v2.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: [Optional] E2E Tests (React 3/4)
- GitHub Check: [Optional] E2E Tests (React 2/4)
🔇 Additional comments (37)
ghost/core/core/server/data/tinybird/tests/api_top_utm_sources_v2.yaml (1)
1-88: Test coverage looks comprehensive.The test suite covers a good range of filter scenarios including individual filters (location, pathname, post_uuid, source, member_status, timezone) and combined filters. This provides good confidence that the v2 endpoint maintains behavioral parity with v1.
ghost/core/core/server/data/tinybird/tests/api_top_locations_v2.yaml (1)
1-86: Good test coverage with device filter inclusion.The test suite comprehensively covers filter scenarios. The device filter test case (lines 78-86) is a valuable addition that verifies bot session exclusion behavior.
ghost/core/core/server/data/tinybird/endpoints/api_top_utm_campaigns_v2.pipe (1)
15-28: Join and aggregation logic looks correct.The inner join on
session_idwithfiltered_sessions_v2correctly restricts results to matching sessions. The empty string filter, grouping, ordering, and pagination are all properly implemented.ghost/core/core/server/data/tinybird/tests/api_top_utm_campaigns_v2.yaml (1)
1-79: Test coverage is consistent with other UTM endpoint tests.The test suite covers the expected filter scenarios and maintains consistency with the v1 test patterns. Campaign data provides realistic test values for validation.
ghost/core/core/server/data/tinybird/endpoints/api_top_utm_contents_v2.pipe (1)
1-30: Implementation follows the consistent v2 endpoint pattern.The structure mirrors the other v2 UTM endpoints correctly, with appropriate column substitution (
utm_content). The same optional optimization regarding date filtering in thesession_datanode applies here as noted forapi_top_utm_campaigns_v2.pipe.ghost/core/core/server/data/tinybird/endpoints/api_top_devices_v2.pipe (1)
1-28: LGTM! Clean implementation of the v2 endpoint using the materialized view.The endpoint correctly uses
argMinMerge(device)to read from the_mv_session_data_v2AggregatingMergeTree data source, which will compute aggregations at ingest time rather than query time. The delegation of filtering tofiltered_sessions_v2follows the architectural pattern described in the README.ghost/core/core/server/data/tinybird/tests/api_top_devices_v2.yaml (1)
1-62: Comprehensive test coverage for the top devices endpoint.The test suite covers all major filtering scenarios including device types, location, member status, pathname, source, and combinations thereof. The expected results are clear and validate the endpoint's filtering behavior.
ghost/core/core/server/data/tinybird/tests/api_top_utm_mediums_v2.yaml (1)
1-79: Well-structured test fixture for UTM medium analytics.The test suite provides comprehensive coverage of filtering scenarios and validates UTM medium aggregation across different dimensions (location, pathname, source, member status, timezone). The expected results align with the endpoint's aggregation logic.
ghost/core/core/server/data/tinybird/endpoints/README.md (2)
10-10: Clear documentation of the materialized view architecture.The update accurately describes the use of
argMinStatefor writing session-level attributes to the_mv_session_dataAggregatingMergeTree table, which is the correct pattern for ClickHouse aggregating tables.
42-42: Correct documentation of the merge combinator pattern.The explanation of using
argMinMergeto read aggregated attributes from_mv_session_datais accurate and aligns with ClickHouse's AggregatingMergeTree semantics. This clarifies how the two-stage filtering retrieves first-hit attributes.ghost/core/core/server/data/tinybird/tests/api_top_utm_contents_v2.yaml (1)
1-81: Comprehensive test coverage for UTM content analytics.The test suite validates UTM content aggregation across multiple filtering dimensions with clear expected results. The fixture follows the same comprehensive pattern as other v2 endpoint tests.
ghost/core/core/server/data/tinybird/tests/api_top_utm_terms_v2.yaml (1)
1-82: Well-structured test fixture for UTM term analytics.The test suite provides comprehensive coverage of filtering scenarios and validates UTM term aggregation. The fixture maintains consistency with other v2 endpoint test patterns.
ghost/core/core/server/data/tinybird/endpoints/api_top_locations_v2.pipe (1)
4-29: LGTM! Inline filtering is appropriate for this hit-level query.This endpoint correctly queries
_mv_hits(hit-level data) and applies inline filtering for hit-level attributes (member_status, location, pathname, post_uuid). This pattern differs from session-level aggregates likeapi_top_devices_v2and aligns with the filtering architecture documented in the README—hit-level filters can appear in bothfiltered_sessions.pipeand endpoint queries.The member_status logic that includes 'comped' when 'paid' is selected (lines 17-22) ensures proper member status filtering.
ghost/core/core/server/data/tinybird/endpoints/api_top_utm_terms_v2.pipe (1)
1-30: LGTM! Clean implementation with proper empty value handling.The endpoint correctly uses
argMinMerge(utm_term)to read from the_mv_session_data_v2materialized view. The empty string filter on line 25 (utm_term != '') is a good practice that prevents cluttering results with sessions that don't have UTM term values.ghost/core/core/server/data/tinybird/endpoints/api_top_sources_v2.pipe (2)
4-13: LGTM!The session_data node correctly retrieves session-level source data from the materialized view using
argMinMergeto extract the earliest source per session.
15-26: LGTM!The aggregation logic correctly counts visits per source and includes empty sources (representing direct traffic), which aligns with the test expectations.
ghost/core/core/server/data/tinybird/tests/api_kpis_v2.yaml (1)
1-267: LGTM!The test fixtures provide comprehensive coverage of the KPI endpoint with various filter combinations and expected results. The single-day test cases correctly expand to hourly granularity.
ghost/core/core/server/data/tinybird/pipes/mv_session_data_v2.pipe (2)
3-19: LGTM!The materialized view correctly uses ClickHouse state functions (
countState,minState,maxState,argMinState) to pre-aggregate session-level metrics at ingest time. The state functions match the corresponding merge functions used in the endpoint queries.
21-22: LGTM!Correctly declares the pipe as TYPE materialized and binds it to the _mv_session_data_v2 datasource.
ghost/core/core/server/data/tinybird/tests/api_top_pages_v2.yaml (1)
1-135: LGTM!The test fixtures provide comprehensive coverage of the top pages endpoint with various filter combinations and pagination scenarios.
ghost/core/core/server/data/tinybird/tests/api_top_sources_v2.yaml (1)
1-131: LGTM!The test fixtures provide comprehensive coverage of the top sources endpoint, including validation that empty sources (direct traffic) are correctly included in results.
ghost/core/core/server/data/tinybird/endpoints/api_top_utm_sources_v2.pipe (2)
4-13: LGTM!The session_data node correctly retrieves session-level utm_source data using
argMinMerge.
15-28: LGTM!The aggregation correctly filters out empty utm_source values (line 25), as empty UTM parameters indicate no UTM tracking was used, which is the appropriate behavior for UTM-specific endpoints.
ghost/core/core/server/data/tinybird/endpoints/api_top_pages_v2.pipe (4)
4-11: LGTM!The query correctly normalizes 'undefined' post_uuid values to empty strings (line 9) and uses
uniqExact(session_id)to accurately count unique visits per page.
17-24: LGTM!The member_status filter correctly includes 'comped' members when filtering for 'paid' members (line 21), which aligns with business logic treating comped subscriptions similarly to paid subscriptions for analytics.
28-34: LGTM!The post_type filter correctly handles two categories: when filtering for 'post', it returns only posts (line 30); otherwise, it returns pages including null values (lines 31-32), which appropriately captures homepage and other non-post content.
25-37: LGTM!The optional filters and aggregation logic are correctly implemented with proper grouping, ordering, and pagination.
ghost/core/core/server/data/tinybird/endpoints/api_top_utm_mediums_v2.pipe (2)
4-13: LGTM!The session_data node correctly retrieves session-level utm_medium data using
argMinMerge.
15-28: LGTM!The aggregation correctly filters out empty utm_medium values (line 25), maintaining consistency with other UTM parameter endpoints.
ghost/core/core/server/data/tinybird/endpoints/api_kpis_v2.pipe (3)
93-95: Join logic between session_data and filtered_sessions_v2 is correct.Both
session_dataandfiltered_sessions_v2independently filter bysite_uuid, so the inner join onsession_idalone is safe and won't cause cross-tenant data leakage.
116-146: Pathname pageviews logic correctly combines session and hit-level filtering.The approach of joining
filtered_sessions_v2(for session-level qualification) with direct filters on_mv_hits(for hit-level counting) is appropriate. This ensures pageviews are counted only for hits that match the specific criteria within qualifying sessions.
149-162: LGTM!The
finished_datanode correctly uses left joins to preserve all timeseries dates and conditionally includes pathname-specific pageviews. Thecoalescefunctions properly handle null values.ghost/core/core/server/data/tinybird/pipes/filtered_sessions_v2.pipe (3)
25-31: Member status filter correctly handles 'comped' as a variant of 'paid'.The logic to automatically include 'comped' when 'paid' is selected is a useful business rule that ensures paid members include complementary subscriptions.
37-85: Session-level filtering with AggregatingMergeTree combinators is well-implemented.The use of
minMerge,argMinMergecombinators correctly extracts session attributes from the materialized view. The inner join withsessions_filtered_by_hit_attributesensures proper two-stage filtering.
1-85: LGTM! Well-structured reusable filtering pipe.The two-stage filtering design (hit-level → session-level) is clean and the pipe correctly exposes
sessions_filtered_by_session_attributesas the final output for downstream endpoints.ghost/core/core/server/data/tinybird/datasources/_mv_session_data_v2.datasource (2)
1-16: Well-designed AggregatingMergeTree schema for session aggregation.The schema correctly uses:
countfor pageviews accumulationmin/maxfor first/last pageview timestampsargMinfor session attributes (captures value at earliest timestamp)The sorting key
(site_uuid, session_id)optimizes for tenant-scoped queries and session grouping.
1-16: Materialized view pipe already uses correct -State combinators.The
mv_session_data_v2.pipecorrectly usescountState(),minState(),maxState(), andargMinState()combinators that match the datasource schema. No changes needed.
| {% end %} | ||
| sd.session_id, | ||
| pageviews, | ||
| pageviews = 1 as is_bounce, |
There was a problem hiding this comment.
this caught my eye at first and i was about to ask about it - but i see it comes from what was originally in mv_session_data.pipe
There was a problem hiding this comment.
Yeah we could probably remove this because we don't show bounce rate anywhere in the UI anymore, but we can handle that separately 🙂
There was a problem hiding this comment.
I believe John is actually using bounce rate in axis. We need to revisit Tinybird use outside of Ghost at some point soon.
| @@ -0,0 +1,17 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
No changes in this pipe compared to the original, since this only uses _mv_hits. Created v2 for consistency so we can configure Ghost to use v2.
| @@ -0,0 +1,15 @@ | |||
| TOKEN "stats_page" READ | |||
There was a problem hiding this comment.
No changes to this pipe compared to the original. Created duplicate v2 for consistency and to allow pointing Ghost to v2 of all endpoints.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
ghost/core/core/server/services/stats/utils/tinybird.js (1)
67-67: Consider handling legacy tbVersion parameter.By removing
tbVersionfrom the exclusion list, if any caller still passes atbVersionoption, it will now be converted totb_versionand included in query parameters. This could cause unexpected behavior or Tinybird errors.Consider explicitly filtering it out with a warning:
- if (!['dateFrom', 'dateTo', 'timezone', 'memberStatus', 'postType'].includes(key) && value !== undefined && value !== null) { + if (!['dateFrom', 'dateTo', 'timezone', 'memberStatus', 'postType', 'tbVersion'].includes(key) && value !== undefined && value !== null) { + if (key === 'tbVersion') { + logging.warn('tbVersion option is deprecated, use config.version instead'); + return; + } // Convert camelCase to snake_case for Tinybird APIghost/core/test/unit/server/services/stats/utils/tinybird.test.js (1)
120-137: Local mode intentionally ignoresversionin configThis test formalizes that when
local.enabledis true,buildRequestalways uses the base pipe name (test_pipe.json) even if aversionis present in config. That keeps local setup simple, but it also means you can’t exercise version‑specific pipes in local mode; if you later need that, consider allowing an opt‑in to applyversionfor local as well.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
apps/admin-x-framework/src/providers/framework-provider.tsx(1 hunks)apps/admin-x-framework/src/utils/stats-config.ts(1 hunks)ghost/core/core/server/data/tinybird/README.md(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_active_visitors_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/endpoints/api_post_visitor_counts_v2.pipe(1 hunks)ghost/core/core/server/data/tinybird/tests/api_active_visitors_v2.yaml(1 hunks)ghost/core/core/server/data/tinybird/tests/api_post_visitor_counts_v2.yaml(1 hunks)ghost/core/core/server/services/stats/ContentStatsService.js(1 hunks)ghost/core/core/server/services/stats/utils/tinybird.js(2 hunks)ghost/core/core/server/services/tinybird/TinybirdService.js(1 hunks)ghost/core/test/unit/server/services/stats/utils/tinybird.test.js(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- ghost/core/core/server/data/tinybird/README.md
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: TryGhost/Ghost PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T14:28:50.351Z
Learning: Analytics using Tinybird should reference scripts in `ghost/core/core/server/data/tinybird/scripts/` and datafiles in `ghost/core/core/server/data/tinybird/`
📚 Learning: 2025-11-25T14:28:50.351Z
Learnt from: CR
Repo: TryGhost/Ghost PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T14:28:50.351Z
Learning: Analytics using Tinybird should reference scripts in `ghost/core/core/server/data/tinybird/scripts/` and datafiles in `ghost/core/core/server/data/tinybird/`
Applied to files:
ghost/core/core/server/services/tinybird/TinybirdService.jsghost/core/core/server/services/stats/utils/tinybird.jsghost/core/test/unit/server/services/stats/utils/tinybird.test.js
📚 Learning: 2025-07-09T04:45:46.968Z
Learnt from: cmraible
Repo: TryGhost/Ghost PR: 24295
File: ghost/core/core/server/data/tinybird/scripts/configure-ghost.sh:12-15
Timestamp: 2025-07-09T04:45:46.968Z
Learning: The Tinybird CLI does not support a --json option for output formatting in the latest version, so text parsing with grep and awk is necessary when extracting tokens and workspace information from tb commands.
Applied to files:
ghost/core/core/server/services/stats/utils/tinybird.js
📚 Learning: 2025-10-30T17:13:26.190Z
Learnt from: sam-lord
Repo: TryGhost/Ghost PR: 25303
File: ghost/core/core/server/services/email-service/BatchSendingService.js:19-19
Timestamp: 2025-10-30T17:13:26.190Z
Learning: In ghost/core/core/server/services/email-service/BatchSendingService.js and similar files in the Ghost codebase, prefer using `{...options}` spread syntax without explicit guards like `...(options || {})` when spreading potentially undefined objects, as the maintainer prefers cleaner syntax over defensive patterns when the behavior is safe.
Applied to files:
ghost/core/core/server/services/stats/utils/tinybird.js
📚 Learning: 2025-12-11T18:43:07.356Z
Learnt from: cmraible
Repo: TryGhost/Ghost PR: 25705
File: ghost/core/core/server/data/tinybird/pipes/mv_session_data.pipe:5-5
Timestamp: 2025-12-11T18:43:07.356Z
Learning: In Tinybird pipe files (.pipe), a standalone '%' character at the start of a SQL block enables Tinybird templating (e.g., {{ String(site_uuid, ...) }}). This is valid syntax and should not be flagged as an error. Apply this guidance to all .pipe files in the repository; reviewers should not treat these templated blocks as syntax errors and should ensure tooling/linting recognizes the templating syntax.
Applied to files:
ghost/core/core/server/data/tinybird/endpoints/api_post_visitor_counts_v2.pipeghost/core/core/server/data/tinybird/endpoints/api_active_visitors_v2.pipe
📚 Learning: 2025-03-13T09:00:20.205Z
Learnt from: mike182uk
Repo: TryGhost/Ghost PR: 22471
File: apps/admin-x-activitypub/src/utils/pending-activity.ts:13-71
Timestamp: 2025-03-13T09:00:20.205Z
Learning: The pending activity utilities in the Ghost ActivityPub module are covered by tests in the file `apps/admin-x-activitypub/test/unit/utils/pending-activity.ts`.
Applied to files:
ghost/core/core/server/data/tinybird/tests/api_active_visitors_v2.yaml
📚 Learning: 2025-03-13T09:00:20.205Z
Learnt from: mike182uk
Repo: TryGhost/Ghost PR: 22471
File: apps/admin-x-activitypub/src/utils/pending-activity.ts:13-71
Timestamp: 2025-03-13T09:00:20.205Z
Learning: The pending activity utilities in Ghost's ActivityPub module are thoroughly tested in `apps/admin-x-activitypub/test/unit/utils/pending-activity.ts`, which covers `generatePendingActivityId`, `isPendingActivity`, and `generatePendingActivity` functions.
Applied to files:
ghost/core/core/server/data/tinybird/tests/api_active_visitors_v2.yaml
📚 Learning: 2025-12-01T08:42:35.320Z
Learnt from: jonatansberg
Repo: TryGhost/Ghost PR: 25552
File: e2e/helpers/environment/service-managers/dev-ghost-manager.ts:210-247
Timestamp: 2025-12-01T08:42:35.320Z
Learning: In e2e/helpers/environment/service-managers/dev-ghost-manager.ts, the hardcoded volume name 'ghost-dev_shared-config' at line 231 is intentional. E2E tests run under the 'ghost-dev-e2e' project namespace but deliberately mount the shared-config volume from the main 'ghost-dev' project to access Tinybird credentials created by yarn dev:forward. This is cross-project volume sharing by design.
Applied to files:
ghost/core/test/unit/server/services/stats/utils/tinybird.test.js
🧬 Code graph analysis (1)
ghost/core/core/server/services/stats/utils/tinybird.js (2)
ghost/core/core/server/services/public-config/config.js (1)
statsConfig(33-33)ghost/core/core/frontend/helpers/ghost_head.js (1)
localEnabled(165-165)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Legacy tests (Node 22.18.0, mysql8)
- GitHub Check: Acceptance tests (Node 22.18.0, mysql8)
- GitHub Check: Acceptance tests (Node 22.18.0, sqlite3)
- GitHub Check: Admin-X Settings tests
- GitHub Check: Legacy tests (Node 22.18.0, sqlite3)
- GitHub Check: ActivityPub tests
- GitHub Check: Ghost-CLI tests
- GitHub Check: Tinybird Tests
- GitHub Check: Lint
- GitHub Check: Unit tests (Node 22.18.0)
- GitHub Check: Build & Push Docker Image
🔇 Additional comments (9)
apps/admin-x-framework/src/providers/framework-provider.tsx (1)
13-13: LGTM!The addition of the optional
versionfield cleanly extends the StatsConfig interface to support versioned endpoints without breaking existing code.ghost/core/core/server/data/tinybird/endpoints/api_post_visitor_counts_v2.pipe (1)
1-17: LGTM!The endpoint structure follows Tinybird conventions correctly, with proper token declarations, required parameters, and SQL logic for counting unique visitors per post. The use of
_mv_hitsaligns with the PR's materialized view optimization strategy.ghost/core/core/server/data/tinybird/tests/api_post_visitor_counts_v2.yaml (1)
1-19: LGTM!Test coverage is comprehensive, covering single post, multiple posts, and edge cases with no data. The expected results align with the endpoint's logic for counting unique visitors.
ghost/core/core/server/services/tinybird/TinybirdService.js (1)
59-73: LGTM!The whitelist expansion properly includes all v2 endpoints for JWT scope authorization. The comment clearly indicates the purpose of these additions (materialized view optimization).
ghost/core/core/server/services/stats/utils/tinybird.js (1)
35-40: LGTM!The config-driven versioning approach is cleaner than per-call tbVersion. The conditional logic correctly skips versioning in local mode, which prevents URL mismatches during local development.
ghost/core/core/server/data/tinybird/endpoints/api_active_visitors_v2.pipe (1)
1-15: LGTM!The endpoint correctly implements active visitor counting with proper time-based filtering and optional post filtering. The Tinybird templating syntax is used appropriately for conditional logic.
ghost/core/core/server/data/tinybird/tests/api_active_visitors_v2.yaml (1)
14-24: LGTM!The post-filtered and custom site test cases provide good coverage for edge cases and filtering scenarios.
ghost/core/core/server/services/stats/ContentStatsService.js (1)
75-83: Tinybird option mapping now solely handles post type (version is config-driven)The
tinybirdOptionsmapping looks good:post_typecontinues to flow through aspostType, and versioning is now handled centrally by the Tinybird client/config instead of via per‑call options. No issues from this change as long as callers aren’t relying on a tb_version override here.ghost/core/test/unit/server/services/stats/utils/tinybird.test.js (1)
73-87: Config‑driven version suffix behavior is well coveredThis test cleanly captures the new contract that
buildRequest('test_pipe', …)appends_v2whentinybird:stats.versionis set, and leaves the querystring behavior unchanged. Solid coverage for the new versioning model.
|
|
||
| // Append version suffix if provided (e.g., "v2" -> "api_kpis_v2") | ||
| const finalEndpointName = config.version ? `${endpointName}_${config.version}` : endpointName; | ||
|
|
||
| return `${baseUrl}/v0/pipes/${finalEndpointName}.json?${params}`; |
There was a problem hiding this comment.
Validate version and endpointName inputs before string interpolation.
The code doesn't guard against edge cases where config.version is an empty string or endpointName is undefined/null. An empty version would produce malformed URLs like api_kpis_.json.
Consider adding validation:
- // Append version suffix if provided (e.g., "v2" -> "api_kpis_v2")
- const finalEndpointName = config.version ? `${endpointName}_${config.version}` : endpointName;
+ // Append version suffix if provided (e.g., "v2" -> "api_kpis_v2")
+ const finalEndpointName = (config.version && endpointName) ? `${endpointName}_${config.version}` : endpointName;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Append version suffix if provided (e.g., "v2" -> "api_kpis_v2") | |
| const finalEndpointName = config.version ? `${endpointName}_${config.version}` : endpointName; | |
| return `${baseUrl}/v0/pipes/${finalEndpointName}.json?${params}`; | |
| // Append version suffix if provided (e.g., "v2" -> "api_kpis_v2") | |
| const finalEndpointName = (config.version && endpointName) ? `${endpointName}_${config.version}` : endpointName; | |
| return `${baseUrl}/v0/pipes/${finalEndpointName}.json?${params}`; |
🤖 Prompt for AI Agents
In apps/admin-x-framework/src/utils/stats-config.ts around lines 16 to 20,
validate inputs before interpolating: ensure endpointName is defined/non-empty
(throw or return early with a clear error or fallback default) and normalize
config.version by trimming and checking length before appending; only append
`_${version}` when version.trim() is non-empty to avoid producing
`api_kpis_.json`, and optionally sanitize characters (e.g., remove spaces) from
version/endpointName to keep the URL well-formed.
| timezone: options.timezone, | ||
| memberStatus: options.member_status, | ||
| postType: options.post_type, | ||
| tbVersion: options.tb_version |
There was a problem hiding this comment.
The "ContentStatsService" originally had built in support for the tbVersion parameter. This has been moved into the tinybird.js file so that it applies equally to all endpoints.
9larsons
left a comment
There was a problem hiding this comment.
Question about local use.
There was a problem hiding this comment.
I don't understand why we're blocking the version when running locally. (Also asserted by the test below)
There was a problem hiding this comment.
Good catch - updated to remove the exception for localEnabled and deleted that test
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ghost/core/core/server/services/stats/utils/tinybird.js(2 hunks)ghost/core/test/unit/server/services/stats/utils/tinybird.test.js(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- ghost/core/test/unit/server/services/stats/utils/tinybird.test.js
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-25T14:28:50.351Z
Learnt from: CR
Repo: TryGhost/Ghost PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T14:28:50.351Z
Learning: Analytics using Tinybird should reference scripts in `ghost/core/core/server/data/tinybird/scripts/` and datafiles in `ghost/core/core/server/data/tinybird/`
Applied to files:
ghost/core/core/server/services/stats/utils/tinybird.js
📚 Learning: 2025-10-30T17:13:26.190Z
Learnt from: sam-lord
Repo: TryGhost/Ghost PR: 25303
File: ghost/core/core/server/services/email-service/BatchSendingService.js:19-19
Timestamp: 2025-10-30T17:13:26.190Z
Learning: In ghost/core/core/server/services/email-service/BatchSendingService.js and similar files in the Ghost codebase, prefer using `{...options}` spread syntax without explicit guards like `...(options || {})` when spreading potentially undefined objects, as the maintainer prefers cleaner syntax over defensive patterns when the behavior is safe.
Applied to files:
ghost/core/core/server/services/stats/utils/tinybird.js
🔇 Additional comments (1)
ghost/core/core/server/services/stats/utils/tinybird.js (1)
35-40: LGTM: Config-driven versioning implemented cleanly.The version suffix logic correctly reads from config and constructs versioned pipe URLs when configured. The absence of local-mode gating addresses the previous review concern about blocking versions locally.
9larsons
left a comment
There was a problem hiding this comment.
Looks good, looking forward to playing around and seeing the diff with real data!
|
Deployment check output to confirm this won't change any existing endpoints: |
ref https://linear.app/ghost/issue/NY-865/analytics-sources-not-populating-for-tangle-due-to-408-timeouts
Problem
Tinybird endpoints are timing out in production for the sites with the most data. the
mv_session_datapipe, which almost all our endpoints depend on, calculates complex aggregations at query time, which we believe to be the biggest contributor to poor performance when querying against large data sets.Fix
The solution to fix this is to convert
mv_session_datafrom a pipe to a materialized view, such that Tinybird will calculate these aggregations as ingest time instead of at query time. As this is a rather large change to the implementation, we've opted to create a duplicate v2 pipeline rather than updating the existing pipeline in place. This way we can validate the v2 pipeline in production against real production data, without actually changing any of the user-facing behavior, before cutting over to using the v2 pipeline.Changes made
mv_session_dataCreating a duplicate pipeline makes it easier to validate this in production, but it does make reviewing these changes more difficult, since the git diff doesn't clearly show what has changed in each endpoint. I've added a comment to each file that shows the diff of the v2 endpoint against the original unversioned endpoint to make reviewing this PR easier, but you'll have to run those commands locally with this branch checked out to see the changes.
Testing Ghost against v2 endpoints
This commit adds the ability to point Ghost to the v2 version of the endpoints. To use this, set
tinybird:stats:versiontov2in yourconfig.local.json, then re-runyarn dev:analytics.