Skip to content

[CLEAN] Synthetic Benchmark PR #25719 - Added materialized view and duplicated v2 Tinybird endpoints#231

Open
tomerqodo wants to merge 12 commits intobase_pr_25719_20260121_9919from
clean_pr_25719_20260121_9919
Open

[CLEAN] Synthetic Benchmark PR #25719 - Added materialized view and duplicated v2 Tinybird endpoints#231
tomerqodo wants to merge 12 commits intobase_pr_25719_20260121_9919from
clean_pr_25719_20260121_9919

Conversation

@tomerqodo
Copy link

Benchmark PR TryGhost#25719

Type: Clean (correct implementation)

Original PR Title: Added materialized view and duplicated v2 Tinybird endpoints
Original PR Description: ref https://linear.app/ghost/issue/NY-865/analytics-sources-not-populating-for-tangle-due-to-408-timeouts

Problem

Tinybird endpoints are timing out in production for the sites with the most data. the mv_session_data pipe, which almost all our endpoints depend on, calculates complex aggregations at query time, which we believe to be the biggest contributor to poor performance when querying against large data sets.

Fix

The solution to fix this is to convert mv_session_data from a pipe to a materialized view, such that Tinybird will calculate these aggregations as ingest time instead of at query time. As this is a rather large change to the implementation, we've opted to create a duplicate v2 pipeline rather than updating the existing pipeline in place. This way we can validate the v2 pipeline in production against real production data, without actually changing any of the user-facing behavior, before cutting over to using the v2 pipeline.


Changes made

  • Creates a new datasource for the materialized view for mv_session_data
  • Creates mv_session_data_v2 pipe to feed the materialized view
  • Creates filtered_sessions_v2, which is backed by the new materialized view
  • Creates v2 of all endpoints, which are backed by the filtered_sessions_v2 and the new materialized view
  • Duplicates all test files for endpoints, without making any changes to the test cases or expected results, to ensure that v1 and v2 of our endpoints maintain the same behavior
  • Adds ability to switch Ghost to use v2 endpoints for the sake of validating against live production data

Creating a duplicate pipeline makes it easier to validate this in production, but it does make reviewing these changes more difficult, since the git diff doesn't clearly show what has changed in each endpoint. I've added a comment to each file that shows the diff of the v2 endpoint against the original unversioned endpoint to make reviewing this PR easier, but you'll have to run those commands locally with this branch checked out to see the changes.

Testing Ghost against v2 endpoints

This commit adds the ability to point Ghost to the v2 version of the endpoints. To use this, set tinybird:stats:version to v2 in your config.local.json, then re-run yarn dev:analytics.
Original PR URL: TryGhost#25719

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants