-
Notifications
You must be signed in to change notification settings - Fork 6
Description
HogQL Insights
Current state of the HogQL conversion for insights and moving from filters
-based insights to query
-based insights.
What are we doing and why?
We are rewriting all our insights in HogQL, instead of raw ClickHouse SQL, which allows us to implement performance improvements and feature toggles (e.g. PoE modes) on this intermediate layer. This also allows us to expose the query to the end user, so they debug issues themselves or adapt queries to less frequent use cases.
In addition to the changes on the SQL layer we also change the way we store the insight configuration. Currently we have a mixin-based filters
format (flat key-value structure) that became hard to maintain and doesn't allow reusage of sub-parts. The new query
format (nested json) should allow copy-pasting parts and nesting "sources" in other queries to allow re-using the results throughout PostHog.
High-level plan of remaining steps
- Remove all
filters
from the frontend and use the frontend-sidefilterToQueryNode
function to convert all api responses to the new query format (when fetching) and thequeryNodeToFilter
function to convert them back to filters for saving/duplicating/etc. - Toggle the feature flag
query-based-insights-saving
, that then sends insights back withquery
, instead offilters
for saving/duplicating/etc. - Optional: Use the backend side
filter_to_query
function in the insights serializer to return aquery
from the backend (just this endpoint, as this is only for testing thefilter_to_query
function works as expected). - Optional: Any changes we want to make to the query schema e.g. camel casing key that are still snake cased.
- If everything works, migrate the insights backend side (iterate over them and use the backend side
filter_to_query
function to replacefilters
withquery
). - Migrate other entities that have insights e.g. activity log or notebooks.
- Convert experiments to HogQL.
- Convert cohorts to HogQL.
- Cleanup.
Remove frontend side dependency on filters
We can get rid of filters frontend side first by using the backend side filter_to_query
function to return only queries from insights (and any other places that might return filters) and adapting the frontend so that it only handles queries. For saving insights we can use the frontend side queryNodeToFilters
function to send finally send filters to the backend.
- Use the backend side
filter_to_query
function to return only queries, not filters feat(hogql): replace filters with query backend side posthog#21945 - Fix any places that might still rely on filters:
- Make
ActionFilter
andentityFilterLogic
based on series, not actions and events - Remove filter based helpers from
frontend/src/scenes/insights/sharedUtils.ts
e.g.isTrendsFilter
orisFilterWithDisplay
- Remove usage of
filtersToQueryNode
,queryNodeToFilters
, etc. in as many places as possible - Remove all usages of the
cleanFilter
function - Make experiments use HogQL queries
- Migrate: activity log, insights, notebook nodes
- Cleanup: Remove all
getQueryBasedInsightModel
- Make
After migrating to backend side filters
For insights
- replace
InsightModel
in subscriptionsLogic.test.ts - replace
InsightModel
in funnelDataLogic.test.ts - replace
InsightModel
in insightVizDataLogic.test.ts - replace
InsightModel
in trendsDataLogic.test.ts - convert all mocked insights in tests, stories etc. to be query based
For the activity log
For notebooks
Experiments backend
Experiments use the PA code backend side to generate trends/funnel results. We should swap out the legacy implementation for the HogQL one there as well.
- Replace funnels in experiments backend side
Finalize query schema
At some point we want to run a migration to replace filters with queries. After that migration it will be harder to make changes to the query schema, meaning we should clean up the schema as good as we can now.
Unfortunately this got complicated by the fact that notebooks already save insights as queries and not filters. Thus they need additional handling in https://github.com/PostHog/posthog/blob/master/frontend/src/scenes/notebooks/Notebook/migrations/migrate.ts and we need to come up with a way to clean up tech debt there. The queries are stored both in the notebook nodes and in the activity log from which the user can go back in time.
Some proposed changes to the current query schema:
- Camel case all properties
- Camel case
dateRange
propertiesdate_from
anddate_to
- Camel case
breakdownFilter
properties - Camel case math properties in entities
- Camel case and fix
hidden_legend_indexes
/hidden_legend_keys
Currently the legend items can be hidden by index or by key depending on the insight and a bug prevents the hidden entries from being saved. We should agree on a single way to hide the entries and fix saving them with the query. - Camel case
aggregation_group_type_index
- Remove/fix
breakdown_histogram_bin_count
in trends filter properties - Replace legacy entities in retention insights with new Event/ActionNodes
- Camel case
Related bugs
Trends
- Error when searching by multiple CSS selectors in Product Analytics posthog#21683 multiple selectors (frontend side issue, not HogQL)
- Persons modal doesn't open for null breakdowns (also never did in the old insights).
- Null in list if breaking down by group property
- Problem with null/other breakdowns and dashboards (1)
Funnels
- https://posthoghelp.zendesk.com/agent/tickets/9814 funnel - performance issue
- https://posthoghelp.zendesk.com/agent/tickets/13065
- persons modal does not open
Retention
- Overall retention percentages went down retroactively posthog#21700 retention data error
Cleanup
- Update mocks to query based mocks
Make it flippin' amazing
- Make sure all the bugs are fixed that we've been deferring
- Async queries don't have user friendly errors posthog#21270
- HogQL optimization that filters events early on posthog#21269
- Caching of "load more breakdown values" should work, see https://posthog.slack.com/archives/C045L1VEG87/p1706285253686739
- Bug: The user paths insight should not disable the persons modal in card view
- Bug: Clicking save before the query completes -> changes should get saved
- Follow ups from feat(insights): improve series and exclusion entity handling posthog#20089
- Changing the title of a hogql insight and clicking “save” on the title (not the whole query) restarts the query calculation. Since the query itself hasn’t changed, I don’t think it should do that (edited) https://posthog.slack.com/archives/C045L1VEG87/p1710855088205499
- The "load more breakdown values" button doesn't make sense when we have 25 values and are limiting to 25 values -> expand the breakdown limit in this case https://posthog.slack.com/archives/C045L1VEG87/p1706285253686739
- Fix issues with "Leave insight? Changes you made will be discarded" https://posthog.slack.com/archives/C0368RPHLQH/p1712589416025629
- Follow up on the commented out tests after CH 23.12 upgrade test(clickhouse): skip tests incompatible with 23.12 posthog#21437
- Implement all the new cool things users ask for Insight feature requests and bugs (trends, funnels, paths, etc) #200
- Finish the HogQL pending tasks HogQL & Data Exploration next steps #81
- Do that little BI thing BI TODOs #157
- Implement a better way to cache persons modals: Previously we had a random "cache_key", so that a calculation for given filters would only run once, but refresh when the insights was reloaded. I'd like to store the current timestamp with a cached item an use that for invalidating responses.