Skip to content

HogQL Insights #130

@thmsobrmlr

Description

@thmsobrmlr

HogQL Insights

Current state of the HogQL conversion for insights and moving from filters-based insights to query-based insights.

What are we doing and why?

We are rewriting all our insights in HogQL, instead of raw ClickHouse SQL, which allows us to implement performance improvements and feature toggles (e.g. PoE modes) on this intermediate layer. This also allows us to expose the query to the end user, so they debug issues themselves or adapt queries to less frequent use cases.

In addition to the changes on the SQL layer we also change the way we store the insight configuration. Currently we have a mixin-based filters format (flat key-value structure) that became hard to maintain and doesn't allow reusage of sub-parts. The new query format (nested json) should allow copy-pasting parts and nesting "sources" in other queries to allow re-using the results throughout PostHog.

High-level plan of remaining steps

  1. Remove all filters from the frontend and use the frontend-side filterToQueryNode function to convert all api responses to the new query format (when fetching) and the queryNodeToFilter function to convert them back to filters for saving/duplicating/etc.
  2. Toggle the feature flag query-based-insights-saving, that then sends insights back with query, instead of filters for saving/duplicating/etc.
  3. Optional: Use the backend side filter_to_query function in the insights serializer to return a query from the backend (just this endpoint, as this is only for testing the filter_to_query function works as expected).
  4. Optional: Any changes we want to make to the query schema e.g. camel casing key that are still snake cased.
  5. If everything works, migrate the insights backend side (iterate over them and use the backend side filter_to_query function to replace filters with query).
  6. Migrate other entities that have insights e.g. activity log or notebooks.
  7. Convert experiments to HogQL.
  8. Convert cohorts to HogQL.
  9. Cleanup.

Remove frontend side dependency on filters

We can get rid of filters frontend side first by using the backend side filter_to_query function to return only queries from insights (and any other places that might return filters) and adapting the frontend so that it only handles queries. For saving insights we can use the frontend side queryNodeToFilters function to send finally send filters to the backend.

  • Use the backend side filter_to_query function to return only queries, not filters feat(hogql): replace filters with query backend side posthog#21945
  • Fix any places that might still rely on filters:
    • Make ActionFilter and entityFilterLogic based on series, not actions and events
    •  Remove filter based helpers from frontend/src/scenes/insights/sharedUtils.ts e.g. isTrendsFilter or isFilterWithDisplay
    • Remove usage of filtersToQueryNode, queryNodeToFilters, etc. in as many places as possible
    • Remove all usages of the cleanFilter function
    • Make experiments use HogQL queries
    • Migrate: activity log, insights, notebook nodes
    • Cleanup: Remove all getQueryBasedInsightModel

After migrating to backend side filters

For insights

  • replace InsightModel in subscriptionsLogic.test.ts
  • replace InsightModel in funnelDataLogic.test.ts
  • replace InsightModel in insightVizDataLogic.test.ts
  • replace InsightModel in trendsDataLogic.test.ts
  • convert all mocked insights in tests, stories etc. to be query based

For the activity log

For notebooks

Experiments backend

Experiments use the PA code backend side to generate trends/funnel results. We should swap out the legacy implementation for the HogQL one there as well.

  • Replace funnels in experiments backend side

Finalize query schema

At some point we want to run a migration to replace filters with queries. After that migration it will be harder to make changes to the query schema, meaning we should clean up the schema as good as we can now.

Unfortunately this got complicated by the fact that notebooks already save insights as queries and not filters. Thus they need additional handling in https://github.com/PostHog/posthog/blob/master/frontend/src/scenes/notebooks/Notebook/migrations/migrate.ts and we need to come up with a way to clean up tech debt there. The queries are stored both in the notebook nodes and in the activity log from which the user can go back in time.

Some proposed changes to the current query schema:

  • Camel case all properties
    • Camel case dateRange properties date_from and date_to
    • Camel case breakdownFilter properties
    • Camel case math properties in entities
    • Camel case and fix hidden_legend_indexes / hidden_legend_keys
      Currently the legend items can be hidden by index or by key depending on the insight and a bug prevents the hidden entries from being saved. We should agree on a single way to hide the entries and fix saving them with the query.
    • Camel case aggregation_group_type_index
    • Remove/fix breakdown_histogram_bin_count in trends filter properties
    • Replace legacy entities in retention insights with new Event/ActionNodes

Related bugs

Trends

Funnels

Retention

Cleanup

  • Update mocks to query based mocks

Make it flippin' amazing

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions