Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event funnel property handling exceeding resource limits #304

Open
scholtzan opened this issue Nov 18, 2021 · 3 comments
Open

Event funnel property handling exceeding resource limits #304

scholtzan opened this issue Nov 18, 2021 · 3 comments
Assignees
Labels
bug Something isn't working Events Issues or enhancements to the events explores

Comments

@scholtzan
Copy link
Contributor

scholtzan commented Nov 18, 2021

It looks like the changes in #278 caused the generated queries to get more complex and requiring more CPU resources. This has caused some dashboard queries (specifically of this dashboard: https://mozilla.cloud.looker.com/dashboards-next/270) to fail with:

Failed to retrieve data - Query exceeded resource limits. This query used 24547 CPU seconds but would charge only 20M Analysis bytes. This exceeds the ratio supported by the on-demand pricing model. Please consider moving this workload to the flat-rate reservation pricing model, which does not have this limit. 24547 CPU seconds were used, and this query must use less than 5100 CPU seconds.

@wlach You probably have the most context here. Any ideas how this could be fixed?

@scholtzan scholtzan added the bug Something isn't working label Nov 18, 2021
@wlach
Copy link
Contributor

wlach commented Nov 18, 2021

It looks like the changes in #278 caused the generated queries to get more complex and requiring more CPU resources. This has caused some dashboard queries (specifically of this dashboard: https://mozilla.cloud.looker.com/dashboards-next/270) to fail with:

Failed to retrieve data - Query exceeded resource limits. This query used 24547 CPU seconds but would charge only 20M Analysis bytes. This exceeds the ratio supported by the on-demand pricing model. Please consider moving this workload to the flat-rate reservation pricing model, which does not have this limit. 24547 CPU seconds were used, and this query must use less than 5100 CPU seconds.

@wlach You probably have the most context here. Any ideas how this could be fixed?

Yeah this isn't totally surprising, #278 made these already complex queries more complex. I think we probably need to adjust it so that we just look for the event character (and not that of the properties) in the case that properties aren't specified. As things are right now, it will look at all properties individually. Unfortunately I suspect this might be a bit complicated to actually implement.

The short term solution I'd propose would be to use a sample_id for these queries. At least for Fenix, the sampled counts look pretty similar for at least one of the explores on this dashboard: https://mozilla.cloud.looker.com/explore/fenix/funnel_analysis?qid=H6Z4IJ4AD0CsjB4M7o88uf&origin_space=191&toggle=fil,vis

@wlach wlach added Glean Changes for Glean Pings and automated Glean derived tables Events Issues or enhancements to the events explores and removed Glean Changes for Glean Pings and automated Glean derived tables labels Dec 16, 2021
@sean-rose
Copy link
Contributor

I think we probably need to adjust it so that we just look for the event character (and not that of the properties) in the case that properties aren't specified. As things are right now, it will look at all properties individually. Unfortunately I suspect this might be a bit complicated to actually implement.

This has been implemented in #497, and should be deployed tomorrow.

@sean-rose
Copy link
Contributor

I also have an idea about removing unnecessary regular expression matching, which should reduce CPU resources required.

Currently the funnel SQL generation results in queries like this:

SELECT
  COUNT(
    CASE
      WHEN REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string],True))
        THEN 1
      ELSE NULL
    END
  ) AS step_1,
  COUNT(
    CASE
      WHEN (REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string],True)))
        AND (REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string,step_2.match_string],True)))
        THEN 1
      ELSE NULL
    END
  ) AS step_2,
  COUNT(
    CASE
      WHEN (REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string],True)))
        AND (REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string,step_2.match_string],True)))
        AND (REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string,step_2.match_string,step_3.match_string],True)))
        THEN 1
      ELSE NULL
    END
  ) AS step_3,
  COUNT(
    CASE
      WHEN (REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string],True)))
        AND (REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string,step_2.match_string],True)))
        AND (REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string,step_2.match_string,step_3.match_string],True)))
        AND (REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string,step_2.match_string,step_3.match_string,step_4.match_string],True)))
        THEN 1
      ELSE NULL
    END
  ) AS step_4

However, I believe the additional regular expressions for each subsequent step are supersets of the previous steps' regular expressions, so also matching all those previous regular expressions should be unnecessary, and we could have simpler queries like this:

SELECT
  COUNT(
    CASE
      WHEN REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string],True))
        THEN 1
      ELSE NULL
    END
  ) AS step_1,
  COUNT(
    CASE
      WHEN REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string,step_2.match_string],True))
        THEN 1
      ELSE NULL
    END
  ) AS step_2,
  COUNT(
    CASE
      WHEN REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string,step_2.match_string,step_3.match_string],True))
        THEN 1
      ELSE NULL
    END
  ) AS step_3,
  COUNT(
    CASE
      WHEN REGEXP_CONTAINS(funnel_analysis.events, mozfun.event_analysis.create_funnel_regex([step_1.match_string,step_2.match_string,step_3.match_string,step_4.match_string],True))
        THEN 1
      ELSE NULL
    END
  ) AS step_4

Assigning this to myself.

@sean-rose sean-rose self-assigned this Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Events Issues or enhancements to the events explores
Projects
None yet
Development

No branches or pull requests

3 participants