Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: split events lookup from exception grouping query #24711

Merged
merged 17 commits into from
Sep 2, 2024

Conversation

daibhin
Copy link
Contributor

@daibhin daibhin commented Aug 30, 2024

Problem

Inspired by #24511

The ErrorTrackingQuery to fetch the errors for a given fingerprint was super slow because we were using groupArray(tuple(...)) for all the events in the period.

While we could have rewritten the events lookup as a subquery we would then have two levels of a "limit" in the query (one at the group level and one at the events level). It all just felt like the one query was doing too much

Changes

Split things into two queries...

  1. Removing anything from ErrorTrackingQuery that relates to individual events
  2. Lookup events using a separate EventsQuery

Does this work well for both Cloud and self-hosted?

Yes

How did you test this code?

Tests should remain exactly the same except for the removed ones

Comment on lines -200 to -234
def parse_embedded_events_and_persons(self, columns: list[str], events: list):
person_indices: list[int] = []
for index, col in enumerate(columns):
if col == "person":
person_indices.append(index)

if len(person_indices) > 0 and len(events) > 0:
person_idx = person_indices[0]
distinct_ids = list({event[person_idx] for event in events})
persons = get_persons_by_distinct_ids(self.team.pk, distinct_ids)
persons = persons.prefetch_related(Prefetch("persondistinctid_set", to_attr="distinct_ids_cache"))
distinct_to_person: dict[str, Person] = {}
for person in persons:
if person:
for person_distinct_id in person.distinct_ids:
distinct_to_person[person_distinct_id] = person

for column_index in person_indices:
for index, result in enumerate(events):
distinct_id: str = result[column_index]
events[index] = list(result)
if distinct_to_person.get(distinct_id):
person = distinct_to_person[distinct_id]
events[index][column_index] = {
"uuid": person.uuid,
"created_at": person.created_at,
"properties": person.properties or {},
"distinct_id": distinct_id,
}
else:
events[index][column_index] = {
"distinct_id": distinct_id,
}

return [dict(zip(columns, value)) for value in events]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get to remove all of this which was copied over from the EventsQuery anyways

Copy link
Contributor

github-actions bot commented Aug 30, 2024

Size Change: 0 B

Total Size: 1.11 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.11 MB

compressed-size-action

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Comment on lines +160 to +164
const where = [
`has(${stringifyFingerprints(
fingerprints
)}, JSONExtract(ifNull(properties.$exception_fingerprint,'[]'),'Array(String)'))`,
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like writing HogQL on the frontend and having to parse it on the backend

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@daibhin daibhin requested a review from a team September 2, 2024 14:35
@@ -21,10 +21,10 @@ const meta: Meta = {
post: {
'/api/projects/:team_id/query': async (req, res, ctx) => {
const query = (await req.clone().json()).query
if (query.fingerprint) {
return res(ctx.json(errorTrackingGroupQueryResponse))
if (query.kind === 'ErrorTrackingQuery') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should ErrorTrackingQuery be an enum somewhere so we apply defensive programming?

@daibhin daibhin merged commit 623df91 into master Sep 2, 2024
93 checks passed
@daibhin daibhin deleted the dn-fix/split-error-event-lookup branch September 2, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants