Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch Week Blog Post: Billion event querying #3066

Closed
2 tasks
andyvan-ph opened this issue Mar 1, 2022 · 11 comments
Closed
2 tasks

Launch Week Blog Post: Billion event querying #3066

andyvan-ph opened this issue Mar 1, 2022 · 11 comments
Assignees

Comments

@andyvan-ph
Copy link
Contributor

andyvan-ph commented Mar 1, 2022

Issue to dicuss and coordinate blog post for the upcoming launch week: see issue.

  • Copy Deadline: March 16
  • Publish Date: March 22
  • Outline: Deep dive into how we developed the platform to handle massive event loads (tentative)

To Do

@andyvan-ph
Copy link
Contributor Author

Note: outline is only placeholder, so feel free to suggest more interesting approaches.

@macobo
Copy link
Contributor

macobo commented Mar 2, 2022

The original plan was to follow https://github.com/PostHog/product-internal/pull/240 and document that.

Of that issue, we will have managed to build out at the time of the release:

    1. Improved deployments (helm chart refactoring done by @guidoiaquinti and I)
    1. Upgrading clickhouse
    1. Sharded clickhouse on self-hosted
    1. Schema: Including event in the events sort key

In addition we've:

  • Had a few painful experience scaling out postgres and debugging it. cc @guidoiaquinti for that side of things
  • Made it possible to run posthog with Altinity Cloud
  • Introduced a new async migrations tooling for larger data migrations going forward (cc @yakkomajuri / @tiina303)
  • Internally constructed a plan for changing product semantics and how person data is stored Future of our person model meta#39
  • person_distinct_id2 for everyone

Do you think that's enough material for the blog post? What to focus on in there?

Sadly we haven't gotten to the biggest bits due to a lack of resources/time/prerequisites.

cc @marcushyett-ph

@marcushyett-ph
Copy link
Contributor

marcushyett-ph commented Mar 2, 2022

Given we still have a way to go to hit the seamless Billion event querying milestone... I would advocate for focussing this article on:

How we improved query performance by X% for our largest projects

We can split this into a few short technical sections, describing the motivation and the nuances of our solutions, i.e:

  • new person_distinct_id table
  • Schema
  • Sharded clickhouse

I think it's worth having a separate post on deployments and helm chart refactoring @andyvan-ph @guidoiaquinti (e.g. "How we simplified our 10 service cross-platform self-hosted deployment")

@andyvan-ph
Copy link
Contributor Author

andyvan-ph commented Mar 2, 2022

RE: above, 'How we improved query performance by X% for our largest projects' sounds good to me if everyone's happy with that?

I think it's worth having a separate post on deployments and helm chart refactoring @andyvan-ph @guidoiaquinti (e.g. "How we simplified our 10 service cross-platform self-hosted deployment")

Yup, agreed. That's covered in #3067

@lottiecoxon
Copy link
Collaborator

idea for this I wanted to run past you @andyvan-ph, for the new title of query performance - I thought it would be fun to do a sport scientist scene with hedgehogs, one running on a treadmill and the others in lab coats assessing how to make improvements?

@andyvan-ph
Copy link
Contributor Author

idea for this I wanted to run past you @andyvan-ph, for the new title of query performance - I thought it would be fun to do a sport scientist scene with hedgehogs, one running on a treadmill and the others in lab coats assessing how to make improvements?

Sounds great, Lottie. Go for it

@lottiecoxon
Copy link
Collaborator

Done! on Figma ready for export when you need them

@andyvan-ph
Copy link
Contributor Author

@lottiecoxon thanking you 👍

@paolodamico
Copy link
Contributor

Deadline for this is in two days, @macobo need any help getting this out the door?

@macobo
Copy link
Contributor

macobo commented Mar 14, 2022

Nothing written yet focussing on projects.. Was planning to draft this tomorrow and Wednesday.

@guidoiaquinti
Copy link
Contributor

guidoiaquinti commented Mar 14, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants