Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement page compilation on demand #330

Merged
merged 13 commits into from
Aug 21, 2023
Merged

Implement page compilation on demand #330

merged 13 commits into from
Aug 21, 2023

Conversation

leandrocp
Copy link
Contributor

@leandrocp leandrocp commented Aug 17, 2023

The main change is getting rid of Code.eval_quoted/3 which was necessary to render the pages since we were storing the template AST in ETS, but it was not scaling well.

The first design for page rendering was compiling a single module containing all page templates but it was causing a huge spike in memory utilization to compile 700+ pages, something around 4gb. The second design was then removing that module and storing all template AST in ETS, which consume only 2.6mb of data and the lookup is incredible fast but it doesn't scale to serve too many requests, especially sudden spikes, each Code.eval_quoted/3 call increases CPU utilization up to 100% faster than we can afford. The third design (this PR) is a mix of both, it will compile a module per page containing the render/1 function that returns a %Phoenix.LiveView.Rendered{} struct and also store page metadata in ETS, including the name of each page module. So the page module is used to store and serve the templates while ETS is an index of pages used to match the request with a page module. The last change required to make this work is compiling the render/1 function lazily, when the app starts the function is empty (returns :not_loaded) then on the first request to that page it will actually recompile the module with the actual implementation, this way there's no memory utilization spike during deployment allowing the app to start, and that recompilation is barely noticeable by users visiting the page, and also Elixir behaves much better compiling modules by demand than compiling all at once.

@leandrocp leandrocp changed the title Improve page rendering Implement page compilation on demand Aug 17, 2023
@bcardarella
Copy link
Contributor

What is the expected memory pressure change we should see? Have you validated that this is better?

@leandrocp
Copy link
Contributor Author

leandrocp commented Aug 17, 2023

What is the expected memory pressure change we should see? Have you validated that this is better?

During deployment:

First design (a single module): ~3 to 4gb memory (the server kept dying before we could measure precisely)
Current design (AST in ETS): 930mb
This PR (ETS + multiple modules): 630mb

After deployment, with the current design it will stabilize on 540mb and with this PR 470mb.

Compiling a couple of modules at once, for example to render the atom.xml feed, doesn't put too much pressure, it's quite fast, and Elixir is capable of freeing up memory after it's done compiling.

@leandrocp
Copy link
Contributor Author

leandrocp commented Aug 17, 2023

Regarding other metrics, Code.eval_quoted/3 is about 3k slower than a compiled function call. The average HTTP response time was more stable and faster, for example with the current design it would jump to 300ms quickly after a thousand requests and would reach 1.5s at the end of the test, but with a compiled module it kept below 100ms for a longer period and reached 400ms max avg. CPU utilization was also more stable, it reached 93% with 60k requests (didn't reach 100% which helps to not degrade requests) while the current approach reached 100% which is one of the main problems I'm solving with this PR.

@leandrocp
Copy link
Contributor Author

leandrocp commented Aug 17, 2023

These graphs give a good perspective.

Before:
Screenshot 2023-08-17 at 12 43 27 PM

After (ignore the data before 11:11)
Screenshot 2023-08-17 at 12 43 38 PM

That's a spike test on a shared-1x-cpu@1024MB fly.io instance running for 2m long, 1k simultaneous users, ~62k requests. But note that's a HTTP request, only the dead render. On real life the experience is better because Beacon does patch pages on pages transition.

@leandrocp leandrocp requested a review from bcardarella August 17, 2023 16:48
@leandrocp
Copy link
Contributor Author

And just to be clear, compiling multiple modules eagerly cause the same memory utilization behaviour. It demanded at least 1gb memory and the server crashed, which I think is not viable, that's why I made it on demand. Which is basically how it handles page publishing.

@bcardarella
Copy link
Contributor

@leandrocp how does this compare to a baseline? It would be interesting to see what a mix phx.new app looks like in comparison to where Beacon is at?

Also, I assume the vast majority of the memory are due to the blog posts?

@leandrocp
Copy link
Contributor Author

@leandrocp how does this compare to a baseline? It would be interesting to see what a mix phx.new app looks like in comparison to where Beacon is at?

I'll do a comparison and post the results.

Also, I assume the vast majority of the memory are due to the blog posts?

Yes that's correct.

@leandrocp
Copy link
Contributor Author

To compare apples to apples I've created a new Phx app with a simple LiveView returning just "ok" with an empty layout, and the same for Beacon.

Both running on a shared-1x-cpu@1024MB fly.io instance. The spike test ran for 5m up to 1000 simultaneous users.

Baseline maxed out at 397213 requests:

baseline spike 5m 1000vu

Beacon maxed out at 275043 requests:

beacon spike 5m 1000vu

That beacon page is running behind basic auth, I couldn't determine how much it impacts the tests. I've executed it multiple times to rule out network inconsistencies and the results were pretty stable.

The previous test was loading test the "/newsletter" page on staging, that layout is larger and that page is using components.

@leandrocp
Copy link
Contributor Author

leandrocp commented Aug 17, 2023

And here's the same test with the current design. It executed 243063 requests.

261450481-4da572f5-5e98-4a49-8eff-77f2936eb578

Copy link
Contributor

@APB9785 APB9785 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome 🥳 Glad to see this approach worked out

@bcardarella
Copy link
Contributor

I suspect a better test may be comparing it to a regular heex template with the same markup as something that is in dockyard.com

@leandrocp
Copy link
Contributor Author

I suspect a better test may be comparing it to a regular heex template with the same markup as something that is in dockyard.com

Excluding pages that query the DB, a candidate for such test would be https://dockyard.com/newsletter

I'm gonna prepare it and post the results tomorrow.

@bcardarella
Copy link
Contributor

bcardarella commented Aug 18, 2023 via email

@leandrocp
Copy link
Contributor Author

leandrocp commented Aug 18, 2023

@bcardarella using flame_on I was able to optimize a bit further to increase requests throughput from 275043 to 309795 requests (compared to 397213 from baseline and 243063 from current design). Still using that simple page that returns a "ok" string:

beacon impr 5m 1000vu

If we don’t see nearly identical results we should see why

I think it's important to define what "nearly identical" means in numbers, for eg with the latest changes we're 22% behind the baseline in terms of requests throughput. There might be more room for improvements but how much is the goal? It's also important to keep in mind that a beacon app will perform more operations than a baseline app with a regular heex template, including page title, meta tags (site, layout, and page), raw schema, dynamic routing, generating the asset path, beacon live data, finding the layout and page modules in memory, and so on. All those operations are either non-existing or static in the baseline app.

In the flame graph below you can see that Beacon is not adding too many calls in the stack:

flame

@bcardarella
Copy link
Contributor

oh wait, we're stress testing with dockyard.com? That may be starting a bunch of other things. Sorry to keep throwing things on your plate here but are we able to deploy a Beacon instance on a new phx app? We can put this on Fly and give it all the resources it needs for comparison.

@leandrocp
Copy link
Contributor Author

leandrocp commented Aug 18, 2023

oh wait, we're stress testing with dockyard.com?

staging (same config as PROD) :)
that's a shared-1x-cpu@1024MB instance

are we able to deploy a Beacon instance on a new phx app?

I think staging is fine, no?
No one is using it, it's reserved for this testing session.

We can put this on Fly and give it all the resources it needs for comparison.

That's doable, but I'm trying to find the highest number possible in that small instance, and also proving that Beacon is resilient. I think there's value in doing so, serving 300k+ requests in a span of 5m without crashing the server while keeping the average HTTP requests reasonable slow (it's fast while it doesn't hit 100% CPU then it gets to ~800ms avg)

@bcardarella
Copy link
Contributor

bcardarella commented Aug 18, 2023

I think staging is fine, no?

It's mostly that if we are comparing this to a mix phx.new app we should eliminate all variables for the best and most accurate results. I figure a new app with a fresh install of Beacon will give us the best picture on that. Or, if we wanted to frame the comparison of an actual in-use production CMS it is OK to leave as dockyard.com. It just depends what you want to convey to the audience. I'd bet that we have some things in dockyard.com that impact performance which are OK for our needs but may not present Beacon to its fullest potential if our baseline is a basic empty app.

@leandrocp
Copy link
Contributor Author

leandrocp commented Aug 18, 2023

It just depends what you want to convey to the audience.

I believe it's more valuable to present the numbers from dockyard.com than a dummy page or something more static, so it represents how Beacon is performing for real. I'm using staging to avoid taking production down but it should give similar results since the environment is pretty similar.

I'd bet that we have some things in dockyard.com that impact performance

There's is and I've removed some function calls for that particular release, stuff like Logger for example. But there's more code that I can't remove.

I figure a new app with a fresh install of Beacon will give us the best picture on that.

Yes, that would be the best scenario to profile Beacon against a baseline app in order to fine tune it. I can use https://github.com/BeaconCMS/beacon_demo for that kind of test.

So I'll perform some tests with beacon_demo comparing to https://summer-sun-6104.fly.dev/ok (that's the baseline app) to double check I'm not missing any obvious optimization but I think this PR is in the right direction. Do you see any blocking issue this PR?

@leandrocp
Copy link
Contributor Author

Updated results. Running for 5m up to 1000 users on a shared-1x-cpu@1024MB instance.

Baseline is a simple phx app with a liveview and Beacon is https://github.com/BeaconCMS/beacon_demo with a layout and page. Both rendering the same HTML at the end.

Baseline

https://summer-sun-6104.fly.dev/ok
262329 requests

baseline spike 5m 1000vu 262329reqs

Beacon main (current design)

https://dy-beacon-demo.fly.dev/demo/ok
212224 requests

beacon current spike 5m 1000vu 212224reqs

Beacon with this PR changes

https://dy-beacon-demo.fly.dev/demo/ok
247442 requests

beacon_demo spike 5m 1000vu 247442reqs

@bcardarella
Copy link
Contributor

This looks good. The PR brings the closest apples to apples comparison > 90%.

@bcardarella
Copy link
Contributor

For the purposes of your ElixirConf presentation it may be worth setting up a basic WP or Drupal site and comparing their perf numbers

@leandrocp leandrocp merged commit c781cab into main Aug 21, 2023
@leandrocp leandrocp deleted the lp/impr-rendering branch August 21, 2023 18:18
@AZholtkevych AZholtkevych linked an issue Aug 22, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Core -> General Improvements: Beacon optimization
3 participants