Server-side Blazor E2E performance and capacity testing

**Edit: @rynowak hijacking top post for great justice**

## Summary

We need to develop infrastructure and a plan for testing Server-Side Blazor's performance and capacity (scalability) to:
- Find and fix performance problems
- Find and fix reliability problems
- Provide guidance to users for capacity planning

## Workload

We should use a port of the Blazing Pizza app to drive our testing. This is a realistic sample of a *small* app, and it is a full end-to-end (includes data access, and some background work on the server). There's a good section of features represented here and we have to crawl before we can run.

## Scenarios

Run these in our perf lab.

### Performance

Run Blazing Pizza on the VM, and spin up enough clients to saturate the CPU or thrash the memory. Figure out which resource is the most scarce, and then optimize, and repeat.

I propose we automate a canonical scenario like placing a pizza order - then measure *render operations per-second* for that fixed test script. 

- [ ] Create a checked in automated performance test in our perf lab that reports *render operations per-second*.
- [ ] Analyze results and log issues for improvement for some time-boxed amount of time. As with all perf work this is open ended so the size of this item is flexible.

### Capacity and Reliability

Run Blazing Pizza on the VM, and spin up a bunch of clients, but have them run the test script **slowly** (with pauses), let's say one operation per-second. The goal is to simulate a bunch of users in actual use of the web site to try and get a realistic estimate of a user count, and what kinds of resources your run out of first.

- [x] Determine a baseline number for capacity in this scenario. How many client using our standard script will exhaust a resource on the server?
- [x] Analyze results and log issues for improvement for some time-boxed amount of time. Depending on the results of investigating the capacity, the priority will vary.
- [ ] Create a checked in automated reliability test in our perf lab than can run for a long period of time. The number of client used for this should put the server at 75% capacity.
- [ ] Analyze results for memory growth and reliability issues. The bar for this category is stricter, we need to address all issues that cause memory growth or reliability problems.

<h2 id="Security">Security</h2>

To further capacity and reliability, we have to understand the characteristics of Blazor in the event a client performs malicious actions against the server. To do this, here are some baselines we need to understand:

* [x] The most number of clients a server can handle before it runs out of resources (memory / CPU / others?). 
~~* [ ] https://github.com/aspnet/AspNetCore/issues/12003 - What happens when client events are raised faster than the server can process them? What is a reasonable limit to queued client events?~~
* [x] Similarly what should be the limit for incoming JS Interop calls.
* [x] ~~https://github.com/aspnet/AspNetCore/issues/11964 - The server will queue up renders until the client acks~~
    - [x] ~~What is a reasonable limit here? What does it look like once we exhaust the queue.~~
    - [x] ~~Similarly if the client disconnects but we keep accumulating renders.~~

## Issues


## Techniques

**TLDR** we're writing a headless Blazor client in .NET.

There's an appealing low investment strategy here where we use selenium to automate headless browsers. This will be easy to accomplish because we're already using selenium for E2E tests. However this doesn't meet the goals because we're going to cap out at 20-30 browsers per client machine, and then be faced with the difficult challenge of coordinating multiple client machines. Additionally each *operation* in selenium invokes polling the DOM to see if it's changed. This makes any test executed in selenium *slow* - which means that we probably cannot do meaningful performance testing with a small number of clients.

Another approach with a slightly higher investment would be to write a test client using Node.js and a DOM library in ts/js. This would be more involved than selenium because we need to mock all of Blazor's interactions with the browser and DOM. This will be faster and scale better than selenium, and is an appealing option.

There's a higher investment strategy where we develop a custom *test* client for writing DOM-driven tests in .NET. This would require us writing a SignalR client that's capable of doing a *Blazor* handshake and then simulating the interface between the server and the client over the SignalR connection. The test client would still need to have some representation of the DOM (for verification, synchronization) and the set of *current* event handlers (for interaction). The biggest advantage here is that we're not using any of the Blazor client-side code - we can simulate a hostile client in many ways. 

Of the choices the last two (Node and .NET clients) both meet the requirements, but the .NET client approach will allow us to write more kinds of tests, so it seems more valuable. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server-side Blazor E2E performance and capacity testing #10449

Summary

Workload

Scenarios

Performance

Capacity and Reliability

Security

Issues

Techniques

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Server-side Blazor E2E performance and capacity testing #10449

Description

Summary

Workload

Scenarios

Performance

Capacity and Reliability

Security

Issues

Techniques

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions