UI becomes unresponsive when there are a large number of completed/running taskruns #2324

rouke-broersma · 2022-03-11T11:08:56Z

Describe the bug

The browser crashes due to the amount of TaskRuns loaded at the same time

Expected behaviour

The browser does not crash simply from opening the page

Steps to reproduce the bug

Have ~1500-2000 completed and/or running taskruns, might probably also happen for other types

Environment details

Kubernetes Platform:
Kubernetes
Kubernetes or OpenShift version:
AKS 1.22.6
Install mode (if on OpenShift):
Helm chart
Cloud-provider/provisioner:
AKS
Versions:
- Tekton Dashboard:
  v0.24.1
- Tekton Pipelines:
  v0.33.2
Install namespaces:
- Tekton Dashboard:
  tekton-pipelines
- Tekton Pipelines:
  tekton-pipelines

Additional Info

Some pagination would probably be helpful

AlanGreene · 2022-03-11T12:30:51Z

Thanks for reporting this @rouke-broersma. It's a known problem when there are a large number of resources on the cluster and has been discussed many times in the past both in GitHub issues and on Slack. We're currently facing some limitations of the Kubernetes API although we do have some plans to address this.

See #1978 (comment) for some information on how we currently manage this in our own dogfooding cluster by removing older runs as well as plans for integration with Tekton Results which we hope will allow us to address some of the concerns about pagination, large numbers of resources, etc.

rouke-broersma · 2022-03-11T12:37:03Z

Sorry, I looked through the open issues and could not find anything.

AlanGreene · 2022-03-11T12:38:40Z

No problem at all. It took me a while to find them and I already knew what I was looking for 😅

rouke-broersma · 2022-03-11T12:41:00Z

On my end it seems like the issue is not necessarily that we have to wait a long time to get the response from the k8s api but rather it seems that the rendering of all the items is the problem. Could the pagination not be performed client side instead, so that at least the items are not all rendered?

AlanGreene · 2022-03-11T12:45:28Z

We tried that in the past but it didn't make a difference in terms of the perceived performance, but may be worth investigating again since we have made a large number of other performance related and client architectural changes in the meantime.

Can you share some more details about the number/size of the resources you're working with, the total response size for the list, or some examples? This could be helpful when we try to reproduce the issue and test any potential improvements.

If you could include request/response time and time to load in the UI that would be great.

It will likely be towards the end of next week before I can dedicate much time to this.

rouke-broersma · 2022-03-11T12:52:04Z

The size of the taskrun response is 3MB according to google chrome and takes 12 seconds to load. This contains about 2000 items I think. I don't think it causes much problems when there are no taskruns running, but when the taskruns are running the tab becomes unresponsive and chrome tries to kill the tab. We have about 20 taskruns running at the same time as a maximum, but 10 at the same time is more common.

My coworker has tried to workaround this with a client side script that throws away all table rows other than the last 100, and he claims he does not have any tab freezing issues.

AlanGreene · 2022-03-11T13:42:05Z

Interesting. Do your Tasks have many steps? Are they short-lived or longer running?

Would it be possible to share your colleague's script or provide more details about what it does / how often it runs? This would help narrow down the types of change that could be most beneficial for your use case, and provide a baseline for comparison of any performance improvements we might make.

rouke-broersma · 2022-03-11T13:56:20Z

The taskruns have about 5-10 steps and run depending on the type (split by namespace) of taskrun between 1 and 30 minutes. The majority of taskruns are of a shorter duration (couple minutes at most) and only once in a while do we have a taskrun that runs for up to 30 minutes.

@maartengo could you provide the details about your modifications

maartengo · 2022-03-11T14:23:17Z

The script can be found here: https://pastebin.com/Mi2HNwT9
We currently have 2382 elements shown on the TaskRuns page, with at most 7 steps per run.

In short it:

Removes all table rows after the first 100 table rows
Hides any table row after the first 30 table rows, this reduces the amount of errors
Keeps doing the above actions every 1-5 seconds, depending on the content of the page

I haven't tested what the performance would be if there was some actual pagination instead of removing the elements.

AlanGreene · 2022-03-18T13:23:05Z

Thanks, this should be very helpful in trying to reproduce the issue and testing any potential improvements.

AlanGreene · 2022-03-18T14:00:11Z

Initial testing with #2327 against our dogfooding cluster shaves ~.5s off 3s load time for 1300 TaskRuns (<1MB) so at least it's heading in the right direction… I'll need to increase the number of resources and have some in progress to get a proper feel for what impact this might have for your use case but at least it's not slower 😄

I'll see if I can publish a test release later today containing the change, otherwise feel free to pull my branch and build it locally. If this works out I'll need to clean up the change a bit and we'll likely apply it to all pages (or at least TaskRuns + PipelineRuns to start with).

AlanGreene · 2022-03-18T15:28:09Z

Published test release pagination-test-20220318:

maartengo · 2022-03-18T15:51:43Z

Good news! Although the page load time is still ~5 seconds, the page is actually responsive afterwards! This is a huge improvement over the regular freezes we used to have.

rouke-broersma · 2022-04-22T09:59:53Z

@AlanGreene Is there a way to move forward with this change? It would be very useful to us :)

AlanGreene · 2022-04-22T10:44:20Z

I haven't had time to work on this recently but will pick it up again soon. The existing PR needs a bit of cleanup and some tests, then we should be able to apply the same changes to PipelineRuns. I'll update the PR by middle of next week 🤞

AlanGreene · 2022-04-22T14:44:03Z

I've updated the PR with a quick first pass at adding pagination to all list pages using a slightly different approach. I'll finish cleaning it up early next week and make sure all tests are passing before marking it ready for review.

AlanGreene · 2022-04-26T12:38:14Z

Client-side pagination is now available on all list pages in the latest nightly release, e.g. https://storage.googleapis.com/tekton-releases-nightly/dashboard/previous/v20220426-14cce744d6/tekton-dashboard-release.yaml

Thanks again @rouke-broersma @maartengo for reporting the issue and helping to validate the change.

This will be included in the next Dashboard release, v0.26.0 due May 5 - 10.

rouke-broersma · 2022-04-26T13:03:13Z

That's awesome thank you!

rouke-broersma added the kind/bug Categorizes issue or PR as related to a bug. label Mar 11, 2022

drriguz mentioned this issue Apr 7, 2022

Support grouping pipelineRuns #2335

Closed

AlanGreene self-assigned this Apr 22, 2022

AlanGreene mentioned this issue Apr 22, 2022

Add client-side pagination #2327

Merged

4 tasks

AlanGreene closed this as completed May 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UI becomes unresponsive when there are a large number of completed/running taskruns #2324

UI becomes unresponsive when there are a large number of completed/running taskruns #2324

rouke-broersma commented Mar 11, 2022

AlanGreene commented Mar 11, 2022

rouke-broersma commented Mar 11, 2022

AlanGreene commented Mar 11, 2022

rouke-broersma commented Mar 11, 2022

AlanGreene commented Mar 11, 2022 •

edited

Loading

rouke-broersma commented Mar 11, 2022

AlanGreene commented Mar 11, 2022

rouke-broersma commented Mar 11, 2022

maartengo commented Mar 11, 2022

AlanGreene commented Mar 18, 2022

AlanGreene commented Mar 18, 2022 •

edited

Loading

AlanGreene commented Mar 18, 2022

maartengo commented Mar 18, 2022

rouke-broersma commented Apr 22, 2022

AlanGreene commented Apr 22, 2022

AlanGreene commented Apr 22, 2022 •

edited

Loading

AlanGreene commented Apr 26, 2022

rouke-broersma commented Apr 26, 2022

UI becomes unresponsive when there are a large number of completed/running taskruns #2324

UI becomes unresponsive when there are a large number of completed/running taskruns #2324

Comments

rouke-broersma commented Mar 11, 2022

Describe the bug

Expected behaviour

Steps to reproduce the bug

Environment details

Additional Info

AlanGreene commented Mar 11, 2022

rouke-broersma commented Mar 11, 2022

AlanGreene commented Mar 11, 2022

rouke-broersma commented Mar 11, 2022

AlanGreene commented Mar 11, 2022 • edited Loading

rouke-broersma commented Mar 11, 2022

AlanGreene commented Mar 11, 2022

rouke-broersma commented Mar 11, 2022

maartengo commented Mar 11, 2022

AlanGreene commented Mar 18, 2022

AlanGreene commented Mar 18, 2022 • edited Loading

AlanGreene commented Mar 18, 2022

maartengo commented Mar 18, 2022

rouke-broersma commented Apr 22, 2022

AlanGreene commented Apr 22, 2022

AlanGreene commented Apr 22, 2022 • edited Loading

AlanGreene commented Apr 26, 2022

rouke-broersma commented Apr 26, 2022

AlanGreene commented Mar 11, 2022 •

edited

Loading

AlanGreene commented Mar 18, 2022 •

edited

Loading

AlanGreene commented Apr 22, 2022 •

edited

Loading