-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global seach hangs UI for tens of seconds with large numbers of jobs #8549
Comments
Hi @optiz0r, thank you for this report! It's incredibly helpful to hear how features are performing in the wild. I have some back story and a couple questions for you. First, a note on the Clearly having the UI hang for 18s is not a good UX, so we need to do something about that.
When you say the call takes 18s is that the time it takes for the API request to respond or the amount of time before the UI is interactive again? With a busy cluster, are you experiencing the UI hanging or generally chugging elsewhere or is it just the global search feature? Are you only experiencing the UI being slow or are all cluster actions slow? |
Makes sense! I agree substring matches are more useful to have here, particularly as many of our jobs have a common prefix.
The 18s I quoted is what chrome dev tools had as the total time for the /v1/jobs request. The total traffic for the search requests is ~47KB, so not tiny but also not enormous. This was the longest time I saw. I had repeated the search multiple times in succession and saw this drop down to about 12s. The UI definitely feels frozen for longer than the download time of that one request. It's worth noting, the UI locks up immediately on first keypress into the search box, so quickly that often it doesn't render any of the characters typed into the search box until after it finished doing its thing (or maybe just the first one). The typed characters suddenly appear in the text field at the same time as the results are rendered. It feels very much like the search operation is being done in the main thread, blocking everything else, and would be better done in a separate thread/worker. -- Repeating my tests this morning, I'm seeing the /v1/jobs and /v1/nodes calls complete in tens of milliseconds rather than seconds, but the UI is still locking up for ~10 seconds on each search. I was doing rolling upgrades of nomad clients when I first noticed, so perhaps the cluster was mroe busy than usual running allocations at the time, but that's not the whole story here. If it makes any difference, I have been running that search while I had the CSI storage view open (because there's no background ajax calls, which made it easier for me to see in dev tools what was happening during the search). Does that mean you don't have any of the job/node data already available in memory to search on and so it takes longer to fetch them on-demand? Repeating the search on the jobs list view, I see slightly different behaviour in the dev tools. I only see a /v1/nodes request appear in the network tab, no /v1/jobs request. I do see regular hits for jobs?index=... as the jobs list update in the background, but these also stop appearing for the duration of the search operation.
I have not been looking for, or noticed any other slowness.
UI seems reasonably snappy otherwise, it's just the global search that's noticeably slow. Most of our workload on this cluster is batch jobs which are completed but not yet GC'd. You could probably re-create the experience quite easily by spawning a few hundred dummy batch that terminate instantly (e.g. exec "echo hello world"). There's no need to have hundreds of running service jobs. Ben |
This closes #8549. Thanks to @optiz0r for the bug report. Having the global search attempt to render every returned result is obviously a mistake! I chose to have the full number of matches still render, though I also considered having it display (10+) instead. The choice of truncating at 10 is arbitrary, maybe a higher number would be preferable, I’m open to suggestions.
Hey @optiz0r! Thanks again for your thorough report of your experience. It led @backspace to find a definite UI performance issue that will go out with the next Nomad release! |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.12.1
Issue
On a busy cluster with large numbers of jobs, the global search hangs up the UI for tens of seconds while it retrieves and processes the job lists and displays the results. For example, a call to
v1/jobs
has been observed to take 18s to return 762 rows of job data, and the UI was hung for all that time. There is no visible indication that the search is doing anything, page elements stop reacting to input events, and the cursor in the search box stops flashing.Reproduction steps
On a cluster with a large number of jobs:
Notes
I see the UI is making its own calls to
/v1/jobs
and/v1/nodes
and doing the processing client side, and seems to be doing this in the main thread. I see there is also a/v1/search
api that can do prefix matching. Would it make more sense to use this API endpoint to do the filtering server side, and update the UI asynchronously?The text was updated successfully, but these errors were encountered: