Global seach hangs UI for tens of seconds with large numbers of jobs #8549

optiz0r · 2020-07-28T08:43:17Z

Nomad version

Nomad v0.12.1

Issue

On a busy cluster with large numbers of jobs, the global search hangs up the UI for tens of seconds while it retrieves and processes the job lists and displays the results. For example, a call to v1/jobs has been observed to take 18s to return 762 rows of job data, and the UI was hung for all that time. There is no visible indication that the search is doing anything, page elements stop reacting to input events, and the cursor in the search box stops flashing.

Reproduction steps

On a cluster with a large number of jobs:

Type something into the global search box
Observe UI is stalled

Notes

I see the UI is making its own calls to /v1/jobs and /v1/nodes and doing the processing client side, and seems to be doing this in the main thread. I see there is also a /v1/search api that can do prefix matching. Would it make more sense to use this API endpoint to do the filtering server side, and update the UI asynchronously?

The text was updated successfully, but these errors were encountered:

DingoEatingFuzz · 2020-07-30T05:35:23Z

Hi @optiz0r, thank you for this report! It's incredibly helpful to hear how features are performing in the wild. I have some back story and a couple questions for you.

First, a note on the /v1/search API: it only supports prefix search (to power CLI autocomplete) and we really wanted fuzzy find behavior. It wasn't an easy call to make but we figured since we already have to fetch all jobs for the jobs list page and all nodes for the clients list page that we could get away with this approach and ultimately have a better UX.

Clearly having the UI hang for 18s is not a good UX, so we need to do something about that.

For example, a call to v1/jobs has been observed to take 18s to return 762 rows of job data, and the UI was hung for all that time.

When you say the call takes 18s is that the time it takes for the API request to respond or the amount of time before the UI is interactive again?

With a busy cluster, are you experiencing the UI hanging or generally chugging elsewhere or is it just the global search feature?

Are you only experiencing the UI being slow or are all cluster actions slow?

optiz0r · 2020-07-30T09:40:59Z

First, a note on the /v1/search API: it only supports prefix search (to power CLI autocomplete) and we really wanted fuzzy find behavior. It wasn't an easy call to make but we figured since we already have to fetch all jobs for the jobs list page and all nodes for the clients list page that we could get away with this approach and ultimately have a better UX.

Makes sense! I agree substring matches are more useful to have here, particularly as many of our jobs have a common prefix.

When you say the call takes 18s is that the time it takes for the API request to respond or the amount of time before the UI is interactive again?

The 18s I quoted is what chrome dev tools had as the total time for the /v1/jobs request. The total traffic for the search requests is ~47KB, so not tiny but also not enormous. This was the longest time I saw. I had repeated the search multiple times in succession and saw this drop down to about 12s. The UI definitely feels frozen for longer than the download time of that one request.

It's worth noting, the UI locks up immediately on first keypress into the search box, so quickly that often it doesn't render any of the characters typed into the search box until after it finished doing its thing (or maybe just the first one). The typed characters suddenly appear in the text field at the same time as the results are rendered. It feels very much like the search operation is being done in the main thread, blocking everything else, and would be better done in a separate thread/worker.

--

Repeating my tests this morning, I'm seeing the /v1/jobs and /v1/nodes calls complete in tens of milliseconds rather than seconds, but the UI is still locking up for ~10 seconds on each search. I was doing rolling upgrades of nomad clients when I first noticed, so perhaps the cluster was mroe busy than usual running allocations at the time, but that's not the whole story here.

If it makes any difference, I have been running that search while I had the CSI storage view open (because there's no background ajax calls, which made it easier for me to see in dev tools what was happening during the search). Does that mean you don't have any of the job/node data already available in memory to search on and so it takes longer to fetch them on-demand?

Repeating the search on the jobs list view, I see slightly different behaviour in the dev tools. I only see a /v1/nodes request appear in the network tab, no /v1/jobs request. I do see regular hits for jobs?index=... as the jobs list update in the background, but these also stop appearing for the duration of the search operation.

Are you only experiencing the UI being slow or are all cluster actions slow?

I have not been looking for, or noticed any other slowness.

With a busy cluster, are you experiencing the UI hanging or generally chugging elsewhere or is it just the global search feature?

UI seems reasonably snappy otherwise, it's just the global search that's noticeably slow.

Most of our workload on this cluster is batch jobs which are completed but not yet GC'd. You could probably re-create the experience quite easily by spawning a few hundred dummy batch that terminate instantly (e.g. exec "echo hello world"). There's no need to have hundreds of running service jobs.

Ben

@optiz0r

This closes #8549. Thanks to @optiz0r for the bug report. Having the global search attempt to render every returned result is obviously a mistake! I chose to have the full number of matches still render, though I also considered having it display (10+) instead. The choice of truncating at 10 is arbitrary, maybe a higher number would be preferable, I’m open to suggestions.

@optiz0r

This closes #8549. Thanks to @optiz0r for the bug report. Having the global search attempt to render every returned result is obviously a mistake!

DingoEatingFuzz · 2020-08-05T21:13:07Z

Hey @optiz0r!

Thanks again for your thorough report of your experience. It led @backspace to find a definite UI performance issue that will go out with the next Nomad release!

github-actions · 2022-11-04T02:37:46Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

backspace added type/bug theme/ui labels Jul 28, 2020

backspace mentioned this issue Jul 30, 2020

UI: Add truncation of rendered search results #8571

Merged

backspace closed this as completed in #8571 Aug 5, 2020

backspace added a commit that referenced this issue Aug 5, 2020

UI: Add truncation of rendered search results (#8571)

f5c8e28

This closes #8549. Thanks to @optiz0r for the bug report. Having the global search attempt to render every returned result is obviously a mistake!

github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global seach hangs UI for tens of seconds with large numbers of jobs #8549

Global seach hangs UI for tens of seconds with large numbers of jobs #8549

optiz0r commented Jul 28, 2020

DingoEatingFuzz commented Jul 30, 2020

optiz0r commented Jul 30, 2020 •

edited

Loading

DingoEatingFuzz commented Aug 5, 2020

github-actions bot commented Nov 4, 2022

Global seach hangs UI for tens of seconds with large numbers of jobs #8549

Global seach hangs UI for tens of seconds with large numbers of jobs #8549

Comments

optiz0r commented Jul 28, 2020

Nomad version

Issue

Reproduction steps

Notes

DingoEatingFuzz commented Jul 30, 2020

optiz0r commented Jul 30, 2020 • edited Loading

DingoEatingFuzz commented Aug 5, 2020

github-actions bot commented Nov 4, 2022

optiz0r commented Jul 30, 2020 •

edited

Loading