-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add budgets to Actor queue flushing #9022
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks like a good compromise! I'm seeing 15-20 updates per second with chrome on osx for animate.html. If I bump the main thread time budget to 2 and task budget to 20 as well, then it's back up to 30-40 updates per second which is what I was getting with my original fix.
src/util/actor.js
Outdated
this._processQueueTop(); | ||
taskCtr++; | ||
} | ||
// We've reached our budget for this frame, defer processingo the rest for the netx frame, this lets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some typos here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement, but still a bummer to have a 2.5x regression for tiles stuck in a queue compared to the previous version.
One random idea that you already considered perhaps, but still thought I'd mention — can we differentiate the way we process the queue not based on arbitrary frame budgets, but based on a heuristic that somehow categorizes sources as either heavy / rarely updated (e.g. basemap vector tiles) or lightweight / frequently updated (e.g. animating GeoJSON)? That way we could have optimal performance in both cases if we can reliably categorize between the two. Perhaps by tracking how often the source is updated and switching dynamically?
Also, I wonder if it's necessary to have the task budget at all? If we just have a time budget, then a lot of fast tasks can happen during animation but it will throttle heavy operations. If I change task budget to unlimited and both time budgets to 2, then I'm back up to ~30 updates per second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you determine that both a time and a task budget achieves the best compromise?
The original issue fixed in #8633 was that tile requests sent to a worker couldn't be cancelled due to tasks being invoked in serial order. This is primarily a concern when loading vector tiles. The animation performance regression reported in #8701 doesn't use vector tile loading; instead it uses GeoJSON sources with repeated setData
. The GeoJSON source has its own mechanism for deduplicating updates (self-sent coalesce messages), similar to how native does it. Over in the original PR, I wrote
I opted for this model instead of a more generic priority queue because we don't have any use cases for a priority queue other than canceling tasks.
However, given that we actually do seem to have differing requirements for worker processing, a priority queue may actually make more sense. Alternatively, we could mark some tasks with an immediate
flag which means they get executed immediately instead of put into the queue. This means that tasks may be executed out of order, but if we always set the immediate
flag for all requests associated with a particular source (e.g. GeoJSON sources), this may be fine.
import {serialize, deserialize} from './web_worker_transfer'; | ||
import ThrottledInvoker from './throttled_invoker'; | ||
|
||
import type {Transferable} from '../types/transferable'; | ||
import type {Cancelable} from '../types/cancelable'; | ||
|
||
// Upper limit on time in ms, the actor allocates towards flushing the task queue per frame | ||
const MAIN_THREAD_TIME_BUDGET = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're measuring time in integer milliseconds, so a limit of 1ms very coarse.
I thought about type-based categorization too, but GeoJSON sources aren't always used with light-data-fast-updates use case — e.g. the GeoJSON might be pretty heavy while never updating, making it similar in behavior to vector tiles. |
Honestly, animation performance in IE11 is terrible by default anyways, so the best heuristic might be detecting the underlying platform, all modern browsers can cancel
The task budget is for IE11 cancellation, in my testing no modern browser (
I started out with a time budget only, but I found out modern browsers never hit it ,but IE11 did hit it rather inconsistently but it wasn't leaving enough tasks in the queue to allow for any decent amount of request cancellation. I then added the task budget and tuned it until I was able to get to a compromise between request cancellation and animation updates. The tiome budget mostly prevents some main thread hitching on IE11 afaict.
|
I think there is a lot we could do to improve queuing, cancellation, prioritization but I think we should focus more narrowly on fixing the regression (if it exists) so that we can get the beta out. From the numbers here it's not clear that something regressed in IE.
The only way you cancel a request before it starts is by delaying it. What these numbers suggest to me is that we are delaying requests less. This is a good thing? #8633 tried to fix the problem that requests that were already cancelled were initiated anyways because the message that they weren't cancelled hadn't been opened yet. That was happening because workers do large tasks that delay for long chunks of time. My guess is that that is not what is happening now. My guess is that the delay in 1.5.0 was much longer than the time it took to open all the waiting messages. Zooming at the wrong speed in any browser will trigger a lot of requests and cancellations. If IE was striking a better balance in 1.5.0 it seems it was semi-random rather than something intentionally tuned to delay requests during rapid zooming. If we need to fix the rapid zooming / cancellation problem now I think we should look into:
|
Closing since #9031 fixes this! |
This PR tries to achieve some middleground between the issues reported in #8998 and #8913.
It tries to do this with 2 main changes:
ThrottledInvoker
usesrequestAnimationFrame
instead ofMessageChannel
in order to reduce latency for animations.I've tried to adjust the batch sizes on the main-thread and worker side so we can have a balance between being able to pre-empt and cancel tasks, while maintaining enough throughput for smooth animation.
Testing with IE11, which needs pre-empting the most, the number of requests sent out for the tile-cancellation debug page went down to ~60 from the ~120 on
1.5.1-beta
. But its still higher than the ~25 on1.5.0
.Animation performance on
Chrome
is roughly the same for me.@msbarry , would really appreciate if you could take a look at this as well!
Launch Checklist