-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UI: Add EventSource ready for implementing blocking queries #5070
Conversation
It might be a bit difficult to see what this adds to the UI, so thought it might be a good opportunity add some gifs! Real Cluster UsageThe following gif shows you the live updating working at a basic level. Services are deregistered and removed without a page refresh. On pages that are detail pages of removed services, we don't give you a 404, but instead we leave all the information on the page so you can still use it and show you a warning message so you know its been deregistered (thanks to @i0rek for bouncing ideas around for the nicest way to do this for the user). Also notice how once the data is loaded in clicking around services is now super quick, no more loading animation. PerformanceThe next gif shows 5000 services (with a lot of healthchecks :) ) with blocking queries responding with a completely new dataset every 1 second. Every single one of the 5000 services has a change every second. Performance remains entirely usable thanks to the DOM recycling scroll pane, so we avoid jittery DOM as much as possible. Searching whilst receiving blocked updates are just as performant as when not receiving blocked updates (note in the search in the GIF for Using EventSource for other things apart from blocking queriesThis shows a real Consul install with some ACL tokens. We save your current ACL token in your browser, but using EventSource here (as explained above) we sync the information between tabs. Notice the 'Your Token' icon in the second tab change as we change our token in the first tab. Deleting the token in one tab also logs out the user in the other tab (or in another browser somewhere else) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @johncowen, this is looking good. It's commendable that you chose to align with existing standards for what is ultimately a proprietary spec.
I tried to keep my review big picture, so to that end I didn't have much to say inline.
I think that thinking about realtime behaviors holistically like this is the right thing to do, and I think that teeing yourself up to move some of this into web worker(s) eventually is pretty exciting. I'm personally still holding out hope for this to become "not our problem" by way of Ember Data adopting workers for serialization and normalization.
On the topic of web workers, I'm curious to hear more about your performance considerations. Obviously making things fast is better and having a real world benchmark like 10-20k services is a good stick to measure performance against, but when it comes to implementation, how are you determining where the bottlenecks are? When talking about moving things off of the main thread, it only makes sense to do this if the operation is going to block for a long time. This would be things like diffing large datasets or parsing large JSON responses. Your EventSource polling code is not going to block for a long time: it's going to quickly call code that will subsequently block for a long time.
Similarly, Ember Concurrency is not going to block for a long time, since it is just an orchestrator for concurrent tasks. So it doesn't make sense to ever want to move it into a worker. More concretely, in a potential future where store manipulation is handled in a web worker, Ember Concurrency could still be used on the main thread to orchestrate message passing between the worker and the UI.
I'm indifferent towards you/Consul using Ember Concurrency. It'd be nice if all the UIs were using it just so all the codebases had some uniformity, but it's not a hill I'm willing to die on. I just want a clearer conversation about performance and hot paths.
The other yellow flag I have is around abstractions. This was a pretty difficult PR to read through. I think that's partly because there isn't an application of this subsystem yet (outside of some usage in tests and in the PR prose), but I also think it's partly because it's so abstract. Possibly too abstract. It seems like every datatype, from promises to event sources to proxies to caches are configurable/extensible.
When an abstraction aims to be everything it ends up being nothing.
I don't think you're at the point of this abstraction being nothing, but maybe it's something worth keeping in the back of your head while you continue this work. It can be beneficial to put stakes in the ground in your abstraction to avoid either abuse of the system or confusion around how it even works.
But! All in all, this is really nice, and I know everyone is excited for the magic this will bring to the UI.
{ errors: [{ status: '5050' }] }, | ||
].forEach(function(item) { | ||
assert.throws(function() { | ||
backoff(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not passing item
into the call to backoff
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, doh, thanks so much for catching this! First glance I was wondering how the tests were even passing, just had a better look and of course it throws an error if the argument is undefined! Will push a fix in a sec.
dispatchEvent() {} | ||
close() {} | ||
}; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of wrapping this class in a function? You're not closing over anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was probably trying to keep things consistent. The user facing API uses a factory function, and I'm also using a factory function to create my mock promises.
It also means that if at some point I do multiple tests in the same file using the mock EventSource class, I get a fresh instance of the actual class every time (as opposed to an instance based on the class). It's not important right now I don't think, as there's nothing there that would produce a bug due to using the same class instance across tests. But each test is supposed to be isolated, and theoretically this means they are. This reason is probably accidental, I would imagine I was just trying to be consistent, or trying to plan ahead in case I needed to mock and inject anything further to create the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency is fine, but isolation is an incorrect reason. It's a static class and you are only operating on instances of the class. Wrapping the whole thing in a function doesn't give you any further isolation, it just redundantly creates class definitions. If the VM is smart enough, it probably just JITs that wrapping function away anyway.
Hey! Thanks so much for looking over this, appreciate it is a little abstract at the moment so thanks for getting through it. 🙏 If you can think of everything here as an external ember addon/npm module (all of the code here is completely decoupled from the Consul API/UI), hopefully it will help you to follow why I PR'ed it like this. The option is still there to externalize it but I think folks prefer it like this? I've tried to respond one by one to some of your queries below:
Maybe it's just me, but I don't see this happening any time soon.
Right now it's a very informal - load up 100k services and make the blocking query respond every second (like in the GIF above but with way more items), then try to use the UI, so scrolling and searching etc. I then try out different ideas and throwaway prototypes and see if there is any noticeable improvement - I've tried quite a few. I even tried stripping everything down to the simplest implementation I could think of - but nothing had much of an effect. The problem is mainly down to the fact that we are dealing with 100k items possibly changing/being added to/being removed every second - that's the main bottleneck. The solution that will have the largest impact (I hope) is to reduce that number. The chances are only a few properties have actually changed...
Exactly. If So what if we proxy the API responses on the frontend to transparently calculate that diff for us? Basically mold the API responses into something that reduces the bottleneck. Thing is, if we calculate that diff on the main thread, we are just moving the problem around without getting anywhere, trying to mop up with a wet towel. We need a dry towel, another thread, a WebWorker. ServiceWorkers would be another option, but we can't guarantee our users are not running under the restrictions that ServiceWorkers have. As far as I can see WebWorkers don't have the same restrictions.
Exactly right. For me the performance of this
Looking at it from another angle. If you aren't accessing the DOM, it doesn't make sense to run it on the main thread. Ideally all the code on the 'left side' of my Routes (so my Repository code backwards, through the entire model layer) could be off the main thread, it doesn't need to access the DOM at all. But right now that would be too difficult, there are far too many questions regarding the Where I'd eventually like to get is maybe something like: Adapter.extend(
{
query: function(query) {
return resolve(
new EventSource(this.urlForQuery(query))
)
}
}
) ..exactly like native
Completely agree here, and I can definitely understand your concerns. Consistency is hugely important in lots of cases, and I went to great effort to keep to As an aside, you could use But why would you replace an almost one liner with 40kb of pretty hard to grok code that potentially can't be run in another thread? And as I mention above, the important thing here is how you use it, not so much how it works. The usage is as close to both a standard and ember conventions as I could possibly get it right now.
Nothing is configurable if, as a user of the software, you use it as documented: import { source, BlockingEventSource } from 'consul-ui/utils/event-source' Everything is configured behind the scenes in the If we do look briefly at how it works, it's built in a simple, modular, composable way. It prefers smaller components with well defined scopes that can be used together. Behind the scenes (specifically in But, it's about how you use it not necessarily how it works. In order to use it I don't want to have to worry about all this configuring. Pragmatically it wouldn't make sense to have to do all this configuring everytime I want to use it, so we provide a simple way to use the software for the end user, tailored specifically for the environment the user is in. Currently,
Fingers crossed! 😄 Thanks again for looking at it and giving it a critical eye, hopefully the responses here give you a bit more insight into what I'm trying to achieve. Moving forwards I'm not quite sure how to organize my PR for usage, I might branch it off this PR to keep it all together before final merge. Lastly, I'll go over your other comments inline, looks like you caught a doozy there! 🙌 |
Weird coincidence, I just happened upon this (thanks to conversation between @banks and @bosconi ) and thought I'd share if you find it interesting: https://docs.microsoft.com/en-us/azure/architecture/patterns/event-sourcing I've had a brief read myself, but will probably give it a better read a little later on, it also seems to detail where I actually wanted to take this, which is reassuring. Funny how things seem to drop into place. |
Thanks for addressing my concerns. I'm excited to see where this goes 🚢 🍾 |
a78ca09
to
d4e3022
Compare
4e25bce
to
2ac38c2
Compare
… service and node listings (#5267) 1. Add `[]` to relevant computed properties 2. Clean up `index` on the URL 3. Add WithEventSource mixin to auto cleanup EventSources when leaving a view
1. Reenables the settings page 2. Adds a toggle to enable/disable blocking queries at a UI level 3. Ensures that the setting isn't cached in the event source cache 4. Ensure that changing the setting cancels any existing requests 5. Also ties up cancelling and restarting requests on tab switch
related: TLDR; Cancelling a blocking query doesn't (currently at least) do anything in the backend. Cancelling and re-requesting requests would have meant we would be "wasting a lot of resources on the server":
So the below turns from a "nice detail" into a "performance feature"
|
Real-Time Live Updates
Our primary objective for the UI up until recently has been making sure we do our best to upgrade the UI to a more modern platform and provide a better base for adding new Consul features without impacting practitioners' workflows. We upgraded the UI and added several new Consul features to the UI to provide zero day UI support for new Consul features over the last few months.
This PR is the first of many UI features aimed at improving the overall UI experience for our users.
It’s a good place to note here that some of this content turned out to be appropriate for an RFC so it would have been a good opportunity to follow up my original RFC with this. One of the issues here was that it took some prototyping and investigation to find a good way to do this, so I’m hoping to figure out how I can combine this prototyping and investigation with an RFC for similar scoped items to improve upon this process. I’ll note some of my assumptions at the end, please feel free to comment on any of the content, this work will be merged onto
ui-staging
notmaster
so there will be plenty of opportunity to refine.What?
Modern web UI update external data without the user having to hit the refresh button. We have long polling blocking query support in the Consul API, use that to implement real-time live updates for the UI.
As well as implementing real-time live updating support via the Consul Blocking Query API, because we use Ember we try to stick to those conventions such as maintaining conventional
Route.model
hooks. Every Route should contain a model hook. It's what Ember developers expect to see and it makes it clear what data is being used in the view layer.Plus, we have a few other objectives:
How?
I actually feel that how you use this as an Ember developer is more interesting/important than how it works, but it's probably best to begin with this brief overview:
Inspired by the Server Sent Events / EventSource living standard (see the nice breakdown of this and other approaches by our own @banks) and the EventTarget standard. Which is basically a 'tool' to solve the problem of uni-directional, 'response' heavy, frontend/backend API communication. It features transparent reconnects, sending a 'cursor' on reconnect and uses 'straight-forwards' HTTP/1.1.
Sticking to Ember conventions,
EventTarget
is implemented mainly using the EmberRSVP.EventTarget
mixin event-target/rsvp. We add support for 'standard-shaped events' using a small piece of code fromevent-target-shim
. This small piece is added in here as I don't want to include the entire shim as we are trying to use Ember as much as possible.The basis of the approach can be simplified as:
A recursive
Promise
returning function, or if you were using ember-concurrency:Usage
EventSource
The main building block is an
EventSource
implementation. Consul doesn't support Server Sent Events, and our data layer is usingember-data
to abstract away the HTTP API. So, we've created aCallableEventSource
that does exactly the same as anEventSource
, but instead of passing it a URL you pass it a 'callable' (a function) that returns a standardPromise
.The above javascript code will eternally query
ember-data
one tick after it resolves the data - essentially polling. To stop theEventSource
yousource.close()
it. This can be done in themessage
orerror
handlers or anywhere else in your codebase by usingEventTarget
methods. You can add an event listener like you would with any other traditional javascript event emittingEventTarget
.We've added further functionality to the standard
EventSource
interface by composing up classes to add 'reopenable' functionality, and alsocursor
tracking and a 5xx backoff/reconnect that brings ourCallableEventSource
back to an almost drop in replacement forEventSource
, but based on 'callables' so it works well withember-data
abstractions.We've done this in 3 separate units to make it easier to reason about and test.
EventSource
, the main functionality of which is ~30 lines of code.EventSource
, ~10 lines of codeember
specific class 'decorator' to add Blocking Query functionality to anEventSource
- 5xx backoff/reconnect functionality andcursor
/index tracking. Following the JSON-API data structure thatember-data
follows by using ameta
property (see also UI: Prefer the wordcursor
for meta and addconfiguration
option to repositories #5042 and UI: Maintain HTTP Headers from all API requests as JSON-API meta data #4946)As a first step (a simplified view) we end up with (along with one of my
Repository
classes):Note this implementation isn't tied to HTTP, the
ember-data
RESTAdapter
or evenember-data
at all. This means if/when we re-write a newHTTPAdapter
for the Consul UI, we don't necessarily have to revisit the blocking query support we've done here.Using EventSource with Ember conventions
Promises and the
Route.model
hookUsing Ember all your data within your current javascript environment/context is, or can be, bound. You update a property in your
Controller
and your template reflects that, usually without you having to do anything. The key thing here is all your data within your current javascript environment/context. Routes generally retrieve data from outside your current javascript environment/context, whether that be from an API, from aWebWorker
, fromlocalStorage
or from another tab, and generally to change this external data, a user interaction is needed to update/change the URL, fire theRoute.model
hook which waits for a promise to resolve and then re-renders the view.event-source/resolver returns a factory gives you a way to return a
Promise
which resolves on the firstEventSource
message
event or rejects on the very firsterror
event. This is very important as it means we can now useEventSource
s with all our pre-existing routes and continue to use our pre-existing conventional Ember route loading/blocking and error functionality. Furthermore we've wrapped this factory and pre-configured it specifically for Ember, so usage would be:Which hopefully should look familiar. You'll notice I'm passing the
EventSource
straight through to myController
/template. Please note the above is not exactly how I'll be using this within Consul UI, it's just for illustrative purposes.Ember Proxies
source
also wraps theEventSource
in an Ember Object or Array Proxy event-source/proxy that updates thecontent
of the proxy on everyEventSource
message
event. This means you can pass this proxy object through to yourController
/templates and everything works as expected, as if the data was from your current javascript environment/context in a conventional Ember way, when actually it's coming from an external API. We don't need to change anything in ourController
/templates for this to work as expected using Ember conventions. In the above illustration I've changed theRoute
slightly to usesource
andBlockingEventSource
, but I'll be using myRepository
classes instead meaning I won't be changing anything in myRoutes
either. All in all I won't have to change hardly anything on the Ember side of things in the Consul UI for this to work. The app stays pretty much the same as it did before adding blocking query support.Data caching, filtering and avoiding the loss of partly downloaded data
Lastly, instead of creating a new
EventSource
every time we enter a new page, we would like to reuse any pre-existingEventSource
so that we maintain thecursor
, plus it would be nice for a 'reopened'EventSource
to immediately resolve aPromise
with the previous recordset so that theRoute
only blocks/shows the loading animation the very first time you enter that specificRoute
. Therefore we begin to get rid of the rather lovely, hypnotic, but unwanted, loading animation. For this we use event-source/cache to let you store anEventSource
against any string value or key only if theEventSource
is successful and has a validcursor
. This key can be anything unique to yourEventSource
.A nice detail here is that once we've fired off a request to
consul
if we click away from the page and click back again, we don't really want to be continually cancelling and re-sending requests as we click backwards and forwards looking at services and their health details. So, only if the blocking query hasn't responded and yet, we keep the blocking query open. That way if you are constantly clicking between the services page and a few services to monitor them, we aren't continually firing off new requests toconsul
, importantly if you don't revisit the page and the blocking query 'naturally' responds we don't restart the blocking query again until you re-enter that page. This also ensures that if you are mid-download of a large-ish response of say 10,000 services, this doesn't get cancelled midway through the download when you click away from the page.We don't use
ember-data
'sstore.peekAll/store.peek
for caching previous responses at the moment. This is mainly due to the nature of the Consul UI KV listing. The KV listing is a backend filtered listing of KV's, only within the 'folder' that you are in. In order to use thestore
for this we would require thestore.filter
method which could give us only the records for this folder, which would be slightly wasteful re-filtering and is deprecated and there doesn't seem to be a recommended alternative, so I don't want to risk depending on that. We could also manually filter the result fromstore.peekAll
but this also feels a little wasteful whenconsul
is already returning a filtered recordset of the KV's we require. Additionally, if we don't use theember-data
store
it means I don't have to worry about data reconciliation, or syncing/deleting data that exists in theember-data
store
that no longer exists inconsul
, otherwise I would have to do intensive multi-dimensional looping, checking and deleting through potentially thousands of records on the main thread, not something I'd like to do.Despite this I'm guessing that what I'm doing right now could be the slowest option I've thought of, but I've been testing this implementation with recordsets of ~10,000 items and the UI continues to run at a perfectly acceptable speed. Considering I was originally aiming at ~4,000-5,000 items, this 'ain't bad'. Lastly here, I have been experimenting with other things (full Server Sent Events emulation in a WebWorker through reconciliation/delta resolving of API responses outside of the main thread) that potentially could improve this with recordsets of ~100,000 items, my initial tests of this look very promising.
We've also built
BlockingEventSource
so you can give it a function to retrieve previous or current data, this means it should be straight-forward to use the above approach for KV's and say astore
using looping deleting/reconciliation approach for services and nodes (which would be risky for KV's due to the deprecatedfilter
method).Why not
ember-concurrency
There are many reasons why I don't want to use
ember-concurrency
to do this. Both from a 'how it works' and a 'how you use it' point of view. I won't go into all of them here but my main reasons are:Looking specifically at large scale Consul users (eg. 10k+ services) which was a main requirement for this work, experiments I ran showed that the WebWorkers are the answer to providing a smooth experience for these users. ember-concurrency obviously has a very hard dependency on ember which is fine in a DOM environment, but I'm not sure if it’s even possible to run Ember in a WebWorker, let alone have to include ~500kb of code in a worker instead of ~70 lines of code. The Worker interface uses the very same EventTarget interface we've used for this code, so we're really not that far off (and we kind of already have) a working implementation.
http://ember-concurrency.com/docs/testing-debugging/
Given the product requirements, I'm uncomfortable not testing polling / live updates in an automated fashion (using something like
Ember.testing
to skip parts of the code), so not having a clear solution for that was part of the decision not to useember-concurrency
ember-concurrency
alternative.ember-concurrency
was consistently up to 3 times slower (depending on the browser) on Desktop. I didn't try mobile.Short term additions (over the next few days) to this PR include:
Actually using the above in Consul UI (I've only pushed the
EventSource
code for the moment, I've not changed any pre-existing code), I have more to say on the usage so I've decided to split the PRs to keep things as easy to digest as possible.Reasonable test coverage is included here, but this will be expanded upon in further PRs
Browser connection limit workarounds. Accessing the UI over anything but
http/1.1
will multiplex the connection meaning there are no connection limits. Whereas Consul UI installations usinghttp/1.1
require HTTP connection management. Again I've split this out as its not necessarily a direct concern of this PR.Longer term, I'll hopefully coming back to the Server Sent Event WebWorker emulation investigation, but right now this is out of scope so I’ll be looking to write an RFC for this and revisit later in the New Year.
Assumptions
We decided not to do HTTP connection management for users accessing the API over HTTP/2 (and similar multiplexed protocols). Some of our users will be accessing the API via a TLS protected HTTP/2 enabled proxy and therefore shouldn’t have any sort of connection limits applied.
Following point 1, when using HTTP/2 you could potentially open a large amount of blocking queries with the UI, especially if you have a high
wait
setting. We queried this with the rest of the consul team to see if there is any restriction on the amount of blocking queries we should open to consul, and there doesn’t seem to be any restrictions, or reason to restrict.We feel that not cancelling sent requests when the user leaves a page (unless a user as no more connections available when using HTTP/1.1) makes for a better user experience. Partly downloaded data is not wasted and changes that have happened since you left a previous page are immediately visible when you return, it doesn’t require another request to be sent back to the API. Blocking queries will naturally timeout and will not restart if the user isn’t on the page, this means that the amount of open connections will remain reasonably small even when using HTTP/2.
We originally felt that end user might want to change the
wait
query parameter of the API, and also add a frontend timeout/throttle in case of noisy clusters, we are already rethinking whether this is exactly the right approach.Lastly I'll be adding further PR's on top of this once I get some eyes on it, and I'll add references to them in line here also.
Related: HTTP Connection Management #5083
Related: Implementation for Service and Node listings #5267
Addresses Issues:
#4221
#3714