-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subscriptions RFC: Are Subscriptions and Live Queries the same thing? #284
Comments
I think in many cases, developers use subscriptions to approximate live queries, but subscriptions are more powerful and easier to implement. For example, in my case, where I have many microservices on my backend, where some nested fields go to other services, it's not really straightforward to define how live queries would work, and I've chosen explicitly to model things as event streams. Live queries would be a nice abstraction on top, but it's only that – it's not, in the general case, a great backend building block. |
I don't think this discussion should be judging about whether or not live queries are better than subscriptions, just whether they are different enough that they should be considered independently. I think "building block" is a great way to look at it though - subscriptions are a great well-specified unit of realtime data push that can be used to build a lot of other cool stuff. The fact that it's very easy to implement a spec-compliant subscription on the server side is pretty awesome, even if it's not always the thing you want as a client-side developer. |
I'd like to know first what people consider live queries to be. What is the definition? I ask, because I think there are different perspectives or ideas in play here, and thus the discussion can run in unnecessary tangents. So, what is a live query? 😄 Scott |
Here's my impression of a live query in one sentence: "A live query is a query where some or all of the parts can be marked as 'live', and the client expects to receive updates whenever any of those parts would have ended up with a different result if refetched again." In short, they should be a drop-in replacement for polling. |
I'll quote my original definition in the RFC:
Another way @stubailo and I have described it is: "infinitely fast/cheap polling". |
Here is my definition. A live query is a query, which is designated by the client as "live". This designation is passed on to the GraphQL server (one could say it is a subscription). The server then observes for triggers or data input from the underlying data sources needed to fulfill any part of the query. This in turn means any updates from the underlying data sources will be passed to the client automatically via bi-directional communication. Scott |
If you want a demo of a live query, complete with a GraphiQL editor, see http://rgraphql.org Live queries are infinitely more powerful than subscriptions because you can model live, reactive data in a way that efficiently encodes changes all the way from the data source to the browser, into something like react and angular. And it's not true that they cannot be done at scale - is definitely possible with a good enough scheduler / balancer. |
I agree with this.
I also agree with this. However, I think they are significantly different from subscriptions nonetheless, and both have their place in the GraphQL ecosystem. |
Agreed. I don't think live queries are necessarily difficult to do, though, either - the argument I'm arguing against is - "live queries at scale isn't a solved science, so we're going to ignore the concept entirely." |
I don't think anyone is ignoring the concept. But "live queries at scale isn't a solved science" has some truth to it. We hope to share more details in the coming months as we continue to learn from our live query experiments at Facebook. However, assuming live queries work perfectly, we believe live queries and subscriptions are different tools in the real-time API toolbox. |
Yeah I think it's important that the spec proposal doesn't say "this is the only thing we will ever do for realtime data" or even "this is the best way to do realtime data" - it should just say "this is the way of doing realtime data that is understood clearly enough to specify" |
@taion I'm curious to know if you think of subscriptions and live queries as semantically different. Suppose you had both subscriptions and live queries at your disposal, when would you use one over the other? |
That is a great question, but it blows my definition of live queries out of the water. Doesn't it? Hehehe... LOL! 😄 If I may answer too. I think with Christian's (@paralin) rqraphql system, live queries are a server-side and domain specific decision. From what I understand from the rgraphql docs, if you want a live query, the ability to observe for updates is "baked" into one or more resolvers for that query. And, I believe this is where this concept has a general concern (and something still missing in the spec too). It requires a front-end dev to have intimate knowledge of the back-end decisions, as the type of query (live or not) cannot be directly "seen" through introspection, whereas, it should be. Sure, one could add some type of comment, but is that really a good solution for flagging queries as "potentially live" with introspection? The other question that burns in my mind is, how does the server know who to broadcast these updates to? The docs mention killing long running processes. That is only scratching the scaling issue. I guess I am the stupid guy on the fence between these two solutions. I don't think GraphQL should be working with events internally. They aren't needed, as Christians's rgraphql system proves. Yet, I don't think pure live queries, without some sort of subscription system, are also the right solution either. Oh. And just because a live query has a subscription system tagged to it, doesn't mean it can't be called a live query. 😉 Scott |
I do think of them as semantically different. It would be awkward to do a toast notification with a live query rather than an event-based subscription stream, for example. That said, I am mostly using subscriptions as a poor man's live query system, with easier-to-understand semantics on the backend. If I had a reactive backend that supported live queries, I would mostly move to using live queries – but I don't, and I decided it wasn't worth the architectural trade-offs required to do so. Additionally, I expect the majority of users of GraphQL subscriptions as-is to use them to do something somewhat similar to my use case of emulating live queries in an easier-to-implement manner for complex back end systems. |
@smolinari don't you mean my system? :) |
@paralin - Sorry about that. You are right. Me goes correcting Scott |
So are we saying that the difference between a "Live Query" and a "Subscription" is essentially how the updates are pushed? A LQ will send you any updates automatically that effects the original query, whether it be an add/remove/update, a Subscription needs a "manual" push of new data, allowing the programmer to be selective about what updates are sent? |
@Siyfion in rgraphql at least the developer still has fine control over what gets sent to the client, the system just manages getting those changes to the client and applying them properly. The only difference I can see really is that subscriptions are limited to the root level of the query only, and cannot be updated after they have begun. These properties are probably good for when you're subscribing to general streams of events. I wouldn't use it for live data though. Imagine you're trying to build a news feed with comments. What happens if someone edits a comment? Do you just push a event saying it was edited via a subscription? But then all of the logic to apply the updates has to be hand built separately for each of the types of things you might want to update. That seems wrong to me. Instead you can just subscribe to the same streams of data on the server, interpret them correctly, and then send back updates to the client tailored to the data they already have. |
@paralin - How does your system know when to send the news feed updates or rather, to which clients? Scott |
@smolinari that's up to the developer to decide. In Go we have strong concurrency patterns around streams of data, and Magellan supports all of those patterns when resolving fields. When a user subscribes to some live query the server decides how it will fill that query, and the developers code can return many different permutations of result representations, including ones that change over time. |
I missed how this can be done with rgraphql. Can you point me to the docs (or code), where this is explained (done)? Scott |
@smolinari http://github.com/rgraphql/soyuz Not much in the way of docs yet, mostly focusing on optimizing and getting in mutations right now. But the interface is the same as in Apollo. Call query, returns an observable, subscribing to the observable triggers the query to actually be applied. The system merges together the entire tree of active queries into one query object and keeps that in sync with the server. There is a lot of information on how it works in the protocol.md doc under I think Magellan (I'm on my phone right now, apologies for the lack of a link) |
Yup, that's how we would do it.
For us, the subscription payload that gets pushed to the client is the same type as a comment_edit mutation payload, and the client already has logic for updating the comments UI in response to a comment_edit mutation response. In general, on our native clients and in Relay, we have client-side infra that is smart about taking GraphQL responses, sticking them into a GraphQL cache, and updating the UIs accordingly, so it's not actually as bad as you make it sound to add logic to handle a subscription response. |
And yet you have to base every change on mutations. I'm building a app right now that is extremely reliant on outside data - that is, sensor data, position data, connectivity, etc from a large number of sources. To make a mutation to affect every little change to this data would be impossible. This type of live data is something well suited to GraphQL, because the client can subscribe to only what it needs. It's also something that cannot be done with subscriptions in any tractable way. This example I believe reveals that there are actually two types of live data that a GraphQL user might want to have: streams of updates to individual fields, along with batch updates as a result of measurable transactions. I believe this is the best argument yet for building two different live mechanisms into GraphQL. |
Just catching up on everything in this thread. I'm seeing two general questions being discussed here: Re: (1), we believe based on experience at Facebook and discussions with other folks that the general problem of implementing live queries at scale is not easy. This doesn't mean that it is always hard; with an efficient reactive backend, implementing live queries becomes fairly straightforward. As @taion mentioned, though, some folks might have "many microservices on [the] backend." Some might have tens or even hundreds of different DBs and services backing the data in their GraphQL schema. The general problem of moving all of the backing data for a GraphQL schema to a reactive backend is quite challenging. However, I think we're getting off-topic by focusing on question (1). The more relevant question for this RFC is (2). Based on my experience working with a bunch of Facebook product teams building real-time features and rolling out GraphQL Subscriptions at scale over the past two years, I believe that the answer to question (2) is yes. We've seen cases where product folks explicitly design their real-time experience around events. They need control over things like which specific events get priority when the rate is too high to deliver all updates. @paralin said previously that "Live queries are infinitely more powerful than subscriptions." I'm not sure if I agree with this, and I'm also not sure that it's useful to debate the meaning of "powerful" (super relevant talk: https://www.youtube.com/watch?v=mVVNJKv9esE) but one thing I will say about subscriptions is that they put more control into the hands of the product developers over which updates they'll receive. We have also seen examples that lend themselves nicely to live queries, and some people in this thread have mentioned examples of that sort. Internally, we are still experimenting and working with product teams to arrive at a general understanding of which use cases are better served by subscriptions and which are better served by live queries, but we are confident that the former is not an empty set. |
@laneyk Agreed in full. I don't dispute that I've been overstating the worth of live queries a bit, primarily because I'm passionate about seeing them considered due to their value in my particular niche applications. I don't believe that live queries are the only way to do it, just that they are an effective mechanism in a lot of small to mid scale applications. It makes sense that live data and events would have very different mechanisms. |
This is one thing that I see consistently in the design of GraphQL. Besides the debate between live queries vs. subscriptions it may be worth thinking about this client-developer-control as a key design point of GraphQL. If you think about mutations, they require a lot of work on the client developer’s side to update the cache. This is a problem that Apollo Client, Relay, and any future GraphQL clients will struggle with. A lot of GraphQL beginners really want mutations to be “magical.” They want to send a mutation to submit a comment and have that comment be automagically inserted into their pre-existing list with zero boilerplate, but GraphQL wasn’t designed to be magical it was designed to be practical. In its practicality GraphQL tries to enable both the server and the client developer as much freedom and flexibility to work in and around the query language without over-prescribing. The server developer may require a token in an HTTP header, or return a JSON blob as a scalar field. The client developer may implement super custom updates to their data based on a mutation or subscription which takes into account variables only the client knows, like a local priority based on what screen the user is on. However, this practicality comes at the cost of some higher-level “magic” features that would make development much faster such as live queries or zero boilerplate mutations on the client. I like that GraphQL has chosen to be practical. It’s the same choice React has made whereas Angular has chosen the “magic” route. If you want magic in the data API space I heavily encourage you to check out Falcor. Unlike GraphQL, Falcor’s design is optimized for some of these magic features like live queries and simple mutations that people would like to see (Albeit you probably won’t get any magic from Falcor in its current form, but I think the design is there. Also forget about the fact that Falcor doesn’t have a schema! You could easily write a version of Falcor with static types and get the same GraphiQL experience). What do you think? Do you see the same consistent choice in design decisions? Do you agree that live queries are a “magical” feature? My point isn’t so much to argue for-or-against live queries (or even for-or-against magic!), I just wanted to make an observation about the design of GraphQL that I’ve noticed from time to time 😊 (since it was mentioned this talk is amazing https://youtu.be/mVVNJKv9esE and its concepts apply to this observation as well) |
@calebmer You don't need to have a feature in the spec to build it. Projects like mine that add real-time to GraphQL operate with GraphQL in its current form, and declare their own rules as to how data is handled. Therefore they are derivative of GraphQL and perhaps compatible while not GraphQL in their own sense. GraphQL definitely can support these types of things, and I believe it's productive to at least discuss inside the bounds of GraphQL without deferring to other products entirely. Your point absolutely holds - GraphQL's spec doesn't really need to have real-time built in. It would be nice, but it would always be labeled as an optional feature anyway. Maybe it's best to leave these features to derivative projects to define, with loose guidelines in the spec? I believe subscriptions should be in the spec for sure, but real-time maybe not. That talk's really good and definitely applies here, thanks for the link! |
It seems like a lot of the discussion has been about things outside of
graphql (how upstream implements events as per @laneyk, but also the
semantics encoded within events, eg update vs new state). Is there any
difference in the (external) behavior between what has been proposed and
live queries beyond how the request is interpreted?
As best I can tell subscriptions (with a bit of hand waving) represent a
subset of one or more possible live query specs in that subscription is
based solely on the arguments passed to the root, whereas a live query will
use the whole query.[1] Everything else is either transport level stuff
(subscription, errors, etc) and so common between the two, or event
semantics (new state, update etc) and opaque to graphql.
[1] I'm cheating by ignoring any details how to design a specific live
query syntax, complexity of implementation, etc, since it's moot to my
point, as well as anything about the client updating their query
|
Ahhhhh. Very interesting. The logic to show a different value than what is persisted sort of eludes me. I guess I can't question FB's business decisions in the end. So, I won't. 😄 I also still say the event based solution is an implementation detail on how to make data change events reactive. But, I digress on that too. Thank you so much for your patience. I'll be very much looking forward to the reference implementation and learning and hopefully also helping a whole lot more in the future. Scott |
Maybe another way to think about it is that event-based subscriptions are one option for live query implementation, but in that context they're a transport-level concern. By contrast, for an event stream, the subscriptions actually do map to what's logically happening. The "like count" thing is an interesting example, because visually it resembles live queries, so I'd argue that it's closer to a workaround over real reactivity there being really, really hard – but having tried to build conceptually similar things on our end with subscriptions, it's a very defensible one. |
@taion: I agree that likes is not the best example to talk about subscriptions in this particular discussion since it's something that is probably a better fit for a live query (assuming both options exist). |
One example where product requirements might specifically dictate events over live queries is something like Twitter's timeline, which shows a badge for new updates rather than immediately displaying new updates – if the user's about to interact with a timeline entry, you don't want to bump the timeline down in an unsolicited manner and make the poor user retweet the wrong thing, or something like that. |
@taion live queries still apply there, you would just restrict the query to never add new entries without an explicit argument change. |
What's funny is that the same argument plays out over and over again. For example, Redux, at its core, is event driven, although they call them actions. It gives you a structure for producing a live view of your state in the form of its reducers and selectors. MobX has you mutate your live model directly, and to the extent that events need to trigger processes, you need to handle that in your mutation logic. There are strong reasons to build systems around the changing data itself. You don't have to worry about accounting for all the causes. There are strong reasons to build systems around events. Sometimes, user experience does care about the causes. Events can be depicted in a live query model by having field that will be the most recent event or Likewise, subscriptions can support live queries by pushing the full state (or changes thereof) in every event. The event becomes "your data has changed". Also awkward to set up, use, and optimize. I think it's probably a good idea to have first-class approaches to both paradigms, even in the same application. |
I think we're in general agreement there, and most of us are targeting live queries. The core issue is just that live queries, even at a schema level, require making more decisions – e.g. do you use something like JSON patch to communicate the updates? Or if not what do you use? Right now a number of implementations mock live queries with polling, but I think a general solution requires the kind of general consensus on how to push live query updates that does not yet exist. |
What events can happen that aren't persisted? If they aren't persisted, can they be? If they can be, and I know they can be, then those events can be "triggered" over live queries. Right? Can every live query be modeled into an event system? Sure they can. But, then you'd be building another separate system. I've seen this done for MongoDB in many ways for example. So it is clear the want for live querying is relatively large. Why is that? Obviously too, only databases that can send live query messages can be used in a proper live query system. Otherwise, you are back again to needing a messaging/ queuing events system/ bus, etc. I can understand why FB went with events. AFAIK, they don't have databases that support live queries. But, maybe they should? If they did, I bet this whole discussion and any solutions would get a whole lot easier. 😉 Scott |
Any sort of stream data – trades, clickstream, &c. aren't nicely modeled by live queries and would have to be emulated there. |
@smolinari If you have a 15 minutes, check out the other issue I made above your comment. I'm increasingly convinced that all the pieces needed for a live query system more or less already exist in today's subscriptions. Although, since it's so far just been a big thought experiment, some details might be missing. |
There was a recent talk on Live Queries at GraphQL Summit by @rodmk, one of the engineers who works on the Live Query system at Facebook. I think it addresses several of the recurring questions in this thread. https://www.youtube.com/watch?v=BSw05rJaCpA. |
@acjay - Absolutely. I never doubted GraphQL's capabilities to accommodate Live Queries. My whole argumentation here was to say that the added event driven system to make subscriptions work is basically unnecessary for (proper) GraphQL, because it can and should support live queries and that is the better answer to subscriptions and state management. Maybe my thinking was a bit ahead of its time???? 😄 @robzhu - Hah! Wow! Excellent video! Rodrigo demonstrates everything I've been trying to get across here. I'm all giddy now. 😛 And no, I don't mean to say, "I told you so.". 😄 I do still get FB's need to not go straight away with a live query solution, because of FB's legacy systems, which Rodrigo also mentions. (i.e. you can't rewrite all of the PHP code.) It demonstrates how FB's own internal issues drive directions in its open source projects and that is all fine and dandy, as a lot of dev shops out there will have those same kinds of issues. But there are also those, who are starting anew and want the best they can get too and Live Queries are the better/ simpler solution, granted only with a true reactive data store. I've enjoyed this whole discussion and I'd like to thank you all again for the opportunity. Scott |
@taion I just re-read #284 (comment), and now I think I get exactly what you mean. And from my side thread, my conclusion to the title question, "Are Subscriptions and Live Queries the same thing?" is now "yes", qualified only by the need to answer the question of how to send updates. In the best case scenario, those semantics can be defined at the spec level, leaving very little to be decided by library and application developers. But, what if there's no natural one size fits all solution to describing updates? Much as @robzhu, since you closed this ticket with the opposite conclusion, namely that live queries should be something separate from subscriptions, I'm curious whether this would address your concerns. |
The spec thing sort of is the thing, though. We were more or less able to ship subscriptions as of v0.4.8 that added support at a parsing level. The v0.10.0 release that changed the API to add first-class support – that was very, very nice from an API perspective, but ultimately didn't amount to much more than a minor API refactor: https://github.com/edvinerikson/relay-subscriptions/pull/39/files By contrast, contra @rodmk, I can't see how to nicely implement live queries in a way that lets me handle lists efficiently, without pushing down the entire list every time the query updates, without some additional spec-level support. A subscription is so similar to a mutation from the schema perspective. A mutation isn't. There is another distinction, too. Ultimately it's not that awkward to subscribe to add, delete, and change events. Doing something like Twitter's "new tweets" alert (instead of reactively showing new tweets) with subscriptions is... possible, but extremely annoying. And there are cases where you either want to or have to ship updates in that manner (e.g. we're doing HIPAA-related stuff, we may want to only indicate the availability of new data, rather than pushing down new private-ish data to the client... ). |
I'm not sure if my point isn't clear, or if I'm missing something you're saying. I think we agree that lists would seem to be the trickiest data type for coming up with a globally accpetable scheme of representing updates. But do you get my point in analogizing that with the scalar situation? The handling of custom scalars is one of the more interesting (and initially confusing) parts of GraphQL to me. The spec basically completely punts on anything having to do with how they're represented. They're just dumb leaf data. It's up to the client and server to determine the convention for their representation. This is great, because it avoids clogging up the spec with arbitrary choices for things like dates and times. Can't the same approach be used for the representation of updates, since there are several reasonable approaches? On a really simplified level, the server needs to implement some function
Yeah, I get that, but for reasons I think everyone agrees with, the event approach just isn't a great fit for every application. I'm just trying to say, I don't think it's actually that much more complex to do live queries using the exact same mechanism as has been built for events, with really just one additional concept of what I might call "modular update representation". I hope this makes my point clearer, and sorry if I've misunderstood what you're trying to say. |
What you're saying makes sense. The distinction I was drawing was that, with subscriptions, there was an "obvious" choice of the semantic GraphQL payload to send back to the client that exactly matches what things look like with a mutation. The issue with live queries (esp. lists) is exactly as you say – the specific implementation needs to define its own format to use for encoding deltas, which is a problem that didn't arise with event subscriptions. It's just more stuff to decide for the app developer. |
Just to throw in what I've been understanding as a live query, which seems to be different to the discussion here and even a bit to what Rodrigo explained too, but I believe live queries shouldn't return whole datasets or deltas of the changed data, but rather only send a trigger to the client to re-request its "affected" query again. That way, the back-end can stay fairly dumb, because the client is the one asking for the new data through the particular query and only the updated data gets "pulled" back into the client. Does that make any sense? Scott |
That sucks because the server has to look everything up again and can't keep any context in memory. You're getting caught up in the implementation details. There are a lot of ways of accomplishing this. Two way socket, pub-sub change notification channel, long polling, merkle tree data hash comparison and state sync, server-side in-memory meekle tree result caches.... |
Re-reading the thread now, I have not found compelling arguments for why the answer to this question is "yes". To quote from Rodrigo's presentation at GraphQL Summit, "Live Queries observe data, subscriptions observe events" For example, suppose you had a server-side clock that tracks the current time. The current time has two interesting properties: the value itself, and when it "ticks". If you want to observe the current time, use a Live Query. These are (awkwardly) isomorphic because you can always record the set of events in a list and observe that list. For example, you can use a CQRS-style log, but it seems silly to have a CQRS log for seconds in the day. Another angle: a Live Query is essentially a Query. You can poll any Query to simulate its behavior as a Live Query. By contrast, polling a Subscription (where the subscription does not have a stateful channel between polls) doesn't make sense. Hope that communicates my current thinking. I'm not seeing the recent arguments cover new ground, so I'm inclined to keep the issue closed, but please let me know if I'm missing some context. |
@robzhu summarizes it nicely. It's easily possible to add an events (subscriptions) implementation on top of whatever live query system, and it's also probably possible to make a live query system using subscriptions as some kind of awkward transport. At the end of the day data is data and the way you transfer it depends on what you want to do with it and how often it changes. |
@paralin But the point I'm trying to make is that if we can "forget" for a minute that |
@acjay Those two things that you just described - including "modular scheme for representing those updates" - is a live queries system. There's no reason to use a subscription channel as your transport for a live queries system. It adds nothing over just a websocket transport. Therefore a subscription channel is not suitable for live queries, as well. It's suitable for the event-based paradigm, which was what it was designed for. I built a prototype of an efficient live-queries system with magellan and it doesn't look anything similar to the subscriptions system - for performance I binary encode and batch changes to different parts of the result tree, which wouldn't be possible via a subscriptions channel anyway. |
@paralin Maybe I'm missing something, but if the assumption is that a web socket server could simply choose to interpret a vanilla |
@acjay I think that's exactly right. A minimum (not especially efficient) implementation could just hold onto the full query and re-run the entire thing and push the results down to the client every time it gets an update. That's in fact how I read the "call to make a prototype" bit at the end of @rodmk's talk. |
@taion @acjay I would struggle to call that a live query system at all. As we're discussing what a real implementation of something like that would look like, or in essence trying to figure out what the "best approach" would be, I'm not really considering hacks like sending the entire state over a subscription channel as a "live query system." You can do the exact same thing with just a websocket and a server-side polling [run query, check if changes happened, wait 3 seconds] loop, and remove the entire graphql stack. In that way it's not useful to have the subscriptions stack in the mix at all for something like this. It is for this reason that I would say that the two things are entirely separate and should be treated as such. I went and watched Rodrigo's talk and while I would argue that saying Subscriptions and Live Queries are interchangeable is misleading, he is right in that you can build almost any application with either approach. One approach will just be better for certain types of things than the other. |
Live Queries observe "data store events", i.e. record creations, updates and deletes. Also, those data store events could be due to other events. Scott |
Re-define Live Queries? Can Live Queries make Subscriptions unnecessary?
@smolinari @paralin @laneyk @dschafer @taion @Siyfion @jamesgorman2 @leebyron
Continuing the conversation from: #267 (comment)
The text was updated successfully, but these errors were encountered: