-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(GraphQL support): RFC for GraphQL support in datahub-frontend (Part 1/2) - Queries #2042
Conversation
Pretty comprehensive! Thanks for this great first time PR! |
/* | ||
* Execute GraphQL Query | ||
*/ | ||
ExecutionResult executionResult = _engine.execute(executionInput); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of observability do you get on query execution? metrics about how data was loaded from different backends would be good to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! Luckily graphql-java provides an Instrumention interface which can be implemented to provide custom tracing / tracking behavior.
In practice, we'd need some where to write these metrics if we provided an implementation of the interface. This may differ from deployment to deployment. Perhaps we could provide a simple "logging instrumentation" that logs the query metrics (as opposed to sending them to a remote monitoring service).
Do you think we should include this as part of the initial PoC? I'll include a small section on instrumentation within the RFC :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Would be good to clarify at least the skeleton. We would want to verify batching etc. is working as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an Instrumentation section to the doc, and will update the reference poc PR to include the default TracingInstrumentation. Please take another look!
This is a perfect contribution @jjoyce0510, thanks for this. GraphQL support is one of the items in our roadmap and this is a big and important feature and it's worth a RFC doc. I would appreciate if you can send a formal RFC PR by following our RFC guide https://github.com/linkedin/datahub/blob/master/docs/rfc.md. |
What are the pros and cons of supporting GraphQL using Play versus Node? |
I think @mars-lan @igbopie and @cptran777 should also take a look :) |
This isn't really an "RFC", at least in our terminology. "RFC" to us is basically a design doc / proposal. I'm still happy to discuss over the code, but a if you feel a design doc is warranted, then check out that RFC process :) https://github.com/linkedin/datahub/blob/master/docs/rfc.md EDIT: Ah, there is an RFC doc in here, it isn't just code :) Maybe consider separating them out in to different PRs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't gotten very far, here's a few comments so far. Might be a bit before I return to the review, so I wanted to leave the comments I had.
Thanks for the contribution, @jjoyce0510. This indeed will help modernize the mid-tier. Agree with @jplaisted that we should reach a consensus on the RFC before jumping into actual implementation. |
Hi folks! Thanks for all the activity so far. I'll separate into a RFC PR and a corresponding PoC PR :) I want to call out that the code changes are not supposed to represent the final iteration, more as a tool of illustration for the concepts discussed in the doc. A side note, I think it is convenient to have code changes colocated with the RFC for easy reference... John |
|
||
## Summary | ||
|
||
This RFC outlines a proposal to implement a GraphQL specification in `datahub-frontend`. Ultimately, this proposal aims to model the following using GraphQL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also add a simple architecture diagram illustrating the relationships between different blocks mentioned here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not completely following... Are you suggesting something like:
- Reads --> 2. Writes --> 3. Search --> 4. Browse?
To depict the sequencing of work suggested here?
Minor nit but for the OP it uses markdown and You can also just post with empty boxes, and then check the rendered boxes in GitHub!
|
I agree that seeing the proof-of-concept code actually helped me understand the proposal better. However, it is also true that we would want to just commit the rfc doc, and not the proof-of-concept code. I think it is worth supporting an RFC process where prototype code is available to review (to the extent that it helps with reviewing the rfc). Later, after consensus on the RFC (which might lead to changes in the POC), we can ensure that the code gets split off into an implementation POC, which the author can further refine before making a future PR, while the main "design" RFC gets merged in. Thoughts @mars-lan and @jplaisted ? |
I don't think it hurts to have two PRs at the same time. One for the RFC and one for the PoC. It can help narrow the discussion on each and reduce some confusion (I initially didn't realize there was even an RFC doc here; I just saw code; granted the doc is too big and github won't show it by default). It also helps prevent accidentally submitting the PoC when the RFC is approved. |
I'd seen somewhere that exploring a NodeJS frontend was on the DH roadmap. While I agree that it may be a reasonable direction in the longer term, it would require quite a bit of work just to spin up a NodeJS frontend that achieves functional parity with the existing Play frontend. Specifically, there'd be a lot of effort in simply implementing the DAOs that talk to GMA in Node (since we'd have no auto-generated client binding help from Pegasus). When I realized this, I considered alternate options. I found that we could quite easily extend the existing Play frontend to accommodate GQL via the addition of a single new endpoint. The wonderful thing about adopting GQL is that our API layer becomes technology-agnostic. The client need not be aware of whether a Play server or a NodeJS server is serving the graph. Instead, the client can depend on an API contract and consider the fulfillment of that contract to be opaque, a responsibility of the server. By introducing the intermediate API layer, it becomes trivial to swap the frontend server implementation (so long as clients depend exclusively on the GQL API). So I consider GQL-on-Play to be a natural stepping stone toward a NodeJS frontend, making migration down the road that much easier! With this approach, we will avoid the need to coordinate a "big bang" migration of client side + frontend server logic in lockstep. |
Oh jeez GitHub deleted a super long comment I had somehow. Great :( This one will be more brief as a result. I'd also like us to consider server driven rendering. I'm not sure if that conflicts with GQL or not, I haven't thought about this too much, just at a high level. But I think what this shows is that A) we shouldn't dive head first into GQL yet because we should first have B) some high level vision planning / discussions around where we want the frontend to go, architecture wise. Maybe that ends up being GQL, SDR, something else, a mix of things, etc. I'll chat within the team this week about getting this started. I'm not really on the DataHub team itself (I'm on the GMA team), so I want DataHub folks to lead this conversation. I can just start it. As part of this vision discussion stuff they can consider GQL and SDR as proposals. |
Quick update on my last post: I've set up a small meeting tomorrow to help hand off this proposal as well as my SDR idea to the DataHub team. After that we should have more traction :) |
@jplaisted Thank you for all your time & effort! On your points -- We did consider server-driven UI pattern a bit. I'll provide some general thoughts.. IMO, Server-Driven UI is particularly valuable in the following scenarios
It's important to remember that SDR does not come for free; it adds complexity by virtue of an additional layer of indirection, makes the API less general-purpose, & tightly couples the client + server. I think it may make sense to leverage server driven UI for those particularly pluggable pieces of the experience, for example entity detail pages, eventually. Less reasonable for the more concrete pillars: search, authentication, browsing, account management. In this world, GQL + SDR definitely do not conflict; Server-Driven UI models would be part of our GraphQL schema. (Airbnb does this) So while I think it's worthwhile to consider the possibility of pushing portions of the experience to the server when designing the UI + API layers, I'd prefer to err on the side of simplicity unless its quite evident that the benefits of server driven UI outweigh the marginal cost. I think it's most critical that we design the UI in a way that is conducive to moving toward the server-driven pattern in the event we find it compelling down the road. To me, this is a matter of clearly separating the UI from the data layer within the client app. Interested to hear your thoughts. Are there additional benefits I'm be overlooking or cost I'm over-indexing on? As for the frontend direction more broadly, I agree we should orient ourselves around some shared vision. Please keep me in the loop on what you folks are thinking! :) |
Good response, this is exactly the kind of thing I was looking for, thanks! I'm not set on it either way, and would like to think through it. And to be clear I'm not looking to be a hard blocker on GQL here; mostly I want to present both of these things to DataHub team and once and have them guide things. That's historically what SDR has been used for and marketed that way. I agree we won't use it that way and thus isn't maybe as appealing as it could be. But we do get another benefit that isn't touted: decoupling the frontend from all the business logic and data models. I kind of want to use SDR for exactly the opposite of the "code can be reused on the server" point. Here eBay has a good diagram of how SDR allows them to cut down on copy and pasted code across multiple platforms. We only have web, so yeah, it isn't a win. But in web are many entities (datasets, charts, flows, users, etc). I want to think about sharding all that logic out to different microservices. The frontend seems like a bottleneck today.
It'll for sure be a gargantuan effort, but it'll make the frontend way more extensible, if all you have to do is spin up a GMS, a "GMS FE", and then magically DataHub can render a page for your entity without needing any rebuilds or redeployments. Whether that's worth it, idk. I think its worth discussing, at the very least. Again my main point was more I think these two proposals are pretty philosophical for the DataHub team and they should A) take them over and B) probably do some larger roadmap work to see if / how they fit in. GQL is 100% the lesser of the two projects effort wise.
Not true if you don't add the FE to the GMS and consider it a separate service instead (see above). You'd leave your API untouched. The FE would be some other API/service that really only DataHub would care about; the rest of your metadata backend wouldn't ever want to query it.
I'd be curious to see what this look like, since I don't really get what a query would look like here. The query really is just the URL; quite literally "what is the view model for this page" and the response would always be some list of generic Component or Node. Again, I wasn't sure if they'd conflict or not, but I can't envision it, at least of the top of my head. I did see that AirBnB published Lona, but I haven't delved into it too deep, so I don't know if it uses GQL or not. Would be interesting to see.
Good point. I will say that the eventual pieces here eventually become more simple (web app only has super simple components; "frontend" servers just build view models; things follow SRP); but yes the system overall has way more complexity, sadly.
Agree, but how does GQL help with this? To be clear; here we're talking about not having a DatasetView, but a TableView, and then some other logic that can take a Dataset and make a TableView (logic which could one day be moved to some server), or something similar, right? I'm not super familiar with all our frontend code today, but if its all already tightly coupling presentation logic with models, aren't you just swapping out the query part of that and it all still stays tightly coupled? Edit: I'll stop bogging down your RFC with my ideas :) If the DataHub team is interested enough in this we can start another RFC and discuss there. |
The "decoupled frontend" idea reminds me of the old Fizzy+Dust combo that was used heavy within LI once upon a time..
Is speed of deployment really the metric we should be optimizing for? What about operational simplicity, speed of development, debuggability concerns, UI flexibility / customizability, etc? It seems that we should work to more clearly understand the tradeoffs at hand. I can understand the scalability benefits of the design - in particular how it facilitates distributed ownership & decouples the evolution of GMS verticals. But is this what the community outside of LI really needs right now? Will the longer-term value introduced by vertical distribution of ownership be worth the complexity cost for DataHub operators in the near-term?
This is what I mean. Not only would they not want to query it, they wouldn't be able to derive any value from querying it because from the PoV of anyone outside of the UI federator the API is useless. (Not claiming that is inherently bad, just the price of specialization)
Not sure I'm following this -- Typically you'll have some single-page app acting as a harness to handle things like navigating routes, fetching JSON from the server, and rendering the response. You wouldn't necessarily render the entire app via push from the server. For example, the auth flow would likely make sense to be defined local to client. There can exist a hybrid, where some portions of your GQL schema are modeled to supply a renderer, while others are used for fetching data alone.
It does not!
Astute observation! Precisely for this reason we do not intend to just swap GQL in place of existing model layer. More to come on this theme :) Great discussion! Let's find a dedicated forum to continue. |
@jplaisted: Looks like consensus has been reached on the current shape of the proposal (as it relates to GraphQL) and it has been lingering for a while. |
I am not an expert in GQL and do not have ownership of the frontend or midtier. Please get a team member from that team to do the review, and just so there's no ambiguity, get a Github "approval" from them. I met with the team Tuesday to hand this over, I am no longer sure what the status is. Thanks.
Agree, I went off topic, apologies. |
Also @keremsahin1 and @mars-lan may be good reviewers too (though I'd argue not ultimate approvers, that should be the DataHub (frontend/midtier) team). I know they were investigating GQL before. Edit: @mars-lan has taken one pass, maybe an explicit LGTM would be nice? hint hint :) |
I am supportive of this proposal and think it will move DataHub in the right direction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Dismissing, since the comments have been addressed.
```POST /api/v2/graphql controllers.api.v2.GraphQLController.execute()``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we evaluate having this API at the GMS layer ( or a layer in b/w GMS and frontend) as opposed to in the mid tier ?
This can help other services (apart from frontend) leverage graphql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that a separate service might be less intrusive and can help gradual migration off the current Play frontend. In fact @arunvasudevan from Expedia also proposed a similar approach. WDYT @jjoyce0510?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi people!
I love the idea of working up the stack to implement GraphQL at the GMS layer. I see the key benefit being client language interop. No longer do clients of GMS need to be running on the JVM to be able to easily access the service (via the provided DAOs).
Either way, implementing GraphQL within DataHub frontend will provide the benefit of improving iteration speed on the client side.
I'd love to work with @arunvasudevan and folks from LinkedIn to propose a GMS GraphQL API as well. In doing so, we can figure out what the relation between the proposed frontend Graph and the GMS Graph should look like. Perhaps, the frontend can act as that federator layer, stitching together the downstream GMS service graphs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the Graph to the Play server unlocks iteration on the client side in the short term. If we choose to migrate away from Play, we can do so once we have a mechanism for communicating with GMS without requiring JVM + DAOs, ie we can at that point consider migrating to a Node frontend.
If we try to build the new service today, we will be blocked on progress until the GMS GQL API completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the GQL graph look different as a mid tier or backend? If we just go with this proposal (GQL layer calls existing Restli GMSes), could we easily move the GQL to the backend later? Assuming the actual models for GQL don't change at all, it seems like the frontend wouldn't even notice :)
So I think we can start with this current proposal and build our way out from there. Get the benefits for the frontend now, and worry about benefits to the backend later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how I see it is next step should be to understand the GMS GQL API story. Once we have that we can decide to either:
- Replace the GMS Dao-based access from datahub-frontend Play service with GQL clients
- Create an entirely new Node service to communicate via GQL to GMS
That is, we work towards meeting in the middle! To me this provides the optimal ability to iterate on multiple pieces in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive of GQL in general but would also like to see explicit approval from LI side before merging this in. |
This PR proposes extending the
datahub-frontend
service to implement the GraphQL spec. Thegraph_ql_frontend
RFC outlines the first part of a 2-part proposal to migrate the existing frontend service to GraphQL. Subsequent RFCs will cover mutation, auth, search, browse & more.I've created a reference branch to showcase the changes outlined in this document. A PR from that branch can be viewed here. Notice that the branch has evolved to contain both changes related to GraphQL Queries (RFC 1/2) and Mutations (RFC 2/2)
Checklist
I look forward to engaging with the community around the ideas outlined in the RFC!
John