Skip to content

Commit

Permalink
Addressing comments
Browse files Browse the repository at this point in the history
  • Loading branch information
jjoyce0510 committed Dec 22, 2020
1 parent ffc5453 commit 73cd78a
Showing 1 changed file with 36 additions and 26 deletions.
62 changes: 36 additions & 26 deletions docs/rfc/active/graph_ql_frontend/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
- Start Date: 12/17/2020
- RFC PR: <TBD>
- Implementation PR(s): <TBD>
- RFC PR: 2042
- Implementation PR(s): 2044

# GraphQL Frontend (Phase 1)

## Summary

This RFC outlines a proposal to implement a GraphQL specification in `datahub-frontend`. Ultimately, this proposal aims to model the following using GraphQL:

1. Reads against the Metadata Graph
2. Writes against the Metadata Graph
3. Search against the Metadata Graph
4. Browsing of the entities in the Metadata Graph
1. Reads against the Metadata Catalog
2. Writes against the Metadata Catalog
3. Search against the Metadata Catalog
4. Browsing of the entities in the Metadata Catalog

We propose that this initiative take place in phases, starting with CRUD-style read support against the entities, aspects, and relationships comprising the DataHub graph. The scope of this RFC is limited to Phase 1: reading against the graph. It will cover the topics of introducing a dedicated GraphQL endpoint into ``datahub-frontend`` and provide a recipe for onboarding GMS entities to GQL. Subsequent RFCs will address writing, searching, and browsing against the graph.
We propose that this initiative take place in phases, starting with CRUD-style read support against the entities, aspects, and relationships comprising the DataHub catalog. The scope of this RFC is limited to Phase 1: reading against the catalog. It will cover the topics of introducing a dedicated GraphQL endpoint into ``datahub-frontend`` and provide a recipe for onboarding GMS entities to GQL. Subsequent RFCs will address writing, searching, and browsing against the catalog.

Along with the RFC, we've included a proof-of-concept demonstrating partial GQL read support, showcasing `Dataset` and its relationship to `CorpUser`. The following files will be useful to reference as you read along:
- `datahub-frontend.graphql` - Where the GQL Schema is defined.
Expand All @@ -22,22 +22,22 @@ Along with the RFC, we've included a proof-of-concept demonstrating partial GQL
- `datahub-frontend/app/graphql/resolvers` - Definition of GQL DataFetchers, discussed below.
- `datahub-dao` - Module containing the DAOs used in fetching downstream data from GMS

It is important to note that there are some questions that would be best discussed among the community, especially those pertaining to modeling of the Metadata Graph on the frontend. These will be covered in the **Unresolved Questions** section below.
It is important to note that there are some questions that would be best discussed among the community, especially those pertaining to modeling of the Metadata Catalog on the frontend. These will be covered in the **Unresolved Questions** section below.


## Motivation
Exposing a GQL API for client-side apps has numerous benefits with respect to developer productivity, maintainability, performance among other things. This RFC will not attempt to fully enumerate the benefits of GraphQL as an IDL. For a more in-depth look at these advantages, [this](https://www.apollographql.com/docs/intro/benefits/) is a good place to start.

We will provide a few reasons GraphQL is particularly suited for DataHub:
- **Reflects Reality**: The metadata managed by DataHub can naturally be represented by a Graph. Providing the ability to query it as such will not only lead to a more intuitive experience for client-side developers, but also provide more numerous opportunities for code reuse within both client side apps and the frontend server.
- **Reflects Reality**: The metadata managed by DataHub can naturally be represented by a graph. Providing the ability to query it as such will not only lead to a more intuitive experience for client-side developers, but also provide more numerous opportunities for code reuse within both client side apps and the frontend server.

- **Minimizes Surface Area**: Different frontend use cases typically require differing quantities of information. This is shown by the numerous subresource endpoints exposed in the current api: `/datasets/urn/owners`, `/dataset/urn/institutionalmemory`, etc. Instead of requiring the creation of these endpoints individually, GraphQL allows the client to ask for exactly what it needs, in doing so reducing the number of endpoints that need to be maintained.

- **Reduces API Calls**: Frontend apps are naturally oriented around pages, the data for which can typically be represented using a single document (view). DataHub is no exception. GraphQL allows frontend developers to easily materialize those views, without requiring complex frontend logic to coordinate multiple calls to the API.

## Detailed design

This section will outline the changes required to introduce a GQL support within ``datahub-frontend``, along with a description of how we can model GMA entities on the graph.
This section will outline the changes required to introduce a GQL support within ``datahub-frontend``, along with a description of how we can model GMA entities in the graph.

At a high level, the following changes will be made within `datahub-frontend`:
1. Define a GraphQL Schema spec (datahub-frontend.graphql)
Expand All @@ -61,7 +61,7 @@ GraphQL APIs must have a corresponding schema, representing the types & relation
for this purpose:

*datahub-frontend.graphql*
```
```graphql
schema {
query: Query
}
Expand Down Expand Up @@ -89,7 +89,7 @@ User-defined types model the entities & relationships in your domain model. In t
The `Query` type defined above representing a special system type that serves as the entry point into the graph, with each field member representing a query for a particular type of data. Based on the types defined above, the following would be a valid GQL query:

*Example Query*
```
```graphql
query datasets($urn: String!) {
dataset(urn: $urn) {
ownership {
Expand Down Expand Up @@ -280,6 +280,13 @@ For more information about `DataLoaders` see the [Using DataLoader](https://www.

For a full reference implementation of the query execution process, see the ``GraphController.java`` class associated with this PR.

### Bonus: Instrumentation
[graphql-java](https://www.graphql-java.com/) provides an [Instrumentation](https://github.com/graphql-java/graphql-java/blob/master/src/main/java/graphql/execution/instrumentation/Instrumentation.java) interface that can be implemented to record information about steps in the query execution process.

Conveniently, `graphql-java` provides a `TracingInstrumentation` implementation out of the box. This can be used to gain a deeper understanding of the performance of queries, by capturing granular (ie. field-level) tracing metrics for each query. This tracing information is included in the engine `ExecutionResult`. From there it can be sent to a remote monitoring service, logged, or simply provided as part of the GraphQL response.

For now, we will simply return the tracing results in the "extensions" portion of the GQL response, as described in [this doc](https://github.com/apollographql/apollo-tracing). In future designs, we can consider providing a more formal extension points for injecting custom remote monitoring logic.


### Onboarding GMA Models

Expand All @@ -290,12 +297,12 @@ using GQL.

The proposed steps for onboarding a GMA entity, it's related aspects, and the relationships among them will be outlined next.

####1. **Model an entity in GQL**
#### 1. **Model an entity in GQL**

The entity model should include its GMA aspects, as shown below, as simple object fields. The client will need not be intimately aware of the concept of "aspects". Instead, it should be concerned with entities and their relationships.

*Modeling Dataset*
```javascript
```graphql
"""
Represents the GMA Dataset Entity
"""
Expand Down Expand Up @@ -361,12 +368,14 @@ Notice that the Dataset's ``Ownership`` aspect includes a nested ``Owner`` field

To support traversal of this relationship, we additionally include a ``CorpUser`` type:

```javascript
```graphql
"""
Represents the CorpUser GMA Entity
"""
type CorpUser {

urn: String!

username: String!

info: CorpUserInfo
Expand All @@ -375,27 +384,27 @@ type CorpUser {
}
```

####2. **Extend the 'Query' type**:
#### 2. **Extend the 'Query' type**:

GraphQL defines a top-level 'Query' type that serves as the entry point for reads against the graph. This is extended to support querying the new entity type.

```javascript
```graphql
type Query {
dataset(urn: String!): Dataset # Add this!
datasets(urn: [String]!): [Dataset] # Or if batch support required, add this!
}
```

####3. **Define & Register DataLoaders**
#### 3. **Define & Register DataLoaders**
This is illustrated in the previous section. It involves extending `DataLoader` and registering the loader in the `DataLoaderRegistry`.

####4. **Define & Register DataFetcher (Data Resolver)**
#### 4. **Define & Register DataFetcher (Data Resolver)**
This is illustrated in the previous section. It involves implementing the DataFetcher interface and attaching it to fields in the graph.

####5. **Query your new entity**
#### 5. **Query your new entity**
Deploy & start issuing queries against the graph.

```javascript
```graphql
// Input
query datasets($urn: String!) {
dataset(urn: $urn) {
Expand Down Expand Up @@ -459,7 +468,7 @@ Aspects should be modeled as nothing more than fields on their parent entity. Th

For example, retrieving the `Ownership` aspect of `Datasets` is done using the following GQL query:

```javascript
```graphql
query datasets($urn: String!) {
dataset(urn: $urn) {
ownership {
Expand All @@ -475,7 +484,7 @@ query datasets($urn: String!) {

as opposed to the following:

```javascript
```graphql
query ownership($datasetUrn: String!) {
ownership(datasetUrn: $datasetUrn) {
owners {
Expand All @@ -502,10 +511,11 @@ There are 2 primary options:

We're interested to get community feedback on this question!

**Should foreign-keys corresponding to outbound relationships be included in the GQL sche,a?**
**Should foreign-keys corresponding to outbound relationships be included in the GQL schema?**

Using the example from above, that would entail a model like:
```"""
```graphql
"""
Represents an Owner
"""
type Owner {
Expand All @@ -532,7 +542,7 @@ type Owner {
```
We should *not* need to include such fields. This is because we can efficiently implement the following query:

```javascript
```graphql
query datasets($urn: String!) {
dataset(urn: $urn) {
urn
Expand Down

0 comments on commit 73cd78a

Please sign in to comment.