Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overfetching on union types with conditional fragments in query operation #2759

Closed
rhino88 opened this issue Aug 31, 2023 · 6 comments
Closed
Labels
P2 An issue that needs to be addressed on a reasonable timescale. sizing/large Rough estimation of effort required to implement sizing/medium Rough estimation of effort required to implement

Comments

@rhino88
Copy link

rhino88 commented Aug 31, 2023

Issue Description

When including conditional fragments on a union field the resolvers for fragments appear to be requested unconditionally from the router.

In the example below, the resolver for Product#c is invoked when the conditional fragment for ... on ComponentC is not needed given a response without any instances of ComponentC from the card service. Removing the conditional fragment for ... on ComponentC prevents the Product#c value from being requested, as expected.

Our production router is currently using v1.20.0 but we also confirmed that this issue exists on v1.28.0. This also impacts the gateway as seen in the reproduction link.

Subgraph A Schema(Card)
 extend schema
      @link(
        url: "https://specs.apollo.dev/federation/v2.3"
        import: ["@key"]
      )

    type Query {
      card: ProductCard
    }

    type ProductCard @key(fields: "id") {
      id: ID!
      components: Component
    }

    type Product @key(fields: "id") {
      id: ID!
    }

    union Component = ComponentA | ComponentB | ComponentC

    type ComponentA {
      a: String
      product: Product
    }

    type ComponentB {
      b: String
      product: Product
    }

    type ComponentC {
      c: String
      product: Product
    }
Subgraph B Schema (Product)
     extend schema
      @link(
        url: "https://specs.apollo.dev/federation/v2.3"
        import: ["@key", "@interfaceObject"]
      )

    type Product @key(fields: "id") {
      id: ID!
      a: String!
      b: String!
      c: String!
    }
Query
query Foo {
  card {
    components {
      __typename
      ... on ComponentA {
          product {
            id
            a
          }
        }
        ... on ComponentB {
          product {
            id
            b
          }
        }
        ... on ComponentC {
          product {
            id
            c
          }
        }
    }
  }
}
Query Plan
QueryPlan {
  Sequence {
    Fetch(service: "card") {
      {
        card {
          components {
            __typename
            ... on ComponentA {
              product {
                __typename
                id
              }
            }
            ... on ComponentB {
              product {
                __typename
                id
              }
            }
            ... on ComponentC {
              product {
                __typename
                id
              }
            }
          }
        }
      }
    },
    Flatten(path: "card.components.product") {
      Fetch(service: "product") {
        {
          ... on Product {
            __typename
            id
          }
        } =>
        {
          ... on Product {
            a
            b
            c
          }
        }
      },
    },
  },
}
Response
{
  "data": {
    "card": {
      "components": {
        "__typename": "ComponentA",
        "product": {
          "id": "product1",
          "a": "A1"
        }
      }
    }
  }
}

Link to Reproduction

https://codesandbox.io/p/sandbox/eloquent-christian-kcs989

Reproduction Steps

Run the query above, and notice the C resolver -- BAD and B resolver -- BAD in the terminal output.

@rhino88 rhino88 changed the title Router overfetches on union types with conditional fragments in query operation Overfetching on union types with conditional fragments in query operation Aug 31, 2023
@korinne
Copy link
Contributor

korinne commented Sep 1, 2023

Thanks for raising this request! Our team has added the item to our backlog, but it does require a bit more design and thought before we can prioritize it. We will definitely keep this issue updated once it's been officially prioritized, and we are also open to community contributions.

@korinne korinne added P4 An issue that should be addressed eventually. sizing/medium Rough estimation of effort required to implement sizing/large Rough estimation of effort required to implement labels Sep 1, 2023
@XtofX
Copy link

XtofX commented Sep 1, 2023

from our stand point, this issue would deserve a better priority that P4 as it affects one fundamental aspect of graphql (prevent overfetching). The resolution of this issue by Apollo is very important to us at Wayfair.

@gwardwell
Copy link

We're seeing a latency impact from the query planner "popping" fields from union fragments (which should be conditional on the type being present in the actual result set) up into a sequential fetch and over fetching. The reproduction looks something like this:

# Subgraph 1
type Query {
  card: ProductCard
}

type ProductCard @key(fields: "id") {
  id: ID!
  components: Component
}

type Product @key(fields: "newKey") {
  newKey: ID!
}

union Component = ComponentA | ComponentB | ComponentC

type ComponentA {
  a: String
  product: Product
}

type ComponentB {
  b: String
  product: Product
}

type ComponentC {
  c: String
  product: Product
}
# Subgraph 2
type Product @key(fields: "oldKey") @key(fields: "newKey") {
  oldKey: ID!
  newKey: ID!
  a: String!
}
# Subgraph 3
type Product @key(fields: "oldKey") {
  oldKey: ID!
  b: String!
}
# Subgraph 4
type Product @key(fields: "oldKey") {
  oldKey: ID!
  c: String!
}
# Query
query Foo {
  card {
    components {
      ... on ComponentA {
        __typename
        product {
          __typename
          a
        }
      }
      ... on ComponentB {
        __typename
        product {
          __typename
          b
        }
      }
      ... on ComponentC {
        __typename
        product {
          __typename
          c
        }
      }
    }
  }
}

Expected behavior

The expectation is that Router would:

  • Go to subgraph 1 to get blocks
  • Go to Subgraph 2 to convert newKey to oldKey
  • Conditionally go to Subgraph 2, 3, or 4 to get the a, b, and c fields based on the type returned in blocks
    • This expectation is because, be definition, the union is of a limited number of unique types without overlap. Regardless of whether underlying fields in the union have overlap, the union members themselves do not and should be treated as unique objects. The expectation is that Router would craft a static query plan allowing it to pivot to the correct union member resolution since they represent unique outcomes.

Here's the expected query plan (or something similar to this is expected):

QueryPlan {
  Sequence {
    Fetch(service: "subgraph-1") {
      {
        card {
          components {
            __typename
            ... on ComponentA {
              __typename
              product {
                __typename
                newKey
              }
            }
            ... on ComponentB {
              __typename
              product {
                __typename
                newKey
              }
            }
            ... on ComponentC {
              __typename
              product {
                __typename
                newKey
              }
            }
          }
        }
      }
    },
    Flatten(path: "card.components.product") {
      Fetch(service: "subgraph-2") {
        {
          ... on Product {
            __typename
            newKey
          }
        } =>
        {
          ... on Product {
            __typename
            oldKey
          }
        }
      },
    },
    Parallel {
Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-2") {
          {
            ... on Product {
              __typename
              newKey
            }
          } =>
          {
            ... on Product {
              __typename
              a
            }
          }
        },
      },
      Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-3") {
          {
            ... on Product {
              __typename
              oldKey
            }
          } =>
          {
            ... on Product {
              __typename
              b
            }
          }
        },
      },
      Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-4") {
          {
            ... on Product {
              __typename
              oldKey
            }
          } =>
          {
            ... on Product {
              __typename
              c
            }
          }
        },
      },
    },
  },
}

Actual behavior

Instead, Router does the following:

  • Goes to subgraph 1 to get blocks
  • Goes to subgraph 2 to convert newKey to oldKey. It is at this point overfetching occurs as Router pops field a out of the union and fetches it at the same time as the key conversion. In practice, this is resulting in a latency cost as field a represents a higher latency field that, in most cases, we do not want to fetch.
  • Goes to subgraph 3 or 4 based on the union return type.

Here's the query plan:

QueryPlan {
  Sequence {
    Fetch(service: "subgraph-1") {
      {
        card {
          components {
            __typename
            ... on ComponentA {
              __typename
              product {
                __typename
                newKey
              }
            }
            ... on ComponentB {
              __typename
              product {
                __typename
                newKey
              }
            }
            ... on ComponentC {
              __typename
              product {
                __typename
                newKey
              }
            }
          }
        }
      }
    },
    Flatten(path: "card.components.product") {
      Fetch(service: "subgraph-2") {
        {
          ... on Product {
            __typename
            newKey
          }
        } =>
        {
          ... on Product {
            __typename
            a
            oldKey
          }
        }
      },
    },
    Parallel {
      Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-3") {
          {
            ... on Product {
              __typename
              oldKey
            }
          } =>
          {
            ... on Product {
              __typename
              b
            }
          }
        },
      },
      Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-4") {
          {
            ... on Product {
              __typename
              oldKey
            }
          } =>
          {
            ... on Product {
              __typename
              c
            }
          }
        },
      },
    },
  },
}

Current Band-aid fixes

I have identified two band-aid fixes for this example. But, to be clear, the do not fix the problem. These are just two ways to try and force the query plan shape to change.

Option 1

If the oldKey in this situation is eliminated, then the query plan reflects the ideal state as there is no sequential fetch that causes a to pop out. It stops being always fetched. This is a challenging remedy as it requires architectural/data changes to support the new identifier.

Option 2

If field a was no longer part of Subgraph 2, but instead was moved to, say, Subgraph 5, then it would achieve the desired query plan effect. However, moving field a to, say, subgraph 3 would result in shifting the over fetching from every request to any request where the response included Component A or Component B. So it's not really a "fix" but just assertion of a small amount of influence over the query plan.

Summary

The crux of the issue is that router is overly eager to reduce subgraph requests at the cost of overfetching. This breaks a core tenant of GraphQL which is only getting what you ask for. The returned shape of the union does not dictate that field a is necessary, yet it is always fetched regardless. And this is only a single example of such behavior.

Ultimately, the Query planner needs to consider fragments on union members as unique units of the operation and should not pop fields out of those fragments. In some cases, yes it will result in additional requests that in some cases could have been avoided. But it's that or over fetching and, IMO, over fetching is the worse of two evils. This is until the query planner is able to include if/else conditions in it's static flow (which is another conversation for another day).

This is why I support elevating this higher than a P4 issue. This really does feel more like a P1 or P2 at the lowest because it will result in UI, UX, and very difficult to determine, quantify, and communicate schema design constraints. I'm already having conversations with teams about what features they need to cut, or what subgraphs need to be broken apart to try and influence the query plan to avoid this overfetching. That's not a concern that should be leaking into the domain separation.

@trevor-scheer trevor-scheer self-assigned this Sep 21, 2023
@gwardwell
Copy link

FWIW, I haven't confirmed this but I imagine the problem could surface for interface fragments as well. That should probably also be considered when solving this problem.

@hwillson hwillson assigned clenfest and unassigned trevor-scheer Nov 7, 2023
clenfest added a commit that referenced this issue Nov 30, 2023
…s that I need to go back and review to see if any are real problems. Also, I think that the initial fetch of `oldKey` in the added test may need to include __typename

Refs #2759
@clenfest clenfest added P2 An issue that needs to be addressed on a reasonable timescale. and removed P4 An issue that should be addressed eventually. labels Dec 5, 2023
@clenfest clenfest removed their assignment Jan 19, 2024
@rhino88
Copy link
Author

rhino88 commented Sep 11, 2024

The experimental_type_conditioned_fetching flag should fix this in router 1.46+

@dariuszkuc
Copy link
Member

👋 looks like we forgot to close this issue with the #2949 PR.

This should be available since fed v2.7.3 and supported by router version 1.46+. Thanks for the reminder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 An issue that needs to be addressed on a reasonable timescale. sizing/large Rough estimation of effort required to implement sizing/medium Rough estimation of effort required to implement
Projects
None yet
Development

No branches or pull requests

7 participants