Overfetching on `union` types with conditional fragments in query operation #2759

rhino88 · 2023-08-31T15:37:36Z

Issue Description

When including conditional fragments on a union field the resolvers for fragments appear to be requested unconditionally from the router.

In the example below, the resolver for Product#c is invoked when the conditional fragment for ... on ComponentC is not needed given a response without any instances of ComponentC from the card service. Removing the conditional fragment for ... on ComponentC prevents the Product#c value from being requested, as expected.

Our production router is currently using v1.20.0 but we also confirmed that this issue exists on v1.28.0. This also impacts the gateway as seen in the reproduction link.

Subgraph A Schema(Card)

 extend schema
      @link(
        url: "https://specs.apollo.dev/federation/v2.3"
        import: ["@key"]
      )

    type Query {
      card: ProductCard
    }

    type ProductCard @key(fields: "id") {
      id: ID!
      components: Component
    }

    type Product @key(fields: "id") {
      id: ID!
    }

    union Component = ComponentA | ComponentB | ComponentC

    type ComponentA {
      a: String
      product: Product
    }

    type ComponentB {
      b: String
      product: Product
    }

    type ComponentC {
      c: String
      product: Product
    }

Subgraph B Schema (Product)

     extend schema
      @link(
        url: "https://specs.apollo.dev/federation/v2.3"
        import: ["@key", "@interfaceObject"]
      )

    type Product @key(fields: "id") {
      id: ID!
      a: String!
      b: String!
      c: String!
    }

Query

query Foo {
  card {
    components {
      __typename
      ... on ComponentA {
          product {
            id
            a
          }
        }
        ... on ComponentB {
          product {
            id
            b
          }
        }
        ... on ComponentC {
          product {
            id
            c
          }
        }
    }
  }
}

Query Plan

QueryPlan {
  Sequence {
    Fetch(service: "card") {
      {
        card {
          components {
            __typename
            ... on ComponentA {
              product {
                __typename
                id
              }
            }
            ... on ComponentB {
              product {
                __typename
                id
              }
            }
            ... on ComponentC {
              product {
                __typename
                id
              }
            }
          }
        }
      }
    },
    Flatten(path: "card.components.product") {
      Fetch(service: "product") {
        {
          ... on Product {
            __typename
            id
          }
        } =>
        {
          ... on Product {
            a
            b
            c
          }
        }
      },
    },
  },
}

Response

{
  "data": {
    "card": {
      "components": {
        "__typename": "ComponentA",
        "product": {
          "id": "product1",
          "a": "A1"
        }
      }
    }
  }
}

Link to Reproduction

https://codesandbox.io/p/sandbox/eloquent-christian-kcs989

Reproduction Steps

Run the query above, and notice the C resolver -- BAD and B resolver -- BAD in the terminal output.

The text was updated successfully, but these errors were encountered:

korinne · 2023-09-01T18:05:44Z

Thanks for raising this request! Our team has added the item to our backlog, but it does require a bit more design and thought before we can prioritize it. We will definitely keep this issue updated once it's been officially prioritized, and we are also open to community contributions.

XtofX · 2023-09-01T20:35:35Z

from our stand point, this issue would deserve a better priority that P4 as it affects one fundamental aspect of graphql (prevent overfetching). The resolution of this issue by Apollo is very important to us at Wayfair.

gwardwell · 2023-09-02T02:23:56Z

We're seeing a latency impact from the query planner "popping" fields from union fragments (which should be conditional on the type being present in the actual result set) up into a sequential fetch and over fetching. The reproduction looks something like this:

# Subgraph 1
type Query {
  card: ProductCard
}

type ProductCard @key(fields: "id") {
  id: ID!
  components: Component
}

type Product @key(fields: "newKey") {
  newKey: ID!
}

union Component = ComponentA | ComponentB | ComponentC

type ComponentA {
  a: String
  product: Product
}

type ComponentB {
  b: String
  product: Product
}

type ComponentC {
  c: String
  product: Product
}

# Subgraph 2
type Product @key(fields: "oldKey") @key(fields: "newKey") {
  oldKey: ID!
  newKey: ID!
  a: String!
}

# Subgraph 3
type Product @key(fields: "oldKey") {
  oldKey: ID!
  b: String!
}

# Subgraph 4
type Product @key(fields: "oldKey") {
  oldKey: ID!
  c: String!
}

# Query
query Foo {
  card {
    components {
      ... on ComponentA {
        __typename
        product {
          __typename
          a
        }
      }
      ... on ComponentB {
        __typename
        product {
          __typename
          b
        }
      }
      ... on ComponentC {
        __typename
        product {
          __typename
          c
        }
      }
    }
  }
}

Expected behavior

The expectation is that Router would:

Go to subgraph 1 to get blocks
Go to Subgraph 2 to convert newKey to oldKey
Conditionally go to Subgraph 2, 3, or 4 to get the a, b, and c fields based on the type returned in blocks
- This expectation is because, be definition, the union is of a limited number of unique types without overlap. Regardless of whether underlying fields in the union have overlap, the union members themselves do not and should be treated as unique objects. The expectation is that Router would craft a static query plan allowing it to pivot to the correct union member resolution since they represent unique outcomes.

Here's the expected query plan (or something similar to this is expected):

QueryPlan {
  Sequence {
    Fetch(service: "subgraph-1") {
      {
        card {
          components {
            __typename
            ... on ComponentA {
              __typename
              product {
                __typename
                newKey
              }
            }
            ... on ComponentB {
              __typename
              product {
                __typename
                newKey
              }
            }
            ... on ComponentC {
              __typename
              product {
                __typename
                newKey
              }
            }
          }
        }
      }
    },
    Flatten(path: "card.components.product") {
      Fetch(service: "subgraph-2") {
        {
          ... on Product {
            __typename
            newKey
          }
        } =>
        {
          ... on Product {
            __typename
            oldKey
          }
        }
      },
    },
    Parallel {
Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-2") {
          {
            ... on Product {
              __typename
              newKey
            }
          } =>
          {
            ... on Product {
              __typename
              a
            }
          }
        },
      },
      Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-3") {
          {
            ... on Product {
              __typename
              oldKey
            }
          } =>
          {
            ... on Product {
              __typename
              b
            }
          }
        },
      },
      Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-4") {
          {
            ... on Product {
              __typename
              oldKey
            }
          } =>
          {
            ... on Product {
              __typename
              c
            }
          }
        },
      },
    },
  },
}

Actual behavior

Instead, Router does the following:

Goes to subgraph 1 to get blocks
Goes to subgraph 2 to convert newKey to oldKey. It is at this point overfetching occurs as Router pops field a out of the union and fetches it at the same time as the key conversion. In practice, this is resulting in a latency cost as field a represents a higher latency field that, in most cases, we do not want to fetch.
Goes to subgraph 3 or 4 based on the union return type.

Here's the query plan:

QueryPlan {
  Sequence {
    Fetch(service: "subgraph-1") {
      {
        card {
          components {
            __typename
            ... on ComponentA {
              __typename
              product {
                __typename
                newKey
              }
            }
            ... on ComponentB {
              __typename
              product {
                __typename
                newKey
              }
            }
            ... on ComponentC {
              __typename
              product {
                __typename
                newKey
              }
            }
          }
        }
      }
    },
    Flatten(path: "card.components.product") {
      Fetch(service: "subgraph-2") {
        {
          ... on Product {
            __typename
            newKey
          }
        } =>
        {
          ... on Product {
            __typename
            a
            oldKey
          }
        }
      },
    },
    Parallel {
      Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-3") {
          {
            ... on Product {
              __typename
              oldKey
            }
          } =>
          {
            ... on Product {
              __typename
              b
            }
          }
        },
      },
      Flatten(path: "card.components.product") {
        Fetch(service: "subgraph-4") {
          {
            ... on Product {
              __typename
              oldKey
            }
          } =>
          {
            ... on Product {
              __typename
              c
            }
          }
        },
      },
    },
  },
}

Current Band-aid fixes

I have identified two band-aid fixes for this example. But, to be clear, the do not fix the problem. These are just two ways to try and force the query plan shape to change.

Option 1

If the oldKey in this situation is eliminated, then the query plan reflects the ideal state as there is no sequential fetch that causes a to pop out. It stops being always fetched. This is a challenging remedy as it requires architectural/data changes to support the new identifier.

Option 2

If field a was no longer part of Subgraph 2, but instead was moved to, say, Subgraph 5, then it would achieve the desired query plan effect. However, moving field a to, say, subgraph 3 would result in shifting the over fetching from every request to any request where the response included Component A or Component B. So it's not really a "fix" but just assertion of a small amount of influence over the query plan.

Summary

The crux of the issue is that router is overly eager to reduce subgraph requests at the cost of overfetching. This breaks a core tenant of GraphQL which is only getting what you ask for. The returned shape of the union does not dictate that field a is necessary, yet it is always fetched regardless. And this is only a single example of such behavior.

Ultimately, the Query planner needs to consider fragments on union members as unique units of the operation and should not pop fields out of those fragments. In some cases, yes it will result in additional requests that in some cases could have been avoided. But it's that or over fetching and, IMO, over fetching is the worse of two evils. This is until the query planner is able to include if/else conditions in it's static flow (which is another conversation for another day).

This is why I support elevating this higher than a P4 issue. This really does feel more like a P1 or P2 at the lowest because it will result in UI, UX, and very difficult to determine, quantify, and communicate schema design constraints. I'm already having conversations with teams about what features they need to cut, or what subgraphs need to be broken apart to try and influence the query plan to avoid this overfetching. That's not a concern that should be leaking into the domain separation.

gwardwell · 2023-09-22T13:32:18Z

FWIW, I haven't confirmed this but I imagine the problem could surface for interface fragments as well. That should probably also be considered when solving this problem.

…s that I need to go back and review to see if any are real problems. Also, I think that the initial fetch of `oldKey` in the added test may need to include __typename Refs #2759

rhino88 · 2024-09-11T20:47:18Z

The experimental_type_conditioned_fetching flag should fix this in router 1.46+

dariuszkuc · 2024-09-11T21:46:19Z

👋 looks like we forgot to close this issue with the #2949 PR.

This should be available since fed v2.7.3 and supported by router version 1.46+. Thanks for the reminder!

rhino88 changed the title ~~Router overfetches on union types with conditional fragments in query operation~~ Overfetching on union types with conditional fragments in query operation Aug 31, 2023

korinne assigned trevor-scheer and unassigned trevor-scheer Sep 1, 2023

korinne added P4 An issue that should be addressed eventually. sizing/medium Rough estimation of effort required to implement sizing/large Rough estimation of effort required to implement labels Sep 1, 2023

trevor-scheer self-assigned this Sep 21, 2023

hwillson assigned clenfest and unassigned trevor-scheer Nov 7, 2023

clenfest added P2 An issue that needs to be addressed on a reasonable timescale. and removed P4 An issue that should be addressed eventually. labels Dec 5, 2023

aliosmanisikk mentioned this issue Dec 8, 2023

@interfaceObject error on interface inline fragments #2405

Open

clenfest removed their assignment Jan 19, 2024

dariuszkuc closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overfetching on `union` types with conditional fragments in query operation #2759

Overfetching on `union` types with conditional fragments in query operation #2759

rhino88 commented Aug 31, 2023 •

edited

Loading

korinne commented Sep 1, 2023

XtofX commented Sep 1, 2023

gwardwell commented Sep 2, 2023

gwardwell commented Sep 22, 2023

rhino88 commented Sep 11, 2024 •

edited

Loading

dariuszkuc commented Sep 11, 2024

Overfetching on union types with conditional fragments in query operation #2759

Overfetching on union types with conditional fragments in query operation #2759

Comments

rhino88 commented Aug 31, 2023 • edited Loading

Issue Description

Link to Reproduction

Reproduction Steps

korinne commented Sep 1, 2023

XtofX commented Sep 1, 2023

gwardwell commented Sep 2, 2023

Expected behavior

Actual behavior

Current Band-aid fixes

Option 1

Option 2

Summary

gwardwell commented Sep 22, 2023

rhino88 commented Sep 11, 2024 • edited Loading

dariuszkuc commented Sep 11, 2024

Overfetching on `union` types with conditional fragments in query operation #2759

Overfetching on `union` types with conditional fragments in query operation #2759

rhino88 commented Aug 31, 2023 •

edited

Loading

rhino88 commented Sep 11, 2024 •

edited

Loading