Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Memoize the printing of queries #2787

Closed
keeganstreet opened this issue Nov 8, 2022 · 10 comments · Fixed by #2847
Closed

RFC: Memoize the printing of queries #2787

keeganstreet opened this issue Nov 8, 2022 · 10 comments · Fixed by #2847
Labels
future 🔮 An enhancement or feature proposal that will be addressed after the next release

Comments

@keeganstreet
Copy link

keeganstreet commented Nov 8, 2022

Summary

✌️ @jian-ong

We note that the formatDocument util uses memoization to avoid walking over the same query AST multiple times:

https://github.com/FormidableLabs/urql/blob/30ecefe9f69a4c33849f6c9c43bf5083b35878c2/packages/core/src/utils/typenames.ts#L64

In our application we have observed a similar bottleneck with the print function from the graphql module. Many pages use the same query with different variables, but they are all having to print the exact same query.

Proposed Solution

We propose adding a Map to the makeFetchBody function to memoize printing of GraphQL queries in a similar way to the formatDocument util.

Example implementation:
7edb1e2

We are keen to hear your feedback on this. Are there other use-cases where this would be a problem, for example queries being constructed dynamically at runtime, resulting in different query keys, or is that not a supported use-case?

@keeganstreet keeganstreet added the future 🔮 An enhancement or feature proposal that will be addressed after the next release label Nov 8, 2022
@JoviDeCroock
Copy link
Collaborator

The thing that I fear here is exchanges that alter the query i.e. adding/removing directives/adding __typename won't recalculate the key as that is what lines them up with what the client thinks is the truth 😅 That's currently one of the main fears I would have for memoizing by key you would always print the original query which in the case of the fetchExchange probably isn't desired.

@keeganstreet
Copy link
Author

Hey Jovi 👋

I know the cache exchange and graphcache exchange insert __typename fields into a query, but they are not inserted conditionally - they are always inserted, so the AST input and string output for a given query would always be the same.

This may be my misunderstanding of directives but I thought they were typically always included in a query but then evaluated based on the value of a variable. But are you saying that exchanges might conditionally alter a query to add or remove directives? Wouldn't that already present a problem for the existing formattedDocs cache object used by the cache exchange and graphcache exchange?

@JoviDeCroock
Copy link
Collaborator

JoviDeCroock commented Nov 8, 2022

I was mainly referring to what if someone writes an exchange that looks at a query string and looks for @local as a directive, this is a directive used to communicate that we want to resolve this field on the client and not send it to the server as it might be a computed field or result in an error. We currently don't have such an exchange but tweaking print in the fetchExchange would for instance completely remove the possibility of someone creating this.

@keeganstreet
Copy link
Author

Hey Jovi, could you please help me understand this issue?

Let's call this hypothetical exchange Local Exchange. If the Local Exchange removes @local directives from the query, by the time the query is piped through to the Fetch Exchange, the @local directives would already be gone. The Fetch Exchange would be printing a query AST without @local directives to a string on every operation, as this is the printed query that needs to be sent to the server. So where is the harm in memoizing that print function?

It seems like it would already be problematic today for an exchange to conditionally alter a query if that exchange is positioned before the cache or graphcache exchange. These exchanges pass the query through the formatDocument util which already memoizes the mutation it performs on the query. So I can't quite see how memoizing the print of the query in the fetch exchange would be any different to this.

@kitten
Copy link
Member

kitten commented Nov 9, 2022

Note: See Edit below; I didn't realise, we didn't end up using stringifyDocument in the fetch body helper

This is already implemented. I think the concern here was to accidentally reuse printed strings of ASTs that have been mutated rather than cloned, but eventually we discovered that the possibility of this is rather irrelevant, and I believe this was then introduced without much disruption.

See: https://github.com/FormidableLabs/urql/blob/30ecefe9f69a4c33849f6c9c43bf5083b35878c2/packages/core/src/utils/request.ts#L30-L61

As per this, the result of the print call is stored on (and read back from) node.loc.source.body, which is the "canonical" place for us to put it.

Furthermore, parsing is already pretty optimised to prevent parsing the same string multiple times as much as possible. That said, we'd assume that in optimised applications where this process is necessary, ASTs would already exist as pre-parsed JSON objects and this would become irrelevant. In that case, we indeed store the key as is computed by its reference.

Lastly, formatDocument prevents duplicate work in a similar manner by using our internal __key property: https://github.com/FormidableLabs/urql/blob/30ecefe9f69a4c33849f6c9c43bf5083b35878c2/packages/core/src/utils/typenames.ts#L66-L89

This property was explicitly chosen and is used to make sure that any spread/cloned AST would still yield the same result, making any "wrong" stringification obvious while also surfacing how we'd expect the AST to at least have a new root DocumentNode reference when it's cloned.

So, I'm inclined to close this as "completed" 😅

On a side note, having done extensive benchmarks on almost every part of graphql's client-side code as part of research we did that went into graphql-web-lite (which alters several of its components to reduce size), graphql.js's runtime characteristics are heavily optimised. Given enough time, the v8 JIT will keep runtime cost low by a tremendous margin. This may not always apply to first execution (and it's an interesting thought that I'll need to put some time into researching by disabling Turbofan in some benchmarks; then again, this would then be different in other browsers), but it's currently not a big source of concern for me personally, given that we already have put in some time to make sure we avoid duplicate work ✌️

Edit: Ah, I just read your change on: https://github.com/FormidableLabs/urql/blob/30ecefe9f69a4c33849f6c9c43bf5083b35878c2/packages/core/src/internal/fetchOptions.ts#L17

Hm, I believe the reason why we had to remove this was due to this protection not originally existing: https://github.com/FormidableLabs/urql/blob/30ecefe9f69a4c33849f6c9c43bf5083b35878c2/packages/core/src/utils/request.ts#L24-L28

But these days it should be save to replace with stringifyDocument 🤔 I'll throw up a branch and we can verify that.

The only problem I see here is that this is a breaking change. I'm not really willing to duplicate a small cache for the print output rather than reusing stringifyDocument, since it really would do the same job, however, currently we add the document name to the output for computing a key of the query because we originally encountered issues with graphql-tag/loader which generated invalid loc.source strings: #1315

We could add an extra check for node.loc.source.name being our own tag ("gql") and potentially just not trust the string otherwise to be fair

Edit 2: This could look like the following: main...fix/stringify-in-body

But this will potentially introduce extra work if any other GraphQL utility happens to try to prevent future print work by providing loc.source.body specifically "for urql", so to speak

Edit 3: (Yes, a lot of edits down the line)
This is potentially also breaking the persistedFetchExchange here: https://github.com/FormidableLabs/urql/blob/30ecefe9f69a4c33849f6c9c43bf5083b35878c2/exchanges/persisted-fetch/src/persistedFetchExchange.ts#L72-L76

So, it may be simplest to just expose an extra way for us to reuse stringifyDocument's output, which may in turn increase memory usage. But, I suppose, that's the only way to get the canonical print output without modifications that may alter what ends up being hashed, unless I'm missing something.

@keeganstreet
Copy link
Author

keeganstreet commented Nov 9, 2022

Thanks for all that info @kitten!

This is potentially also breaking the persistedFetchExchange

How would this break the persisted fetch exchange? I understand that the persisted fetch exchange hashes the request body query, and we need to have deterministic hashes for APQ. But changing makeFetchBody to call stringifyDocument instead of print directly should still return the same output right?

Edit: also, we're using graphql-tag/loader, and we like being able to generate ASTs at build time instead of runtime, even if they are heavily cached at runtime. So it would be a shame if the memoisation of the print function only works when node.loc.source.name is "gql". I guess #1315 was not a problem for us because we have a different GraphQL document per query.

@kitten
Copy link
Member

kitten commented Nov 9, 2022

How would this break the persisted fetch exchange?
But changing makeFetchBody to call stringifyDocument instead of print directly should still return the same output right?

Because we'd need to hash the output of a parsed then printed AST, i.e. a normalised output. However, stringifyDocument doesn't output a normalised output right now, so I'd have to look into why we did that and when that becomes an issue.

So it would be a shame if the memoisation of the print function only works when node.loc.source.name is "gql"

Just to clarify what I said, the memoisation would still work — however, we wouldn't be able to trust the loc.source string as it's potentially incorrect when specifically graphql-tag/loader outputs it. It's a terribly old and frankly poorly maintained library, and I think it's reached its EOL, but its popularity forces us to consider some of its shortcomings.

Also, even in your setup, as you're describing it, I'd suspect loc.source is still not equivalent to printing the AST you've got. That's essentially the problem.

@keeganstreet
Copy link
Author

Hi @kitten, shall I close this issue and open a bug issue instead for clarity, linked to this?

@kitten
Copy link
Member

kitten commented Dec 1, 2022

I made some progress on this and found a stable set of redactors that should satisfy all the requirements we have.

However, there'll be plenty of moving parts to test, e.g.

  • integrations with APIs that support Automatic Persisted Queries
  • Formatting regressions and changes shouldn't be introduced (we'll have to check that)
  • Validating that GET requests still format vaguely the same
  • General testing with tooling like GCG and graphql-tag (the latter for legacy support)

After that's done this should be good to go

@keeganstreet
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future 🔮 An enhancement or feature proposal that will be addressed after the next release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants