Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Scalar serialize as built-in scalar type #326

Closed
wants to merge 1 commit into from

Conversation

leebyron
Copy link
Collaborator

Currently, Custom Scalars only describe their name and no other behavior about them can be programmatically determined. However in practice most custom scalar types are specializations of one of the existing built-in scalar types (String, Int, Float, etc.).

This proposes adding one additional piece of metadata to custom scalars, a "serializes as" type which must be a built-in scalar type. This serialize as type describes what kinds of values are allowed to be provided as input to that type and what kind of value is expected to serialize from that type. This metadata is useful for code-generation systems which expect to generate code in well typed environments. It can also improve the validation accuracy and IDE ergonomics by providing a similar level of information about custom scalars as exist for the built-in scalars.

To provide this metadata to those kinds of tools, this also extends the introspection system to return the serialize as type in the ofType field.

This spec text has a reference implementation here: graphql/graphql-js#914

Currently, Custom Scalars only describe their name and no other behavior about them can be programmatically determined. However in practice most custom scalar types are specializations of one of the existing built-in scalar types (String, Int, Float, etc.).

This proposes adding one additional piece of metadata to custom scalars, a "serializes as" type which must be a built-in scalar type. This serialize as type describes what kinds of values are allowed to be provided as input to that type and what kind of value is expected to serialize from that type. This metadata is useful for code-generation systems which expect to generate code in well typed environments. It can also improve the validation accuracy and IDE ergonomics by providing a similar level of information about custom scalars as exist for the built-in scalars.

To provide this metadata to those kinds of tools, this also extends the introspection system to return the serialize as type in the `ofType` field.
@stubailo
Copy link
Contributor

a "serializes as" type which must be a built-in scalar type.

What if I'm using a transport which supports additional types, like ordered maps or something, which aren't one of the built in types? Also, this wouldn't support the currently widely used JSON scalar type.

I think this is a great proposal, but it's too restrictive if all scalars have to be serializable in this way. It would be awesome if it were optional, so you could get default type generation but only for the types that have this specified.

@IvanGoncharov
Copy link
Member

@leebyron Great PR 👍

I think this is a great proposal, but it's too restrictive if all scalars have to be serializable in this way.

@stubailo According to this PR "serialize as" is fully optional:

A custom scalar MAY provide a built-in scalar type which describes how the custom scalar is serialized.

So it doesn't introduce any breaking changes and you can still define custom scalar as previously, e.g. scalar JSON. Moreover, I think restricting to base types is a great improvement over my proposal in #308 since it solves the problem of deeply nested scalars without introducing new entity.

Without this restriction, GraphQL schema can have arbitrarily long chains of scalars like AvatarUrl => ImageUrl => Url => String which is great in theory but puts additional overhead on authors of libraries and tools.

@OlegIlyenko
Copy link
Contributor

in general i like this addition, Though I have concerns about this part:

if one is provided it must be one of the built-in scalar types (Int, Float, String, Boolean, ID).

Do you think it would make sence to relax this rule and allow custom scalar values to be defined in terms of other custom scalar values? I do see that it can simplify this feature from client perspective, but there is still some value to be gained when server is able to describe nested constraints.

That said, i just checked how I defined ScalarAlias in sangria, and it turs out that it does not allow nested dependencies either (so you can't define a scalar alias in terms of another scalar alias) :) Though scalar aliases serve a bit different purpose.

On the other hand, if we don't allow nested definitions, then i'm a buit lost. From this description:

as the real value extracted by doing this is to allow clients to understand the serialized response. In that sense, we do actually want to consider this in terms of the semantic scalar values.

I undestood that the whole point of this feature is to focus on the sematic meaning of the scalar value. By restricting it to only built-in scalars, we greatly deminish it's value, in my opinion (built-in scalars do not provide much of veriety). In this case I feel that it actually would be better to expose literal information instead. This has an a few advantages:

  • we would be able to describe all built-in types as well as any custom scalar type
  • base type question is not relevant anymore sice we only have a limited set of literals

Still not 100% about soundness of my argument. Would love to hear your thoughts on this :)

@stubailo
Copy link
Contributor

Sorry, I totally misread! Thanks for the clarification @IvanGoncharov

@leebyron
Copy link
Collaborator Author

@OlegIlyenko

Do you think it would make sence to relax this rule and allow custom scalar values to be defined in terms of other custom scalar values? I do see that it can simplify this feature from client perspective, but there is still some value to be gained when server is able to describe nested constraints.

I think @IvanGoncharov made a good point above about this. Allowing custom scalars to be defined in terms of other custom scalars essentially produces a chain, where the end of that chain is either something that client tooling understands or something it does not. Ultimately that tooling needs to resolve the chain and determine if it has the knowledge to do anything more specific.

The motivation for restricting to the built-in scalar types is that any other type a client tool won't know what to do with, and biasing towards a simpler solution protects us from incidental complexity in the future. But I'm definitely open to discussion on this. I would be curious to hear of an example where the nested constraints would be useful for solving a problem which would weigh in favor of a slightly more complicated solution.

I undestood that the whole point of this feature is to focus on the sematic meaning of the scalar value. By restricting it to only built-in scalars, we greatly deminish it's value, in my opinion (built-in scalars do not provide much of veriety). In this case I feel that it actually would be better to expose literal information instead.

I do think there are two related but different problems here.

The one that drove this proposal is client-side tooling which seek to understand how to treat custom scalars. What should client tooling do with a URL or DateTime or the like? Explaining how they serialize in terms of existing scalars allows client tools to only need to understand the built-in scalars and via this addition understand how to treat nearly all other custom scalars as well. This can also handle literals as well, that anything which serializes as String should accept String literals.

A different but related problem is how to define custom scalars which actually expose new serialization behavior. For example a Long 64-bit int value. It would not be correct to say that this serializes as Int, however it would be useful to say that it accepts INT literals to incrementally improve validation.

My opinion is that the first problem is more important to solve since client code-gen has become important for nearly all popular GraphQL client frameworks (Apollo, Relay, and FB's native apps), where defining custom scalars in terms of the literals they accept rather than how they serialize would not help.

The second problem is interesting in its ability to allow GraphQL to grow to support more kinds of serialization behavior, but is not as much a problem encountered by clients now and to solve would likely require more than just describing the literals that type accepts.

Is that a fair characterization?

@OlegIlyenko
Copy link
Contributor

Yes, this sounds good! After thinking about it fo a while, I also think that it is a good idea to approach this from a practical perspective. Approaching it from a client and codegen perspective is quite helpful. At least for numbers, these tools need to know the actual size of a number, the literal information is not very helpful here. So even limited subset of semantic information is already quite helpful.

This gives me another idea: should we distinguish between input and output base type? for a Long it would be quite helpful - it can declare itself as an Int-compatible on the input side, but undefined on the output side. the same is for BigInt and BigDecimal. Not sure whether this information is worth added complexity. if a codegen tool encounters such type, it probably will need a dedicated mapping anyways to fully support such scalar type.

@wincent
Copy link
Contributor

wincent commented Jun 20, 2017

I know this is an early draft, but from the word-smithery perspective some of the wording here is a bit awkward. In a few places we have this structure:

Custom scalars may provide a built-in scalar type it serializes as in ofType, which describe primary data format and scalar coercion rules.

The "which it serializes as" pattern is a bit of a mouthful and I suspect may be harder to read for English-as-a-second-language readers. I'd suggest reordering these to avoid the trailing "as". In the example above, for instance, we could say:

Custom scalars may indicate how they serialize by returning a built-in scalar type from ofType.

(Or similar.)

Likewise:

If a custom scalar provides a built-in scalar type which it serializes as, ...

could become:

If a custom scalar serializes as a built-in scalar type, ...

@IvanGoncharov
Copy link
Member

IvanGoncharov commented Nov 29, 2017

As we discussed during the last WG meetup there was no major objection to this proposal.

However, I want to raise my concern about SDL syntax. With the proposed change you will be able to define custom scalars in two ways:

  1. scalar SomeScalarType - to define scalars with new serialization behavior e.g. 64-bit integers.
  2. scalar SomeScalarType as String - to define domain-specific scalars as HTML, CreditCardNumber, etc. as serializable to one of the standard scalars.

I'm concerned that developers may use the first form (because it shorter) for the cases where the second form is the correct choice.
I think it's bad to create syntax that encourages developers to define scalars with custom serialization since in most of the cases it's not what developer intend. Again, I think the first form would be used because it short and because you need to know that the second form exists.

Also, I think the majority of APIs either don't need scalar with custom serialization behavior or they use scalars provided by GraphQL implementations (e.g. Sangria's Long).

So there is an issue: the short form (scalar SomeScalarType) is used for more complex and infrequent case and the long form (scalar SomeScalarType as String) is used for the simple and very common case.

I expect the following example scenario:

  1. a developer defines API with scalars like scalar Quantity and assumes it's integer >= 0. But he doesn't know about as Int part and everything works just fine without it.
  2. runs some codegen tool that assumes that either:
  • Quantity is a string (e.g. apollo-ios)
  • produces error
  1. developer either decides that defining scalars was a bad idea and just uses Int instead or
    spends time googling just to find out about the second form.

I strongly believe that the syntax for defining scalars with the new serialization behavior should signalize (like dangerouslySetInnerHTML in React) that by doing so, you create problems to code generation, some of the tools, etc

I'm still not sure about how alternative syntax may look like but here are a few of the possible variants:

custom SomeScalarType
custom scalar SomeScalarType
scalar SomeScalarType asCustom
scalar SomeScalarType as Custom
scalar SomeScalarType as Any

Would be great if someone can come up with a better one.

I think it is still possible to introduce such changes to the SDL because it's not merged into the spec yet and because Descriptions as strings PR also introduces breaking change.

What do you think about it?

P.S. @leebyron Would be great to discuss it before Descriptions as strings PR is merged.

@schickling
Copy link

Is there any further movement on this @IvanGoncharov? I remember you championing this. 👑

@IvanGoncharov
Copy link
Member

@schickling Thanks for ping 👍
I pretty happy with PR and it's current state.
It's look like next step would be implementing it inside graphql-js.
I rebased graphql/graphql-js#914 as graphql/graphql-js#1173 and will work on adding more tests.

But I'm still concern about SDL problem I described in my previous comment and think it should be disscussed before SDL syntax will be set in stone.
What do you think about it and proposed solution?

@IvanGoncharov
Copy link
Member

As suggested by @leebyron during last WG I'm adopting this PR as #521

@leebyron leebyron deleted the rfc-scalar-as branch July 1, 2022 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants