-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding: add ScalarMarshaler and ScalarUnmarshaler #56235
Comments
A search through all public modules known by the module proxy showed no implementations of the |
Overall the proposal sounds like a good idea to me. The only surprising aspect is that MarshalText would usually take precedence over MarshalScalar. If I have a type which could marshal itself as either an integer or a string, I don't think I could implement both methods correctly, as one would always win. I guess I could instead implement MarshalJSON, but that is specific to a single encoding format. |
I think that's up to the encoding package in question. The proposal seems to just be suggesting that precedence for the |
Whichever way the precedence ends up between the two, I think implementing the case I mention above with MarshalText and MarshalScalar would not be possible. Maybe that limitation by design is fine - I am simply pointing it out. |
In my original proposal I argued against the inclusion of sub-64-bit integers since the value can be represented by the larger width kind. However, allowing smaller bit widths may be helpful for formats that encode smaller width integers differently. Analysis:
Thus, there is some utility (for formats like BSON) in supporting the smaller-width numeric kinds (e.g., int8, int16, int32, etc.). However, it is unclear whether the benefit is worth the complexity. Adding various bit widths will expand the set of scalar types by 8 (i.e., int8, int16, int32, uint8, uint16, uint32, float32, complex64). This will greatly increase the complexity of encoding libraries that need to check for these. If we don't expose the width here, that information can always be exposed in a side-channel mechanism. For example: type MyStruct struct {
NumProcessed atomic.Int32 `bson:",int32"`
} I'm still in favor of excluding the smaller width types, but wanted to call this caveat out. |
I thought more about precedence order between Consider the following example: // Color is a finite enumeration of supported colors.
type Color int
const (
Red Color = 1
Green Color = 2
Blue Color = 3
)
func (c Color) MarshalText() ([]byte, error)
func (c Color) MarshalScalar() (int64, error)
func (c *Color) UnmarshalText(b []byte) error
func (c *Color) UnmarshalScalar(v int64) error For most text-based serializations (e.g., JSON), you probably want to use When unmarshaling, you could imagine JSON using |
This proposal has been added to the active column of the proposals project |
I wonder, can we decide to add sub-64-bit integer types later? It's not clear to me whether that would be considered a breaking change given the type parameter.
That sounds like the right choice to me. As an implementer, I want to offer the encoding methods which makes sense for my type. I don't want to dictate which one should always be used over another. An encoding package can inspect which methods I implement and make a reasonable decision about which one to use, like yous ay. |
I'm not sure, but I think so? \cc @ianlancetaylor |
The only fashion by which constraints containing union elements are contravariant with their constituents is in the operation set. I.e., if adding a type to a union element creates a smaller operation set, then programs using that constraint can break. There are two reasons why this doesn't apply here:
If we remove both of these properties, it is possible to write programs which fail to compile when adding another type to the union: https://go.dev/play/p/fbi6-RzzspL, https://go.dev/play/p/CGnAxdndZSA (It is weird to think of untyped literals as "operations." I twisted myself up a couple times writing this.) |
I agree that in a case like this adding additional terms to the type union in the constraint should be fine. |
It sounds like people are generally happy with
What do we do after that? Specifically, it seems like we can either add support for them to marshalers (like encoding/gob), or we can add implementations to existing types in the standard library, but we can't do both, because that would likely change the wire format generated by some of these packages. So which do we do? It sounds like above all we want to add them to the marshalers (like gob and json (and xml?)) and then after that, we can add these methods to types that gob and json (and xml?) currently cannot marshal at all and therefore have no existing wire format. This would include atomic.Uint32 and so on. Do I have that right? Also, is encoding/json really going to do 10 extra dynamic interface satisfaction checks on every value it marshals and unmarshals? Will we notice the expense? |
@rsc Technically speaking adding them to the marshalers would be a backwards incompatible change, as types (not just in the stdlib) currently could have those methods and would then change how they are encoded. Mayhaps something like json.Decoder.UseNumber? That would be strictly backwards compatible and allow to implement the interfaces for types in the stdlib. Or we ignore the theoretical breakages in lieu of evidence of practical ones. |
I think |
@Merovius, per #56235 (comment), there were no detected methods of the proposed name according to all code known by the module proxy.
That's my plan for implementing this. The marshal path for "json" already has a |
@dsnet Fair enough (Next time I chide someone for not reading the entire discussion before commenting, feel free to remind me of this 🤦) |
No worries at all. The throughput of comments on golang/go issues and discussions is astonishing and unreasonable to expect every person to be at the same level of state. |
Moving this to likely accept, but it seems like we should wait for Go 1.21 to make sure we land the encoder updates and the new implementations of these methods together. |
Agreed. To be clear, acceptance of this proposal implies concurrent support within relevant "encoding/..." packages? This would include the following packages:
It is unclear whether to add support in "encoding/asn1" since that package currently does not support |
Given the existing lack of support for |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
See golang/go#56235 for more information.
Change https://go.dev/cl/553176 mentions this issue: |
Change https://go.dev/cl/553175 mentions this issue: |
I find this proposal valuable and didn't see any open CLs mentioned here, so I sent in CL 553175 defining the interfaces and CL 553176 implementing their use in encoding/json. I can do the same for encoding/xml and encoding/gob soon. |
Change https://go.dev/cl/568896 mentions this issue: |
Change https://go.dev/cl/569276 mentions this issue: |
There are often types that are fundamentally opaque wrappers around a scalar type. One such example is
atomic.Bool
and friends. In order to get these types to operate well with serialization libraries, they would need to implement one of theMarshaler
orUnmarshaler
interfaces (see #54582).The current interfaces in the "encoding" package fall short:
encoding.TextMarshaler
andencoding.TextUnmarshaler
only treat the type as if it were a string, and so can't natively represent a boolean or number.json.Marshaler
andjson.Unmarshaler
is specific to JSON, which means that only "encoding/json" benefits and not other encoding libraries (in stdlib such as "encoding/xml" or third-party packages).MarshalText
orMarshalJSON
always allocates, which is rather heavy-weight for something that is encoding a scalar.To work around these detriments, I propose the addition of the following interface into the "encoding" package:
The type list specifies exactly 5 types, which matches the superset kind of all the scalar kinds in Go:
~bool | ~int64 | ~uint64 | ~float64 | ~complex128
) to keep the possible set of concrete interfaces a finite set that encoding packages can actually check for.TextMarshaler
andTextUnmarshaler
, which are more or less the same thing.Many serialization formats have first-class support booleans and various numeric types, so this interface provides a way to express that fact.
It is a feature (not a bug) that the 5 possible representations of this interface all use the same method name. That is, for a given type, it can only choose to implement scalar representation for one of the scalar kinds. We do not need to worry what happens when a type implements both
MarshalScalar() (int64, error)
andMarshalScalar() (float64, error)
at the same time.If
MarshalJSON
andMarshalText
andMarshalScalar
are all present, it is the discretion of the encoding package to choose the precedence order. I recommendMarshalJSON
taking precedence overMarshalText
, taking precedence overMarshalScalar
.It is up to an encoding package to decide which of the scalar kinds it wants to support. For example, "encoding/json" would support scalar marshaling for bool, int64, uint64, and float64, but reject complex128 since JSON has no representation of complex numbers. This matches the existing "encoding/json" behavior where it can't serialize complex128.
\cc @johanbrandhorst @mvdan
The text was updated successfully, but these errors were encountered: