Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any way to let int64 marshal to int not string? #438

Closed
jinleileiking opened this issue Jul 18, 2017 · 35 comments
Closed

any way to let int64 marshal to int not string? #438

jinleileiking opened this issue Jul 18, 2017 · 35 comments

Comments

@jinleileiking
Copy link

#296

But I really need this feature.

@jinleileiking
Copy link
Author

use float

@achew22
Copy link
Collaborator

achew22 commented Jul 18, 2017

You can also use int32 if you want your data to be represented as an int still.

@jinleileiking
Copy link
Author

@achew22 my data is bigger than int32. Can int32 work?

@achew22
Copy link
Collaborator

achew22 commented Jul 24, 2017

If your data is bigger than an int32 then probably not but it depends on your usecase. Talk around with your team and see if there may be another strategy for attacking the problem (or use a float)

@jinleileiking
Copy link
Author

I just want to set a unixtime(epoch) which is a int64

@achew22
Copy link
Collaborator

achew22 commented Jul 25, 2017

If you want to store a time, you should use the well known type Timestamp. This will correctly get marshalled to a Date object if you're using a swagger generated client.

https://developers.google.com/protocol-buffers/docs/reference/csharp/class/google/protobuf/well-known-types/timestamp

@Raviaryacsc
Copy link

What is the reasoning behind marshaling the int64, uint64 to string instead of number. unit32, float, doubles are marshaled as numbers. Is not it a bug which should be fixed?
Suggestion of using timestamp for epoch is not going to work.

@achew22
Copy link
Collaborator

achew22 commented Sep 20, 2017

To partially quote an earlier issue comment:

gRPC gateway follows the serialization rules of proto3 and the proto3 spec says that int64, fixed64, and uint64 are all strings.

This is the case because of IEEE 754 Double precision which JS implements using a 52 bit mantissa (1 for sign, and 11 for exponent). This means the largest number that JS can represent without losing precision is 2^52. Try it out:

(Math.pow(2,53) + 0) == (Math.pow(2,53) + 1)
true

So, rather than pretending that Javascript can handle it (and lose data) proto3 prefers to have the client interact with strings. Fewer footguns == better libraries.

If you want to interact with int64 as numbers, there are a bunch of JS bigint libraries out there that you can use to parse the string, perform whatever operations you want and get back a string. That would probably be a better way to handle this.

That is why I am suggesting using timestamp for epoch, since it was designed to work around the issue of the 52 bit mantissa.

@Raviaryacsc
Copy link

Proto3 to JSON Mapping:
int64, uint64 ---> String
float, double ----> number.
Does it seems right? I see point in your reply however should not this logic should apply to floating point, doubles instead of 64 bit integers?

@achew22
Copy link
Collaborator

achew22 commented Sep 20, 2017

A float (or single as it is sometimes called) as defined in proto as a 32 bit sequence:

  • Sign bit: 1 bit
  • Exponent width: 8 bits
  • Significand precision: 24 bits (23 explicitly stored)

So it can only faithfully hold values up to 2^23, which is smaller than 2^52 no problem there. "Single precision is termed REAL in Fortran, float in C, C++, C#, Java, Float in Haskell, and Single in Object Pascal (Delphi), Visual Basic, and MATLAB. However, float in Python, Ruby, PHP, and OCaml and single in versions of Octave before 3.2 refer to double-precision numbers. In most implementations of PostScript, the only real precision is single." Javascript does not have an implementation of single.

While a Double is defined as:

  • Sign bit: 1 bit
  • Exponent: 11 bits
  • Significand precision: 53 bits (52 explicitly stored)

So it can faithfully hold values up to 2^52 which isn't a coincidence because it was specified by the ECMAScript standard. "All arithmetic in JavaScript shall be done using double-precision floating-point arithmetic."

All quotes from the linked wikipedia page.

@Raviaryacsc
Copy link

Exactly. Double too fits in that requirement and should be JSONed as string instead of number why it is not ?
Also, JSON does not defined that Unit64 should be represented as string in JSON. it is a shortcoming of JS and not JSON. We can create the JSON with big numbers by many ways and JS is going to have problem with that. Then why this trick is done in Proto3 to Json mapping?

Today JSON is used with all sort of different languages and not just ties to JS.
I am fine with writing Un-Marshaller to convert String to Unit64 based on the field Name and Type and continue with my stuff but I see this is not a logical mapping.

@achew22
Copy link
Collaborator

achew22 commented Sep 20, 2017

Exactly. Double too fits in that requirement and should be JSONed as string instead of number why it is not ?

No, the double definition used in proto is the same float64/double definition used by javascript, IEE754 double. There is no need to convert since they are representing the same data the same way with the same edge cases.

Also, JSON does not defined that Unit64 should be represented as string in JSON. it is a shortcoming of JS and not JSON. We can create the JSON with big numbers by many ways and JS is going to have problem with that. Then why this trick is done in Proto3 to Json mapping?

Unfortunately the JSON (JavaScript Object Notation) spec defines everything relative to number so it does suffer from the same problem as Javascript since it is, definitionally, Javascript.

Today JSON is used with all sort of different languages and not just ties to JS.
I am fine with writing Un-Marshaller to convert String to Unit64 based on the field Name and Type and continue with my stuff but I see this is not a logical mapping.

You are always welcome to write a custom marshaller and if you think this is the best behavior for you then you should go for it. I would just advise you that you need to inspect all of your client libraries implementations to make sure they are not adhering to the JSON spec, which again has every number defined as a IEEE754 double precision number. If you fail to verify their incorrect behavior then you will be in a world of hurt and data loss. Every browser will do the correct thing and destroy your data.

However, if you're taking the time to write this in a non-browser environment, why not just use the gRPC generated client code in the language of your choosing? There are generators for most every language.

@kurin
Copy link

kurin commented Jul 17, 2018

I'd like to revisit/reopen this if possible. While the proto3 spec defines the int64/uint64 json representations as strings, and the above discussion points out the good reasons for that, the fact is that very often json is used as an interchange format between parties who aren't using javascript.

Anyone moving to a protobuf service definition from a different system is probably going to hit a particularly difficult snag when they need to send timestamps, file sizes, or any other data that can reasonably expected to grow beyond a 32 bit representation.

For my part, I had to copy and paste not only the jsonpb package (and then remove seven lines), but I had to copy a lot of the runtime package as well.

I don't know how to provide this behavior without also modifying jsonpb (and having to make my case there as well), but would you be open to providing a toggle that would enable this behavior?

@johanbrandhorst
Copy link
Collaborator

If you're using JSON as an interchange format, why not just use float64 in your proto definitions?

@kurin
Copy link

kurin commented Jul 18, 2018

In the specific scenario in which I find myself, I am emulating an already-existing service whose API I can't modify, and sometimes I really do need to send integers that are >2^54 or whatever it is (for example, identifiers sampled uniformly over uint64). But also it's semantically gross.

(I am actually a little bit surprised that json parsers will barf when given a "string" instead of an "int"; Go's encoding/json can be told to expect a value in quotes, but will then fail if it comes without quotes. In the interests of client compatibility, this isn't something I want to have to worry about.)

@johanbrandhorst
Copy link
Collaborator

You can't send values like that in valid JSON. It seems to me you must quote it or hope your marshaller supports invalid JSON?

@kurin
Copy link

kurin commented Jul 19, 2018

That's basically what I'm saying, I haven't done a survey but I strongly suspect that most non-JS marshallers are like "yeah that's an int it's fine." Also a quick peek at the RFC seems also to suggest that implementations are free to do whatever they want with the "number" type.

I'm not saying it's a great idea to do this, but it's a better than copying and modifying packages locally. Right now there are json endpoints that are emitting large ints as json numbers, and unless we know that all json clients can accept strings or numbers without modification (and we know that for encoding/json in fact they can't) then people are going to be doing end-runs around this part of the implementation no matter how well-meaning it is.

@johanbrandhorst
Copy link
Collaborator

johanbrandhorst commented Jul 19, 2018

I have to disagree with the assertion that this scenario affects many users. You are of course welcome to submit a PR with your suggested changes but I don't think this is something that affects many users, and as such I doubt anyone of the maintainers will be keen to pick this up. My sympathies that this is out of your control, but have you considered some alternative solution like a pre-gateway number transformer handler? Might be as simple as running a regex over the incoming strings and quoting large numerical sequences.

@DoctorQ
Copy link

DoctorQ commented Aug 3, 2018

DANGER DANGER DANGER THIS WILL SILENTLY DESTROY YOUR DATA IF YOU FOLLOW THIS ADVICE

https://github.com/golang/protobuf/blob/master/jsonpb/jsonpb.go

	needToQuote := string(b[0]) != `"` && (v.Kind() == reflect.Int64 || v.Kind() == reflect.Uint64)
	if needToQuote {
		out.write(`"`)
	}
	out.write(string(b))
	if needToQuote {
		out.write(`"`)
	}

DANGER DANGER DANGER THIS WILL STILL SILENTLY DESTROY YOUR DATA

add needToQuote = false

then

DANGER DANGER DANGER ARE YOU STILL IGNORING THE WARNINGS? THIS REALLY WILL DELETE DATA WITHOUT WARNING YOU

	needToQuote := string(b[0]) != `"` && (v.Kind() == reflect.Int64 || v.Kind() == reflect.Uint64)
        needToQuote = false
	if needToQuote {
		out.write(`"`)
	}
	out.write(string(b))
	if needToQuote {
		out.write(`"`)
	}

DANGER DANGER DANGER NO, SERIOUSLY -- THIS WILL BREAK YOU IN UNANTICIPATED WAYS THAT ARE INCREDIBLY DIFFICULT TO DIAGNOSE.

for in need

@johanbrandhorst
Copy link
Collaborator

For posterity, don't do what @DoctorQ suggests, use uint32 or don't use JSON at all when working with integers larger than 53 bits.

@kurin
Copy link

kurin commented Aug 3, 2018

I mean, again, we don't always have the luxury of those choices. There is a large number of existing JSON APIs that use JSON numbers for integers with more than 53 bits. The standard json package even encodes int64 as a number. Switching away from JSON, or encoding those values as strings, would break users.

@achew22
Copy link
Collaborator

achew22 commented Aug 3, 2018

@kurin, what is the client that is calling you written in?

@kurin
Copy link

kurin commented Aug 5, 2018

There are multiple clients in many languages, but to pick two I know of: Java, Go.

@achew22
Copy link
Collaborator

achew22 commented Aug 5, 2018

If that's the case then could you use the grpc client directly? You won't have any of these kinds of problems if you're using the grpc generated client library directly.

@kurin
Copy link

kurin commented Aug 5, 2018

I'm sorry, I think there's some confusion in this thread about the kinds of ways in which API creators and implementors can get locked into an API design.

I am in the middle of implementing the server side of a JSON API I do not own and cannot change. The API is defined by a third party with whom I have no say, and one of the parts of the definition is that numbers (such as files sizes into the many TB) are JSON numbers.

However, even if I could change the JSON definition, it would not be possible to do so without breaking the many, many, many clients that have already been written to target this API, some of which are on github, but many of which are possibly proprietary.

But even if I could contact all the owners of client implementations, telling them to change their client behavior to support a third party service is a complete non-starter.

And even if I could change the official implementation and all the API docs, it would still break all the clients that all work today, and be a terrible thing to do.

I know that the proto3 spec specifies that 64 bit ints get marshalled as strings, and I know that this is because javascript implements numbers as floats, but please believe me when I say that even in the face of all of that, @DoctorQ's solution is sometimes more correct than trying to "fix" the API.

@johanbrandhorst
Copy link
Collaborator

I proposed another solution too: you can wrap the gRPC gateway mux in your own handler that just converts large number sequences to quoted sequences. It would just be a simple http.Handler with some regex parsing perhaps? Or just manually scan and find large numbers.

@achew22
Copy link
Collaborator

achew22 commented Aug 5, 2018

I know that the proto3 spec specifies that 64 bit ints get marshalled as strings, and I know that this is because javascript implements numbers as floats,

It sounds to me like the argument should be a float64. If they specified that the API is a JSON number then that == a float64. Since it sounds like this is an open source API, maybe you could share which one it is. I'm curious about what open-source APIs have IDs that exceed 52^2.

@kurin
Copy link

kurin commented Aug 5, 2018

I'm not going to lie, having to use float64 instead of a more semantically appropriate type to hold integer valued data because a different programming language that I don't use is broken makes me a little bit resentful.

I don't think (but I'm not sure) that any of my integer values goes above 2^53; maybe they will fit in floats.

But this is really, IMO, better served by giving users a knob, even if the knob has a warning label on it.

@achew22
Copy link
Collaborator

achew22 commented Aug 6, 2018

@kurin,

I'm not going to lie, having to use float64 instead of a more semantically appropriate type to hold integer valued data because a different programming language that I don't use is broken makes me a little bit resentful.

I think that is fair. Being displeased because of these circumstance is not ideal is well justified.

Unfortunately, the reality of the world we live in is that JSON hasn't aged well compared with alternative marshalling formats and is relatively restricted due to it. It's great that it existed in 1999 and it has propelled a lot of development over the years, but I, for one, am glad to see it shuffle off this mortal coil. Since it is a schemaless exchange format you're stuck dealing with the lowest common denominator of clients that receive it. Originally it was created for use in Javascript, so I guess it's a bit fitting that Javascript is what holds it back.

In this particular case if you look at the ECMA-404 (JSON spec) for their definition of a "number" you will see this:

A number is a sequence of decimal digits with no superfluous leading zero. It may have a preceding minus sign (U+002D). It may have a fractional part prefixed by a decimal point (U+002E). It may have an exponent, prefixed by e (U+0065) or E (U+0045) and optionally + (U+002B) or – (U+002D). The digits are the code points U+0030 through U+0039.

That's it.

Orly! No limits on how many decimals are allowed to be included? No limit on how far down the decimal places there are going to be? Thank you, JSON, for forcing me to load "math/big" in Go or java.math.BigDecimal so that I can interpret this random JSON blob. How should we handle the situation where someone sends us a 600 digit number with 600 sigfigs? Proto has to draw a line somewhere. I can easily imagine another issue being about how protobuf returns a big.Float which is inconvenient to use in golang instead of the much more reasonable int64 that it currently returns.

I don't think (but I'm not sure) that any of my integer values goes above 2^53; maybe they will fit in floats.

Fingers crossed, I ran into this issue at a company I worked at before and we had to go through the migration from 52bit ints to strings. It was horrible, just horrible.

We are trying to map JSON based REST concepts into proto3. If there are things where JSON is more flexible than proto3, then we either have to expand proto3's/our understanding to contain more stuff (see #712 for an example of that in combination with googleapis/googleapis#512), or conform the old API to the new specs. I don't really see another way to do that -- something has to give when there is inconsistency between two APIs.

But this is really, IMO, better served by giving users a knob, even if the knob has a warning label on it.

I would like to take this chance to give you that knob. If you want to change the way that grpc-gateway marshalls/unmarshalls, you can implement the your own marshaller and unmarshaller. It shouldn't take more than a few hours to put something together that handles your use case. You could even put it up on github and share it around. For more docs on this, it is the first entry in our FAQ.

Maybe you could help me flesh that out with more context/content so future people don't get quite this frustrated. What do you think about that plan?

@kurin
Copy link

kurin commented Aug 6, 2018

The thing I actually ended up doing was more or less identical to what @DoctorQ suggested above: kurin/blazer@ee030c6. (And note that this commit is as old as this conversation; I'm not advocating for my specific use-case, I'm pointing out that this is sometimes going to be the least worst thing to do.)

But doing this will cause my copy of jsonpb to fall behind the released version. Mine is a very quick-and-dirty implementation, so I don't really care, but I feel like @johanbrandhorst's response disregards tradeoffs that could be rationally made. (e.g. for two Go endpoints, for example, trading large ints via floats will introduce strictly more risk of silently casting too large an int.) But any custom marshaller is going to effectively do the thing that has big warning signs all over it here. If we want the hoops that users have to jump through to get that to take hours and be the rather sizable yak that that would be, well, okay. But if we agree that, despite the risks, it is a thing users should sometimes do, I would want to make it easy to do.

@dustinxie
Copy link

dustinxie commented May 11, 2019

I can understand that the proto3 spec says that int64, fixed64, and uint64 are marshaled to strings.
On the other hand, grpc-gateway is a proxy service designed to translate gRPC <==> REST according to proto definition. It would be great to have grpc-gateway correctly marshal to uint64.

To give an example, the following proto message
message Seq {
uint64 seq = 1;
}
marshals 123456 to "123456", no problem -- that's the proto3 spec.

However, note that grpc-gateway service has the above proto definition. In this case, I would expect to receive a JSON object {"seq":123456}, instead of {"seq":"123456"} -- because grpc-gateway knows the field is uint64, it has the capability to generate correct JSON, and it should follow the proto definition

@achew22
Copy link
Collaborator

achew22 commented May 12, 2019

@dustinxie if your numbers are in the neighborhood of 6 digits, int32 can faithfully represent numbers up to 2,147,483,648 positive and negative and it will be typed as a number in the result.

@dustinxie
Copy link

@achew22 thanks for the reply and suggestion.

the unpleasant side of the thing is that the service has already been deployed out there for some time, and historic data persisted to DB on disks. If you change it to int32/float64, it would be a breaking change for existing data. I think many users are facing the same compatibility issue.

grpc-gateway is truly a great convenience, it has made us effortless setting REST proxy for our gRPC service, and I am amazed to see this is accomplished by a core team of only 6 members.

Currently I am using a strconv.ParseInt() to convert string to uint64 -- no big deal. But IMHO this could (and should) be done naturally by grpc-gateway internally, it would make it an elegant solution -- everything follows the proto definition, no gimmicks, no tricks :)

@johanbrandhorst
Copy link
Collaborator

All you'd need to accomplish this is a custom marshaller. We're very unlikely to make this change, but for users who are desperate you could simply fork jsonpb and change the way it handles large integers.

@achew22
Copy link
Collaborator

achew22 commented May 12, 2019

grpc-gateway is truly a great convenience, it has made us effortless setting REST proxy for our gRPC service, and I am amazed to see this is accomplished by a core team of only 6 members.

Thank you so much for the kind words. I'm really proud of all the work that has gone into this project. The adoption and community that has grown has been astounding and I'm very proud to be a part of it.

no big deal. But IMHO this could (and should) be done naturally by grpc-gateway internally, it would make it an elegant solution -- everything follows the proto definition, no gimmicks, no tricks :)

To be very explicit about this, the grpc-gateway default marshaller does not make any determinations on how to serialize anything. The code is here for anyone to review. We don't make any editorial decisions about what the structure of your data should be, we call the go-proto team's marshaller. After all, it is the canonical translator. As a result we quite literally have no ability to modify this. If you would like to create an issue against the golang/protobuf or protocolbuffers/protobuf repo to have them change their interchange format specification, you're very welcome to. Since we depend on their implementation, this repo is simply not the correct place to raise these concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants