Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observations on type primitives #5

Closed
webron opened this issue Mar 15, 2014 · 8 comments
Closed

Observations on type primitives #5

webron opened this issue Mar 15, 2014 · 8 comments

Comments

@webron
Copy link
Member

webron commented Mar 15, 2014

Issue by quasipedia from Sunday Mar 02, 2014 at 00:16 GMT
Originally opened as https://github.com/wordnik/swagger-docs/issues/2


While I understand the present specification is an attempt at documenting how swagger works today, I thought the following observations could be of some use at least for discussion / version 1.3.

number

The JSON schema specification says for this primitive:

Any JSON number. Number includes integer.

This possibly means that number should also have the int32 and int64 formats available (as they are valid), and that eventually the integer type could be killed altogether.

date & date-time

In my opinion these are not formats. A format would be ISO 8601 (which specifies both date and date-time), for example. Mixing date and date-time with int32 and int64 is however IMO mixing oranges and apples: the former define the semantics of the string, the latter their boundaries.

In general

I believe this section may be a candidate for review: since in JSON the number is encoded as its human-readable representation, there is no need to attach information about the number of bits or the signed/unsigned property.

If the thought was to provide the client with information about the boundaries of the values, this would be better achieved with explicit maximum and minimum fields (and possibly with a resolution one, telling what is the last significant digit of a result (a far more interesting/useful property to know, in math applications).

@webron
Copy link
Member Author

webron commented Mar 15, 2014

Comment by webron from Sunday Mar 02, 2014 at 14:18 GMT


Regarding number - the JSON Spec also describes an integer data type specifically, and since most development languages have an integer distinction, I think we'll end up keeping that.

Regarding the date and date-time, we also follow the JSON Schema - http://json-schema.org/latest/json-schema-validation.html#anchor108. Granted, it only describes date-time, and we need to better clarify what date-time is (based on the JSON Schema), but as the schema says, it is an extension of a string value.

As for the additional number of bits and signed/unsigned property - I disagree with you. Again, most development languages have a distinction between 32bit and 64bit numbers, and the min/max limitations are meant as validation restrictions and not type restrictions. Granted, you can let the client choose whatever data type and limit if you want to limit to 32bits you can use the min/max values always to denote it, but I think that would lead to a more messy resulting swagger spec output rather than simply denoting the type (again, like most development languages do anyways).

@webron
Copy link
Member Author

webron commented Mar 15, 2014

Comment by quasipedia from Sunday Mar 02, 2014 at 23:02 GMT


@webron - Thanks for looking into this.

number - integer

I'd be happy to have integer kept as a type because it is a type. Here the initial bad design is that of the JSON-Schema, as it defines a type (integer) without defining other commonly used ones, using the kitchen sink number for everything else.

The additional problem that the swagger specs are introducing in they present form is that they take number and treat it as if it was float, disallowing the integer formats on it.

If the API were consistent, all valid format for integer should automatically be available on the number as well, as integer is a subset of number, not an alternative.

Bit length & sign

Maybe in the end we will simply have to disagree. :) However let me try one last time to convince you of why having int32 or duble as formats might be seen as poor specification design:

  1. They are not formats. float, double, int64 are data types. A format describe the way data is represented, as it is the case for date and date-type (assuming those will be unequivocally tied to an unambiguous formatting, such as the ISO 8601 standard, otherwise they are semantic markers).
  2. The fact many languages have distinct types for int32 and int64 or byte and long is no good reason per se to have this information sent over as part of the data structure. By analogy, since most languages have distinctive types for UTF-8 and byte strings, then you should have a format distinguishing those too. The point I am trying to make here is this: the reason why in languages such as C you have to declare if the type is signed/unsigned is that otherwise you might misinterpret the data (128 → 255 for example). However the JSON representation already takes care of this and numbers are unequivocally clear. So the information about how that number was stored on the system that originated the response is not needed.
  3. Using a type to infer the boundaries of data limit relies on a number of assumptions that might just turn out to be wrong. You could for example have a system that is returning a percentage in the form of an integer between 0 and 100, but may have obtained that by using double precision floats. I understand where the idea used in swagger comes from, but "choosing a type to match the boundaries of what that variable will possibly be during program execution" (what programmers routinely do) does not necessarily translate in the fact that type will be optimised relative to the final value of the variable.
  4. Suggesting what data type to use on the target system is a problem that should lie outside the scope of an API-format specification. All you want to achieve with an API-format is to make sure data is correctly interpreted / validated / consumed in the exchange between machines. It is none of an API-format business to try to optimise the memory consumption for numbers on either machine. If this is an important aspect of a given business case, this should be taken care by the specific API implementation, not by the API-format specifications.

If you are still not convinced...

...please at least consider these marginal improvements to the present situation:

  1. If you really want to send the "internal data type of the originating system" in the API, you could at least dedicate a specific, aptly named attribute for that (examples I can think of: machine-hint, C-datatype, bit-storage, ...), keeping the format attribute... well... for formats! :)
  2. Change the formulation of the sentence:

If the format field is used, the respective client MUST conform to the elaborate type.

into:

When the machine-hint field is used, the respective client MAY reliably discard the possibility that transmitted data won't fit the hinted type.

In other words: the only thing a client MUST do, is to be able to accept an integer of any size if the machine-hint is not given for integer (and the same for floating point numbers).

Peace & Love!
/mac

@webron
Copy link
Member Author

webron commented Mar 15, 2014

Comment by webron from Monday Mar 03, 2014 at 12:37 GMT


I disagree with a few of the points you raise (especially the format reference, as one can argue that the number of bits is a format specification), but in general I see the point you're trying to make.

The thing is this - we can hide behind technical jargon all we want but let's look at a practical point of view for a moment.

Say I write a REST API in.. say.. Java. Now, if I go and declare an API operation that accepts an integer (int32 in this case), if a client sent me a long (int64) number (that is, something that's more than int32) that would simply not work and I would end up throwing some error.
Yes, I can go and declare in every single operation the upper and lower limit, and yes, it's machine generated and read so it's not really that much of a hassle, but it does make the resulting spec description much more cluttered.

Regarding the usage of number for integers, there's absolutely no problem there. Floating point numbers include integers. The only issue is that number doesn't enforce the value to be an integer, which in many cases you'd want.

I did check a few API descriptions out there, and I consider them to be a problem in many cases. Some state that a field is a number but it's obvious from the field description that only an integer would be valid. They normally don't say what will happen if you send the wrong format (that is, a float value). In my book, that's a poor API design. Sure, they can throw an error, but if you're trying to mechanize the api description, it should be done properly.

That said, I do think you raise valid points, and would love to hear the input of others as well.

@webron
Copy link
Member Author

webron commented Mar 15, 2014

Comment by quasipedia from Monday Mar 03, 2014 at 21:25 GMT


Hi @webron!

...as I said... in the end we might have to end up agreeing to disagree. :)

I don't think that having to specify limits (if you want to do just that) does clutter the API, honestly. For example, in the framework I created in python, I mapped all native datatypes (both the JSON-Schema's and python's ones) to classes which are subclassed by a generic Model class. So everywhere in my code, I can define variables for the endpoints as for example Float('The first parameter', min=0, max=1).

The framework itself will do the heavy lifting of creating a model with all the right object attributes, and - most importantly - all that a user inspecting that specific operation will read will be "type": "Float". Since the model Float is defined at the very bottom of the document, this solution does not clutter the document at all, or that's my intention, anyhow. It also does not require to "go and declare in every single operation the upper and lower limit", as you put it, as the model is shared among parameters using it (so you only have a Int64 model, but you can have 42 different endpoints having using an Int64 parameter).

Also, if you look at RAML and Blueprint, you will appreciate their approach is similar to the one I am advocating for (not my solution with classes, but my idea that an API specification is not the right place for stuff like int64 & co., and boundaries should be specified with min/max attributes instead).

And with this, your honour, I rest my case! [ but me too would be interested to hear what other people think about this]

Best regards!

fehguy added a commit that referenced this issue Sep 8, 2014
fehguy added a commit that referenced this issue Sep 8, 2014
@webron
Copy link
Member Author

webron commented Mar 27, 2016

Parent issue: #579.

@webron
Copy link
Member Author

webron commented Mar 27, 2016

Potentially related to #607.

@webron
Copy link
Member Author

webron commented Jul 21, 2016

Tackling PR: #741

@webron
Copy link
Member Author

webron commented Feb 22, 2017

I'm closing this as a fairly old ticket. While it has merit, at the moment we're pretty much following what JSON Schema does. We may explore that again in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant