Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove "method", URI input with "hrefSchema" (object version) #179

Closed
wants to merge 1 commit into from

Conversation

handrews
Copy link
Contributor

@handrews handrews commented Dec 5, 2016

[EDIT: I was originally calling this "hrefVars" because in an earlier incarnation it was not a schema. But now it is so I've edit everything to refer to "hrefSchema" which makes much more sense. However, much of the discussion still refers to hrefVars, just mentally substitute hrefSchema please]


This is an alternative to #159 that avoids special rules for processing "hrefSchema" schemas and instead applies a normal JSON Schema to the user input. This is both more powerful and more consistent than #159.

@jdesrosiers @Anthropic @slurmulon it would be great to get your input on this in addition to @awwright and @Relequestual

In the most recent draft, "method" simply controlled whether link
input was placed in the URI (for a value of "get") or in the request
body (for a valued of "post"). While correlating with HTML, this
was both confusing due to "get" and "post" not necessarily indicating
the HTTP methods of the same name, and limiting in that users/clients
could not submit data through both the URI and the request body
at the same time.

This introduces "hrefSchema" which provides a schema for user input
matching the "href" URI Template variables. It removes "method"
and makes "schema" and "encType" unambiguously apply to the request
body in all cases.

Clients should choose when to use the request body submission based
on the rules of the protocol given by the URI scheme and the semantics
indicated by the link relation.

Additionally, the complex and apparently rarely if ever used
preprocessing rules have been removed, and both the resulting and
pre-existing limitations have been documented. They will be the
subject of work for future drafts.

The meta-schemas have been updated accordingly, and the LDO schema has
been brought over from the web site repo and updated. Improper
use of "$ref" has been fixed with "allOf".

@Relequestual Relequestual added this to the draft-6 (next draft) milestone Dec 5, 2016
@awwright awwright modified the milestones: draft-next (draft-6), Meta-schema draft-05 Dec 11, 2016
@jdesrosiers
Copy link
Member

I kinda like the symmetry between LDOs and HTML Forms. But it is undeniable that it is confusing even for the relatively well informed.

I don't think it is important to be able to specify data in the URL and in the request body at the same time. Any time I have been faced with this limitation, I found myself happier with the design once I had worked around the problem. I've come to think of specifying data in the URL and in the request body at the same time to be design smell. However, I don't have a problem if others find it useful.

The changes in the PR don't seem to match with the description. It adds the hrefVars keyword, but doesn't remove method. There are no updates to the spec, just the meta schema. Specifically, it is not clear what preprocessing rules you are proposing to remove.

@handrews
Copy link
Contributor Author

@jdesrosiers you need to click "load diff" under "jsonschema-hyperschema.xml" in order to see the main part of the diff. I don't know why it does this- I got confused earlier today myself.

However, I did accidentally drop the removal of method from the meta-schemas when I rebased the change this morning, so I've fixed that. It still makes you click to load the XML file diff, though.

@handrews
Copy link
Contributor Author

handrews commented Dec 13, 2016

I don't think it is important to be able to specify data in the URL and in the request body at the same time. Any time I have been faced with this limitation, I found myself happier with the design once I had worked around the problem. I've come to think of specifying data in the URL and in the request body at the same time to be design smell.

Could you elaborate on this?

I use the URI template variables (not just URL query parameters) for identifying resources, while the body is for the representation. Sometimes there is overlap (if a database id is used in the URI, it's usually also in the body) but often there is not.

Note that "specify data" does not necessarily mean that a human is inputting the data. When driving a system with HATEOAS it is common to need to map data from resource A's representation into URI template variables and/or representation fields of related resource B.

I'm trying to remember a specific use case- one had to do with hierarchical configurations, the parent's id (which needed to go in the URI for any request on the parent) was in the child representation as "parentId" or something like that. So if you had a child representation and you needed to update the parent, the link relation instructing you how to do that required that you fill in the parentId into the URI template while passing the updated representation on a PUT request.

I can write that out in actual schema if that will help.

@Anthropic
Copy link
Collaborator

@handrews I thought I had mentioned somewhere, but can't find the evidence I did, that I felt a more generic name like substitute rather than hrefVars would be more future proof if we want to ensure the same substitution ability for any other fields in future.
ps. just checking you got my email yesterday?

@handrews
Copy link
Contributor Author

handrews commented Dec 13, 2016

[EDIT: changed hrefVars to hrefSchema]

Here's what that general idea looks like in schema, @jdesrosiers

{
    "type": "object",
    "properties": {
        "id": {"type": "integer", "minimum": 1},
        "parentId": {"type": "integer", "minimum": 1},
        "stuff": {"type": "string"}
    },
    "links": [
        {
            "rel": "self",
            "href": "/foos/{id}"
        },
        {
            "rel": "up",
            "href": "/foos/{parentId}"
        }
    ]
}

So if I have a child C, and I want to set it's parent's "stuff" field to "lorem ipsum whatever", then I need to fill out the "up" link with the child's "parentId" and do either of:

  • GET the parent, change "stuff" to "lorem ipsum whatever", and PUT it back
  • PATCH the parent to change "stuff" to "lorem ipsum whatever"

In the GET case I first use child data to fill out the parent link relation, and then separately make use of the body. But the URI still the one I filled out with the child data. In the PATCH case I am actually making use of both mechanisms in one request.


Now consider the possibility that I just have an id for one of these "foo" things, and i know that I want to change that foo's "stuff" to "lorem ipsum whatever". If my root (entry point) resource looks something like this:

{
    "links": [
        {
            "rel": "tag:example.com,2016:foos",
            "href": "/foos/{id}",
            "hrefSchema": {
                "type": "object",
                "properties": {
                    "id": {
                        "type": "integer",
                        "minimum": 1,
                    }
                }
            }
        }
    ]
}

There is no representation of the root resource (or its an empty object or something). It's just there to be a place to find the initial set of links. So if I already know which "foo" I need and what its new "stuff" should be, then my "id" input goes into the URI template while my "stuff" input goes into the request body. There is no other way to accomplish this- it is absolutely wrong to put "stuff" in the URI, and it's not possible to identify the resource without putting "id" in the URI.

@handrews
Copy link
Contributor Author

handrews commented Dec 13, 2016

@Anthropic That was probably in #159 which was an earlier, more confusing version of this idea.

I ripped off the "hrefVars" name from Mark Nottingham's JSON Home project, in case anyone was wondering. I'd consider "substitute" if you can sell me on that more general idea, but I don't recall being convinced yet :-) [EDIT: But now I think hrefSchema makes the most sense, assuming an href-specific name]

I did get your email, will follow up tonight or tomorrow, too many other distractions today for me to look into the things you referenced.

@jdesrosiers
Copy link
Member

Could you elaborate on this?

It's hard to explain because it's more of an intuition that something I can strongly defend. That is why I have no problem with others doing this if they find it useful. I wish I had an example at hand, but I don't. My main reason for shying away from this is because I think it is a sign that I am trying to do too many things at once. In other words, the task could be broken down into two steps, one that gets the URI data from the user and another that gets request body data from the user. This separation can lead to more requests, but it can also lead to better caching and more flexible use of the API. Unfortunately it's been so long since I put thought into this particular issue that I remember the conclusions I came to more clearly than the reasons for those conclusions.

Note that "specify data" does not necessarily mean that a human is inputting the data. When driving a system with HATEOAS it is common to need to map data from resource A's representation into URI template variables and/or representation fields of related resource B.

Agreed. I don't have a problem with that. In fact, you couldn't do much useful without it. But, that isn't something that JSON Hyper-Schema can't do without hrefVars. What I specifically try to avoid (personally, not necessarily for the spec) is user input being used to construct the URI and the request body at the same time.

Regarding your example, I see what you are doing, but I wouldn't do something like this. Remember that I consider a JSON document and it's hyper-schema as a single entity. If you know the foo you want to change, then at some point you must have retrieved that foo and therefore have it's hyper-schema which tells you how to make the call without having to build the URI. You should need a root link. I'd rather force the user to retrieve the resource before they can modify it. Subsequent calls can use the cached copy so you don't have to retrieve it multiple times. I think root links should be limited to creating new resources to initiate some workflow or locating an existing resource you don't yet have a copy of so you can do work on that resource.

@Anthropic
Copy link
Collaborator

@handrews no problem had other email go into junk before... whenever you are ready...

I could imagine a label field built from title wanting to substitute for "User {arrayIndex} {?name}", or a UI policy may want to substitute within a JSON-Pointer to update "/user/{selectedUserIndex}/" to mandatory.
We have the option to copyValueTo which could be handled with a substitute on field default.

While these are cases primarily for a UI-Schema I would be surprised if we can't find other cases where $data isn't enough alone and it needs to be inserted or wrapped in a prefix and/or postfix.

I'd favour a core property that can be re-used in all extensions of json-schema to avoid multiple aliases.
I understand that href vars differ in that they are processed at execution rather than requiring a value watcher and live interpolation, but the result is the same which is why I think it is worth considering.

@handrews
Copy link
Contributor Author

@jdesrosiers several replies:

have it's hyper-schema which tells you how to make the call without having to build the URI.

I don't follow this- the hyper-schema tells you how to build the URI, that's how you make the call.

I'd rather force the user to retrieve the resource before they can modify it.

That's why I put the GET/modify/PUT cycle in there, to show that you still need all of those capabilities even when you do a GET/PUT (FWIW, I make people jump through a ton of hoops proving that they need PATCH before I'll let anyone use it- PATCH is almost always premature optimization in my experience).

at some point you must have retrieved that foo

In a large system (I'm talking about a couple hundred engineers working on multiple interconnected products) sometimes bits of information become divorced from their original context for various reasons. The least tractable reason being legacy software. I need JSON Schema to support the ideal hypermedia system, but also make reasonable accommodations for the realities of long-term large-scale software development.


But, that isn't something that JSON Hyper-Schema can't do without hrefVars

Could you give an example? I have not come up with any alternative that was close to what I needed, either on my own or with the two collaborators I had at my previous job.

user input being used to construct the URI and the request body at the same time.

I'm rarely dealing directly with user input at all. For me, the primary use case is enabling a programmatic hyperclient to connect the data returned from one call in a sequence into the next call. "User input" is usually either the modify in a GET/modify/PUT, or something like filter/pagination parameters.

the task could be broken down into two steps, one that gets the URI data from the user and another that gets request body data from the user.

Again, I'm rarely concerned with direct user interaction so while this is an important use case, it's not what I'm trying to solve. Instead of "user" input it's best just thought of as input that comes from somewhere other than the current instance. If there's a UI involved it might gather things in multiple steps, but the hyperclient should not be constrained by human UI factors (nor should it constrain the design of human UIs).

@handrews
Copy link
Contributor Author

handrews commented Dec 13, 2016

[EDIT: changed hrefVars to hrefSchema]

@Anthropic yes, I understand now. Actually that sort of thing has been proposed, although I opted not to address it in this first step (the original proposal was very tentative and muddled- you have much more clear use cases driving this which is great).

My general thought was that if schema keyword "foo" can be templated, then "fooSchema" would be how to resolve its template. This makes the correlations clear, rather than having one giant "substitute" and figuring out how it applies to everything. I very much like treating the variable names as JSON Property names and just using normal schema rules (including $data, for an example of how $data fits with this see #108 which is the full proposal of which this is the first step).

@jdesrosiers
Copy link
Member

have it's hyper-schema which tells you how to make the call without having to build the URI.

I don't follow this- the hyper-schema tells you how to build the URI, that's how you make the call.

Sorry, I meant: have it's hyper-schema which tells you how to make the call without having to build the URI with user input.

But, that isn't something that JSON Hyper-Schema can't do without hrefVars

Could you give an example?

Never mind. I misunderstood what you were describing. But maybe I'm still missing something because I don't see how hrefVars helps.

I'm rarely dealing directly with user input at all. For me, the primary use case is enabling a programmatic hyperclient to connect the data returned from one call in a sequence into the next call. "User input" is usually either the modify in a GET/modify/PUT, or something like filter/pagination parameters.

That's all very reasonable, but I don't see the difference between working with user input and working with data returned from another call. The API doesn't know or care where the data came from.

@handrews
Copy link
Contributor Author

handrews commented Dec 14, 2016

[EDIT: Changed hrefVars to hrefSchema]

That's all very reasonable, but I don't see the difference between working with user input and working with data returned from another call. The API doesn't know or care where the data came from.

Hmm... @jdesrosiers I think we're kind of trying to say the same thing but getting tripped up somehow. I definitely agree with your last sentence here. And let's just call it "outside data", for lack of a better term. It may come directly from the user, or indirectly, or from elsewhere in the system, whatever that is.

But as of now (before this PR), you have three options which don't add up to much:

  1. Your URI Template variables are resolved from the instance.
    • If the instance has the data, then the variables MUST be resolved from the instance.
    • If the instance does not have the data, then "values MAY be provided from another source (such as default values). Otherwise, the link definition SHOULD be considered not to apply to the instance."
  2. You set "method"="get" and use "schema": {...} to define and validate how to accept outside data as URL-encoded parameters in the query string
    • This can only put outside data into the query string as URL parameters
    • "schema" is independent of, and will quite likely not match, the URI Template variables
    • For any URI Template variables that are not URL query string parameters, "schema" cannot supply those variables in any way
  3. You set "method"="post" and use "schema": {...} to define and validate the request body
    • The request body can only come from outside data
    • You can, of course, manually pull data from the instance and put it in the body, but that's not a hyper-schema feature

So, what does "hrefSchema" do?

  • It removes the arbitrary and nonsensical disconnect between URI Template parameters (instance data) and input schema (outside data)
  • It removes the crippling restriction that outside data can only go into the URL query string parameters
  • It allows for both external data (validated against "hrefSchema") and instance data (validated with the instance schema)
  • It eliminates the crippling restriction that "schema" can apply only to the URL query string or to the request body, which prevents any ability to use both for any sort of external data
  • [NOT PART OF THIS PR] If "$data" and something like relative JSON pointer is accepted, then using both "hrefSchema" and "schema" will be able to make use of both external and instance data
    • "bar": {"const": {"$data": "0/foo/bar"}} hardwires the bar variable to the "bar" property within the "foo" property, and does not allow external data
    • "bar": {"default": {"$data": "0/foo/bar"}} defaults the bar bariable to the "bar" property within the "foo" property, but still allows external data if it is supplied

Does that make it seem any more useful? None of these things are currently possible- if you think they are, please give an example of how they currently work.

@handrews
Copy link
Contributor Author

Sorry, I meant: have it's hyper-schema which tells you how to make the call without having to build the URI with user input.

I'm still not following. If I have a filtered collection and I want to apply another filter to it, having the hyper-schema doesn't magically supply the filter value. I've also worked with APIs that do operations other than GET on filtered URIs (it's an easily implemented technique to avoid un-RESTful batch processors and overly complex PATCHes when you cannot afford multiple round trips due, for instance, to an extremely bandwidth-constrained environment).

So if I want to apply a filter and PUT and empty collection e.g. {"elements": []} to it (this is one pattern I've seen and used for deleting everything that matches the filter), I need the hyper-schema to give me the URI template at all, but I need to both construct the correct body and apply the desired filter. Both are in some way coming from outside of the instance.

@jdesrosiers
Copy link
Member

It removes the arbitrary and nonsensical disconnect between URI Template parameters (instance data) and input schema (outside data)

I don't have a problem with this separation and I haven't found it to be restrictive.

It removes the crippling restriction that outside data can only go into the URL query string parameters

Yes, this I would find useful.

It eliminates the crippling restriction that "schema" can apply only to the URL query string or to the request body

I don't have a problem with this either (like I've been attempting to explain).

I think we are on the same page, we just have different needs.

@handrews
Copy link
Contributor Author

@jdesrosiers I'm not sure how to respond to "I don't need this". It's not really offering a point of discussion. If you don't need it, you don't need it, but any one person not needing a thing has no bearing on whether it's needed in general.

Are you arguing that this should not be done because you don't need it, or are you just observing that it's outside of your usage but you're OK with going ahead with the change?

My problem in dealing with hyper-schema in general is that I have found very, very few people who are attempting to work on the same scale. This is why I get very frustrated with examples like "consider a blog and blog entries." They're so trivial that depending on business needs I might not even bother to worry about whether the HTTP API involved was RESTful.

You need REST when you have large-scale open-ended systems that need to last for years or longer. Even most supposedly "RESTful" APIs are really only designed to last a year or two, and then get re-done in a new "version".

I need a system that will hold up for a decade or more with multiple distributed teams writing interrelated services, plus potentially including independently authored 3rd-party services that interact with the core services. Many services involve complex relational configurations. They will be written in a variety of languages and need to function in a variety of performance vs ease-of-use constraints.

So I need all of this stuff. I needed it in my past work (and used something very much like this) and I need it all again in my next project.

But I'm not sure how to advance this when the most common reply (not just here) from others is "I'm not doing anything that complicated." That's great for you, but it doesn't help me.

@jdesrosiers
Copy link
Member

I didn't say "I don't need this". I said, "I don't have a problem with this". What I mean is that I don't see the constraint as necessarily a bad thing. Sometimes a constraint leads to a better design. GOTO is a famous example. Adding the constraint that you can not use GOTO led to better designed programs. Working within Hyper-Schema's constraints taught me a lot about how to design good hypermedia systems. Maybe it's over-constrained, but it helped me avoid mistakes I would have otherwise made. I guess you can say I have an appreciation for it's constraints.

are you just observing that it's outside of your usage but you're OK with going ahead with the change?

Yes. That's what I've said from the beginning. I find many of these constraints useful, so I will probably continue designing schemas largely the way I did before. For example, I will avoid building both URI and request body from outside data at the same time. But, if someone else finds it useful, I have no compelling reason why it should not be allowed.

I have built large scale hypermedia systems before (although it has been a few years). Nothing as enormous as yours, but still pretty big. When I say I haven't found something to be a problem, I'm not just talking about trivial systems, but I recognize that you have probably come across many things that I have not.

@handrews
Copy link
Contributor Author

Maybe it's over-constrained, but it helped me avoid mistakes I would have otherwise made. I guess you can say I have an appreciation for it's constraints.

I would really like to understand this better rather than going down an "agree to disagree" route (one of my least favorite phrases ever). This could be a compelling reason with examples of where it has helped and what sort of solutions you came up with as a result.

I don't want to over-sell my experience. I harp on the scale because I get a lot of people throwing trivial examples around, and I want to make the point that those examples don't cover what I need. But nothing I've done is really earth-shattering, and I don't have any success over a period of longer than a few years to point to so far. The challenges I see are somewhat technical (more elaborate HATEOAS, usually) but mostly social (how do you get many teams to go in the same direction, when it is not necessarily intuitive to everyone involved?)

That said, I'm having trouble figuring out exactly how you think things should work, because I was flatly unable to make it work at all with Draft 4. We just came up with an alternative loosely based on hyper-schema instead. A few things in that alternative were undeniably bad ideas, but the stuff I'm advocating here was successful and at least seemed essential.

So I would really like to understand your solutions better. I'm entirely willing to reconsider my approach if anyone can demonstrate an alternate view that produces a usable system. I often advise teams to redesign resources rather than jump to the shiniest shortcut (e.g. PATCH), so I am sympathetic to the notion of helpful guiding constraints.

I just don't see how these constraints are at all helpful. I don't see the harm you seem to see in putting data in both places, for instance. It's not clear to me what sorts of bugs or design problems that would produce, compared to what it has, in my experience, solved.

@awwright
Copy link
Member

I just noticed the patch references RFC 7409, which is "Forwarding and Control Element Separation (ForCES)". Is that right?

In the most recent draft, "method" simply controlled whether link
input was placed in the URI (for a value of "get") or in the request
body (for a valued of "post").  While correlating with HTML, this
was both confusing due to "get" and "post" not necessarily indicating
the HTTP methods of the same name, and limiting in that users/clients
could not submit data through both the URI and the request body
at the same time.

This introduces "hrefVars" which provides a schema for user input
matching the "href" URI Template variables.  It removes "method"
and makes "schema" and "encType" unambiguously apply to the request
body in all cases.

Clients should choose when to use the request body submission based
on the rules of the protocol given by the URI scheme and the semantics
indicated by the link relation.

Additionally, the complex and apparently rarely if ever used
preprocessing rules have been removed, and both the resulting and
pre-existing limitations have been documented.  They will be the
subject of work for future drafts.

The meta-schemas have been updated accordingly, and the LDO schema has
been brought over from the web site repo and updated.  Improper
use of "$ref" has been fixed with "allOf".
@handrews
Copy link
Contributor Author

handrews commented Jan 3, 2017

@jdesrosiers pinging you again now that the holidays are through. I would still really like to understand one or two situations where you feel that the current rules guide schema authors to a better design than what this proposal would allow. Not understanding that clearly is making me nervous about the change.

If you can comment within the next two weeks (even if only to say that you need more time) that would be great. Otherwise since you've said that you are OK with this we will go ahead, but I'd really prefer to understand the trade-off this would make.

@handrews handrews mentioned this pull request Jan 5, 2017
@JoergAdler
Copy link

JoergAdler commented Jan 6, 2017

Hi, I've a question about the "method" attribute and why removing it should in my option not be done.

There are some more http-verbs than get and post. Imagine, you have two links. One is for getting an order and one is for deleting it.
The URI for both is /orders/1. The only difference between both is, that one does a DELETE HTTP-Request and in the other one a GET. (Of course they have different rels.) With a "method" attribute (which in fact can be all HTTP-Verbs, see draft04 schema) you can easily do the request without having to read a documentation or so. Other advantages are:

  • the api is less prone to errors, because the client programmer could not accidentally delete something instead of getting it
  • the api calls are not so chatty. Without having the correct verb in the schema, I've to make something like response=followRel("delete").withHttpVerbDelete() in the client.

The information has to be in the server anyway, so why not write it in a machine readable way, instead of having to document it?

@handrews
Copy link
Contributor Author

handrews commented Jan 6, 2017

@JoergAdler thanks for commenting- it's great to see that more people are looking at this.

What you want is proposed in issue #73, for an "allow" property that hints at what the Allow HTTP header for the target resource would show. Like "targetSchema" and "mediaType", "allow" would be advisory only. The target resource may have reason to only allow a particular method in certain states, for instance, so sometimes it may reasonably return a 405 for a method the schema claims is allowed.

Anyway, you'll see discussion of several of your points in #73. It's essentially what you were suggesting as "http-verb", but not necessarily HTTP-specific (there are other protocols :-)

From your get/delete example, I would suggest that you are aligning the concept of "rel" too closely with protocol operations. "rel" should tell you why you care about the related thing. "next" and "previous" links do that- they let you know why you would or wouldn't follow each link.

As for getting or deleting, if GET works you can get it, and if DELETE works you can delete it. You don't need anything to tell you how to make those requests (or a PUT) because they always work the same. So you don't need multiple links for that. A PATCH should come with a request media type specified in the Accept-Patch header, which tells you what you need to know about how to build a patch- no need for JSON Schema to do that.

A POST is the only method where you need to be told how to build the request, so that is what "schema" is for. If you have a link, and the "rel" tells you that you want to POST to the target resource (the obvious example is when the "rel" is "collection"), then you use "schema" to figure out what to send.

But if you're using some other method, you ignore "schema" because the protocol (and possibly the patch media type) already tells you how to build that request.

The problem with this in Draft 05 and earlier is that "schema" was being overloaded to describe URI parameters for GET. This was just a holdover from HTML forms, which are often URL-encoded for GET requests. That's deeply entrenched behavior in HTML, but it doesn't really fit in a larger hypermedia context where HTTP's full feature set is available. Conceptually, resolving the correct URI (and therefore the correct resource) is different from sending the selected resource a request document. And you may want to do both.

So now, with this PR, when it comes to HTTP, "schema" is only used for POST. That is the only method that could possibly need it to describe a request body, assuming you're using HTTP properly. URI variables (form-encoded url query parameters or otherwise) are now handled through "hrefSchema". This is unambiguous, and you can use both mechanisms at once. There is nothing specifically indicating HTTP methods at all, so this also works for other protocols. And protocol method hints are likely to come back under a new, more clear name, as is being discussed in #73.

@JoergAdler
Copy link

JoergAdler commented Jan 6, 2017

@handrews
I think you misunderstood my intend. The thing, i want is the complete encapsulation of HTTP specific things in the link description objects, so one can write generic clients with it, only following rels. As you can see here: java example

There is now protocol leaking in that level of abstraction. Which is completely destroyed, if the "method" attribute is removed.

You state:

As for getting or deleting, if GET works you can get it, and if DELETE works you can delete it. You don't need anything to tell you how to make those requests (or a PUT) because they always work the same.

That's not the way HATEOAS should work anyway. If the current state of the application allows a delete, there will be a delete link in the response if not, there is no delete link. Here is a very good blog post about this. (Albeit the section with URL-Bookmarking should be read with care.)
Maybe I misunderstood the purpose of JSON-Hyperschema to be used for HATEOAS?

re-edit for better understanding: maybe we then need a property like "http-verb", but his will be another issue

@Relequestual
Copy link
Member

when it comes to HTTP, "schema" is only used for POST. That is the only method that could possibly need it to describe a request body, assuming you're using HTTP properly.
@handrews

In one of the APIs I'm working on, we actually have a large body in our get request. It's not a RESTful API, and it only has the one route (currently), but may have more later.

Consider, you have a patient record with ontology encoded terms related to that patient. That needs a JSON structure. The API finds similar patients based on a patient in anothers system. The patient record is encoded into JSON and sent as the body of a GET request. If it was to be restful, you would make the request with a query string of the patient ID, and then the queried system would request that patient from the original system. Not as practical, and an additional API request inside an API request, which doesn't feel right either.

What I'm trying to say is, sometimes people don't want to follow the rules or make a proper RESTful API for whatever reason, valid or otherwise. That choice should exclude them from using JSON Hyper Schema to define their API. We can recommend (or some key word), but we shouldn't exclude or make it impossible.

@handrews
Copy link
Contributor Author

handrews commented Jan 6, 2017

@JoergAdler I'll reply to you in detail a bit later today, but first, quickly...

@Relequestual I've filed several different issues previously that all were some variation on "is this invalid (non-RESTful) use of HTTP something we want to support in JSON Schema?" and the answer has seemed to be "no, we don't". A GET request with a body would be such a thing; per RFC 7231:

A payload within a GET request message has no defined semantics;
sending a payload body on a GET request might cause some existing
implementations to reject the request.

Another one I asked about was dealing with a legacy HTTP API where GET responses have an envelope wrapping the representation, plus some "standardized" response meta-data, while PUT requests just take a plain representation. This breaks the intended "you can PUT back a GET" semantics of HTTP, and my question of whether it could be described with JSON Schema seemed to meet a "no".

We need to decide what level of support for non-strictly-HTTP-compliant HTTP APIs we intend to provide. That is beyond the scope of this PR (because this PR does not make that situation worse, it tells you exactly as much as "method" did in Draft 05).

@handrews
Copy link
Contributor Author

handrews commented Jan 6, 2017

@JoergAdler There are a few underlying concerns that determine how we should look at this:

  1. What is the role of the link relation ("rel")
  2. How does HATEOAS, which is a runtime concept, fit with a static description such as JSON Schema
  3. What constitutes protocol abstraction vs leakage?

[aside to @awwright: I hope you get a chuckle out of this, I'm pretty sure I'm repeating your answers to me from a few months ago when I was asking more or less the same thing :-) ]

What does the relation tell us?

From Section 4 of RFC 5988 (Web Linking), link relation type definitions:

can specify the behaviours and properties of the target resource (e.g., allowable HTTP methods, request and response media types that must be supported).

(that's actually from 4.1, registered links, but reading 4.2 about extended links it seems clear that it applies to them as well).

This means that if you want to forbid deletes as part of the link description, then your "rel" should indicate that deletes are forbidden. Not some other part of the link description. You can also have something like the "allow" hints, but really your relation should be descriptive enough that it tells you what operations might be possible. I would only use "allow" to indicate that, for some specific resource, only a subset of the usual operations defined for this "rel" are available.

How does HATEOAS fit with JSON Hyper-Schema?

HATEOAS is a runtime concept, while each JSON Hyper-Schema is a static document. Given a schema URI, it must always resolve to the same schema. This leaves us with a few ways to document behavior at runtime (most notably presenting or not presenting specific links based on the resource state):

  1. Document all possible behavior, and allow discovery of which behavior is currently available at runtime. If your relation type is properly specified, it already does this. As noted above, you might get more specific with "allow". And at runtime, you may still get a 405 if the current state of the resource forbids a particular method.

  2. Document different possible sets of behaviors as multiple links, each with a different relation type, and ensure that each link is only usable when the resource is in the correct state. Using "oneOf", "anyOf", or "dependencies" (or if Validation: if/then/else #180 eventually goes in, "if"/"then"/"else"), make the presence or absence of a link clear by tying the LDO to a specific property being present or absent, or having a specific value. So for example, you can have a link indicating that you can interact with a related resource only present if the "locked" field is set to false.

  3. Connect different schemas to the resource at runtime. Using profile and/or describedBy, attach the right schema to the resource based on its current state. If it is currently writeable, attach a schema that has a link for writing. If not, attach a schema that omits that link.

These each have different tradeoffs in terms of flexibility vs being able to describe (e.g. in generated documentation) the full potential behavior of the API statically.

What does it mean to encapsulate the protocol?

So, given the above, what constitutes encapsulating HTTP, and what constitutes leaking it? We also need to keep in mind that JSON Hyper-Schema is not tied to HTTP in any way. It is simply that HTTP is the most common case. It's just as valid to specify an email link:

{
    "links": [{
        "rel": "author",
        "href": "mailto:someone@example.com?subject={subject}",
        "hrefSchema": {
            "properties": {"subject": {"type": "string"}},
            "required": ["subject"]
        },
        "encType": "text/plain",
        "schema": {"type": "string"}
    }]
}

Pretty much the only thing you can do with a "mailto:" URI is send to it. An "httpMethod" field would be not just irrelevant but incorrect. And even with a more protocol-neutral field, you don't need anything here- "mailto:" tells you all you need to know. This link tells you how to send a message to the author of this resource, and requires you to specify the subject as well.

When it comes to generic clients, either your hyper-schema client understands "mailto:" URIs and can construct and send emails, or if it doesn't maybe it tries to launch an external app that it knows that does understand email (this, of course, is exactly what web browsers usually do with "mailto:" URIs).

If there is anything in a link description object for a "mailto:" URI that has anything to do with HTTP, then that is HTTP leaking into JSON Hyper-Schema. The abstraction for hyper-schema is that it does not care what the protocol is.

The URI tells you the protocol- JSON Hyper-Schema doesn't need to do anything else to communicate it.

The link relation tells you why you care about the target resource, which should include some indication of what you can do with that resource. The link relation generally should not constrain the protocol. Many relation type+protocol combinations may be nonsensical, but "author" is a good example.

An HTTP link for "author" tells you that you can interact with the "author" resource through HTTP, so potentially retrieve, update, delete, etc. such a resource. A mailto link for an "author" resource tells you that you can send it email. Both of these things make sense. You could also interact with an "author" resource through FTP (why you would try to describe a stateful protocol interaction in a system mostly geared towards REST I don't know, but you could do it).


So let's get back to HTTP links. Your "href" tells you that you are using HTTP (or HTTPS, which is the same for our purposes). Your "rel" should convey what you might be able to do with the resource. Many link relation types will imply that multiple HTTP methods are valid; you should not need to tie each relation type to a specific method (look over the IANA registered link types, for instance).

We can (and quite likely will, I'm guessing) add an "allow" hints field or something similar. For HTTP links you would be able to put HTTP methods there if you want to describe them in more detail statically in the schema. But (even if we're taking "allow" from the HTTP header) you could also put other things in there for other protocols (FTP commands? I guess? I need a better example- my "mailto:" example has no use for "allow"... maybe git commands for use with "git+ssh" scheme?)

I realize this may not be the sort of answer you are looking for, but does this make sense as a possible way to look at hypermedia and HATEOAS? If so, we can start from there and work towards whether it is the ideal way to do so.

@wuan
Copy link

wuan commented Jan 6, 2017

Please do not remove the 'method' property, as it would require clients to know more than they should.

Consider a POST request for creating an entity. Such a link can easily be updated on the server side to use a PUT request with a generated fixed entity id (in order to achieve an idempotent operation). If the HTTP method is contained within the LDO, the client will still work without any modification if implemented properly.

This will be not possible any more when the method property is removed from the LDO and btw this is also an example where PUT would require a schema for the body as well.

@handrews
Copy link
Contributor Author

handrews commented Jan 6, 2017

@wuan if you are coming from Draft 04, I understand your concern.

However, as of Draft 05, "method" cannot explicitly specify PUT, DELETE, or PATCH, and "get" and "post" were declared to not necessarily map to HTTP GET and POST. See #96 (comment) for an exhaustive discussion of the concerns that this change provoked (I strongly advise against reading the whole issue, just start with that comment). That discussion led to the proposal of #73 ("allow").

So this PR is attempting to take where we are with Draft 05, look at what the Draft 05 change was trying to accomplish, and making that more clear (by dumping the confusing "get" doesn't necessarily mean "GET"/"post" doesn't necessarily mean "POST" aspects, and removing "method" because people think it still means literal HTTP methods which is misleading), less ambiguous (by making "schema" always mean the same thing instead of switching between the message body and the URL query string parameters) and more fully functional (by allowing both message body input with "schema" and URI parameter input with "hrefSchema".

This PR does not fully restore the functionality of Draft 04. However, this PR plus issue #73 does produce a spec that is more functional than Draft 04. We have not yet moved on #73 because it is part of a larger topic of target hints (protocol specific) and otherwise that we need to sort out. I hope to have it sorted by Draft 07.

So to recap:

  • If you need to explicitly list HTTP methods, Draft 04 will still work for you
  • Draft 05 does not allow explicitly listing HTTP methods
  • If this PR goes into Draft 06, it will have more functionality in this area than Draft 05
    • This PR does not remove any HTTP support that Draft 05 provided
    • It does remove "href" preprocessing, but we're pretty sure that's practically never used
  • If you need to explicitly list HTTP methods, I suggest waiting for Draft 07

It's pretty clear that delaying Draft 06 for #73 would put it off by months, quite likely to the point where we'd have Draft 07 ready anyway. Draft 06 has a lot of other things in it that people can use right now, and improves on Draft 05's hyper-schema support even without explicit listing of HTTP methods. So it's valuable to use to get Draft 06 out ASAP (hopefully this month or at the latest February).

Does that make sense? I realize it is not ideal but I think it is our best avenue for moving forward right now.

@handrews
Copy link
Contributor Author

handrews commented Jan 6, 2017

@wuan also, if you want Draft 06 features but need HTTP methods specified for your client to work, I'd just implement "allow" as a custom keyword. Chances are good you'll be able to easily migrate to a fully supported standard keyword in Draft 07 (or we'll have a complete story around how else this sort of thing can be handled).

@JoergAdler
Copy link

JoergAdler commented Jan 9, 2017

@handrews
I edited the suggestion with "http-verb" away before your first answer (just wrote it back in). Sorry for my edit. That was because I realized, that here the method attribute covers my suggestion in a much more general way, without http-leakage to json - hyperschema, which is (as you also state) much better :-)

In your enumaration of the possible ways, one can use the static schemas together with HATEOAS, you missed the way, we do it here.
We deliver the schema in every response. It's like HAL. The advantages are, that we can have the method attribute defined in the schema and not in the documentation. You can have the schema of the responses modified at runtime. Validating if server and client are still in sync and put in default, allowed and required values at runtime.

I completely missed, that there is a draft 05, cause the website only states draft 04 as the current version. Wouldn't it be a better idea to only deprecate the method keyword (in the version like it was in 04) for draft 06 and move it out with the alternative "allow" in version 07?

edited typo

@Relequestual
Copy link
Member

Relequestual commented Jan 9, 2017

@Relequestual I've filed several different issues previously that all were some variation on "is this invalid (non-RESTful) use of HTTP something we want to support in JSON Schema?" and the answer has seemed to be "no, we don't". A GET request with a body would be such a thing; per RFC 7231: (#179 (comment))

I dissagree. Doing so would be limiting. It is not an invalid use of HTTP, it's just not recomended, but there may be valid reasons for doing so. Elasticsearch actually uses body in GET, so this change would provent anyone that uses Elasticsearch from defining their API with Hyper Schema. That feels like a bad thing to me. Docs: https://www.elastic.co/guide/en/elasticsearch/guide/current/_empty_search.html

If appropriate, I can open a new issue if you feel this issue shouldn't address my point. I don't remember seeing the discussion about supporting non-restful APIs.

Also, I don't believe doing so is in violation of HTTP... http://stackoverflow.com/questions/978061/http-get-with-request-body#comment56145237_983458

@handrews
Copy link
Contributor Author

handrews commented Jan 9, 2017

@Relequestual as I've said to @JoergAdler and @wuan, your concerns have nothing to do with this PR and everything to do with the change made between Draft 04 and Draft 05. Which I did not do and based on past conversations, I can't undo.

Is anyone interested in talking about the changes made by this PR?

@handrews
Copy link
Contributor Author

handrews commented Jan 9, 2017

@Relequestual I've filed #226 for the topic of using hyper-schema with APIs that abuse HTTP (treating "no defined semantics" as "do whatever you want" is definitely abuse).

@Relequestual
Copy link
Member

I'd dissagree it's abuse when the key word SHOULD is used. I pasted the SO quote in the issue you created.

@handrews
Copy link
Contributor Author

@Relequestual "Relies on behavior specifically documented to be unreliable" is what I am getting at, which is why I said "abuse" rather than "violate". It's not a violation to send such a request, but neither is it a violation for a server or intermediary to reject such a request. See #226 for details.

@handrews
Copy link
Contributor Author

@JoergAdler I'm fully aware of dynamically choosing a schema at runtime, that is my option 3. As far as I can tell the only difference between how I described it and what you are doing is that I would change the "profile" and/or "describedBy" links to change which schema is delivered, rather than including the schema in the response. This is much more network-efficient unless every single response across the entire lifetime of the resource has a different schema (I can't think of a use case that would produce that).

The web site (specifically http://json-schema.org/documentation.html ) does link to Draft 05 (in IETF parlance, it is draft-wright-*-00, because of their requirement to reset the number when the author changes). There is no meta-schema for Draft 05, and there is some disagreement on whether it is worth publishing one or just getting to Draft 06. I'm not going to re-iterate that entire disagreement here, you can dig around and find it if you really want to.

I am not the person you need to lobby about the changes made to "method" between Draft 04 and Draft 05. I am just trying to improve how we describe and validate user input for various URI and request body use cases. See this wiki page I just wrote for more of the history.

The advantage of removing "method" from the specification in Draft 06 is that you are free to treat it as an extension keyword and do whatever you want with it. Taking it out of the spec means you are not constrained by Draft 05's approach, and can implement whatever stop-gap you need while we sort out #73.

@handrews
Copy link
Contributor Author

After discussion with @Relequestual I am closing this in favor of #228.

I have created a wiki page explaining the history of "method" in hopes of keeping discussions about explicitly documenting every HTTP method out of #228 (which is not about making that any harder or easier, it's about being more flexible with input specification).

@JoergAdler please post any follow-up comments about HTTP methods in #73, I will be happy to continue the discussion there.

@Anthropic I am not sold on a generic "substitute" and do not see any sort of consensus emerging on that before Draft 06. We should continue the discussion in its own issue. There is nothing to prevent us from introducing "substitute" in a later draft and dropping "hrefSchema" in its favor. Such a change would be easy to script for people migrating from one draft to the next, so I would prefer not to delay Draft 06 further.

@jdesrosiers I still hope to hear more from you about design constraints over in #228.

@handrews handrews closed this Jan 10, 2017
@handrews
Copy link
Contributor Author

@JoergAdler and @wuan : I have changed PR #228 to NOT remove "method". It now only adds "hrefSchema". While this does not restore the Draft 04 functionality that you prefer, it leaves that as a more clear option.

Thank you both for your input. While I resisted at first it sat around in the back of my head until I realized this would work just fine.

@wuan
Copy link

wuan commented Jan 13, 2017

Thank you @handrews for leaving "method" in for the moment!

@JoergAdler
Copy link

Thanks @handrews :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants