Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separated files are not validated correctly #396

Open
curry684 opened this issue Mar 20, 2017 · 26 comments
Open

Separated files are not validated correctly #396

curry684 opened this issue Mar 20, 2017 · 26 comments
Labels

Comments

@curry684
Copy link

Take a look at the split up OpenAPI/Swagger example definition at https://github.com/OAI/OpenAPI-Specification/blob/master/examples/v2.0/json/petstore-separate/spec/swagger.json.

It's split up into multiple files, for example:

            "schema": {
              "$ref": "../common/Error.json"
            }

Running the following command on a local copy of that file (schema here):

vendor/bin/validate-json sample/spec/swagger.json res/schema.json

Happily outputs that validation succeeded... even if I add dozens of random characters anywhere in any of the included documents.

So it would seem that:

  • These references are not resolved at all, and
  • The failure to resolve them is completely ignored while it should be blocking successful validation
@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

Yikes, that's not good! Thanks for the heads-up, will investigate.

@curry684
Copy link
Author

I actually discovered this while using that example spec as a fixture for another library, and suddenly wondered how it could ever work as I was json_decoding it myself before feeding it to the validator, which therefore never even saw the file or base path. So none of the default settings of the Validator itself appear to be causing any kind of check for whether references can be loaded.

On a related note: as there's no documentation whatsoever regarding external references I'm not even sure how I would go about fixing this even if an error were thrown. It would really be a good idea to have some examples for cases like this.

@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

OK, so I've had a look through the schema, and it only refers to itself and (using an absolute path) to the draft-04 meta-schema. However, swagger.json does contain references to ../common/Error.json.

I have tested your example, and have confirmed via wireshark that the draft-04 spec is being correctly fetched from http://json-schema.org/draft-04/schema.

Could you please clarify whether you are reporting:

  1. a problem with resolving references in the schema (bug); or
  2. a problem with resolving references in the file being validated (not a bug).

[Ed: If it's (2), then it's not a bug, and is working as intended - JSON documents do not have any means of referring to other documents (see here). $ref is a schema thing only, and thus is only supposed to be resolved if it occurs in the context of a schema.]

@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

Which version of json-schema are you using? This seems to be working fine from v2.0.0 upwards, including in the current 5.1.0 and in the 6.0.0-dev tree...

$ ./validate-json test.json swagger.json

Output (v5.1.0):

JSON does not validate. Error:
file_get_contents(file:///common/Error.json): failed to open stream: No such file or directory
Error code: 0

test.json:

{
    "paths": {
        "/pets": {
            "get": {
                "responses": {
                    "default": {
                        "schema": {
                            "invalidProperty": "invalidData"
                        }
                    }
                }
            }
        }
    }
}

@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

On a related note: as there's no documentation whatsoever regarding external references...

Documentation for $ref is available here.

@curry684
Copy link
Author

curry684 commented Mar 20, 2017

Thanks for the swift responses! I'm kind of confused with what you are saying.

Let's take the most simplified example Swagger file. It contains internal references within its own document:

              "items": {
                "$ref": "#/definitions/Pet"
              }

To my best knowledge this is a native JSON schema feature, and validating this file against the spec should in this case inline expand the Pet definition and validate its contents recursively for whether it's appropriate and valid in this context. If I understand you correctly this library currently just validates this $ref property as a string and performs no further processing, so in this case it would just accept the reference resolving to an invalid target type?

As for the documentation: I was referring on how to properly use the SchemaStorage and UriRetriever functionality in this library, sorry :) The documentation on that is currently very thin with just a single simple example in the README.

@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

@curry684 It's bedtime here - but give me a few hours, and I'll explain in a bit more detail how it works, and what I think you're running into here.

@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

To my best knowledge this is a native JSON schema feature, and validating this file against the spec should in this case inline expand the Pet definition and validate its contents recursively for whether it's appropriate and valid in this context.

This is where you're running into problems. You're correct that $ref is a JSON schema feature, but you are trying to use it in the context of an ordinary JSON document.

A JSON schema validator takes two inputs - one is the document to be validated, and the other is the schema to validate against.

  • The document to validate (in this case your swagger.json example) is just standard JSON, and treated according to the spec at http://json.org/.
  • The schema is a much richer standard, and is treated according to the spec at http://json-schema.org/.

I think the bit that's tripping you up is that you are confused why the references in swagger.json are not being expanded. This is because, for the purposes of validation the way you are trying to do it, swagger.json is not a schema file - it's just an ordinary JSON document, and ordinary JSON documents have no concept of references. This is not a bug; it's how schema validation is supposed to work. If you try to do the same thing with other validation libraries, you'll run into the same problem.

So, in your example, you are asking the question, "Is swagger.json a valid document, according to the rules in schema.json?" The answer to that one is obviously yes.

What your example does not have is the next step of the process, which would be to ask, "Is random-document.json a valid document, according to the rules in swagger.json?" If you ask the question this way, you'll find that the references are expanded properly, as per my example here.

For the purposes of additional clarification, I would like to point out that the thing which determines whether a file is treated as a schema or as an ordinary JSON document is not what the content of the file looks like. The determining factor is whether you tell the validation library that it's a schema (by passing it as the 'schema' argument), or if you tell the library that it's a normal document (by passing it as the 'please validate this thing against the schema' argument).

In summary - this is user error, not a bug. However, if you are still confused by this, please let me know which part you are finding unclear or strange, and I'll try to expand on it a bit more.

@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

As for the documentation: I was referring on how to properly use the SchemaStorage and UriRetriever functionality in this library, sorry :) The documentation on that is currently very thin with just a single simple example in the README.

You are right - we definitely need to improve the documentation, particularly in those areas! I think we need to properly nail down which parts of the API are public first (see #392), but once that's done more documentation should definitely be on the cards.

@curry684
Copy link
Author

Thank you very much for the verbose and helpful explanation. I was wrongly under the impression that, in the context of validation, properties starting with a $ were 'special' in both the document to be validated and the schema file. After all the $schema property in the root object sort of is - it's not defined in the schema itself yet parsed and processed correctly. While I do see the $ref property only being defined as a "dumb string" where allowed indeed.

The end result is however afaics that there is no way whatsoever to properly validate an OpenAPI definition right now. Not only is the split-file support absent, it also cannot see then whether an internal reference actually points to an object that is valid in that location.

So the followup question: is there some kind of way to patch this behavior into the validator for me as a consumer? All I would actually need is to inline resolve a $ref property as a JSON pointer through the UriRetriever.

@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

After all the $schema property in the root object sort of is - it's not defined in the schema itself yet parsed and processed correctly.

This is a deviation from the spec (the spec defines it as a string URI that references a schema's meta-schema). This library uses it as a fallback option if the user does not provide a schema (by looking there for an inline schema), however this behavior is nothing to do with the standard. It's just an additional non-standard feature that previous developers of this library have chosen to implement.

The end result is however afaics that there is no way whatsoever to properly validate an OpenAPI definition right now. Not only is the split-file support absent, it also cannot see then whether an internal reference actually points to an object that is valid in that location.

Unfortunately, without additional logic, this is indeed currently the case. The official JSON standard doesn't support multi-file documents, so any validation behavior that is capable of processing such a thing very definitely falls under the category of custom / non-standard features.

So the followup question: is there some kind of way to patch this behavior into the validator for me as a consumer? All I would actually need is to inline resolve a $ref property as a JSON pointer through the UriRetriever.

Certainly, although I don't think this kind of behavior belongs in the actual validation library (@shmax / @bighappyface?). You could however certainly repurpose the UriRetriever component as part of your own solution, if you feel that it meets your needs.

My recommendation for your particular use-case would be to write a simple preprocessor. This preprocessor would:

  • Accept a single target document as its input;
  • Look for any property named $ref, and dereference it inline (i.e. replace the reference with the contents of its target)
  • Error if the reference pointer is invalid
  • Output a single, dereferenced JSON document

This would then allow you to preprocess the references, then feed the entire thing to the validator. It would ensure that everything is properly validated against the schema, including your split files, without needing to introduce nonstandard behavior into the library.

An example preprocessor could look something like this (pseudocode):

objectStorage = [
    'targetURI' : '/path/to/target/file'
]


preprocess(target) {
    if (target.$ref is set) {
        file = the part of target.$ref prior to '#'
        internalRef = the part of target.$ref after '#'
        if (objectStorage[file] is not set) {
            objectStorage[file] = fetch & decode content(file)
        }
        refContent = content at (objectStorage[file]/internalRef)
    }
    for (each object in target) {
        target.object = preprocess(object);
    }
    return target;
}

Please let me know if you'd like me to clarify any of that further, or if you have additional questions.

@curry684
Copy link
Author

curry684 commented Mar 20, 2017

Actually that's more or less what I was writing right now :) The following implementation appears to work fine, completely standalone with the separated example:

$definition = json_decode(file_get_contents(SWAGGER_PATH));
$basePath = dirname(SWAGGER_PATH);

$iterator = new \JsonSchema\Iterator\ObjectIterator($definition);
foreach ($iterator as $key => $element) {
    if (property_exists($element, '$ref')) {
        list($uri, $fragment) = array_pad(explode('#', $element->{'$ref'}, 2), 2, null);
        if (!empty($uri)) {
            $target = json_decode(file_get_contents($basePath . '/' . $uri));
        } else {
            $target = $definition;
        }
        if (null !== $fragment) {
            $fragment = array_filter(explode('/', ltrim($fragment, '/')));
            foreach ($fragment as $part) {
                $target = $target->{$part};
            }
        }
        unset($element->{'$ref'});
        foreach (get_object_vars($target) as $name => $value) {
            $element->{$name} = $value;
        }
    }
}

$validator = new \JsonSchema\Validator();
$validator->validate($definition, (object) ['$ref' => 'file://' . SWAGGER_SCHEMA]);
if (!$validator->isValid()) {
    foreach ($validator->getErrors() as $error) {
        echo sprintf("[%s] %s\n", $error['property'], $error['message']);
    }
}

Obviously lacking a lot of error checking, and not sure how it handles circular references right now, but it's now passing validation fine and triggering the right errors when dependencies are broken.

@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

Actually that's more or less what I was writing right now :)

That looks pretty good :-). 👍

Note that with ObjectIterator, $key is fairly meaningless - it's just an internal reference to how ObjectIterator has flattened its input object.

I'd also recommend catching a few more error cases if this will be handling untrusted data:

  • file_get_contents failure or malicious input URI
  • json_decode failure
  • $target->$part being undefined

If you aren't feeding it untrusted data, then obviously you may not care about all of those.

[Ed: Looks like you were a step ahead of me - I think you edited your comment to mention the error situation while I was still typing this, or I didn't see it - either way, sounds like you're already on it ;-)]

@erayd
Copy link
Contributor

erayd commented Mar 20, 2017

One other thought - your preprocessor as you have written it will not handle recursive or nested references. You may not care about that, but figured it was worth mentioning.

@curry684
Copy link
Author

Yeah I was adding the error checking part to save you the effort of reviewing a proof of concept prototype which I knew was flawed in a lot of ways hehe. The recursion would be a simple matter of repeating the process

I noticed that the logic I wrote here more or less resembles what SchemaStorage::addSchema performs internally. I think it does make sense to extract this functionality or integrate it into the library.

See, your point that the ECMA-404 JSON standard does not in any way define references is completely correct, and as such the point is acceptable that the library does not support them. However JSON Pointers themselves are an accepted standard on their own in RFC 6901 as an optional extension to JSON documents. Hence in my opinion, given how many people are working with OpenAPI/Swagger or related schemas, it would be a good idea to provide optional support for automatically resolving JSON Pointers in a validation library. A simple $factory->setReferenceKey('$ref') would be a verbose and clean way to enable this functionality which is already 99% implemented because of the schema end of things.

@erayd
Copy link
Contributor

erayd commented Mar 21, 2017

@shmax What's your take on this?

@handrews
Copy link

@curry684 With JSON Schema Draft 04, there was a separate draft RFC for JSON Reference. So OpenAPI could refer to that as its own thing. With JSON Schema Draft 05 (and the forthcoming Draft 06), "$ref" was moved into JSON Schema, which makes it a little less clear how something like OpenAPI should use it.

I am personally hoping to put some thought into how JSON Schema can be used within other formats, as that is very common (it's how OpenAPI uses it). That should re-clarify how to use "$ref" outside of application/schema+json documents. But that's just something I'm personally looking at, it's not on anyone's official road map or anything.

@shmax
Copy link
Collaborator

shmax commented Mar 22, 2017

What's your take on this?

Are you asking if I think that the json-schema library (meaning this one) should be responsible for resolving $refs in the incoming document? I'm not sure. I'm sitting here trying to form an opinion and getting nowhere. It's a use case that's never occurred to me--so far I've only been using json-schema to validate incoming ajax requests from my site's front end, so what you guys are talking about has never come up. That said, I guess I can't think of any real reason why we couldn't do it, from a technical sense; in fact, haven't we already provided everything needed to do it? Something like:

 $schemaStorage = new SchemaStorage();
$resolvedInput = $schemaStorage->resolveRefSchema($input);
$validator->validate($resolvedInput, $schema);

@shmax
Copy link
Collaborator

shmax commented Mar 22, 2017

Well, no, I guess that wouldn't recurse; I'll poke around a bit more later.

@erayd
Copy link
Contributor

erayd commented Mar 22, 2017

Are you asking if I think that the json-schema library (meaning this one) should be responsible for resolving $refs in the incoming document?

@shmax Pretty much. It would be technically fairly easy to implement, although it also raises all kind of other questions (e.g. do we support differing resolution bases the way JSON schema does via id).

I guess where I'm coming from is that I know we can, but I don't think we should... but we have a real-world user who needs the functionality, and he won't be the only one. This feels like a case that goes beyond the remit of a JSON validator, particularly noting how easily solveable this one is with a preprocessor, and I want to say no because of that, but I'm feeling conflicted about it.

@erayd
Copy link
Contributor

erayd commented Mar 22, 2017

@handrews You will have seen many more such cases than this one I suspect, and may have a better perspective - do you have an opinion on whether we should add an option to dereference and expand $ref in input data?

@curry684
Copy link
Author

This feels like a case that goes beyond the remit of a JSON validator, particularly noting how easily solveable this one is with a preprocessor

I'm not even opposed to this. The suggestion I made about the setReferenceKey call would make it fully automatic, and that might indeed be a bridge too far. However all the functionality is there, maintained, tested and error checked, and that on the other hand seems wasteful to me not to use while forcing many end users to write their own error-prone half-assed preprocessor.

So the question in that case becomes: how to expose the functionality. I see multiple options, like a plain and simple NodeProcessorInterface I can implement to do something to an input node before it's being validated. With, for bonus points, a ReferenceProcessor default implementation providing this specific behavior. The main challenge would be to resolve external relative references correctly (an inline exploded partial document has to know its own origin to be able to resolve those correctly).

@erayd
Copy link
Contributor

erayd commented Mar 22, 2017

@curry684 You make some good points - perhaps there's room for a middle-ground approach here.

If we were to tweak the architecture so that there was a generic ReferenceResolver class, and documented that class as part of the public API, does that sound like a reasonable solution?

The more I think about it, the more this idea seems reasonable - we don't have to add any nonstandard functionality to the validator (which I would really rather we not do), but there will be a class there that is documented in the public API that can be used to do all the heavy lifting in a preprocessor. The end result could be end-user code that looked somewhat like this:

use \JsonSchema\ReferenceResolver;
use \JsonSchema\Validator;

// recursively resolve, fetch & expand $ref
$input = ReferenceResolver::resolve($input, RESOLVE_MODE_FETCH);

$validator = new Validator();
$validator->validate($input, $schema);

You'd still need to write the bit above, obviously... but it does seem that it addresses your primary concern here, while avoiding adding anything to the library that would deviate from the standard.

@shmax / @bighappyface / @handrews Your thoughts on this? I'm happy to do the work, if we all agree that this is a reasonable approach. If we don't agree, then I'm happy to stick with the status quo until we agree on an alternative way of doing things (or decide to do nothing, which is also a perfectly valid option).

@erayd
Copy link
Contributor

erayd commented Mar 22, 2017

Obviously such a ReferenceResolver would also need to be able to take a base URI to dereference relative paths.

@curry684
Copy link
Author

Your suggested code seems perfectly reasonable. If we can reduce a common semi-custom end-user requirement to a single line of code I think that's the optimal solution by definition.

@handrews
Copy link

@erayd I can come up with different answers depending on the goal:

From a strict compatibility standpoint, application/json and most application/foo+json media types do not support references, so the default behavior should be to ignore them and treat them as normal JSON properties.

But as an option that can be configured- yeah it seems useful. JSON Reference was a separate thing at one point so it made more sense for OpenAPI to use it without necessarily defining it on their own back then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants