Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic in LazyRefValidator::lazy_compile with jsonschema 0.21 #547

Closed
jrudolph opened this issue Oct 2, 2024 · 23 comments · Fixed by #556
Closed

panic in LazyRefValidator::lazy_compile with jsonschema 0.21 #547

jrudolph opened this issue Oct 2, 2024 · 23 comments · Fixed by #556

Comments

@jrudolph
Copy link

jrudolph commented Oct 2, 2024

With jsonschema 0.21, I get a panic out of jsonschema which is hard to debug because of missing detail information:

panicked at .../index.crates.io-6f17d22bba15001f/jsonschema-0.21.0/src/keywords/ref_.rs:103:61:
Invalid schema: ValidationError { instance: Null, kind: Referencing(InvalidUri(Resolve(ResolveError(InvalidBase)))), instance_path: JsonPointer([]), schema_path: JsonPointer([]) }

The schemas in question are relatively reference heavy, I can try to pinpoint it more closely but wanted to report it already.

@jrudolph
Copy link
Author

jrudolph commented Oct 2, 2024

It seems that the failing schema uses local references like this:

{
    "$id": "/schemas/base/spatial",
    "location": {
        "$id": "#/location",
        "type": "object",
        "properties": {
            "location": {
                "type": "string"
            },
            "coordinates": {
                "$ref": "#/lat-lon-coords"
            }
        },
        "required": [
            "location",
            "coordinates"
        ],
        "additionalProperties": false
    },
    "lat-lon-coords": {
        "$id": "#/lat-lon-coords",
        "type": "object",
        "properties": {
            "lat": {
                "type": "number",
                "minimum": -90,
                "maximum": 90
            },
            "lon": {
                "type": "number",
                "minimum": -180,
                "maximum": 180
            }
        },
        "required": [
            "lat",
            "lon"
        ]
    },
}

It panics when trying to compile the #/location schema which itself references the sibling schema #/lat-lon-coords. It might be that the references are not completely sound wrt the spec, in any case jsonschema should not panic...

@Stranger6667
Copy link
Owner

Thank you so much for reporting! I'll take a look at it soon :)

@jrudolph
Copy link
Author

jrudolph commented Oct 2, 2024

Thanks for this useful library ;)

@jrudolph
Copy link
Author

jrudolph commented Oct 7, 2024

Thanks for fixing the panic, @Stranger6667. Now the panic is gone but the regression itself is not fixed. It still says InvalidBase as the only indication of what might be wrong. So, for us this is still a regression from 0.20.

@jrudolph
Copy link
Author

jrudolph commented Oct 7, 2024

I might be misunderstand the specification but I think the above case is similar to the case described in https://json-schema.org/understanding-json-schema/structuring#defs:

{
  "$id": "https://example.com/schemas/customer",

  "type": "object",
  "properties": {
    "first_name": { "$ref": "#/$defs/name" },
    "last_name": { "$ref": "#/$defs/name" },
    "shipping_address": { "$ref": "/schemas/address" },
    "billing_address": { "$ref": "/schemas/address" }
  },
  "required": ["first_name", "last_name", "shipping_address", "billing_address"],

  "$defs": {
    "name": { "type": "string" }
  }
}

The $defs syntax is a convention where to put sub-schemas which we didn't adopt but afaics there's nothing that would disallow plain #/xyz references from working.

@jrudolph
Copy link
Author

jrudolph commented Oct 7, 2024

Pasting in the above schema into the test you added works, so probably we now arrived a different but related issue after all. Will investigate, the problem is still that I have to add printlns to jsonschema to figure out which sub-schema file is the problem...

@Stranger6667
Copy link
Owner

hi @jrudolph

I believe that the issue is in the $id value, in the original example it is a JSON pointer, but for such cases, internally a fake URI base is used, so it should be fine (I've added a test with a minimized version of your schema).

After finishing with drafts support I plan to address error messages and there will be much more context for such cases, which will also pinpoint the exact error location & will have a more descriptive message.

However, maybe some immediate patches to this specific area of URIs will help, I'll take a look (maybe just adding the input values to the error message will help with debugging)

@jrudolph
Copy link
Author

jrudolph commented Oct 7, 2024

That sounds good and I'll also try to debug where the problem is now.

@jrudolph
Copy link
Author

jrudolph commented Oct 7, 2024

I added printlns and a panic at the right position and ended up with this:

Trying to resolve id: #/location base: https://example.org/schemas/base/spatial#/location
Invalid URI: base URI/IRI with fragment
stack backtrace:
   0: rust_begin_unwind
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panicking.rs:74:14
   2: <referencing::error::Error as core::convert::From<fluent_uri::error::ResolveError>>::from
             at /home/johannes/git/opensource/_2024/jsonschema-rs/crates/jsonschema-referencing/src/error.rs:183:9
   3: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/result.rs:1989:27
   4: referencing::uri::resolve_against
             at /home/johannes/git/opensource/_2024/jsonschema-rs/crates/jsonschema-referencing/src/uri.rs:13:8
   5: referencing::resolver::Resolver::in_subresource
             at /home/johannes/git/opensource/_2024/jsonschema-rs/crates/jsonschema-referencing/src/resolver.rs:164:37
   6: jsonschema::compiler::Context::in_subresource
             at /home/johannes/git/opensource/_2024/jsonschema-rs/crates/jsonschema/src/compiler.rs:73:24
   7: jsonschema::compiler::compile
             at /home/johannes/git/opensource/_2024/jsonschema-rs/crates/jsonschema/src/compiler.rs:353:15
   8: jsonschema::keywords::ref_::LazyRefValidator::lazy_compile::{{closure}}
             at /home/johannes/git/opensource/_2024/jsonschema-rs/crates/jsonschema/src/keywords/ref_.rs:145:13

So, it does not happen inside the document but when reference from an outside schema. I'll try to figure out a test case.

@jrudolph
Copy link
Author

jrudolph commented Oct 7, 2024

Here's a failing test case for ref_.rs:

#[test]
    fn test_relative_base_uri_in_referenced_schema_doc() {
        let schema = json!({
            "$id": "https://example.com/root",
            "properties": {
                "location": {
                    "$ref": "/indirection#/$defs/baz",
                }
            }
        });

        struct MyRetrieve {}
        impl Retrieve for MyRetrieve {
            fn retrieve(
                &self,
                uri: &Uri<&str>,
            ) -> Result<Value, Box<dyn std::error::Error + Send + Sync>> {
                println!("Retrieving {}", uri);
                match uri.path().to_string().as_ref() {
                    "/indirection" => Ok(json!({
                        "$id": "/indirection",
                        "$defs": { // another issue is that $defs is required here
                            "baz": {
                                "$id": "#/$defs/baz",
                                "$ref": "/types/#foo"
                            }
                        }
                    })),
                    "/types" => Ok(json!({
                        "$id": "/types",
                        "foo": {
                            "$id": "#/foo",
                            "$ref": "#/bar"
                        },
                        "bar": {
                            "$id": "#/bar",
                            "type": "integer"
                        },
                    })),
                    _ => panic!("Not found"),
                }
            }
        }

        let validator = Validator::options()
            .with_draft(Draft::Draft201909)
            .with_retriever(MyRetrieve {})
            .build(&schema)
            .expect("Invalid schema");

        assert!(validator.is_valid(&json!(2)));
    }

@Stranger6667
Copy link
Owner

Thanks, that would help me a lot! I merged a change for better URI-related errors and will take a look at the issue soon

@jrudolph
Copy link
Author

jrudolph commented Oct 8, 2024

Great, thanks!

@Stranger6667
Copy link
Owner

Right now, the reference resolving process works as follows:

  1. Resolving /indirection#/$defs/baz:
  • Drop the fragment and look for a resource with "$id": "/indirection"
  • Treat the fragment as a JSON pointer, resolving to {"$id": "#/$defs/baz", "$ref": "/types/#foo"}.
  1. Resolving /types/#foo:
  • Look for a resource with "$id": "/types/" (likely a typo, should be /types without trailing slash).
  • Treat foo as an anchor, which doesn't exist in the schema.

The issue appears to be a typo in /types/#foo. It should likely be /types#/foo, which would be treated as a pointer within the /types resource.

This version should work, but there is an issue with eager resolving that does not check baz in the /indirection resource:

# root (note that there is no $id)
{
    "$ref": "/indirection#/baz"
}

# /indirection
{
    "$id": "/indirection",
    "baz": {
        "$ref": "/types#/foo"
    }
}

# /types
{
    "$id": "/types",
    "foo": {
        "$ref": "#/bar"
    },
    "bar": {
        "type": "integer"
    }
}

But it works with Python's jsonschema:

import jsonschema
from referencing import Registry, Resource
from referencing.jsonschema import DRAFT201909

a = {
    "$id": "/indirection",
    "baz": {"$ref": "/types#/foo"},
}

b = {
    "$id": "/types",
    "foo": {"$ref": "#/bar"},
    "bar": {"type": "integer"},
}


def retrieve(uri):
    contents = {"/indirection": a, "/types": b}[uri]
    return Resource.from_contents(contents, default_specification=DRAFT201909)


registry = Registry(retrieve=retrieve)
schema = {
    "$ref": "/indirection#/baz",
}
validator = jsonschema.Draft201909Validator(schema, registry=registry)
assert validator.is_valid(2)
assert not validator.is_valid("key")

So, there is definitely a bug with eager ref resolving which should be fixed.

On the schema side, do you think that my assumption about the typo is correct?

At least this is my understanding of the resolution process, which was more or less ported from Python's referencing and appears to be correct (given it passes the referencing test suite + the official JSON Schema test suite). If there is any aspect about the schema structure that I've missed, please, let me know.

@jrudolph
Copy link
Author

jrudolph commented Oct 8, 2024

Oh indeed, the trailing slash is a typo, but is that really related to the error which complains about a fragment in the base URL?

@Stranger6667
Copy link
Owner

Stranger6667 commented Oct 8, 2024

I think it is misleading right now:

Failed to resolve '/types/#foo' against 'https://example.com/indirection#/$defs/baz': base URI/IRI with fragment

That actually could be just Unresolvable: /types/#foo or something like that.
Additionally, it could check for typos like that, where the anchor does not exist, but if treated as a pointer, then it resolves to some sub-schema.

@jrudolph
Copy link
Author

jrudolph commented Oct 9, 2024

For me it still fails without the trailing slash. Basically, I'd like to have both tests passing that I added here:

https://github.com/Stranger6667/jsonschema-rs/compare/master...jrudolph:jsonschema-rs:hash-references?expand=1

@jrudolph
Copy link
Author

jrudolph commented Oct 9, 2024

I think it is misleading right now:

Failed to resolve '/types/#foo' against 'https://example.com/indirection#/$defs/baz': base URI/IRI with fragment

Not sure why that would be misleading? Isn't it an error coming directly out of fluent_uri::UriRef::resolve_against and happens because the base has a fragment defined?

@Stranger6667
Copy link
Owner

Sorry for the delay, I am going to make a new release with the current changes (as there are quite a lot of things already) and will follow up a bit later next week with fixes on the issues we've discussed here. Right now I don't have much time to dive into resolving to implement a proper fix.

Not sure why that would be misleading? Isn't it an error coming directly out of fluent_uri::UriRef::resolve_against and happens because the base has a fragment defined?

Oh, you're right. Now I re-read it and it makes sense, as with that typo it becomes a URI with a fragment. My intention was to find a way to make the error more connected with what is defined in the schema, so there is something actionable in the message

@jrudolph
Copy link
Author

Sorry for the delay, I am going to make a new release with the current changes (as there are quite a lot of things already) and will follow up a bit later next week with fixes on the issues we've discussed here. Right now I don't have much time to dive into resolving to implement a proper fix.

Yeah, no hurry, we can stay on 0.20 for the time being, just wanting to make sure that we will have a path forward in the future.

My intention was to find a way to make the error more connected with what is defined in the schema, so there is something actionable in the message

Indeed, that's already a great improvement.

@Stranger6667
Copy link
Owner

@jrudolph should be fixed via #604 (see the test there). Released in 0.24.1, please let me know if it fixes the issue for you - I believe, it should.

@jrudolph
Copy link
Author

Thanks, I'll try to see if it worked.

@jrudolph
Copy link
Author

jrudolph commented Nov 7, 2024

I finally got to checking the new version. The error message is now much better but resolution still fails... looking into it

@jrudolph
Copy link
Author

jrudolph commented Nov 7, 2024

I created a reproducer at #633. The issue is that $id fields still seem to include the fragment while resolving basically against itself...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants