Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore JSON validation #949

Closed
wants to merge 7 commits into from

Conversation

marko-bekhta
Copy link
Member

This PR is not to be merged and only to see if we need to do anything else in #938 on top of which it is based. The last commit has next changes:

  • Added a "mocked" version of JsonMetadataProvider which returns a couple of constraints for properties like name or email.
  • Added JsonProperty and JsonConstrainableType which extend from the property API that we have. Note that I've hardcoded the list of possible properties into JsonConstrainableType but in real case I think we wouldn't need that at all and we would just programmatically define some constraints (probably) without any validation if the property is present or not ...
  • And there's this simple test to check how it would work:
@Test
public void testValidatePropertyWithRedefinedDefaultGroupOnMainEntity() {
    Validator validator = getValidator();
    JsonObject jsonObject = Json.createObjectBuilder()
            .add( "firstName", "name" )
            .add( "lastName", "" )
            .add( "age", BigDecimal.valueOf( 20 ) )
            .add( "email", "not-an-email-string" )
            .build();
    Set<ConstraintViolation<JsonObject>> constraintViolations = validator.validate( jsonObject );
    assertThat( constraintViolations ).containsOnlyViolations(
            violationOf( Email.class ).withProperty( "email" ),
            violationOf( NotBlank.class ).withProperty( "lastName" )
    );
}

I've copied this from the comment #938 (comment)

@marko-bekhta
Copy link
Member Author

What I'm not very clear about is how you would define the metadata for nested properties. You either need a way to define the path of the property or you would use some sort of cascading mechanism and you would need an API to set the type of a nested object so that it properly cascades.

@gsmet I was thinking about something like next:

ConstraintMapping newMapping = config.createConstraintMapping();
newMapping
        .type( "FirstJsonType" )
        .property( "prop1" ).constraint( new NotNullDef() ).constraint( new SizeDef().max( 100 ) )
        .type( "SecondJsonType" )
        .property( "prop2" ).constraint( new EmailDef() )
        .property( "prop3" ) .valid("FirstJsonType");

config.addMapping( newMapping );
Validator validator = config.buildValidatorFactory().getValidator();

defining the "types" as unique string names, and in case of cascading - passing a type-string as an argument.
I've discarded the idea as it wouldn't fit the default API. But if we could add another validate() something like 'validateJson(..) the idea above might be an option.

@gsmet
Copy link
Member

gsmet commented Apr 9, 2018

and in case of cascading - passing a type-string as an argument.

How would you do that? I can see how you pass the initial type to the initial validate() call. But I'm unclear how you would pass the information for cascading in the case of nested objects?

Note that we might want a totally different API for JSON e.g. having a specific ValidatorFactory and/or a specific Validator but it would be nice if we could be able to reuse all the existing validation machinery.

@marko-bekhta
Copy link
Member Author

I was thinking to pass it while defining constraints:

 .property( "prop3" ) .valid( "FirstJsonType" );

which would mean that we want to cascade the validation on property prop3 which is of "type" FirstJsonType. It won't do the "run-time" detection but be just a static metadata definition.

@gsmet
Copy link
Member

gsmet commented Apr 9, 2018

Ah yes, that would work. Wondering if we should make the type a class to support inheritance in the rules.

Another option would be to define a property path, something like property('foo').asObject().property('bar'), and HV would interpret this in terms of cascading.

It's probably less complicated than the valid() version for the user but it would be impractical when the user wants to reuse part of a config, so your option is probably better.

As for the architecture, I think I would make it a totally separate module with a JsonValidatorFactory and a JsonValidator. I think it would be safer and cleaner. Just a bit worried about how much duplication we would have.

@yrodiere
Copy link
Member

yrodiere commented Apr 10, 2018

It's probably less complicated than the valid() version for the user but it would be impractical when the user wants to reuse part of a config, so your option is probably better.

Funny, I would have mixed the two ideas:

  • Expose a separate API to define JSON "types": essentially a map "property name" -> "type of that property" (the type could be a primary type such as String, or a more complex, composite type that the user declared)
  • Rely internally on what was contributed through that API whenever the user declares constraints. If you know the root type, you should be able to infer the type of every property, even at the end of chains of properties.

That way you get the best of both worlds. The user can re-use type definitions, and the constraint mapping API stays simple and to the point: just map constraint to properties.
Also, forcing users to define whether a "primary type" property is a String or an Integer, or... might help you to choose the right constraint validator at bootstrap, which may help to improve performance and detect configuration errors.

@gsmet
Copy link
Member

gsmet commented Apr 12, 2018

That way you get the best of both worlds.

Yes the only drawback is that you would need to define the whole schema. But indeed, I think the engine will need the types if we don't want to make significant changes to it.

@yrodiere yrodiere closed this Apr 12, 2018
@yrodiere yrodiere reopened this Apr 12, 2018
@yrodiere
Copy link
Member

(Sorry, clicked the wrong button...)

Just wanted to mention that you would only need to define the part of the schema that is actually validated, which may make a big difference in some cases.

@marko-bekhta
Copy link
Member Author

I played a bit more with this and here's what I've ended up with so far:

JsonConstraintMapping mapping = /*get the mapping*/;
mapping.type( User.class )
        // uses string as a type by default
        .property( "name" ) 
            .constraint( new NotNullDef() ).constraint( new LengthDef().min( 3 ) )
        // if it's not a string we need to pass a type with the name for it
        .property( "age", Integer.class ) 
            .constraint( new MinDef().value( 18 ) )
        // in case of complex type (inner json) we can pass a type and use valid() on it
        .property( "address", Address.class )
            .valid();
// and define constraint for the child json as for another type
mapping.type( Address.class )
        .property( "street" )
            .constraint( new NotBlankDef() )
        .property( "building_number", Long.class )
            .constraint( new NotNullDef() ).constraint( new MinDef().value( 1 ) );

We would need to pass a type of each property when defining constraints for it as otherwise we won't be able to create a correct JsonProperty i.e. we cannot predict the type of it on our own and we cannot wait until a json is already passed for validation.
It would also mean that we would need to check for any type mismatches during validation.

I have all this metadata definition part ready and started to look into how to integrate it with the rest of the engine, and that's where there are some problems. Through all the engine code we have bean and it's type bound toughener for example in

public class ValidationContext<T> {
    /**
     * The root bean of the validation.
     */
    private final T rootBean;

    /**
     * The root bean class of the validation.
     */
    private final Class<T> rootBeanClass;

    /**
     * The metadata of the root bean.
     */
    private final BeanMetaData<T> rootBeanMetaData;
}

which might work if we pass JsonObject everywhere for json validation, but it would be nice if we could have the metadata for Address but the rootBean = jsonObject. I'll need to double check if this causes any problems, because it was too late when I've stumbled on it initially 😃
But what bugs me more is that we cannot simple reuse ValidatorImpl as is. For example we cannot make calls like:

beanMetaDataManager.getBeanMetaData( value.getClass() )

as the value would always be json in our case but we need to get the type assigned to that particular property instead.
And it looks like that we might need to do a lot of changes in the ValidatorImpl so that we decrease the duplication.

@gsmet
Copy link
Member

gsmet commented Apr 17, 2018

And it looks like that we might need to do a lot of changes in the ValidatorImpl so that we decrease the duplication.

Yeah, from the beginning, I thought a JsonValidator would be needed. I think it makes sense to have it separate rather than polluting an already too big ValidatorImpl with additional methods.

Not sure about the amount of duplication we would have though.

That being said, ValidatorImpl is a big piece of code, splitting it in smaller digestable chunks would be a good perk :).

About ValidationContext, do you remember https://hibernate.atlassian.net/browse/HV-1526 ? Maybe it's time to have different ValidationContexts? Not sure if it makes sense, just thought about this issue when reading you.

Sorry a lot of handwaving :). HTH though.

@gsmet
Copy link
Member

gsmet commented Apr 18, 2018

// uses string as a type by default

By the way, minor nitpicking but we discussed it with my colleague Yoann and we both think having a default type is a bad idea. Let's make the user use a consistent API and define the type explicitly.

Yoann is also a bit skeptical about the use of classes to reference the types. I was the one suggesting it initially as I thought it would be easier to fit in the API but if we have a separate JSONValidator class and it does not break anything (typically I wouldn't want us to change the engine for that and I'm a bit worried about the @Valid case), maybe it would be better to use strings: the main argument of Yoann is that it might then be possible to generate the schema from JSON annotations.

@marko-bekhta
Copy link
Member Author

I did some more investigation around this. I've followed the ideas around PropertyHolder and string names for mappings. So far here's an updated version for the mapping:

mapping.type( "User.class" )
        // uses string as a type by default
        .property( "name", String.class )
            .constraint( new NotNullDef() ).constraint( new LengthDef().min( 3 ) )
        // if it's not a string we need to pass a type with the name for it
        .property( "age", Integer.class )
            .constraint( new MinDef().value( 18 ) )
        // in case of complex type (inner json) we can pass a type and use valid() on it
        // note that in case a property is a propertyholder itself the starting method
        // name is different
        .propertyHolder( "address", "Address.class" )
            .valid();
// and define constraint for the child json as for another type
mapping.type( "Address.class" )
        .property( "street", String.class )
            .constraint( new NotBlankDef() )
        .property( "building_number", Long.class )
            .constraint( new NotNullDef() ).constraint( new MinDef().value( 1 ) );

Main change from above mapping is usage of strings instead of classes. This required to change the logic for inner "property holders" as for them we need to pass a mapping name which is now a string I've added a different method for such case:

public interface PropertyHolderTarget {
    PropertyConstraintMappingContext propertyHolder(String property, String mappingName);
}

That was the easy part :)
Now about the different property holders. As we build this metadata upfront and store objects from org.hibernate.validator.internal.properties in metadata (Property in particular) there seems to be two ways how we can move forward.
First option is to pass a PropertyHolderPropertyExtractor type when retrieving the HibernateValidatorConfiguration. It will look something like:

// for json
PropertyHolderConstraintMapping jsonMapping = configuration.createPropertyHolderConstraintMapping(
        JavaxJsonPropertyHolderPropertyExtractor.class
);
for Map
PropertyHolderConstraintMapping mapMapping = configuration.createPropertyHolderConstraintMapping(
        MapPropertyHolderPropertyExtractor.class
);

this argument of createPropertyHolderConstraintMapping() will be passed to some property holder extractor manager and it will give the right extractor for us to use. In such case the extractor need to create org.hibernate.validator.internal.properties.Property for us. Something along next lines:

public interface PropertyHolderExtractor{
    Property createProperty(String proeprtyName, Class<?> proeprtyType);
    Property createPropertyHolder(String proeprtyName, String mappingName);
}

this extractor than can be used to create properties used in metadata.
The possible downsides of this might be:

  • need to pass additional parameters when creating mappings
  • need to create and store the same metadata for different propertyHolders. The creating part can be solved by doing it in the method that receives a PropertyHolderConstraintMapping as a parameter.
    The benefit is that we create a metadata that could access properties from property holders right away.

Another option might be to introduce some sort of a wrapper around actual properties, something like

public interface PropertyHolderProperty {
    Object getPropertyValue(String propertyName, Class<?> propertyType);
}

and then each time we need to call Property#getValueFrom(Object) we wrap the actual object with this wrapper.
It would allow to have a generic property holder metadata but then would require creation of additional object at runtime. Hence I'm leaning more towards the first approach, with which I'll proceed, unless there are better ideas :).

And another question that I have - how crazy the changes can be around ValueContext and ValidationContext. Why am I asking? I introduced PropertyHolderMetaData similar to BeanMetaData to store the info about property holders. I think we could later add a collection of PropertyHolderMetaDatas to BeanMetaData to support case that you mentioned:

class MyBean {
    @NotNull
    private String name;
    @ValidPropertyHolder("mapping")
    private JsonObject myJsonObject;
}

Initially I was trying to fit it all within existing contexts but it feel like hacking things to make them work. At this moment I was thinking about extracting a couple of interfaces from these contexts that we can later pass around. For example in case of ValidationContext methods like:

public T getRootBean() {
    return rootBean;
}

public Class<T> getRootBeanClass() {
    return rootBeanClass;
}

public BeanMetaData<T> getRootBeanMetaData() {
    return rootBeanMetaData;
}

public Executable getExecutable() {
    return executable;
}

are not used in ConstraintTree classes and it would help if we separate them out as in case of PropertyHolderValidationContext, instead of above methods, we would jsut have something like:

public T getRootPropertyHolder() {
    return rootPropertyHolder;
}

public PropertyHolderMetaData getPropertyHolderMetaData() {
    return propertyHolderMetaData;
}

But before going further with this I wanted to confirm if that would be ok :) that's why the question - "how crazy the changes can be?" If so I'll start working on the decoupling the contexts before getting back to property holders.

@gsmet
Copy link
Member

gsmet commented Apr 25, 2018

So let's start with the easy question: "how crazy the changes can be around ValueContext and ValidationContext?".

There are no crazy ideas. IMO, there are only a few guidelines we have to follow: avoid creating new objects in the hot paths and keep the code readable. Apart from that, feel free to do whatever changes you see fit as 6.1 is a major version.

I'm not very happy with ValidationContext at the moment, for various reasons so I suppose we will make quite a lot of changes to it in the future.

I concur with you that a specific ValidationContext for root property holder validation would be in order.

Now the rest

  • the comment in the mapping example you pasted are outdated, String is not the default anymore. Moreover I think I would drop the .class from the mapping name and simply call it address for instance.
  • about the creation of the mapping, I'm wondering if we really need to create separate one for Map and JSON or it could be generic enough to have the free form mappings totally agnostic. TBH, I'm not sure it's doable but I think I would try this first.

In such case the extractor need to create org.hibernate.validator.internal.properties.Property for us.

For me the property value extractor was a way to extract the value from a PropertyHolder. So basically, you would have a path of property value extractors as you have a path of value extractor for container elements at the moment.

What I envisioned (without experimenting with it or playing with the code, so consider this with care):

  • when you validate a PropertyHolder, it would be the root object and you would apply the MetaConstraint to it, the MetaConstraint being referenced by nested locations similar to what we do with container elements.
  • when you validate a PropertyHolder as a property of a standard bean, the MetaConstraint location would mix classical FieldConstraintLocation with PropertyHolderPropertyConstraintLocation (not very happy about the name but you get the idea).

In any case, you would know which mapping to apply to build the MetaConstraints at bootstrap (when I mean bootstrap I mean when you build the metadata, which of course is done at runtime when validating the first bean of a given type) because you would either have the root mapping name as the parameter of the validatePropertyHolder() call or as an attribute of the @ValidPropertyHolder annotation.

So all in all, I would leverage most of the existing infrastructure. Not sure it works but it would be worth a try IMHO.

@gsmet gsmet changed the title Json validation Explore JSON validation May 3, 2018
Copy link
Member

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @marko-bekhta, thanks a lot for this huge piece of work. Just a few comments from the side line here (didn't take a very detailed look, just a few things caught my intention). On a general level, it seems as if there's some redundancy added now to the programmatic API? Also, it's not quite clear yet to me whether this targets JSON specifically or a more general means of "free-form" validation (i.e. anything non-JavaBeans-based)?

@Incubating
public interface HibernateFreeFormValidator {

Set<ConstraintViolation<JsonObject>> validateJson(JsonObject json, Class<?> typeToValidate, Class<?>... groups);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is typeToValidate? Also the scope of the interface is a bit blurry to me. The class comment describes it as a generic on, but then the method signature is tied to JSON. Can we make it truly generic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this first experiment, Marko did use a class as a way to define the JSON type, thus this parameter here and the beanClass attribute you mention a bit later.

We already decided we don't want that so it will be changed.

* @author Gunnar Morling
* @author Kevin Pollet &lt;kevin.pollet@serli.com&gt; (C) 2011 SERLI
*/
public interface Cascadable<C extends Cascadable<C>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it that all these interfaces are copied for JSON?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that when declaring something as cascading, we need to define the new JSON type that will be used.

Maybe we could try to use some more generics in the DSL and have this difference managed this way (e.g. having a generics for the Cascadable DSL and inject it where it makes sense).

Not sure it's doable nor very future proof.

*
* @author Marko Bekhta
*/
public class ConstrainedProperty extends AbstractConstrainedElement {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, ConstrainedElement always was meant to represent the "physical" elements of a type (i.e. fields, getters etc.), whereas the unified handling of fields and getters as "properties" only was part of the "aggregated model". I'm not sure whether it's a good idea to mix these two levels now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know that, thanks for the clarification. I'll take this into account in the other PR for the "abstraction".


private final ConfigurationSource source;

private final Class<T> beanClass;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the beanClass in case of JSON?

@gunnarmorling
Copy link
Member

In terms of referencing types in the programmatic config API, that's what I had in mind for that:

mapping.type( new JsonType ( "com.example.User" )
        .property( "name", String.class )
        ...

The current form (mapping.type( User.class )) would be a short-cut for mapping.type( new BeanType ( User.class ).

@marko-bekhta
Copy link
Member Author

@gunnarmorling I see that @gsmet answered almost everything. This PR is a bit outdated now. I do have other patches in this direction of free-form validation. Right now we wanted to get the preparation work done first:

  • build an abstraction over reflection in the engine
  • make some changes to Validation and Value Contexts (IIRC some changes are already merged in)

As for the "type" we ended up agreeing on using String as a mapping name. But I think that your idea with new JsonType ( "com.example.User" ) might be useful for some internals, so thanks for it! :)

@gsmet
Copy link
Member

gsmet commented Nov 7, 2018

Closing in favor of #989 .

@gsmet gsmet closed this Nov 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants