Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom validation of built struct #67

Closed
faern opened this issue Apr 9, 2017 · 37 comments
Closed

Custom validation of built struct #67

faern opened this issue Apr 9, 2017 · 37 comments

Comments

@faern
Copy link

faern commented Apr 9, 2017

It would be nice to allow custom code to validate an object before it is returned from the build method. This topic is touched upon in #53 and #56, but none of them are really about this, nor discuss it in detail.

I mean to validate the entire struct upon build, not individual fields on set. This is for situations where there are no invalid values for individual fields, but some combinations of field values are invalid.

Example

#[derive(Builder)]
#[builder(validatefn=my_validate)]
pub struct Foo {
    src: IpAddr,
    dst: IpAddr,
}

impl FooBuilder {
    fn my_validate(foo: &Foo) -> Result<(), String> {
        if foo.src.is_ipv4() == foo.dst.is_ipv4() {
            Ok(())
        } else {
            Err("Source and destination addresses not in same address family".to_owned());
        }
    }
}

// And the generated builder looks something like:
impl FooBuilder {
    fn build(&mut self) -> Result<Foo, String> {
        let foo: Result<Foo, String> = whatever_it_does_now_to_create_foo();
        foo.and_then(|foo| Self::my_validate(&foo).map(|_| foo))
    }
}

The value sent to validatefn should probably be limited to an identifier, a method name, and the code generated will be Self::$validatefn(&value)...

Structured errors

Structured errors, instead of String is discussed in issue #60. I really like the idea of returning structured errors and support that. This custom validation can be made to support that by changing the requirement on validatefn from (T is the type we are building):

Fn(&T) -> Result<(), String>

to

Fn(&T) -> Result<(), E> where E: std::error::Error

And introduce another structured error variant ValidationFailed(E), giving us a build method similar to:

impl FooBuilder {
    fn build(&mut self) -> Result<Foo, FooBuilderError> {
        let foo: Result<Foo, FooBuilderError> = whatever_it_does_now_to_create_foo();
        foo.and_then(|foo| {
            Self::my_validate(&foo)
                .map(|_| foo)
                .map_err(|e| FooBuilderError::ValidationFailed(e))
        })
    }
}

Allowing the user of derive_builder to return any error type they want in their validation function.

@faern
Copy link
Author

faern commented Apr 9, 2017

Should just add that I'm willing to help build this. But I want to see how it is received first.

@faern faern closed this as completed Apr 9, 2017
@faern faern reopened this Apr 9, 2017
@faern
Copy link
Author

faern commented Apr 9, 2017

Regarding this feature together with structured errors. If no validation function is given we should probably not even generate a ValidationFailed variant to the accompanying error. Only generate error variants that is relevant to the specific builder.

@colin-kiegel
Copy link
Owner

The proposal looks good to me. Thanks for starting the discussion. 👍

Some questions circulating my mind are:

  • how often do you need to do validation as a post-processing step?
  • how often do you want to do mutation as a post-processing step? (e.g. silently coerce/sanitize values into a valid range).
  • and are those needs best treated separately or in one go?
  • how far do you get with just making the generated build method private and wrapping it in a custom public build method?

I would prefer to have a rough overall plan for validation and mutation, before implementing just one of these.

I agree that validation would be more powerful on the struct level instead of individual fields. With regard to mutation, I'm not so sure. It may be sufficient to offer an API for mutation on the field level? Because if you need mutation on the struct level you could just wrap the build method, no?

@faern
Copy link
Author

faern commented Apr 9, 2017

Can I hide the generated build method in the current version? I could easily add my own build method, but I don't get how I would hide the existing one.

If we also want mutation/sanitize, we could easily implement so that the built value is passed with ownership to the validator and that the method return it again, possibly modified:

Fn(T) -> Result<T, E> where E: std::error::Error

But then we might need to do some bikeshedding regarding the name for that functionality, since it's not strictly validating any longer.

For sanitizing things, such as keeping numbers in a desired range, I'm not sure what solution I prefer. In general I don't like silently doing stuff like that. I would probably treat an out of range value as a validation error instead of modifying it.

@colin-kiegel
Copy link
Owner

No you can't hide the build method right now, but it's somewhere on my todo list (I just created a github issue #70). It should not be very difficult to implement, in case you want to give it a go. ;-)

I am also a bit wary about encouraging such possibly surprising things like mutating post-processing. The limitation to sanitazing is one of the things I liked about your proposal. But on the other hand I'm not sure if that would rule out some valid use cases and what those use cases might look like / need.

Just throwing in possible names

  • and_then could denote a function, which is allowed to mutate and error (in analogy to Result::and_then)
  • map could denote a function, which is allowed to mutate, but not error
  • filter could perhaps be used instead of validatefn, although filters in std seem to return only bool. But I feel either validatefn or validate would fit better

@faern
Copy link
Author

faern commented Apr 10, 2017

Cool, then we agree about the mutating post-processing. I just suggested that because I got the impression you wanted to support that use case. No, I prefer only allowing validation and then error out on invalid settings instead.

I think a good start is to just support validation like how I proposed in the beginning. Personally I don't see any use case for and_then or map.

@colin-kiegel
Copy link
Owner

colin-kiegel commented Apr 10, 2017

Ok, then let's go with your original proposal. 👍

Note: It must be #[builder(validatefn="my_validate")] instead of #[builder(validatefn=my_validate)] or the compiler will reject this.

I will accept PRs. so if you like, you can work on this and contact me for any questions. These are the important files:

  • derive_builder_core/src/build_method.rs (<-- you can start here and add some unit tests)
  • derive_builder/src/options/* (<-- the attribute parsing and option propagation is a bit involved right now, I'd do that as a second step)
  • last but not least, please update changelog, readme and docs and add some integration tests. :-)

@faern
Copy link
Author

faern commented Apr 10, 2017

Awesome. Thanks for the hints! Will see when I get time to do this, hopefully at least start looking at it today or tomorrow. Will reach out to you if I face problems.

@TedDriggs
Copy link
Collaborator

@faern, wouldn't validations like this appear in Foo's new method or the equivalent, rather than in a separate builder method? Particularly for RAII cases, it seems strange to build the struct before validation occurs.

@faern
Copy link
Author

faern commented Apr 10, 2017

@TedDriggs Do you mean the new method as it is proposed in #69? There we would only know the values of the required fields, and the validation might need to compare optional fields. My intention is to be able to verify if the builder state is sane before it is allowed to ship a built struct to whoever calls build()

To do validation before constructing the struct you mean to send the &FooBuilder instead of &Foo to the validation method? That might make sense. Would be more efficient to not construct Foo indeed. The downside is that then the validation method must work with Option<FieldType> instead of FieldType directly, and it will not have access to the actual values for fields that has not been set and will receive their default values. But as a writer of a validation method you should know what the fields default to anyway, so it should still be possible to do what you need.

@colin-kiegel
Copy link
Owner

colin-kiegel commented Apr 10, 2017

I think validation belongs in the build method. It's however a tough question, whether this should be post- or pre-processing. @faern already pointed out some advantages of both variants. I can add these arguments, from the perspective of the initial proposal:

  • pro: post-processing would also detect erroneous default values - which is a very good thing
  • contra: at the same time that might confuse downstream users with weird errors about values they didn't set
  • (contra: it may feel a bit weird to write functions on a struct impl that neither take or return the struct. But since the validatenfn is an internal helper function it can be private and all is good.)

I'm leaning towards post-processing as initially suggested. Knowing that the final construction is valid is very valuable IMO.

@TedDriggs
Copy link
Collaborator

Knowing that the final construction is valid is very valuable IMO.

If this is the intention, I'd imagine that the exporting author should implement a new method on the target type, and then:

  1. Ask the builder to guarantee all fields are present or have default values.
  2. Feed the values to a method on the target type in a custom "build" step.

This keeps argument validation in the target type, enables someone to bypass the builder completely without bypassing validation, and allows the author to decide how to handle defaults; they can keep the source of truth on defaults in the builder, or they can only read user-specified values from the builder and dispatch to an override on the target type such as with_opt(...).

Caveat

There may be a philosophical question here: Are *Builder objects meant to be the primary means of instantiating a struct, or are they meant to be utilities to help with gathering the values together in code (possibly alongside other means of object creation such as serde deserialization)?

  1. If it's the former, then putting the validation into the builder makes sense, but the ability to add #[serde] attributes to the builder becomes more important; the user should deserialize wire input into a builder, then rely on the build method to perform validation.
  2. If it's the latter, then encouraging the user to put validation in a builder-specific path opens the door to other libraries not getting that validation, or forces users to write their validation multiple times.

Approach #1 has a lot of merit, but at the moment the suggestion is that deserializing into a builder isn't a priority, which pushed me (and probably others) towards option 2.

@faern
Copy link
Author

faern commented Apr 12, 2017

I started leaning towards validation before construction a few posts back actually. It does feel strange to construct an object that we don't even know if it's valid or not yet. I'm working on this and I will bring a draft of that to a show-and-tell PR later for discussion :)

I'm not particularly fond of doing stuff in the target type at all. It feels like it creates circular dependencies between the two structs. Take Command and Child from libstd as an example. There is no way to get a hold of a Child without first going through a Command.

Serializing builders sounds strange to me. The builder is not a data struct meant for moving around IMO, it's a factory that sits in one place and spits out targets. I had not even thought about the serializiation problem. I just imagined the builder to be the source of truth for target instances. I don't understand your point 2 really. If other libraries want to use your target type they should also import your builder. Where I currently use derive-builder there are no constructors and only hidden fields on my target types, so the only way to get hold of one is through the builder, and that is how I mean to export it to others as well.

I'm not sure what is best any longer. I only thought about builders as the one and only way to create target types and they contain all the logic, while the targets can rest assured they are always OK.

Idea: Treat builders as primary creators of targets and put all logic in builder. When people have many ways to create targets it's less of our responsibility so those people can ignore this custom validation features and they implement their validation directly on the target and they call it manually. They have to call that manually after serde deserialization anyway, so would be more symmetrical if they had to call that manually for all their means of creating that type.

I will continue coding, will be easier to discuss when we have examples at hand.

@colin-kiegel
Copy link
Owner

@TedDriggs Ok, keeping the validation function on the target type makes sense to me. This would be a minor change to the original proposal and it could be a private function

impl Foo {
    fn my_validate(&self) -> Result<(), String> {
        // .. as before
    }
}

But then it becomes much more important what the error type is, because it is supposed to be shared functionality. It would be nice, if the user can pick the error type as long as it is in certain boundaries. We could keep them very loose like E: Display + Sized. It would be best if we could put these requirements into some Validation trait with associated type then. The user would need to impl Validation for Foo and would get nicer errors if some boundary doesn't hold. It would also simplify the API to opt into validation, since the function name can be omitted. :-)

The only problem with this is that a procedural macro like derive_builder can't export any items. I read about a workaround somewhere but haven't tried it and remain very skeptical if that trick would work (it would require some wrapper crate to re export the custom derive, which I'm skeptical that it would work).

@colin-kiegel
Copy link
Owner

colin-kiegel commented Apr 12, 2017

@faern cool that your working on this. 👍

@faern
Copy link
Author

faern commented Apr 12, 2017

A reasonable limit to the error type is std::error::Error I feel :) That would pull in Display as well. Should encourage more of the ecosystem to follow the standard error trait.

@TedDriggs
Copy link
Collaborator

Serializing builders sounds strange to me.

@faern agreed, but I'm not talking about serializing builders. Here's the workflow I'm describing:

  1. Deserialize from a file or the wire into a FooBuilder
  2. [Optional] Apply local overrides to the deserialized struct
  3. Call build()
  4. Use the built and validated Foo
  5. [Later] Serialize Foo on to some other system

In order for this to work, I'd need to be able to derive serde::Deserialize on the builder, and add deserialization attributes to the builder fields.

I'm not particularly fond of doing stuff in the target type at all. It feels like it creates circular dependencies between the two structs. Take Command and Child from libstd as an example. There is no way to get a hold of a Child without first going through a Command.

This is a totally valid interpretation, though at the moment I can't get there because of the deserializing aspect described above. Hand implementing a separate deserialization container is too heavyweight to be interesting.

If other libraries want to use your target type they should also import your builder. Where I currently use derive-builder there are no constructors and only hidden fields on my target types, so the only way to get hold of one is through the builder, and that is how I mean to export it to others as well.

See #73; I'd prefer to only import the target type, and have that expose a static method to get a builder instance. This keeps the imports shorter, and will improve the refactoring experience that RLS is working on - the "rename type" refactor wouldn't be able to rename the derived builder when the user renames the base type, which would cause build breaks.

@faern
Copy link
Author

faern commented Apr 12, 2017

The following is in no way criticism towards anyone. It's just me thinking out loud about use cases in general :)

There surely are quite different use cases for builders I have come to realize. And it would of course be awesome if the crate would support as many of them as possible. It would be nice to find the core properties of most use cases and find out which ones can be supported and document them. Then all future issues and PRs can be mapped to them and it would be easy to say if the issue/PR would map nicely into an existing one, if the crate should support yet another use case or if it simply moves in the wrong direction and should not be considered. My main point being that if there are two clusters of use cases that are a bit too far away from each other they might be better solved by two different crates.

Not saying this problem exists with current issues/PRs, just trying to draw a mental model here. This might suit better in a separate issue titled "General design goals and supported use cases" or similar :)

@faern
Copy link
Author

faern commented Apr 12, 2017

Particularly for RAII cases, it seems strange to build the struct before validation occurs.

@TedDriggs What about your earlier argument that validation should occur before target struct creation? Or did I interpret that quote wrong? Feels like then we need to put the validation in the builder. Or should the validator on the target type take the builder as argument to validate you mean? Maybe you completely changed stance since that quote.

@faern
Copy link
Author

faern commented Apr 12, 2017

Any of you guys available on IRC or some other chat protocol? I'm on the Mozilla IRC server.

@TedDriggs
Copy link
Collaborator

@faern yep - I'm "teddriggs" on there too.

@TedDriggs
Copy link
Collaborator

I've written up a vision doc with an e2e example here.

@colin-kiegel
Copy link
Owner

I'm not very often on IRC at all, but my nick is colin-kiegel. I'd prefer if important conversations happen on github, so all arguments are in one place and other people can chime in. But that's not a must. When you see me in IRC, you can contact me of course. :-)

@colin-kiegel
Copy link
Owner

PS: I have to catch up with your latest arguments later - maybe over the weekend.

@TedDriggs
Copy link
Collaborator

@colin-kiegel totally reasonable; @faern had some questions that were difficult in this high-latency format. I've updated the gist, look forward to both feedback from both of you.

Would it be easier for this doc if I submit a dummy PR? That would let us add inline comments etc.

@colin-kiegel
Copy link
Owner

@TedDriggs sure we can try that

@colin-kiegel
Copy link
Owner

  • 👍 for std::error::Error.
  • 👍 for we should try to formulate the primary use cases and their core properties/requirements that we are targeting
  • I left some comments in the general vision proposal Vision proposal #79. TL;DR: In terms of general direction, I'd prioritize static validation of required fields and integrated validation of values higher than the kind of flexibility suggested in this proposal. But hopefully we can still work out some compromises. :-)
    But hopefully we can still work out some compromises. :-)
    Ok, back to this proposal:

Where to do validation

Option 1: post-processing on target type

  • validate function lives in impl Foo { /* ... */ }
    • can be shared with other ways to construct a Foo, e.g. deserialize, then validate
    • can be a private method, which does not appear in the public API of Foo
    • any error type that implements std::error::Error
  • validation is easy, since fields are not wrapped in Options
  • validation is final - nothing happening after validation means nothing can go wrong (e.g. defaulted values could be wrong without being noticed)

Variant 1b: ... via validation trait

  • validate function lives in impl Validation for Foo { /* ... */ }
    • can be shared
    • will be public
    • any error type that implements std::error::Error
  • validation is easy
  • validation is final
  • separation of concerns instead of implicit coupling (e.g. the principle Foo should not know any Details about FooBuilder would be slightly violated if the validation function lives in Foo - because its signature needs to know what FooBuilder expects)

Option 2: pre-processing on builder type

  • validate function lives in impl FooBuilder { /* ... */ }
  • separation of concerns
  • validation can produce better error messages on defaulted values (e.g. the validation knows about unspecified fields, which will take some default value that might be in some conflict with other specified values)

=> My impression is that we have much stronger arguments for option 1 / 1b. My current favourite is 1b, but it might be tricky to get a Validation trait into scope (a procedural macro like derive_builder can't export any items). We could start with option 1 and deprecate that in favour of 1b in the future, as soon as we find a good way to export items from a proc macro crate. IMO starting at 1 would put as closer to the final 'goal' 1b.

We can add a note to the documentation that pre-processing can be easily achieved by manually wrapping a private renamed build function #70.

Deserializing a builder

@TedDriggs Whether/How there will be a way to de/serialize a builder seems orthogonal to how the build method works (including validation) IMO. Let's discuss it in another ticket if possible (e.g. #43), or please try again to explain why this is related to validation. :-)

@colin-kiegel
Copy link
Owner

What do you think about my attempt to summarize the pros/cons of where to put the validation method?

PS: I've sent both of you my email address via IRC. In case you have questions about the code base or other chit-chat. ^^

@TedDriggs
Copy link
Collaborator

I didn't get the email address in IRC; my client doesn't seem to do well when my computer isn't online :/

Having validation run as a post-processing step on the target type feels wrong, because it means that for a period of time - however brief - an object of Foo existed which violated Foo's invariants; that shouldn't be possible. The RAII concern stems from this; if the building of the object has side-effects, then building one and tearing it down on subsequent validation may not be as invisible as the caller expected.

If validation lives on the target type but in a custom constructor, then the author doesn't need to take a dependency on the builder type, and has a clear place to add any side-effects that may occur during validation or construction.

@colin-kiegel
Copy link
Owner

Ok, side-effects are definitely an interesting aspect. :-) With the current derive_builder they can only happen when we create a defaulted value for some field (independently of whether we move that value into a new instance of Foo). So you're basically saying, we shouldn't compute any defaulted values before validation of the required ones, right?

But we can already run into similar situations, because right now default-fns can both have side-effects and/or fail (#[builder(default="return Err(_)")]). I.e. the first default-fn could have a side-effect and the second one could fail, aborting the construction of Foo. This is of course a bit theoretical, but already possible. To avoid this situation we would need to require that no default-fn may ever fail or panic. While we could enforce never-fail we cannot enforce never-panic, which still leaves an unsoundness loophole I guess. You can have the same situation, if you do this manually Foo { a: foo()?, b: bar()? }. This can also have side-effects on an aborted construction of Foo. I wonder if it's worth it worrying about then.

One thing I don't like about the custom constructor approach is field ordering (again ^^). Say if we have multiple required fields all of the same type, then changing the field order somewhere might lead to wrong assignments - without compiler warnings. E.g. Foo { a: usize, b: usize } but new(b: usize, a: usize) - Oops! ;-)

@TedDriggs
Copy link
Collaborator

One thing I don't like about the custom constructor approach is field ordering (again ^^). Say if we have multiple required fields all of the same type, then changing the field order somewhere might lead to wrong assignments - without compiler warnings. E.g. Foo { a: usize, b: usize } but new(b: usize, a: usize) - Oops! ;-)

100% agree; that's why in #79 I have the author implement the build function in order to override validation. That way they control the order of parameters read from the builder and passed to the constructor function, and can fully document that process. Having those getters return the Result type makes abandoning construction when finding a missing field much easier to write, but having the author write that by hand ensures it won't break due to a field order edit or field addition or something.

@colin-kiegel
Copy link
Owner

Hm. Would getters #7 return the default value, if the field is unset but defaulted? If yes, they could have side-effects. If no, then manual composition of the build function would seem complicated - and direct field access could be used as well instead of getters.

@faern
Copy link
Author

faern commented Apr 18, 2017

If you write the both the constructor and the build method by yourself, then what do you gain from having a builder at all? You get an extra struct with identical fields wrapped in options and setters for them. Feels like you lose most of the benefits of an automatic builder.

@TedDriggs
Copy link
Collaborator

The setters and optional fields are the biggest chunk of boilerplate to me; I'd happily write one constructor method which doesn't even have to expose all the Into<...> generics in exchange for having the boilerplate of the builder taken care of (and kept in sync with the struct). Writing the build method and constructor also provides a very natural way for "me" (the author) to surface structured errors in a manner that fits with "my" crate's error-handling strategy.

For cases where a struct literal works for initialization, I can see some value in letting the user pass a validation function, though I agree with @colin-kiegel's concerns about argument ordering and backwards compatibility. Another approach might be accepting a path to a function that takes &FooBuilder; that would avoid the argument ordering worries at the expense of making the validation depend on the caller first making a builder. Though that brings us back to the issue of deserializing into builders - otherwise how does the author make sure the validation was run?

@faern
Copy link
Author

faern commented Apr 19, 2017

But you can write the custom constructor and builder method with the current version of the crate, or did I miss something? Since we allow not generating the build method we already support this more manual use case.

But there are also use cases where it does not matter if the struct is initialized, validated and then just thrown away and similar. My current use case is just a data only struct, no methods/functionality. And all I need to validate is if two IpAddr fields are in the same address family. Performance, initialization order, side effects etc are of no concern, I just want to do it with minimal amount of code. Which makes option 1 from @colin-kiegel sound interesting.

@TedDriggs
Copy link
Collaborator

I don't think we do support build method suppression; I submitted a PR for it but I don't think it's been merged yet.

The parts I'm missing are:

  1. Ability to derive serde traits (PR is awaiting review)
  2. Ability to control derivations

With those two things, the pattern I'm describing comes to fruition.

I think the pattern you're describing could be done as part of the build_fn attribute. It would not be allowed if the user not opted to skip the automatic build method, and for hygiene reasons it would be a syn::Path to a function. I could get a PR out for this, though I'm still not clear what the signature should be.

TedDriggs added a commit to TedDriggs/rust-derive-builder that referenced this issue Apr 19, 2017
This adds a validation hook when the macro-generated build function is used. The hook borrows the builder and returns a Result<_, String> if any errors are discovered.
This was referenced Apr 19, 2017
TedDriggs added a commit to TedDriggs/rust-derive-builder that referenced this issue Apr 21, 2017
This adds a validation hook when the macro-generated build function is used. The hook borrows the builder and returns a Result<_, String> if any errors are discovered.
@colin-kiegel
Copy link
Owner

Just released with v0.4.6 - again many thanks to @TedDriggs. 🎉 :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants