-
Notifications
You must be signed in to change notification settings - Fork 50
[WIP] [ZF3] Refactor input filter component #111
Comments
I think validators should always be run before the filters. Four years ago, I had an issue with the If I take your example on phone numbers, the user may have intentionally put a space and expects to get it in this field. If you filter it before validating it, the It doesn't make sense to filter the value before its validation. If I take the example of a specific input type, such as dates, we should check if the value is really a date before converting it to the expected date format. If we filter it before, it makes no sense to validate it: if the value is converted to a date, it will be a date anyway. For these reasons I think the value should be validated before being filtered. The |
Dates and files are special cases. Most other form data often can have superfluous characters and/or require some form of casting before it can be tested for validity. As examples:
Even with dates, the input can vary based on the input type (is it a time field? A date field? A month?). It was for these reasons we introduced filters as a pre-validation workflow in zf1's form component, and why the practice persisted into this component in zf2. These normalization filters are primarily intended to strip unwanted characters prior to validation; if they cannot handle the input, they are expected to return the original verbatim. For things like dates, I would likely validate that the value is correctly formatted, and can be car to a However, removing pre-approval filtering/normalization is a non-starter, for the reasons explained above. |
There is no radio button validator. If you are talking about the
Same as radio button, however it uses the
I never used the
As I said before, whitespaces may have been accidentally introduced ... or not. Although they are often unwanted, we can't assume that this is always the case. The question is more "Are whitespaces before or after input allowed?" than "Were the whitespaces wanted?".
The validator may have an option to force strict comparison, and if it is not used, it must filter the value before processing using its dependencies. In PHP, number validators filter the value internally before proceeding with the validation (is_int, is_float, is_numeric, ...).
I don't understand what you mean. If we need a date, we ask the user to input a date. Once we know the date is valid, we convert it to the expected format.
Returning the original value when we can't filter it can lead to security issue if no validator is run after the filter. Of course, doing pre- and post-filtering will correct these issues, but in my opinion the pre-filtering process is unnecessary. |
This sentence is irrelevant, because zend-inputfilter does not force you to use a filter to remove the whitespace. It's your choice!
A simple example: a text field for the age of a person. Many values are possible:
With pre-filtering all values will pass the validator. Now the question: which filtering is user-friendly? 😉 The current problem is the "background magic" like the
This will remove the "background magic" and introduce many more possibilities for the InputFilter. 👍 |
@froschdesign |
Never trust any input! You do not know if the value comes from this form field. |
@froschdesign |
Filtering should always come before validation, otherwise, validation may pass pre-filtering, but fail post-filtering. |
@GeeH depends on use case. I had situations when I had to change the value after validation, but also had situations when I had to normalize input to validate. So to have both would be perfect. |
@GeeH When I code, I write: function($arg)
{
if(!is_int($arg)) {
throw new Exception('Integer expected.');
}
$arg = intval($arg);
// ...
} and not: function($arg)
{
$arg = intval($arg);
if(!is_int($arg)) {
throw new Exception('Integer expected.');
}
//...
} |
I think you need to consider that the framework is written for the 90% usecase and not for your specific uses. I agree, your examples are totally valid, but what you are talking about is type casting and not filtering here. Considering that all form data gets passed as a string, your example of using I'm not saying that there are no situations when you would need to validate before filtering, I'm saying that the 90% use case is to filter before validating, and that's what we'll have to support. With that said and after thinking more about it I think the ability to attach pre-filters for cleaning input, and post-filters for casting etc is a good idea and I'd like to see it implemented if there's no BC break. |
The milestone for this issue is 3.0.0 and so BC breaks are allowed. 😃 |
@GeeH |
@Seryus |
In your example, the validator won't validate the user input, it will validate the filter output. If the validation passes, the user won't be notified that its input was wrong and has been modified. For this reason, filtering before validation is a bad pratice. Of course, if the field expects to get the age of a person, whitespaces were probably not wanted. In this case, it would be better to configure the validator to allow whitespaces, and then filter the value. |
Without pre-filtering you will generate frustration on the user side. One whitespace at the beginning of an email address, a whitespace in the middle of a bank account number, empty lines at the end of a user message and so on.
You want really allow whitespace in an email validator? var_dump(filter_var('mail@example.com', FILTER_VALIDATE_EMAIL)); // 'mail@example.com'
var_dump(filter_var(' mail@example.com', FILTER_VALIDATE_EMAIL)); // false
var_dump(filter_var('mail@example.com ', FILTER_VALIDATE_EMAIL)); // false
var_dump(filter_var(' mail@example.com ', FILTER_VALIDATE_EMAIL)); // false |
I would prefer to use the HTML5 form input or JavaScript to avoid this frustration on the user side. About allowing whitespaces in email validator, I meant if the developer wants, it can do so by extending the Zend Validator and using dependency injection. We shouldn't allow whitespaces in the validator in general. However, I understand your point. So if we rename it and add post-filtering, should we create a new component? Or should we just use a new NormalizationChain in Input/InputCollection? |
You can not trust the client side, because you don't know them.
In my opinion a new component should not the first goal. I will quote myself:
Add your ideas here or provide a some code snippets. For example:
Please look also at the RFC for the new validator component. Some code can you find in this repository: https://github.com/weierophinney/zend-datavalidator Thanks for your contribution! 👍 |
@Seryus |
I'm looking at it, I didn't know this forum existed, thank you! |
This repository has been closed and moved to laminas/laminas-inputfilter; a new issue has been opened at laminas/laminas-inputfilter#5. |
This issue has been moved from the
zendframework
repository as part of the bug migration program as outlined here - http://framework.zend.com/blog/2016-04-11-issue-closures.htmlOriginal Issue: https://api.github.com/repos/zendframework/zendframework/issues/4772
User: @bakura10
Created On: 2013-07-02T12:52:46Z
Updated At: 2014-09-28T00:34:50Z
Body
Hi everyone,
This is a branch I'm working on on my free time, to refactor the input filter component for ZF 3. I mostly push this for people who want to see the code and help.
It is not yet usable because it relies on some refactoring I'm doing in both Validator and Filter components. It completely embraces the service manager, so now we should not have the VERY annoying problem of adding custom validators/filters, and not being able to use them using the short name because it uses plugin managers that were not fetched from SM.
Under the hood, the component is completely rewritten and now uses SPL RecursiveFilter. From my first experiments, this allow an around x5-8 performance improvements and a memory consumption divided by 13 for complex input filters (magic happen here: https://github.com/bakura10/zf2/blob/943abe025ed578c4068cf19df079b34726d449a3/library/Zend/InputFilter/BaseInputFilter.php#L274-L300)
What I've seen is that the input filter component got more and more bloated, and it is now slow as hell. Checking the isValid method as it is now makes it quite clear where the problem is.
Therefore, for ZF3, I suggest that the base input filter to stay as simple as possible, and people wanting to have custom behaviors to extend the class and override the isValid for their use-cases (as a more general rule, I'd like this for most components too).
To help this, it also uses internally SPL RecursiveFilter. Basically, it allows to have very powerful validation group rules. For instance, this branch features a ValidationGroupRegex filter, and usage would be as follow:
Those filters are really easy to write. They must extend AbstractValidationGroupFilter and implements the "accept" method (like this: https://github.com/bakura10/zf2/blob/943abe025ed578c4068cf19df079b34726d449a3/library/Zend/InputFilter/Filter/ValidationGroupRegexFilter.php)
What do you think ? :)
NOTE : not all features have been back ported yet.
Comment
User: @desem
Created On: 2013-07-02T13:33:40Z
Updated At: 2013-07-02T13:33:40Z
Body
And I like it :)
http://www.ivangospodinow.com/simple-form-validator-for-zend-framework-2-forms
Comment
User: @bakura10
Created On: 2013-07-02T13:38:22Z
Updated At: 2013-07-02T13:38:22Z
Body
Thanks :). I change a little bit the logic so now any recursive iterator filter can be specific, so you can use the RecursiveCallbackFilterIterator:
This will only validate fields except the one different than "toto".
Pretty powerful :D.
Comment
User: @desem
Created On: 2013-07-02T13:56:33Z
Updated At: 2013-07-02T13:56:33Z
Body
It's interesting. But I would start with a common forum for zend :)
Comment
User: @texdc
Created On: 2013-07-04T19:13:03Z
Updated At: 2013-07-04T19:13:03Z
Body
Please see my notes.
Comment
User: @bakura10
Created On: 2013-07-04T19:23:45Z
Updated At: 2013-07-04T19:23:45Z
Body
I answered :).
Comment
User: @bakura10
Created On: 2013-08-09T09:51:01Z
Updated At: 2013-08-09T09:51:01Z
Body
I tried to add more exciting stuff to this refactor. Now you can add a callback recursive filter to act as a validation group:
Comment
User: @bakura10
Created On: 2013-08-09T10:17:43Z
Updated At: 2013-08-09T10:17:43Z
Body
Based on ocramius feedback, I've renamed the InputFilter class to InputCollection, which may be clearer for people. What do you think?
The only problem is that people may be confused between the InputCollection and CollectionInputCollection (previously CollectionInputFilter). I'll try to have a better idea for this.
Comment
User: @texdc
Created On: 2013-08-09T15:06:32Z
Updated At: 2013-08-09T15:06:32Z
Body
I like the 'InputCollection' name! I've just started to use 'CollectionInputFilter', so I understand the issue. Why not just 'CollectionInput'?
On Aug 9, 2013, at 5:17 AM, Michaël Gallego notifications@github.com wrote:
Comment
User: @bakura10
Created On: 2013-08-10T08:21:41Z
Updated At: 2013-08-10T08:21:41Z
Body
@texdc , I'm not even sure the CollectionInputFilter is really needed. After all, the collection input filter does nothing more than applying the same input filter to a collection. Can't we simply run the same input filter against different set of data ? (if I managed to make the input filter stateless it's going to be even more true… but turning it stateless is really hard…)
Comment
User: @bakura10
Created On: 2013-08-10T08:39:19Z
Updated At: 2013-08-10T08:42:23Z
Body
By the way, here are some performance figures. I simply created a single input filter. Please note that my current implementation does a little more than the current one as each element are created using a factory (both input and input collection), so that now the global validator and filter plugin managers are always correctly injected (this was always hard in current implementation). Please note also that a major time during construction is instead spent on the service locator, therefore I've decided to create the element using new instead of using the plugin manager (after doing this test I realized how service manager is slow).
In order not to make the test wrong, no validator nor filters are attached to elements. This test just take into account the traversal time.
With 50 elements
ZF2 :
Input filter construction time: 0.0015020370483398 msec
Traversal time: 0.0062439441680908 msec
Memory consumption: 3 Mb
ZF3 :
Input filter construction time: 0.00066399574279785 msec
Traversal time: 0.0010840892791748 msec
Memory consumption: 2,75 Mb
With 500 elements
ZF2 :
Input filter construction time: 0.010568141937256 msec
Traversal time: 0.058469772338867 msec
Memory consumption: 6 Mb
ZF3 :
Input filter construction time: 0.0047299861907959 msec
Traversal time: 0.0099430084228516 msec
Memory consumption: 3,75 Mb
With 5000 elements
ZF2 :
Input filter construction time: 0.1059877872467 msec
Traversal time: 0.60562801361084 msec
Memory consumption: 37 Mb
ZF3 :
Input filter construction time: 0.048290014266968 msec
Traversal time: 0.10112690925598 msec
Memory consumption: 11,75 Mb
We can expect the difference to be even more when using nested input filters (which is quite common), but already, the implementation that uses RecursiveIteratorIterator, once most important features were back ported, traversal is around 6 times faster and construction 2 times faster, and consumes a lot less memory with a lot of elements.
Furthermore, all the validation group features, even with complex ones like Callback, come nearly for free with RecursiveIteratorIterator.
Comment
User: @bakura10
Created On: 2013-09-02T15:54:03Z
Updated At: 2013-09-02T15:54:03Z
Body
Hi everyone,
I've taken into account a lot of feedbacks and I've completely rewritten some parts of it.
The big news is that now input filter component is completely stateless. Both input and input collection. Because of this, I had to abandon the RecursiveIteratorIterator implementation that was not flexible enough. Now it uses IteratorIterator instead. As a consequence, perf improvements are a bit less drastic (especially in construction time).
How to use it
Now, input filter component is stateless. This means that there is no longer "isValid" method. Instead, you are expected to use the "validate" method:
Because validation result is completely decoupled, it can now be serialized or converted to JSON:
Validation result
One of the main goal of this refactor was, as I said it, to simplify it and remove a lot of crap because of everyone adding its edge cases.
Instead, I've written several methods that make it easy to extend default behaviour:
Also, the validation group feature got more powerful thanks to SPL filters. Iteration is not made using a simple foreach, but using an IteratorIterator:
SPL filter iterators are automatically executed for us and allow to accept or refuse a specific key.
By default, ZF attaches a "Zend\InputFilter\ValidationGroup\NoOpFilterIterator" filter that accepts all fields (it is the same as setting an empty array as validation group). But you can also attach more complex filter. For instance using an array:
Now, only the "foo" input of this input collection will be validated.
It comes with other filter:
This should open the way to very fine filtering, as an input collection can contain another input collection that contains its own validation group filter. In the future, I'll write the code to be able to specifcy an input filter through config:
Performance
I've made some performance test, once again without adding any validators/filters so that the test focus only on input filter:
ZF2:
5 elements,
3 input collections with 5 elements inside
Construction time : 0.00055599212646484 secondes
Traversal time : 0.0064549446105957 secondes
Memory: 3 Mb
50 elements,
10 input collections with 50 elements inside
Construction time : 0.0096118450164795 secondes
Traversal time : 0.11807513237 secondes
Memory: 6,5 Mb
100 elements,
25 input collections with 100 elements inside
Construction time : 0.033029079437256 secondes
Traversal time : 0.51178097724915 secondes
Memory: 19,75 Mb
ZF3
5 elements,
3 input collections with 5 elements inside
Construction time : 0.00034809112548828 secondes
Traversal time : 0.0011918544769287 secondes
Memory: 2,5 Mb
50 elements,
10 input collections with 50 elements inside
Construction time : 0.0085010528564453 secondes
Traversal time : 0.01695990562439 secondes
Memory: 2,75 Mb
100 elements,
25 input collections with 100 elements inside
Construction time : 0.031938076019287 secondes
Traversal time : 0.060458183288574 secondes
Memory: 4,25 Mb
Comment
User: @Ocramius
Created On: 2013-09-02T16:17:17Z
Updated At: 2013-09-02T16:17:17Z
Body
As discussed with @bakura10 on IRC, we should have the same interface for
Input
andInputFilter
. AnInputFilter
is just a normalInput
with anarray
orTraversable
value. Otherwise, I see no difference with theInput
API.Any pluralization of method names in the API shall be removed.
Comment
User: @bakura10
Created On: 2013-09-02T16:22:36Z
Updated At: 2013-09-02T16:24:21Z
Body
Done.
So now InputCollection extends Input, and have the same "validate" method. I didn't change it yet but Input should return a ValidationResult too.
The question now is that InputFilter will receive an array of ValidationResult, so we must find an efficient way to inject the result into the main ValidationResult.
As @Ocramius suggested, I've alos removed the fallback value capability. In fact this is a Filter job. So maybe we should create a FallbackValueFilter, but this sohuld not be handled in Input Filter component.
Comment
User: @bakura10
Created On: 2013-09-02T17:47:51Z
Updated At: 2013-09-02T17:48:47Z
Body
I have integrated several new things:
Currently, whenever an input collection is validated, a new ValidationResult is built. If the input collection contains no errors, then the values are filtered. This means that in the following hiearrchy:
if input 3, 4 and input filter 2 are valid, then the values are filtered EVEN THOUGH input 1 failed. The filter step is "on an input collection" basis.
This may not be the most efficient way because we need to do the recursion twice... If you have any better idea, please shout!
I also need people to evaluate this: https://github.com/bakura10/zf2/blob/d0cfc2e6643732b1e5e020f6154a6617e7e4d20d/library/Zend/InputFilter/InputCollection.php#L157
Comment
User: @bakura10
Created On: 2013-09-02T19:14:05Z
Updated At: 2013-09-02T19:14:05Z
Body
I've been using IteratorAggregate, which is around 30% faster than implement IteratorIterator. We are now even faster than when using RecursiveIteratorIterator :D. Some funny benchmark with HEAVY input filters (450 inputs, and 50 sub-input collections with 450 elements inside):
ZF2:
Construction time : 0.47534704208374 secondes
Traversal time : 7.0274820327759 secondes
Memory consumption: 154 Mb
ZF3:
Construction time : 0.48701810836792 secondes
Traversal time : 0.83196616172791 secondes
Memory consumption: 18 Mb
Comment
User: @Netiul
Created On: 2013-09-03T06:58:37Z
Updated At: 2013-09-03T06:58:37Z
Body
Impressive, I must say. Keep up the great work!
Comment
User: @bakura10
Created On: 2013-09-03T09:58:39Z
Updated At: 2013-09-03T09:58:39Z
Body
Thanks @Netiul .
I've just noticed something strange, I'd like your feedback. Since this time, I've always thought that the input filters does : validation using the raw data, then filtering. However, it seems that I was wrong: https://github.com/zendframework/zf2/blob/master/library/Zend/InputFilter/Input.php#L309
The filtering is done prior to the validation. Is this something expected? Is this normal behaviour and what we want? I'm always lost about this.
Comment
User: @bakura10
Created On: 2013-09-03T12:26:28Z
Updated At: 2013-09-03T12:26:28Z
Body
Another massive performance improvement for today (around 20%), because the whole input collection is now traversed only once instead of twice.
This is possible thanks to the previous remark about filtering data, some talk with @Ocramius, and the new stateless that allows interesting optimizations.
First, the ValidationResult has been renamed to InputFilterResult. This is because I'll try to make Validator component completely stateless too. It will return ValidationResult, so it was important to make the two different by the naming.
About the performance improvement it comes from the reason of my previous comment:
I realized that current implementation of Input, the "isValid" method filter data before validating it. In fact, it makes sense (if you have an input with a "Alnum" validator and a trimming Filter, you want " abc " to be valid). The thing is that in ZF2, values were filtered twice: once when the "isValid" method of the input filter is called, and another time you do "getValues". This is useless work.
Now, when the "validate" method for an input is called, as the value is filtered anyway, it's stored directly in the ValidationResult, and aggregated by the ValidationResult of the Input Collection. So calling "getData" from the input collection does not incur another recursion.
The second thing is that as I said before, InputCollection now can have their validators and filters (which shuld remove the need of the previous CollectionInputFilter class). I'm not sure about the workflow:
Let's assume the following input collection, called "coll", which contains two inputs "a", "b".
When calling validate, the following happens:
1.1 If invalid, register error messages.
1.2 If input collection is configured to break on failure, we immediately return an input filter result.
Does the order make sense to everyone?
Another question I have is: if an input has failed (not valid), should we insert the filtered value into the input filter result ?
Remark: because InputFilter is a stateless component now, I think we can safely switch the plugin manager to share instance by default. This should be a pretty memory consumption improvment for people using a lot of input filters.
Comment
User: @dphn
Created On: 2013-09-03T12:35:53Z
Updated At: 2013-09-03T12:35:53Z
Body
Tell me, please, why do not you rename InputFilter to InputSpecification? After all, it contains both filters and validators?
Comment
User: @bakura10
Created On: 2013-09-03T12:37:35Z
Updated At: 2013-09-03T12:37:35Z
Body
??? You're mixing everything again... InputSpecifciation does not exist. No classes have been renamed to InputSpecification.
InputFilter has been however renamed to InputCollection, because it represents more what it is (a collection of inputs).
Comment
User: @dphn
Created On: 2013-09-03T12:38:54Z
Updated At: 2013-09-03T12:38:54Z
Body
Thank you, I get it. So really better.
Comment
User: @weierophinney
Created On: 2013-09-03T16:10:05Z
Updated At: 2013-09-03T16:10:05Z
Body
@bakura10 :
Yes, this is normal behavior, and has been since ZF1. The idea is that you do normalization tasks, and then validate the normalized data (this is particularly useful for things like phone numbers and CC numbers, to allow stripping non-digit input such as spaces, dots, parens, etc). I've seen some requests in the past for a post-filter process as well, but the idea has never gained traction.
Comment
User: @bakura10
Created On: 2013-09-03T16:11:22Z
Updated At: 2013-09-03T16:11:22Z
Body
Ok thanks to confirm this then :).
Comment
User: @dphn
Created On: 2013-09-05T16:25:15Z
Updated At: 2013-09-05T16:25:15Z
Body
I wonder, in the third version will remain a mass func_get_arg () in the constructor of each validator?
Comment
User: @Ocramius
Created On: 2013-09-05T16:35:37Z
Updated At: 2013-09-05T16:35:37Z
Body
@dphn not sure if that's relevant to this PR - the main focus here is moving to a stateless interface first. Otherwise we'd need an
Options
object for every validator, which is probably a further step for a different PRComment
User: @dphn
Created On: 2013-09-05T16:42:16Z
Updated At: 2013-09-05T16:42:16Z
Body
I'm sorry, I just took the most active channel of communication :) Not anymore.
Comment
User: @dphn
Created On: 2013-09-05T22:00:49Z
Updated At: 2013-09-05T22:00:49Z
Body
It would be nice if it were true. I'm trying to add a filter in the multiupload form, but it is always executed after validators.
Comment
User: @bakura10
Created On: 2013-09-05T22:02:35Z
Updated At: 2013-09-05T22:02:35Z
Body
Yes, this is expected for file upload though and won't change. For file it makes sense to first validate and then filter (which include things like file upload).
Comment
User: @dphn
Created On: 2013-09-05T22:11:39Z
Updated At: 2013-09-05T22:11:39Z
Body
I think that uploading files is not a task to be performed by the filter. If the filters and validators can be added to Form as listeners, this problem would not exist. But what can I do in the current situation?
Comment
User: @bakura10
Created On: 2013-09-05T22:13:49Z
Updated At: 2013-09-05T22:13:49Z
Body
In fact, I was a bit vague. Filters does not "upload", they rename. Eventually, they could rename and move to something else, which would upload the file to another server. But if you think at it, "moving" or "renaming" can be the job of a filter.
The text was updated successfully, but these errors were encountered: