-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow the schema to be an associative array #389
Conversation
Grrr.... didn't think this one through properly and screwed it up. Please don't merge or waste time with code reviews until I've fixed it (will be tomorrow sometime; it's bedtime here now). |
Okie-doke. Quick comment, though--I guess I was hoping to remove object casting altogether. Is that outside the realm of possibility? |
@shmax Do you mean get rid of it in your own code, or get rid of within the library code? |
Right, now that my idiotic mistake has been corrected.... @shmax, any chance you'd be willing to review this? Also, what do you mean by "hoping to remove object casting altogether" - can you expand on that a bit? |
For the record, my idiotic mistake was forgetting to consider that some arrays are supposed to be arrays, and should remain that way in order to process properly. |
* @return object | ||
*/ | ||
public static function arrayToObjectRecursive($array) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this any faster than just doing this?
return json_decode(json_encode($array), false);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't benchmarked it. Do you have an opinion on which is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think my version is faster, because you're letting all the work happen in C, and you don't have the overhead of the recursive PHP function calls. Time it for yourself though and see what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make a good point 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth considering the performance in this case. Perhaps a continuous means of measuring performance can be added in another PR to facilitate a continuous improvement and scrutiny?
Both. I'm wondering if there's any real reason to use an object for the schema internally, at all. For people like me who use native PHP arrays it adds overhead. People who use real JSON files have to do a conversion whether we use arrays or objects, so it makes no difference to them. There's probably something I'm not considering, so it would be nice to talk through this more. I imagine the original authors would have some insight. |
It's better performance-wise to use one or the other internally, rather than allowing both and dealing with it on-the-fly (e.g. via the That said, it doesn't particularly matter which one is chosen - the library uses objects at the moment, but could just as easily use arrays. I must admit I prefer object syntax, but that's a personal preference only. |
Sure, to most people it just looks a little sexier: $v = $foo->bar; vs $v = $foo['bar']; I guess I'd like to know exactly how much overhead the conversion to object typically takes before I start making too much fuss over this. I can do a little more research this weekend. |
My instinct would be that it's trivial, and not enough to matter. There's probably a small difference (I suspect the arrays are slightly faster for internal use, and yes, the object cast will incur a little overhead), but I'm not particularly keen to refactor the entire library to use arrays unless it's significant. If you're keen to benchmark it though, I'm curious - will wait to see what you come up with :-). |
Mmm, I like the object notation mainly because it's nicer to type, and eliminates a set of brackets ;-). |
Will try to get some numbers this weekend. |
I used a tool I found online to benchmark object casting vs not: with casting: https://3v4l.org/0V0iK/perf#output Nothing too shocking about the results--looks like you save less than a tenth of a second by not casting, so I'm not going to sweat it. |
I also did the analysis of @erayd's method: https://3v4l.org/Xe9Mh/perf#output It looks like yours has a faster "System time", and mine has a faster "User time". I have no idea what the difference is. |
@shmax Looks too close to call really - the difference isn't enough to tell them apart. Do you have a preference for which approach you'd like this PR to use going forward? |
If it were me and all other things being equal I'd go for the version that is less code, but that said I wish I knew what is meant by "user time" and "system time". |
The system time is the amount of time the process spends in system calls to the kernel. The user time is the amount of time the process spends in userspace. Re code amounts, they're about equal, assuming the return status of json_encode is properly handled (i.e. check One other consideration is that Thoughts? |
I don't think you need to worry about that. Presumably PHP will not produce syntax errors in its own JSON output.
I guess I'd have to see a real use case, but a JSON schema is supposed to be composed of JSON, isn't it? My vote is for |
It's not syntax errors I was concerned about - it's the potential to end up with an empty schema if encoding fails - hence why I said error checking of <?php
$fp = fopen('php://stdout', 'w');
$schema = [
'myProperty' => 'myValue',
'myResource' => $fp
];
$schema = json_decode(json_encode($schema));
var_dump($schema); // NULL A JSON schema is supposed to be JSON, yes. However, PHP arrays are not JSON, and may contain things which are not valid in the context of JSON (like resources), most likely because the user is generating the array on-the-fly using something else that may output resources. If we're going to cast them, we need to ensure that this is done without silently dropping important data on the floor - which means we have to have error checking.
OK - I will change it to use this.
I think you misunderstand - what we care about is the total time, regardless of where that time is spent. Using less user time at the expense of more kernel time doesn't mean it's faster, it just means that the execution is occurring in a different place. The correct metric to compare them on is SUM(user, system). By that metric, they're essentially identical. |
How's that? |
Looking good. Regarding system vs user time, I received a response from the maintainer of that benchmarking tool about it, and he says:
|
If the system time for our case is mostly PHP's own startup overhead, then that means the difference in system time is jitter in his benchmark platform, and probably means that the margin of error is larger than I had thought - running the tests in a loop with a high iteration count would provide better insight. Simply throwing the system time out is not a good idea - it's still a relevant part of the measurement. System time can be influenced by making fewer / less expensive system calls - however for our particular use-case, he's right that there's not much we can do about it. How much scope is available for changing that kind of thing depends very much on what you're actually trying to achieve. |
FFS, then benchmark it yourself. |
I have no interest in benchmarking it again. The benchmarks you've already run show that the two cases are close enough that IMO it's not worth fussing about which one is used from a performance perspective, so I'm happy with either on that score. My discussion of the benchmarking metrics used was purely because you mentioned that you did not understand them, so I figured it might be useful to explain some about how they actually work. You've mentioned a preference for |
Yes, and you were wrong. When I provided evidence instead of just moving on you began this elaborate explanation about how the tests can't be trusted, and I officially ran out of patience. Do what you like. |
I don't think I was wrong (and I don't think the quote you provided was wrong either - it certainly doesn't disagree with what I was saying) - however this seems to be turning a bit toxic, so let's just leave things where they are, agree to disagree, and move on. I was not trying to say the tests can't be trusted, and if I gave that impression then I apologise, and should probably have worded things a bit better than I did. |
You said:
Meanwhile, the author of the tool himself explains that system is concerned with PHP startup time, and can essentially be discounted. It is NOT the sum of the two values we are talking about, and it has nothing to do with work moving to the kernel in my version.
Er, yes, that is exactly what you did:
Sorry for being irritable, but I suggested |
Based on that, it sounds like you want to debate this further, rather than simply disagree and move on... I only started discussing the metrics in the first place because you expressed a desire to know what they meant! So I will try to clarify what I was trying to say earlier. Please bear in mind that I'm not trying for 'alternative facts' here - I'm just trying to clarify my position, as I feel you have badly misunderstood my intentions. I'm not saying that the situation is your fault - it could well be mine, and perhaps I could have worded some of my earlier replies more clearly. So:
For the record, I have never tried to claim that either approach was faster. I did claim that they are essentially identical in speed, given the numbers available, and provided additional explanation of the benchmark metrics (per above) to clarify why I believe that to be the case. I said I was happy to go with your preference, and implemented it. And I suggested a way of controlling for the jitter (looping) in case you wished to obtain a more accurate benchmark result, as it seems you care about micro-optimising this case (which I have no problem with; I also tend to micro-optimise when I have the time). It's worth bearing in mind here that we are arguing about a difference of single-digit milliseconds, based on a single run of only one test case, for a runs-only-once piece of code! This is not IMO worth anything like the amount of angst in this thread. As far as less code is concerned... it's objectively not. The only way you get less code is by ignoring error cases. Noting the result of an error here is fairly serious (i.e. feeding a That said, while I would prefer to do proper error checking, my preference is mild - as I said above, somebody feeding us a schema with resources in it is fairly unlikely. If you'd rather have no error checking for the sake of doing all this on a single line, then I'm happy to do that. |
No, I do not. code LGTM |
Then I'll leave the subject alone. 🤝
Thanks for reviewing :-). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
* Allow the schema to be an associative array Implements jsonrainbow#388. * Use json_decode(json_encode()) for array -> object cast * Skip exception check on PHP versions < 5.5.0 * Skip test on HHVM, as it's happy to encode resources
* Add URI translation for retrieval & add local copies of spec schema * Add use line for InvalidArgumentException & adjust scope (#372) Fixes issue #371 * add quiet option (#382) * add quiet option * use verbose instead of quiet * add quiet option * always output dump-schema * always output dump-schema-url * fix typo and ws * [BUGFIX] Add provided schema under a dummy / internal URI (fixes #376) (#378) * Add provided schema under a dummy / internal URI (fixes #376) In order to resolve internal $ref references within a user-provided schema, SchemaStorage needs to know about the schema. As user-supplied schemas do not have an associated URI, use a dummy / internal one instead. * Remove dangling use * Change URI to class constant on SchemaStorage * Add option to disable validation of "format" constraint (#383) * Add more unit tests (#366) * Add test coverage for coercion API * Complete test coverage for SchemaStorage * Add test coverage for ObjectIterator * Add exception test for JsonPointer * MabeEnum\Enum appears to use singletons - add testing const * Don't check this line for coverage mbstring is on all test platforms, so this line will never be reached. * Add test for TypeConstraint::validateTypeNameWording() * Add test for exception on TypeConstraint::validateType() * PHPunit doesn't like an explanation with its @codeCoverageIgnore... * Add various tests for UriRetriever * Add tests for FileGetContents * Add tests for JsonSchema\Uri\Retrievers\Curl * Add missing bad-syntax test file * Restrict ignore to the exception line only * Fix exception scope * Allow the schema to be an associative array (#389) * Allow the schema to be an associative array Implements #388. * Use json_decode(json_encode()) for array -> object cast * Skip exception check on PHP versions < 5.5.0 * Skip test on HHVM, as it's happy to encode resources * Enable FILTER_FLAG_EMAIL_UNICODE for email format if present (#398) * Don't throw exceptions until after checking anyOf / oneOf (#394) Fixes #393 * Fix infinite recursion on some schemas when setting defaults (#359) (#365) * Don't try to fetch files that don't exist Throws an exception when the ref can't be resolved to a useful file URI, rather than waiting for something further down the line to fail after the fact. * Refactor defaults code to use LooseTypeCheck where appropriate * Test for not treating non-containers like arrays * Update comments * Rename variable for clarity * Add CHECK_MODE_ONLY_REQUIRED_DEFAULTS If CHECK_MODE_ONLY_REQUIRED_DEFAULTS is set, then only apply defaults if they are marked as required. * Workaround for $this scope issue on PHP-5.3 * Fix infinite recursion via $ref when applying defaults * Add missing second test for array case * Add test for setting a default value for null * Also fix infinite recursion via $ref for array defaults * Move nested closure into separate method * $parentSchema will always be set when $name is, so don't check it * Handle nulls properly - fixes issue #377 * Add option to also validate the schema (#357) * Remove stale files from #357 (obviated by #362) (#400) * Stop #386 sneaking in alongside another PR backport
* Allow the schema to be an associative array Implements jsonrainbow#388. * Use json_decode(json_encode()) for array -> object cast * Skip exception check on PHP versions < 5.5.0 * Skip test on HHVM, as it's happy to encode resources
What
If the schema supplied to
Validator::validate
is an array, convert it to an object prior to using it for validation.Why
Because it was requested in #388, and will improve the user-friendliness of the library interface, particularly noting we already support the data to be validated being an associative array.