Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate PHP classnames in di.xml files via schema #8743

Merged
merged 2 commits into from
Mar 11, 2017

Conversation

ktomk
Copy link
Contributor

@ktomk ktomk commented Mar 1, 2017

The preferenceType @for/@type attributes, the typeType @name attribute,
the virtualTypeType @type attribute and the pluginType @type attribute
contain class-names (FQCNs) which should not start with a leading
backslash (U+005C "") and should not contain other invalid character
sequences that would represent an invalid PHP class-name.

Previously this was possible and these errors got unnoticed within di.xml
configuration file validation.

The ObjectManager - a user of these configurations - handles this common
error in user input in part so far by removing any trailing slashes with
multiple calls like:

    $type = ltrim($type, '\\');

This change has been introduced in 33ebc24 and could be classified as a
workaround for a quite common mistake when specifying an FQCN that despite
the varying notations in the PHP manual due to the contextual resolution
rules (and different to a file-systems absolute path in POSIX) must not
start with a leading separator as type or class-name.

When using a string-value as class-name (e.g. class_exists() calls) the
class name is always an FQCN as namespacing in PHP is contextual within
source-code for identifiers only and not for strings.

So relative class-names (like those with a leading backslash) do not
apply for strings. This is the case in configuration files like di.xml
files. The di.xml files use the

urn:magento:framework:ObjectManager/etc/config.xsd

schema location which is resolved by UrnResolver (6379732) to

lib/internal/Magento/Framework/ObjectManager/etc/config.xsd

That schema did validate class-name attribute values only against the
simple type of being strings (xs:string). As a class-name in PHP is a
valid string also if starting with the backslash character (and other
invalid names, like digits in front), this commit extends the schema
to validate against the regular expression provided by the PHP manual [1]:

^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$

by adding a new simple-type called "phpFqcn" that qualifies the string-
base-type with the from PHP manual translated pattern:

[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*

extended for namespaced class-names:

([a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)(\\[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)*

The change ensures that the said attribute values are validated and
invalid class-names are recognized during schema based validation.

This change prevents that configured PHP-types can be autoloaded when
used w/o smudging (see the ltrim() operation). It has been documented [2]
that the leading backslash prevents correct file-resolution when auto-
loading with the Composer autoloader, a component actively used by the
Magento application.

This change adheres to existing PR #8638 and relates to issue #8635.

Refs:

@maghamed maghamed self-requested a review March 1, 2017 20:55
@maghamed maghamed self-assigned this Mar 1, 2017
@orlangur
Copy link
Contributor

orlangur commented Mar 1, 2017

@ktomk, the only thing I think could be done better is using #8638 implementation without changes. in such case there would be two different commits from different authors in your PR and they both would be marked Merged after your PR is merged.

Still fixable with force push ;)

@ktomk
Copy link
Contributor Author

ktomk commented Mar 1, 2017

@orlangur Just checked, it's possible, cherry-picking 0b243b8 applied cleanly and my check-script reports all cases fixed. I let the travis build finish then I can force-push with the cherry-pick. It should show the correct authorship then. I merely did the second commit to show that the schema changes first show red and then green.

The commit is auto-generated anyway based on the check script I hacked together yesterday.

@orlangur
Copy link
Contributor

orlangur commented Mar 1, 2017

@ktomk cherry-picking changes hash of commit thus older PR will most likely not be marked as merged.

Easiest way is to recreate branch from that PR and add your changes on top so that hash of another commit is preserved.

@ktomk
Copy link
Contributor Author

ktomk commented Mar 1, 2017

I can do that, too. The unit suite is still red on Travis, I see that one test failed but the output does not share which one. Is that normal? I will rebase now on top of the other change.

@ktomk ktomk force-pushed the feature/config-xsd-checks branch from dc940f6 to 81e3b67 Compare March 1, 2017 22:43
@orlangur
Copy link
Contributor

orlangur commented Mar 2, 2017

I see that one test failed but the output does not share which one. Is that normal?

Nope. It happens if there is a fatal error or some test closed STDOUT :) Will check locally now.

@orlangur
Copy link
Contributor

orlangur commented Mar 2, 2017

Here it is:

There was 1 failure:

1) Magento\Framework\ObjectManager\Test\Unit\Config\XsdTest::testSchemaCorrectlyIdentifiesValidXml
Element 'virtualType', attribute 'name': [facet 'pattern'] The value '' is not accepted by the pattern '([a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)(\\[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)*'.
Line: 10

Element 'virtualType', attribute 'name': '' is not a valid value of the atomic type 'phpFqcn'.
Line: 10

Element 'virtualType', attribute 'name': Warning: No precomputed value available, the value was either invalid or something strange happend.
Line: 10

Element 'virtualType', attribute 'type': [facet 'pattern'] The value '' is not accepted by the pattern '([a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)(\\[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)*'.
Line: 10

Element 'virtualType', attribute 'type': '' is not a valid value of the atomic type 'phpFqcn'.
Line: 10

Failed asserting that an array is empty.

Besides adopting test it would be nice to add all missing cases for phpFqcn if there are some not covered yet.

Here is the bad test: orlangur@bbe2944

@ktomk
Copy link
Contributor Author

ktomk commented Mar 2, 2017

Thanks for digging it up, I can reproduce locally now. Can you say if empty name and type is actually allowed for virtual type? From a quick look it seems to me the test has a rot XML fixture.

@ktomk
Copy link
Contributor Author

ktomk commented Mar 2, 2017

@orlangur: The type-name "phpFqcn" is a bit cryptic, I think we should change it to "phpClassname", that makes the error messages more readable.

I'll add a test for invalid classnames so that these are spotted precisely.

The preferenceType @for/@type attributes, the typeType @name attribute,
the virtualTypeType @type attribute and the pluginType @type attribute
contain class-names (FQCNs) which should not start with a leading
backslash (U+005C "\") and should not contain other invalid character
sequences that would represent an invalid PHP class-name.

Previously this was possible and these errors got unnoticed within di.xml
configuration file validation.

The ObjectManager - a user of these configurations - handles this common
error in user input in part so far by removing any trailing slashes with
multiple calls like:

        $type = ltrim($type, '\\');

This change has been introduced in 33ebc24 and could be classified as a
workaround for a quite common mistake when specifying an FQCN that despite
the varying notations in the PHP manual due to the contextual resolution
rules (and different to a file-systems absolute path in POSIX) must not
start with a leading separator as type or class-name.

When using a string-value as class-name (e.g. class_exists() calls) the
class name is always an FQCN as namespacing in PHP is contextual within
source-code for identifiers only and not for strings.

So relative class-names (like those with a leading backslash) do not
apply for strings. This is the case in configuration files like di.xml
files. The di.xml files use the

    urn:magento:framework:ObjectManager/etc/config.xsd

schema location which is resolved by UrnResolver (6379732) to

    lib/internal/Magento/Framework/ObjectManager/etc/config.xsd

That schema did validate class-name attribute values only against the
simple type of being strings (xs:string). As a class-name in PHP is a
valid string also if starting with the backslash character (and other
invalid names, like digits in front), this commit extends the schema
to validate against the regular expression provided by the PHP manual [1]:

    ^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$

by adding a new simple-type called "phpClassName" that qualifies the string-
base-type with the from PHP manual translated pattern:

    [a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*

extended for namespaced class-names:

    ([a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)(\\[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)*

The change ensures that the said attribute values are validated and
invalid class-names are recognized during schema based validation.

This change verifies that configured PHP-types can be autoloaded when
used w/o smudging (see the ltrim() operation). It has been documented [2]
that the leading backslash prevents correct file-resolution when auto-
loading with the Composer autoloader, a component actively used by the
Magento application.

This change adheres to existing PR magento#8638 and relates to issue magento#8635.

Additionally this commit adds tests for the new phpClassName xsd type with
config xml material.

Refs:

- magento#8635

- magento#8638

- [1] https://php.net/language.oop5.basic

- [2] http://magento.stackexchange.com/a/161010

- 33ebc24

- 6379732
@ktomk ktomk force-pushed the feature/config-xsd-checks branch from 81e3b67 to a207012 Compare March 2, 2017 07:20
@ktomk
Copy link
Contributor Author

ktomk commented Mar 2, 2017

I ammended the PR changing the xsd-type to phpClassName, fixed the fixture and added invalidation tests, please review.

@@ -7,7 +7,7 @@
-->
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:ObjectManager/etc/config.xsd">
<preference for="Some_For_Name" type="Some_Type_Name" />
<virtualType name="" type="" shared="true">
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my understanding that these attributes can not be empty. I would like someone else to confirm or decline my understanding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you right. these attributes can't be empty

@ktomk
Copy link
Contributor Author

ktomk commented Mar 2, 2017

@orlangur Please see the updated changes addressing what you raised. Also a bit clarification from a fresh pair of eyes on the valid_config.xml changes, I'm a bit short on time and can't pull a reference in docs or similar. As this changes the xsd I think it is worth to double check.

@orlangur
Copy link
Contributor

orlangur commented Mar 2, 2017

@ktomk, cool! Just as I like - simply two clean commits doing their job in mainline 😄

I did not immerse into the whole XSD thingy yet and there is no reason to do it now. @maghamed will do review better and with less efforts as he participated in validation against XSD initial implementation.

@tkn98
Copy link
Contributor

tkn98 commented Mar 2, 2017

@orlangur: me from the office, very fine, great to read. As you were involved already, the pair of fresh eyes wouldn't qualify for you doing the review anyway :). And the build is green again. very fine. :)

@maghamed maghamed added this to the March 2017 milestone Mar 2, 2017
Copy link
Contributor

@maghamed maghamed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktomk @ajpevers @orlangur terrific job, guys! Well done! Fantastic example of community distributed collaboration

@tkn98
Copy link
Contributor

tkn98 commented Mar 2, 2017

Not that it get lost: The ltrim() code can be removed then, see 33ebc24. If you provide feedback about that, I can file another commit later that is removing these ltrim() calls as not needed any longer.

@ktomk
Copy link
Contributor Author

ktomk commented Mar 2, 2017

Okay, I won't propose the change to remove (the two) ltrims in ObjectManager with this PR any longer as this is an backwards incompatible change in the public interface. Considering this b/c when the get or create methods are used, these are still prone to the same kind of error we have in the current di.xml files just by providing strings (e.g. in tests etc.). Maybe worth to put it to a test on a branch of it's own throwing IllegalTypeNameExceptions around to see what shakes up.

Copy link
Contributor Author

@ktomk ktomk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a mistake in translating the regex pattern regarding the range of supported characters and need some feedback of what is wanted regarding character support for class-names used with DI.

</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:pattern value="([a-zA-Z_&#x7f;-&#xff;][a-zA-Z0-9_&#x7f;-&#xff;]*)(\\[a-zA-Z_&#x7f;-&#xff;][a-zA-Z0-9_&#x7f;-&#xff;]*)*"/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maghamed I'm thinking about dropping the higher code-points b/c I think it would make sense to only allow PHP type-names that can be expressed in us-ascii (ending at x7f). PHP allows more, e.g. classes with asian characters if the files are utf-8 encoded. This XSD pattern is incorrect in that sense already (just came to my attention) as the pattern given in the PHP manual is for single-byte sequences whereas in the XSD the numeric character entities represent Unicode code-points. So question is, what should be supported. I'm not decided, but would tend to be more strict than less strict. This would leave less room for surprises in the long run.

@ktomk ktomk force-pushed the feature/config-xsd-checks branch from 8ba3ba9 to a207012 Compare March 3, 2017 02:51
@AntonEvers
Copy link
Contributor

@maghamed @ktomk @orlangur @tkn98 Thank you all. I've just read up on the discussion. If there is anything more I can do for this issue I'd be happy to. So far that's not needed am I right?

@ktomk
Copy link
Contributor Author

ktomk commented Mar 3, 2017

Good question. I started playing around with diverse locations where the type-names in strings are prefixed with backslashes. This is far from concrete results, but just to give an example, a change like ktomk/magento2@a207012...8ba3ba9 still is green. That is when class-names are used as string arguments, configured within DI.xml files.

@ktomk
Copy link
Contributor Author

ktomk commented Mar 3, 2017

@orlangur: I ran the "unit" testsuite locally like on Travis, but it exited with a fatal error at the end. Is there anything I need to know when running locally?

@orlangur
Copy link
Contributor

orlangur commented Mar 9, 2017

@ktomk sorry, I somehow opened the tab, then forgot about it and didn't answer. No, it should work just fine, please post your stack trace.

@okorshenko okorshenko self-assigned this Mar 10, 2017
@magento-team magento-team merged commit a207012 into magento:develop Mar 11, 2017
@okorshenko
Copy link
Contributor

@ktomk thank you for your contribution! Your pull request successfully merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants