- 
                Notifications
    You must be signed in to change notification settings 
- Fork 9.4k
Validate PHP classnames in di.xml files via schema #8743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate PHP classnames in di.xml files via schema #8743
Conversation
| @orlangur Just checked, it's possible, cherry-picking 0b243b8 applied cleanly and my check-script reports all cases fixed. I let the travis build finish then I can force-push with the cherry-pick. It should show the correct authorship then. I merely did the second commit to show that the schema changes first show red and then green. The commit is auto-generated anyway based on the check script I hacked together yesterday. | 
| @ktomk cherry-picking changes hash of commit thus older PR will most likely not be marked as merged. Easiest way is to recreate branch from that PR and add your changes on top so that hash of another commit is preserved. | 
| I can do that, too. The unit suite is still red on Travis, I see that one test failed but the output does not share which one. Is that normal? I will rebase now on top of the other change. | 
dc940f6    to
    81e3b67      
    Compare
  
    | 
 Nope. It happens if there is a fatal error or some test closed STDOUT :) Will check locally now. | 
| Here it is: Besides adopting test it would be nice to add all missing cases for  Here is the bad test: orlangur@bbe2944 | 
| Thanks for digging it up, I can reproduce locally now. Can you say if empty name and type is actually allowed for virtual type? From a quick look it seems to me the test has a rot XML fixture. | 
| @orlangur: The type-name " I'll add a test for invalid classnames so that these are spotted precisely. | 
The preferenceType @for/@type attributes, the typeType @name attribute, the virtualTypeType @type attribute and the pluginType @type attribute contain class-names (FQCNs) which should not start with a leading backslash (U+005C "\") and should not contain other invalid character sequences that would represent an invalid PHP class-name. Previously this was possible and these errors got unnoticed within di.xml configuration file validation. The ObjectManager - a user of these configurations - handles this common error in user input in part so far by removing any trailing slashes with multiple calls like: $type = ltrim($type, '\\'); This change has been introduced in 33ebc24 and could be classified as a workaround for a quite common mistake when specifying an FQCN that despite the varying notations in the PHP manual due to the contextual resolution rules (and different to a file-systems absolute path in POSIX) must not start with a leading separator as type or class-name. When using a string-value as class-name (e.g. class_exists() calls) the class name is always an FQCN as namespacing in PHP is contextual within source-code for identifiers only and not for strings. So relative class-names (like those with a leading backslash) do not apply for strings. This is the case in configuration files like di.xml files. The di.xml files use the urn:magento:framework:ObjectManager/etc/config.xsd schema location which is resolved by UrnResolver (6379732) to lib/internal/Magento/Framework/ObjectManager/etc/config.xsd That schema did validate class-name attribute values only against the simple type of being strings (xs:string). As a class-name in PHP is a valid string also if starting with the backslash character (and other invalid names, like digits in front), this commit extends the schema to validate against the regular expression provided by the PHP manual [1]: ^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$ by adding a new simple-type called "phpClassName" that qualifies the string- base-type with the from PHP manual translated pattern: [a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]* extended for namespaced class-names: ([a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)(\\[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)* The change ensures that the said attribute values are validated and invalid class-names are recognized during schema based validation. This change verifies that configured PHP-types can be autoloaded when used w/o smudging (see the ltrim() operation). It has been documented [2] that the leading backslash prevents correct file-resolution when auto- loading with the Composer autoloader, a component actively used by the Magento application. This change adheres to existing PR magento#8638 and relates to issue magento#8635. Additionally this commit adds tests for the new phpClassName xsd type with config xml material. Refs: - magento#8635 - magento#8638 - [1] https://php.net/language.oop5.basic - [2] http://magento.stackexchange.com/a/161010 - 33ebc24 - 6379732
81e3b67    to
    a207012      
    Compare
  
    | I ammended the PR changing the xsd-type to phpClassName, fixed the fixture and added invalidation tests, please review. | 
| --> | ||
| <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:ObjectManager/etc/config.xsd"> | ||
| <preference for="Some_For_Name" type="Some_Type_Name" /> | ||
| <virtualType name="" type="" shared="true"> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is my understanding that these attributes can not be empty. I would like someone else to confirm or decline my understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you right. these attributes can't be empty
| @orlangur Please see the updated changes addressing what you raised. Also a bit clarification from a fresh pair of eyes on the valid_config.xml changes, I'm a bit short on time and can't pull a reference in docs or similar. As this changes the xsd I think it is worth to double check. | 
| @orlangur: me from the office, very fine, great to read. As you were involved already, the pair of fresh eyes wouldn't qualify for you doing the review anyway :). And the build is green again. very fine. :) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Not that it get lost: The  | 
| Okay, I won't propose the change to remove (the two)  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake in translating the regex pattern regarding the range of supported characters and need some feedback of what is wanted regarding character support for class-names used with DI.
| </xs:documentation> | ||
| </xs:annotation> | ||
| <xs:restriction base="xs:string"> | ||
| <xs:pattern value="([a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)(\\[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]*)*"/> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maghamed I'm thinking about dropping the higher code-points b/c I think it would make sense to only allow PHP type-names that can be expressed in us-ascii (ending at x7f). PHP allows more, e.g. classes with asian characters if the files are utf-8 encoded. This XSD pattern is incorrect in that sense already (just came to my attention) as the pattern given in the PHP manual is for single-byte sequences whereas in the XSD the numeric character entities represent Unicode code-points. So question is, what should be supported. I'm not decided, but would tend to be more strict than less strict. This would leave less room for surprises in the long run.
8ba3ba9    to
    a207012      
    Compare
  
    | Good question. I started playing around with diverse locations where the type-names in strings are prefixed with backslashes. This is far from concrete results, but just to give an example, a change like ktomk/magento2@a207012...8ba3ba9 still is green. That is when class-names are used as string arguments, configured within DI.xml files. | 
| @orlangur: I ran the "unit" testsuite locally like on Travis, but it exited with a fatal error at the end. Is there anything I need to know when running locally? | 
| @ktomk sorry, I somehow opened the tab, then forgot about it and didn't answer. No, it should work just fine, please post your stack trace. | 
| @ktomk thank you for your contribution! Your pull request successfully merged! | 
The preferenceType @for/@type attributes, the typeType @name attribute,
the virtualTypeType @type attribute and the pluginType @type attribute
contain class-names (FQCNs) which should not start with a leading
backslash (U+005C "") and should not contain other invalid character
sequences that would represent an invalid PHP class-name.
Previously this was possible and these errors got unnoticed within di.xml
configuration file validation.
The ObjectManager - a user of these configurations - handles this common
error in user input in part so far by removing any trailing slashes with
multiple calls like:
This change has been introduced in 33ebc24 and could be classified as a
workaround for a quite common mistake when specifying an FQCN that despite
the varying notations in the PHP manual due to the contextual resolution
rules (and different to a file-systems absolute path in POSIX) must not
start with a leading separator as type or class-name.
When using a string-value as class-name (e.g. class_exists() calls) the
class name is always an FQCN as namespacing in PHP is contextual within
source-code for identifiers only and not for strings.
So relative class-names (like those with a leading backslash) do not
apply for strings. This is the case in configuration files like di.xml
files. The di.xml files use the
schema location which is resolved by UrnResolver (6379732) to
That schema did validate class-name attribute values only against the
simple type of being strings (xs:string). As a class-name in PHP is a
valid string also if starting with the backslash character (and other
invalid names, like digits in front), this commit extends the schema
to validate against the regular expression provided by the PHP manual [1]:
by adding a new simple-type called "phpFqcn" that qualifies the string-
base-type with the from PHP manual translated pattern:
extended for namespaced class-names:
The change ensures that the said attribute values are validated and
invalid class-names are recognized during schema based validation.
This change prevents that configured PHP-types can be autoloaded when
used w/o smudging (see the ltrim() operation). It has been documented [2]
that the leading backslash prevents correct file-resolution when auto-
loading with the Composer autoloader, a component actively used by the
Magento application.
This change adheres to existing PR #8638 and relates to issue #8635.
Refs:
Incompatibility with composer ~1.3.0 and setup:di:compile #8635
Make Magento compatible with composer ~1.3.0 #8638
[1] https://php.net/language.oop5.basic
[2] http://magento.stackexchange.com/a/161010
33ebc24
6379732