-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error processing cyrillic strings in Tokenizer #1462
Comments
I don't get any errors running PHPCS 3 over your sample code. The line reporting the error is actually muting any error output from I'm wondering if you have set an encoding that is incompatible with the content of your files. Possible set while using 2.x because the default was not utf-8, but version 3 uses utf-8 by default. Does any of that sound possible? It might also be worth running phpcs over the sample code and using the |
I am running phpcs as an inspection in PHPStorm. The exact error message PHPStorm is giving me is:
The file analyzed is encoded in UTF-8.
As far as i can tell PHPSTorm doesn't change the default encoding in phpcs settings, i could not find any means to pass configuration to phpcs when running it from PHPSTorm. These are the exact contents of test.php:
This is the output of the Processing ruleset /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/ruleset.xml Adding sniff files from /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs directory => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Classes/ClassDeclarationSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Commenting/ClassCommentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Commenting/FileCommentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Commenting/FunctionCommentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Commenting/InlineCommentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/ControlStructures/ControlSignatureSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/ControlStructures/MultiLineConditionSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Files/IncludingFileSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Formatting/MultiLineAssignmentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Functions/FunctionCallSignatureSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Functions/FunctionDeclarationSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/Functions/ValidDefaultValueSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/NamingConventions/ValidClassNameSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/NamingConventions/ValidFunctionNameSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/NamingConventions/ValidVariableNameSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/WhiteSpace/ObjectOperatorIndentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/WhiteSpace/ScopeClosingBraceSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/PEAR/Sniffs/WhiteSpace/ScopeIndentSniff.php Processing rule "Generic.Functions.FunctionCallArgumentSpacing" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/Functions/FunctionCallArgumentSpacingSniff.php Processing rule "Generic.NamingConventions.UpperCaseConstantName" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/NamingConventions/UpperCaseConstantNameSniff.php Processing rule "Generic.PHP.LowerCaseConstant" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/PHP/LowerCaseConstantSniff.php Processing rule "Generic.PHP.DisallowShortOpenTag" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/PHP/DisallowShortOpenTagSniff.php Processing rule "Generic.WhiteSpace.DisallowTabIndent" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/WhiteSpace/DisallowTabIndentSniff.php Processing rule "Generic.Commenting.DocComment" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/Commenting/DocCommentSniff.php Processing rule "Generic.Files.LineLength" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/Files/LineLengthSniff.php => property "lineLimit" set to "85" => property "absoluteLineLimit" set to "0" Processing rule "Generic.Files.LineEndings" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/Files/LineEndingsSniff.php => property "eolChar" set to "\n" Processing rule "Generic.Functions.FunctionCallArgumentSpacing.TooMuchSpaceAfterComma" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/Functions/FunctionCallArgumentSpacingSniff.php => severity set to 0 Processing rule "Generic.ControlStructures.InlineControlStructure" => /vagrant/vendor/squizlabs/php_codesniffer/src/Standards/Generic/Sniffs/ControlStructures/InlineControlStructureSniff.php => property "error" set to "false" => Ruleset processing complete; included 27 sniffs and excluded 0 *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => do Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_CURLY_BRACKET => { Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token [2]: T_WHILE => while Process token [3]: T_WHITESPACE => В· Process token 4 : T_OPEN_PARENTHESIS => ( Process token [5]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => ; Process token [2]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => while Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_PARENTHESIS => ( Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => for Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_PARENTHESIS => ( Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => if Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_PARENTHESIS => ( Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => foreach Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_PARENTHESIS => ( Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => } Process token [2]: T_WHITESPACE => В· Process token [3]: T_ELSE => else Process token [4]: T_WHITESPACE => В· Process token [5]: T_IF => if Process token [6]: T_WHITESPACE => В· Process token 7 : T_OPEN_PARENTHESIS => ( Process token [8]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => } Process token [2]: T_WHITESPACE => В· Process token [3]: T_ELSEIF => elseif Process token [4]: T_WHITESPACE => В· Process token 5 : T_OPEN_PARENTHESIS => ( Process token [6]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => } Process token [2]: T_WHITESPACE => В· Process token [3]: T_ELSE => else Process token [4]: T_WHITESPACE => В· Process token 5 : T_OPEN_CURLY_BRACKET => { Process token [6]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => do Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_CURLY_BRACKET => { Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** Creating file list... DONE (1 files in queue) Changing into directory /vagrant Processing test.php *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => $foo Process token [2]: T_WHITESPACE => В· Process token 3 : T_EQUAL => = Process token [4]: T_WHITESPACE => В· Process token [5]: T_CONSTANT_ENCAPSED_STRING => 'С‹' Process token 6 : T_SEMICOLON => ; Process token [7]: T_WHITESPACE => \r\n *** END PHP TOKENIZING *** *** START TOKEN MAP *** *** END TOKEN MAP *** *** START SCOPE MAP *** *** END SCOPE MAP *** *** START LEVEL MAP *** Process token 0 on line 1 [col:1;len:5;lvl:0;]: T_OPEN_TAG => $foo Process token 2 on line 2 [col:5;len:1;lvl:0;]: T_WHITESPACE => В· Process token 3 on line 2 [col:6;len:1;lvl:0;]: T_EQUAL => = Process token 4 on line 2 [col:7;len:1;lvl:0;]: T_WHITESPACE => В· Process token 5 on line 2 [col:8;len:3;lvl:0;]: T_CONSTANT_ENCAPSED_STRING => 'С‹' Process token 6 on line 2 [col:11;len:1;lvl:0;]: T_SEMICOLON => ; Process token 7 on line 2 [col:12;len:0;lvl:0;]: T_WHITESPACE => \r\n *** END LEVEL MAP *** *** START ADDITIONAL PHP PROCESSING *** *** END ADDITIONAL PHP PROCESSING *** [PHP => 8 tokens in 2 lines]... DONE in 3ms (2 errors, 0 warnings) FILE: /vagrant/test.php ---------------------------------------------------------------------------------- FOUND 2 ERRORS AFFECTING 2 LINES ---------------------------------------------------------------------------------- 1 | ERROR | [x] End of line character is invalid; expected "\n" but found "\r\n" 2 | ERROR | [ ] Missing file doc comment ---------------------------------------------------------------------------------- PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY ---------------------------------------------------------------------------------- Time: 213ms; Memory: 4Mb This is the output of the same command on the same file, but the cyrillic 'ы' is replaced by latin 'a'. The sniffs are all the same so i omitted that part. Processing test.php *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => $foo Process token [2]: T_WHITESPACE => В· Process token 3 : T_EQUAL => = Process token [4]: T_WHITESPACE => В· Process token [5]: T_CONSTANT_ENCAPSED_STRING => 'a' Process token 6 : T_SEMICOLON => ; Process token [7]: T_WHITESPACE => \r\n *** END PHP TOKENIZING *** *** START TOKEN MAP *** *** END TOKEN MAP *** *** START SCOPE MAP *** *** END SCOPE MAP *** *** START LEVEL MAP *** Process token 0 on line 1 [col:1;len:5;lvl:0;]: T_OPEN_TAG => $foo Process token 2 on line 2 [col:5;len:1;lvl:0;]: T_WHITESPACE => В· Process token 3 on line 2 [col:6;len:1;lvl:0;]: T_EQUAL => = Process token 4 on line 2 [col:7;len:1;lvl:0;]: T_WHITESPACE => В· Process token 5 on line 2 [col:8;len:3;lvl:0;]: T_CONSTANT_ENCAPSED_STRING => 'a' Process token 6 on line 2 [col:11;len:1;lvl:0;]: T_SEMICOLON => ; Process token 7 on line 2 [col:12;len:0;lvl:0;]: T_WHITESPACE => \r\n *** END LEVEL MAP *** *** START ADDITIONAL PHP PROCESSING *** *** END ADDITIONAL PHP PROCESSING *** [PHP => 8 tokens in 2 lines]... DONE in 4ms (2 errors, 0 warnings) FILE: /vagrant/test.php ---------------------------------------------------------------------------------- FOUND 2 ERRORS AFFECTING 2 LINES ---------------------------------------------------------------------------------- 1 | ERROR | [x] End of line character is invalid; expected "\n" but found "\r\n" 2 | ERROR | [ ] Missing file doc comment ---------------------------------------------------------------------------------- PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY ---------------------------------------------------------------------------------- Time: 220ms; Memory: 4Mb This is the output with phpcs 2.8.1., there are no errors in PHPSTorm with this version: Processing ruleset /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/ruleset.xml Adding sniff files from "/.../PEAR/Sniffs/" directory => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Classes/ClassDeclarationSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Commenting/ClassCommentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Commenting/FileCommentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Commenting/FunctionCommentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Commenting/InlineCommentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/ControlStructures/ControlSignatureSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/ControlStructures/MultiLineConditionSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Files/IncludingFileSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Formatting/MultiLineAssignmentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Functions/FunctionCallSignatureSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Functions/FunctionDeclarationSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/Functions/ValidDefaultValueSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/NamingConventions/ValidClassNameSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/NamingConventions/ValidFunctionNameSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/NamingConventions/ValidVariableNameSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/WhiteSpace/ObjectOperatorIndentSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/WhiteSpace/ScopeClosingBraceSniff.php => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/PEAR/Sniffs/WhiteSpace/ScopeIndentSniff.php Processing rule "Generic.Functions.FunctionCallArgumentSpacing" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/Functions/FunctionCallArgumentSpacingSniff.php Processing rule "Generic.NamingConventions.UpperCaseConstantName" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/NamingConventions/UpperCaseConstantNameSniff.php Processing rule "Generic.PHP.LowerCaseConstant" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/PHP/LowerCaseConstantSniff.php Processing rule "Generic.PHP.DisallowShortOpenTag" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/PHP/DisallowShortOpenTagSniff.php Processing rule "Generic.WhiteSpace.DisallowTabIndent" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/WhiteSpace/DisallowTabIndentSniff.php Processing rule "Generic.Commenting.DocComment" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/Commenting/DocCommentSniff.php Processing rule "Generic.Files.LineLength" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/Files/LineLengthSniff.php => property "lineLimit" set to "85" => property "absoluteLineLimit" set to "0" Processing rule "Generic.Files.LineEndings" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/Files/LineEndingsSniff.php => property "eolChar" set to "\n" Processing rule "Generic.Functions.FunctionCallArgumentSpacing.TooMuchSpaceAfterComma" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/Functions/FunctionCallArgumentSpacingSniff.php => severity set to 0 Processing rule "Generic.ControlStructures.InlineControlStructure" => /vagrant/vendor/squizlabs/php_codesniffer/CodeSniffer/Standards/Generic/Sniffs/ControlStructures/InlineControlStructureSniff.php => property "error" set to "false" => Ruleset processing complete; included 27 sniffs and excluded 0 *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => do Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_CURLY_BRACKET => { Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token [2]: T_WHILE => while Process token [3]: T_WHITESPACE => В· Process token 4 : T_OPEN_PARENTHESIS => ( Process token [5]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => ; Process token [2]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => while Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_PARENTHESIS => ( Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => for Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_PARENTHESIS => ( Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => if Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_PARENTHESIS => ( Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => foreach Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_PARENTHESIS => ( Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => } Process token [2]: T_WHITESPACE => В· Process token [3]: T_ELSE => else Process token [4]: T_WHITESPACE => В· Process token [5]: T_IF => if Process token [6]: T_WHITESPACE => В· Process token 7 : T_OPEN_PARENTHESIS => ( Process token [8]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => } Process token [2]: T_WHITESPACE => В· Process token [3]: T_ELSEIF => elseif Process token [4]: T_WHITESPACE => В· Process token 5 : T_OPEN_PARENTHESIS => ( Process token [6]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => В· Process token 2 : T_OPEN_CURLY_BRACKET => { Process token [3]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => } Process token [2]: T_WHITESPACE => В· Process token [3]: T_ELSE => else Process token [4]: T_WHITESPACE => В· Process token 5 : T_OPEN_CURLY_BRACKET => { Process token [6]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => do Process token [2]: T_WHITESPACE => В· Process token 3 : T_OPEN_CURLY_BRACKET => { Process token [4]: T_CLOSE_TAG => ?> *** END PHP TOKENIZING *** Creating file list... DONE (1 files in queue) Changing into directory /vagrant Processing test.php *** START PHP TOKENIZING *** Process token [0]: T_OPEN_TAG => $foo Process token [2]: T_WHITESPACE => В· Process token 3 : T_EQUAL => = Process token [4]: T_WHITESPACE => В· Process token [5]: T_CONSTANT_ENCAPSED_STRING => 'С‹' Process token 6 : T_SEMICOLON => ; Process token [7]: T_WHITESPACE => \r\n *** END PHP TOKENIZING *** *** START TOKEN MAP *** *** END TOKEN MAP *** *** START SCOPE MAP *** *** END SCOPE MAP *** *** START LEVEL MAP *** Process token 0 on line 1 [col:1;len:5;lvl:0;]: T_OPEN_TAG => $foo Process token 2 on line 2 [col:5;len:1;lvl:0;]: T_WHITESPACE => В· Process token 3 on line 2 [col:6;len:1;lvl:0;]: T_EQUAL => = Process token 4 on line 2 [col:7;len:1;lvl:0;]: T_WHITESPACE => В· Process token 5 on line 2 [col:8;len:4;lvl:0;]: T_CONSTANT_ENCAPSED_STRING => 'С‹' Process token 6 on line 2 [col:12;len:1;lvl:0;]: T_SEMICOLON => ; Process token 7 on line 2 [col:13;len:0;lvl:0;]: T_WHITESPACE => \r\n *** END LEVEL MAP *** *** START ADDITIONAL PHP PROCESSING *** *** END ADDITIONAL PHP PROCESSING *** [PHP => 8 tokens in 2 lines]... DONE in 10ms (2 errors, 0 warnings) FILE: /vagrant/test.php ---------------------------------------------------------------------- FOUND 2 ERRORS AFFECTING 2 LINES ---------------------------------------------------------------------- 1 | ERROR | [x] End of line character is invalid; expected "\n" but | | found "\r\n" 2 | ERROR | [ ] Missing file doc comment ---------------------------------------------------------------------- PHPCBF CAN FIX THE 1 MARKED SNIFF VIOLATIONS AUTOMATICALLY ---------------------------------------------------------------------- Time: 162ms; Memory: 4Mb |
So you don't see any PHP errors when using PHPCS on the command line? But you also don't see the content correctly in the output? (I do see the content correctly in my debug output) Can you try running It would also be good if you could paste the output of |
Yes, there are no errors when running from command line. I tried running with the When running with the |
I'm not really sure what is going on then. I didn't make any big changes to that code in 3.0 except for changing the default encoding to utf-8. If you'd like to do some debugging, the easiest thing to do is drop in an echo before line 193 in Tokenizer.php. The code in that area looks like this: // There are no tabs in this content, or we aren't replacing them.
if ($checkEncoding === true) {
// Not using the default encoding, so take a bit more care.
$length = @iconv_strlen($this->tokens[$i]['content'], $this->config->encoding);
if ($length === false) {
// String contained invalid characters, so revert to default.
$length = strlen($this->tokens[$i]['content']);
}
} else {
$length = strlen($this->tokens[$i]['content']);
} Line 193 is that call to // There are no tabs in this content, or we aren't replacing them.
if ($checkEncoding === true) {
// Not using the default encoding, so take a bit more care.
echo 'Trying to get length for "'.$this->tokens[$i]['content'].'" using encoding "'.$this->config->encoding.'"'.PHP_EOL;
$length = @iconv_strlen($this->tokens[$i]['content'], $this->config->encoding);
if ($length === false) {
// String contained invalid characters, so revert to default.
$length = strlen($this->tokens[$i]['content']);
}
} else {
$length = strlen($this->tokens[$i]['content']);
} Running PHPCS over a file should then give you output like this:
If nothing else, it would hopefully show us where the error is, although you might need to run it via PHPStorm and hope it shows all output. |
I am having the same issue on PHPStorm on 3.0.1. Since I didn't have much time to investigate, I'm just gonna give some heads up on what I found. This is happening not only for "cyrillic" characters, but many others, and the issue seems to be on the way "iconv/iconv_strlen" works. Run the following sample code and you'll see the issue happening regardless:
Source: http://php.net/manual/en/function.iconv-strlen.php#62320 Result:
In my case, I get the same error as reported "iconv_strlen(): Detected an illegal character in input string in /vendor/squizlabs/php_codesniffer/src/Tokenizers/Tokenizer.php on line 193". Sorry for not having time to properly write a test case, nor create a PR to fix this, but I did a test simply using "mb_strlen" instead of "iconv_strlen" on squizlabs/php_codesniffer/src/Tokenizers/Tokenizer.php:193 and seemed to work fine. I know this is not the actual fix, but at least this may lead to a solution. Here are some other references just in case:
Also I strongly suggest to stop suppressing with "@" since it makes harder to identify the issue. I hope this helps. |
It's also good to mention that when this issue happens, the code sniffer does not continue to check the rest of the file, so it just shows a warning on <?php tag then nothing else is evaluated, at least on PHPStorm. |
I am having the same issue on PHPStorm on 3.0.1. =( |
Exactly the same issue |
Added this to the 3.0.1 milestone, but still can't replicate it, so it's more just to revisit it when I am working on that version. If anyone is able to replicate while passing the correct encoding to PHPCS, please let me know what content is causing the error. If you are able to add the debug code I provided above, that would be very helpful as well. |
Using CLI this is not visible. Here's the test:
Again, as I've mentioned on my previous comment, while using mb_strlen this does not happen and seems to not break other stuff. |
This problem also occurs in 3.0.2 across many files, even if --encoding=utf-8 is passed. Since mb_strlen() seems to fix the problem, be great if this fix could be rolled in. In my case, I have several 100s of files where phpcs just dies with the iconv message. Trying to find the one offending character to fix... to... get past the code bailing would be a monumental task. |
Making the mb_strlen change suggested by @mourawaldson works like a charm. Be great if this minor code change (4 lines of code) could be rolled into the Tokenizer.php file + a new version of PHP_CodeSniffer released. Also, please do remove the "@" shutup operator as @mourawaldson suggested also. Thanks for your consideration of this fix. |
PR #1611 will fix this issue. I have not found any additional issues with mb_strlen(). |
I've been looking into this more and I think that using mb_strlen doesn't really fix anything. Yes, it will run without error if there is an encoding mismatch, but it still wont produce the correct length in this case. You may as well just call strlen and use what it has because both values will be wrong. The reason the So my current thinking is that I'll just change the error reporting settings while calling iconv_strlen so that it reverts back to the previous behaviour where the value would ultimately come from strlen but no error would be shown. I still have no idea why PHP Storm is causing this issue while a CLI run is not. It feels like the content is either being saved in the wrong encoding or the wrong encoding is being passed to PHPCS (or no encoding is being passed). It's quite likely this has always failed in PHP Storm but the error was just being suppressed properly by the use of |
The error from iconv_strlen when a string contains invalid chars (based on the encoding) was no longer being muted due to the new error handler in the Runner class. This commit replaces the mute operator with an error_reporting change to properly mute that error again and allow files to be checked even with mixed encoding.
I've pushed the change I described in the previous comment. This should restore the previous behaviour from version 2.x where the iconv_strlen error is muted. I'll leave this in feedback for a little while in case anyone has some time to test it. I've mentioned this in a few places, but not here, so: I'm not going to switch to using mb_strlen because that would mean a serious BC break for PHPCS due to new requirements. I would only consider a change like that in a major version (version 4) and only if it performed significantly better as iconv is a default extension and mb is not. |
Changing iconv_strlen to mb_strlen fixes all these errors for me. I've take to manually patching Tokenizer.php every time it updates. This is the best way to fix 100s + sometimes 1000s of phpcs bailouts, where processing stops + no reports are generated. You can't really just mute the iconv_strlen related problems, because whenever one of these problems is hit, phpcs processing stops + errors out. Wrapping iconv_strlen in an eval + ignoring exceptions raised will likely work. |
Hi @gsherwood, I understand your concern, but when you say "but it still wont produce the correct length in this case" is because of what? Do you have an sample code that gives different results? I'll try to take sometime to test this out. |
If the encoding you specify doesn't match the encoding of the string, you wont get a correct count. The iconv extension handles this by throwing an error. The mbstring extension handles this without errors (I can't remember if it ignores chars, or counts bytes, or both) but it can't actually give the correct result. Here is some sample code with a string I was using for testing this: <?php
$str = 'А а, Б б, В в';
echo 'strlen: '.strlen($str).PHP_EOL;
echo 'mb_strlen(utf-8): '.mb_strlen($str, 'utf-8').PHP_EOL;
echo 'mb_strlen(utf-16): '.mb_strlen($str, 'utf-16').PHP_EOL;
echo 'mb_strlen(windows-1252): '.mb_strlen($str, 'windows-1252').PHP_EOL;
echo 'iconv_strlen(utf-8): '.iconv_strlen($str, 'utf-8').PHP_EOL;
echo 'iconv_strlen(utf-16): '.iconv_strlen($str, 'utf-16').PHP_EOL;
echo 'iconv_strlen(windows-1252): '.iconv_strlen($str, 'windows-1252').PHP_EOL; Which outputs:
The correct string length is 13 characters, which is fine when you've passed in UTF-8. If I just subbed in mb_strlen for iconv_strlen, I'd still be running with incorrect values if you have passed in the incorrect encoding. That's obviously what's happening here with PHPStorm because the CLI is working fine but iconv is still finding invalid chars when run via PHPStorm. The passes encoding must be wrong. |
It works because mb_strlen wont produce errors. It is likely not producing the correct result though, unless you've configured it to always use the one encoding that you use, which PHPCS obviously can't do.
You might be better off testing the fix I committed first as this restores the PHPCS 2.x behaviour, which presumably worked for you.
You should read my previous comment about why it was muted, why muting broke in version 3, and how I was going to fix it. |
Thanks for sharing your test @gsherwood! I wasn't aware that other encoding than UTF-8 was giving different results. Anyway, I gave a try here with your change and is working fine on PHPStorm. Although I quickly looked into some benchmarks comparing iconv_strlen vs mb_strlen and mb performs way better, but as you said seems is not a simple change, the impact is big actually. For me this is considered fixed by your changes. Thanks again. |
If I pull a git copy of the project right now, let me know if your fix will be available in the pull. |
It's been committed, so it will be there if you pull master. |
Thanks! |
What versions are affected?
This bug appeared when i switched from 2.8.1 to 3.0.0.
PHP version 7.1.4.
What causes the bug?
It happens when the analyzed source code contains cyrillic strings.
Example code causing the bug
<?php $arr = [ 'ы' => 1 ];
<?php const FOO = 'ы';
The error message is:
iconv_strlen(): Detected an illegal character in input string in /vendor/squizlabs/php_codesniffer/src/Tokenizers/Tokenizer.php on line 193
The text was updated successfully, but these errors were encountered: