Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid character handling #74

Merged
merged 2 commits into from
Aug 4, 2023
Merged

Conversation

arthurscchan
Copy link
Contributor

@arthurscchan arthurscchan commented Aug 3, 2023

This fixes an unwrapped ArrayIndexOutOfBoundException in src/org/joni/ScannerSupport.

In the Lexar class, it uses the Encoding.length() method to retrieve the encoding characters length. According to https://github.com/jruby/jcodings/blob/master/src/org/jcodings/specific/UTF8Encoding.java, the length() method could return CHAR_INVALID (equals to -1) if the given character is invalid. The logic in Lexer does not check this return value and use it directly to change the index p. This could result in negative p value and affect future code which use p as array index like the code in https://github.com/jruby/joni/blob/master/src/org/joni/ScannerSupport.java#L136. As a result, an unexpected ArrayIndexOutOfBoundException is thrown.

This PR fixes the bug by adding a conditional check for the return result of the Encoding.length() method before using it. An IllegalArgumentException is thrown to indicate there is invalid character in the provided regex which is more "informative" for the user instead of the general ArrayIndexOutOfBoundException.

We found this bug using fuzzing by way of OSS-Fuzz, where we recently integrated joni (google/oss-fuzz#10680). OSS-Fuzz is a free service run by Google for fuzzing important open source software. If you'd like to know more about this then I'm happy to go into detail and also set up things so you can receive emails and detailed reports when bugs are found.

Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
@enebo
Copy link
Member

enebo commented Aug 4, 2023

good find. I think you are missing an import for Encoding in jcodings?

Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
@arthurscchan
Copy link
Contributor Author

Thanks for your reply. I have fixed the import statement, sorry for missing that.

@enebo enebo merged commit 4fdf926 into jruby:master Aug 4, 2023
1 check passed
@enebo
Copy link
Member

enebo commented Aug 4, 2023

@arthurscchan np. Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants