Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tokenizing Unicode escape sequence in string literal #826

Merged
merged 4 commits into from
Oct 9, 2020

Conversation

jevancc
Copy link
Contributor

@jevancc jevancc commented Oct 8, 2020

This Pull Request fixes/closes #808 .

It changes the following:

  • Add InnerIter::peek_iter() and store the peeked char in InnerIter instance instead of Cursor
  • Fix invalid Unicode code point in lexer test

The issue is caused by a bug in Cursor::fill_bytes. When Cursor::fill_bytes was called after Cursor::peek, the iter would be incremented to peek the next char but the peeked char would not be filled to the buffer. This PR introduces a new method InnerIter::peek_char and stores the peeked char in InnerIter so that it can fill the peeked char to input buffer when invoking InnerIter::fill_bytes correctly.

The test syntax::lexer::tests::codepoint_with_no_braces is updated in this PR. Since \uD83D is a surrogate code point, the test will panic when calling decode_utf16 and trying to decode it. This bug should be fixed in another issue/PR to handle the invalid code point.

@codecov
Copy link

codecov bot commented Oct 8, 2020

Codecov Report

Merging #826 into master will increase coverage by 0.02%.
The diff coverage is 81.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #826      +/-   ##
==========================================
+ Coverage   59.23%   59.26%   +0.02%     
==========================================
  Files         157      157              
  Lines       10034    10035       +1     
==========================================
+ Hits         5944     5947       +3     
+ Misses       4090     4088       -2     
Impacted Files Coverage Δ
boa/src/syntax/lexer/cursor.rs 62.71% <81.25%> (+2.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 470dbb4...a12e8dc. Read the comment docs.

@jevancc jevancc marked this pull request as ready for review October 8, 2020 19:19
Copy link
Member

@HalidOdat HalidOdat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! check my comment on how we might improve this :)

Comment on lines 297 to 299
let mut buf = [0u8; 4];
chr.encode_utf8(&mut buf);
Ok(Some(buf[0]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut buf = [0u8; 4];
chr.encode_utf8(&mut buf);
Ok(Some(buf[0]))
Ok(Some(chr as u8))

Couldn't we just cast to u8 since we check if its ascii, this should be faster than calling encode_utf8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! I've made this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, I can look into the invalid code point issue. Is there already an issue/PR for it?

Copy link
Member

@HalidOdat HalidOdat Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, I can look into the invalid code point issue. Is there already an issue/PR for it?

There is an issue #778, but not a PR to fix (also nobody is assigned if you would like to take it)

@HalidOdat HalidOdat added bug Something isn't working lexer Issues surrounding the lexer labels Oct 8, 2020
@HalidOdat HalidOdat added this to the v0.11.0 milestone Oct 8, 2020
Copy link
Member

@HalidOdat HalidOdat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! :)

Copy link
Member

@Razican Razican left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are no regressions in the benchmarks, so this looks pretty good! Thanks!

@Razican Razican merged commit 01dbf8b into boa-dev:master Oct 9, 2020
@jevancc jevancc deleted the fix/808_unicode_escape_str branch October 9, 2020 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lexer Issues surrounding the lexer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A string containing only a Unicode escape sequence causes an error
3 participants