-
-
Notifications
You must be signed in to change notification settings - Fork 715
fix(linter/plugins): handle utf16 characters within comment spans #14768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(linter/plugins): handle utf16 characters within comment spans #14768
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. |
CodSpeed Performance ReportMerging #14768 will not alter performanceComparing Summary
Footnotes
|
ad44a8d to
baf7192
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes UTF-16 character handling within comment spans for the linter plugin system. When source code contains multi-byte UTF-16 characters (like emojis or non-Latin scripts), comment span offsets need to be correctly converted from UTF-8 to UTF-16 indices to ensure accurate positioning.
Key Changes:
- Added UTF-16 conversion for comment spans in the linter
- Introduced comprehensive test coverage for Unicode characters in comments
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| crates/oxc_linter/src/lib.rs | Added call to convert comment spans to UTF-16 offsets |
| apps/oxlint/test/fixtures/unicode-comments/plugin.ts | New test plugin that reports all comments with their types and values |
| apps/oxlint/test/fixtures/unicode-comments/output.snap.md | Expected snapshot output showing correctly extracted comments with Unicode characters |
| apps/oxlint/test/fixtures/unicode-comments/files/unicode-comments.js | Test fixture file containing various Unicode characters in comments (emojis, Chinese, Greek, Hebrew, etc.) |
| apps/oxlint/test/fixtures/unicode-comments/.oxlintrc.json | Configuration file enabling the unicode-comments test plugin |
| apps/oxlint/test/e2e.test.ts | Added end-to-end test case for UTF-16 character handling in comments |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
All looks correct, so merging. But I'd suggest as follow-up to expand the tests:
- Add
startandendof each comment to snaphot. - Add
rangeto snapshot (or assertcomment.range[0] === comment.start && comment.range[1] === comment.endfor all comments). - Add
locto snapshot for each comment, and eyeball that they look correct.
It might also be good to call context.report() for each comment individually, passing the Comment as node property. That'd:
- Test the translation of offsets back to UTF-8 when reporting errors (that happens on Rust side).
- Make the snapshot more readable and easier to eyeball for errors.
|
@overlookmotel I left out testing |
Part of #14564. Corrects start and end offsets to accommodate two byte characters. Conversion of UTF-8 indices to UTF-16 takes place on the rust side for performance.