Skip to content

Conversation

@lilnasy
Copy link
Contributor

@lilnasy lilnasy commented Oct 19, 2025

Part of #14564. Corrects start and end offsets to accommodate two byte characters. Conversion of UTF-8 indices to UTF-16 takes place on the rust side for performance.

@graphite-app
Copy link
Contributor

graphite-app bot commented Oct 19, 2025

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

@github-actions github-actions bot added A-linter Area - Linter A-cli Area - CLI C-bug Category - Bug labels Oct 19, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 19, 2025

CodSpeed Performance Report

Merging #14768 will not alter performance

Comparing lilnasy:fix/linter/plugins/utf16 (baf7192) with main (c6395c7)1

Summary

✅ 4 untouched
⏩ 33 skipped2

Footnotes

  1. No successful run was found on main (cd266b4) during the generation of this report, so c6395c7 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 33 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@lilnasy lilnasy force-pushed the fix/linter/plugins/utf16 branch from ad44a8d to baf7192 Compare October 19, 2025 04:20
@lilnasy lilnasy marked this pull request as ready for review October 19, 2025 04:29
@lilnasy lilnasy requested a review from camc314 as a code owner October 19, 2025 04:29
Copilot AI review requested due to automatic review settings October 19, 2025 04:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes UTF-16 character handling within comment spans for the linter plugin system. When source code contains multi-byte UTF-16 characters (like emojis or non-Latin scripts), comment span offsets need to be correctly converted from UTF-8 to UTF-16 indices to ensure accurate positioning.

Key Changes:

  • Added UTF-16 conversion for comment spans in the linter
  • Introduced comprehensive test coverage for Unicode characters in comments

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/oxc_linter/src/lib.rs Added call to convert comment spans to UTF-16 offsets
apps/oxlint/test/fixtures/unicode-comments/plugin.ts New test plugin that reports all comments with their types and values
apps/oxlint/test/fixtures/unicode-comments/output.snap.md Expected snapshot output showing correctly extracted comments with Unicode characters
apps/oxlint/test/fixtures/unicode-comments/files/unicode-comments.js Test fixture file containing various Unicode characters in comments (emojis, Chinese, Greek, Hebrew, etc.)
apps/oxlint/test/fixtures/unicode-comments/.oxlintrc.json Configuration file enabling the unicode-comments test plugin
apps/oxlint/test/e2e.test.ts Added end-to-end test case for UTF-16 character handling in comments

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

All looks correct, so merging. But I'd suggest as follow-up to expand the tests:

  • Add start and end of each comment to snaphot.
  • Add range to snapshot (or assert comment.range[0] === comment.start && comment.range[1] === comment.end for all comments).
  • Add loc to snapshot for each comment, and eyeball that they look correct.

It might also be good to call context.report() for each comment individually, passing the Comment as node property. That'd:

  1. Test the translation of offsets back to UTF-8 when reporting errors (that happens on Rust side).
  2. Make the snapshot more readable and easier to eyeball for errors.

@overlookmotel overlookmotel merged commit 78ee7b8 into oxc-project:main Oct 19, 2025
21 checks passed
@lilnasy
Copy link
Contributor Author

lilnasy commented Oct 19, 2025

@overlookmotel I left out testing start and end explicitly for brevity. The comments' value being correct implies they were correct when used to slice sourceText. I can add the assertions and loc right away!

@lilnasy lilnasy deleted the fix/linter/plugins/utf16 branch October 19, 2025 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-cli Area - CLI A-linter Area - Linter C-bug Category - Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants