fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte characters #4815

key-moon · 2024-04-06T12:07:40Z

The current Prisma's LSP cannot handle schemas containing multibyte characters correctly.

This is because the offsets returned by prisma-fmt are calculated based on the number of bytes in UTF-8 encoding. In the LSP protocol, text offsets should be represented by the length in UTF-16 unless otherwise specified.

This pull request includes changes to the offset_to_position and position_to_offset functions, as well as the implementation of the offset_to_lsp_offset function and its usage within lint::run.

This fixes the above issue.

fixes prisma/language-tools#1308

CLAassistant · 2024-04-06T12:07:45Z

All committers have signed the CLA.

codspeed-hq · 2024-04-08T14:47:31Z

CodSpeed Performance Report

Merging #4815 will not alter performance

_{Comparing key-moon:fix-multibyte (3932eaa) with main (27c0eb3)}

Summary

✅ 11 untouched benchmarks

key-moon · 2024-04-08T14:54:44Z

There are warnings about dead_code for many functions in offset.rs. This is due to the fact that most functions in offset.rs are only used in lib.rs. I have allowed dead_code in the whole file for a now, but if you have a better solution, I would appreciate suggestions.

aqrln · 2024-04-08T16:51:18Z

@key-moon this happens because the module is included twice in both the library crate and the binary crate. I see that some existing modules in this package follow this pattern too but it's not really the best way to do it. Aside from leading to problems like the one you encountered, it will also lead to compiling the same code twice and potentially bloating the binary (for us maybe there's no binary size impact because -Os collapses identical functions, but it will affect the compilation speed regardless).

The way it's normally done in Rust is you only include such modules in the library crate (i.e. in lib.rs) and export what you need to use in the binary crate. In other words, if you turn mod offsets; in lib.rs into pub mod offsets; and remove mod offsets; in main.rs, you can use the prisma_fmt::offsets module anywhere in the binary crate (note that we refer to prisma_fmt as an external crate here even though both crates are in the same package with the same cargo manifest and name) and you won't have any dead or duplicate code.

key-moon · 2024-04-08T17:28:58Z

Understood. I am new to Rust, so your help with the basics was very helpful.

While applying the suggested fix, I ran into a problem caused by the fact that lint uses offsets. Since lint is included by main and lib, two different ways of referencing the function, crate::offsets::offset_to_lsp_offset; and prisma_fmt::offsets::offset_to_lsp_offset; conflict.

This could be solved by not including lint in main and publishing the lint module as well. However, as this would be a relatively large change, I'm hesitate to make such a change.

Edit: I just pushed the fix that was described above. I'm not sure whether this fix was the right way to do it.

key-moon · 2024-04-10T13:29:40Z

The error caused by the conflict has been resolved. It took some time to resolve, but now this PR is back to a problem-free state.
One thing I discovered in the process of fixing the error. The MiniError in lint::run does not seem to return information about which file the warning was emitted from. I believe this is an unintended behavior.

key-moon · 2024-04-15T07:57:50Z

Is there anything you are having trouble with for the merge? If there is, I will work towards resolving it.

key-moon · 2024-05-25T09:41:46Z

Any updates?

key-moon · 2024-06-16T08:31:46Z

I would like to highlight the importance, as it seems the significance of this pull request might not be fully understood. While this pull request may appear to address an odd bug involving strange prisma files with emojis for you, it is actually a highly troublesome bug for those of us using languages represented by multibyte characters. This bug prevents us from using basic features like red underlines and quick fixes when we use our native language in comments.

Could you please consider reviewing this pull request? Simply merging this PR will greatly improve the development experience for thousands of developers in regions where multibyte characters are used.

aqrln

Hey @key-moon, thanks for the PR and sorry it wasn't reviewed yet. I'm not too familiar with this part of the codebase but to me the changes look good and make sense.

I left a few comments: first of all, there are new changes present on main that are lost here because of moving the code and need to be integrated to offsets.rs — this is more critical, and secondly, there's a suggestion for better performance / algorithmic complexity — take a look if it makes sense but it's not critical.

prisma-fmt/src/offsets.rs

aqrln · 2024-06-16T12:51:03Z

prisma-fmt/src/lint.rs

+            start: offset_to_lsp_offset(err.span().start, db.source(err.span().file_id)),
+            end: offset_to_lsp_offset(err.span().end, db.source(err.span().file_id)),


Should we have a function that takes a span and returns a pair of LSP offsets to avoid traversing the document in O(n) twice?

I have implemented the span_to_range function. Additionally, I replaced a function with the same name and role in offset.rs(see https://github.com/prisma/prisma-engines/pull/4815/files#diff-557db087b8d611a0049284a4b38c817f1868a1dde965a9fa904c2dcaad576eb0R198). Since the order of arguments was different, I modified the usage accordingly. For consistency, I also adjusted the order of arguments for the range_after_span function. Since this is an internally used function, I believe there should be no issues.

aqrln · 2024-06-16T13:22:05Z

@Druue this is the fix for prisma/language-tools#1308

key-moon · 2024-06-16T15:48:07Z

I apologize for any urgency I may have caused. I really appreciate your prompt review. Additionally, I have replied to the comment that was made. Thank you very much.

Druue · 2024-06-17T13:53:56Z

Hey! Thank you so much for the PR and sorry for the delay. I've pulled this on and will see about finding some time to review this :)

Some initial thoughts are if you can revert the function arg orderings to what they were to minimise the number of changes to read through

key-moon · 2024-06-17T14:09:02Z

Thank you for reviewing!

Since the function signatures in offset.rs follows a similar pattern like (offset, document), it makes sense to implement the span_to_range function in offset.rs as span_to_range(span, document). The span_to_range function in code_actions.rs performs the exact same operation as the newly created span_to_range, except for the order of the arguments. Maintaining the original order of arguments would mean creating a wrapper span_to_range function to wrap the span_to_range function. To avoid this kind of code, I decided to change the order of the arguments.

There are only three places where these functions are used, so I don't think the review will be a significant effort.

key-moon · 2024-06-17T14:37:44Z

I realized that just implementing span_to_range does not solve the problem with offset_to_lsp_offset mentioned above, so I implemented span_to_lsp_offsets. Since these two functions use almost the same logic for their calculations, I created a common function, offset_to_position_and_next_offset, for the part where the calculation resumes.

Additionally, with the implementation of span_to_lsp_offsets, offset_to_lsp_offset is no longer used in the code. Although it is currently retained with #[allow(dead_code)], it can be removed if it is no longer needed.

Druue · 2024-06-18T11:41:28Z

Since the function signatures in offset.rs follows a similar pattern like (offset, document), it makes sense to implement the span_to_range function in offset.rs as span_to_range(span, document).

We generally prefer to include contextual args first (e.g. the schema / document) and spans after / at the end.

The span_to_range function in code_actions.rs performs the exact same operation as the newly created span_to_range, except for the order of the arguments. Maintaining the original order of arguments would mean creating a wrapper span_to_range function to wrap the span_to_range function.

I genuinely don't know what you're talking about here. There is only one fn span_to_range, I don't see why a wrapper would be needed

I will say that this looks good though, thank you. I'm not going to block this on the above.

Screen.Recording.2024-06-18.at.13.36.09.mov

key-moon · 2024-06-18T11:50:10Z

Thank you for your review 🙇

I created the span_to_range function in offset.rs and removed it from code_actions.rs. My argument was that wrapper is needed if we want to retain both functions.

Once again, thank you for the review!

Change returned offset to UTF-16 offset

28cc7cd

key-moon requested a review from a team as a code owner April 6, 2024 12:07

key-moon requested review from jkomyno and removed request for a team April 6, 2024 12:07

key-moon added 2 commits April 6, 2024 22:28

fix compiling issue

8744045

format

2d8f45b

key-moon changed the title ~~fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte charactors~~ fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte characters Apr 6, 2024

key-moon added 4 commits April 8, 2024 22:59

Merge branch 'main' into fix-multibyte

ab1e3a6

fix merge commit

e689717

fix docs

d05a2f4

fix comment for the test

e043e36

allow unused code

095cade

key-moon added 6 commits April 9, 2024 02:30

stop using allow(dead_code)

3a8d6a8

format

eb10dac

remove lint from main.rs (see prisma#4818 )

eaa648a

Merge branch 'main' into fix-multibyte

829169b

remove unused use

a5e07ae

fix for multi-schema

4b96152

Merge branch 'main' into fix-multibyte

8401266

aqrln requested a review from Druue May 27, 2024 17:23

Merge branch 'main' into fix-multibyte

dd17bb2

aqrln reviewed Jun 16, 2024

View reviewed changes

aqrln mentioned this pull request Jun 16, 2024

fix(fmt/psl): multibyte characters #4920

Closed

key-moon added 2 commits June 16, 2024 23:37

added reverted fixes and tests

1b4d0f8

implement span_to_range

13295b4

Druue self-assigned this Jun 17, 2024

Druue added the PR: Bug A PR That Fixes a bug label Jun 17, 2024

Druue added this to the 5.16.0 milestone Jun 17, 2024

key-moon added 2 commits June 17, 2024 23:12

Merge branch 'main' into fix-multibyte

6a2e5d3

implement span_to_lsp_offsets

3932eaa

Druue approved these changes Jun 18, 2024

View reviewed changes

Druue merged commit 05b7e05 into prisma:main Jun 18, 2024
204 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte characters #4815

fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte characters #4815

key-moon commented Apr 6, 2024 •

edited by Druue

Loading

CLAassistant commented Apr 6, 2024 •

edited

Loading

codspeed-hq bot commented Apr 8, 2024 •

edited

Loading

key-moon commented Apr 8, 2024

aqrln commented Apr 8, 2024 •

edited

Loading

key-moon commented Apr 8, 2024 •

edited

Loading

key-moon commented Apr 10, 2024 •

edited

Loading

key-moon commented Apr 15, 2024

key-moon commented May 25, 2024

key-moon commented Jun 16, 2024

aqrln left a comment

aqrln Jun 16, 2024

key-moon Jun 16, 2024

aqrln commented Jun 16, 2024 •

edited

Loading

key-moon commented Jun 16, 2024

Druue commented Jun 17, 2024

key-moon commented Jun 17, 2024

key-moon commented Jun 17, 2024

Druue commented Jun 18, 2024

key-moon commented Jun 18, 2024

		start: offset_to_lsp_offset(err.span().start, db.source(err.span().file_id)),
		end: offset_to_lsp_offset(err.span().end, db.source(err.span().file_id)),

fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte characters #4815

fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte characters #4815

Conversation

key-moon commented Apr 6, 2024 • edited by Druue Loading

CLAassistant commented Apr 6, 2024 • edited Loading

codspeed-hq bot commented Apr 8, 2024 • edited Loading

CodSpeed Performance Report

Merging #4815 will not alter performance

Summary

key-moon commented Apr 8, 2024

aqrln commented Apr 8, 2024 • edited Loading

key-moon commented Apr 8, 2024 • edited Loading

key-moon commented Apr 10, 2024 • edited Loading

key-moon commented Apr 15, 2024

key-moon commented May 25, 2024

key-moon commented Jun 16, 2024

aqrln left a comment

Choose a reason for hiding this comment

aqrln Jun 16, 2024

Choose a reason for hiding this comment

key-moon Jun 16, 2024

Choose a reason for hiding this comment

aqrln commented Jun 16, 2024 • edited Loading

key-moon commented Jun 16, 2024

Druue commented Jun 17, 2024

key-moon commented Jun 17, 2024

key-moon commented Jun 17, 2024

Druue commented Jun 18, 2024

key-moon commented Jun 18, 2024

key-moon commented Apr 6, 2024 •

edited by Druue

Loading

CLAassistant commented Apr 6, 2024 •

edited

Loading

codspeed-hq bot commented Apr 8, 2024 •

edited

Loading

aqrln commented Apr 8, 2024 •

edited

Loading

key-moon commented Apr 8, 2024 •

edited

Loading

key-moon commented Apr 10, 2024 •

edited

Loading

aqrln commented Jun 16, 2024 •

edited

Loading