-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Black does not format correctly strings containing emojis #395
Comments
This may be related to the fact that in python Javascript recommends converting the string into an array to retrieve the number of characters (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length#strings_with_length_not_equal_to_the_number_of_characters). I don't know it it's easier to return from the python server the adjusted offsets by encoding the string into utf-16 (also taking the correct endianness into account) or to make vscode position API unicode aware (probably the first). |
@DavideCanton Can you share the logs you captured? Also, is the file encoding UTF-8? |
@DavideCanton I think this has to be handled in python itself. Looks like the general specification is to handle it based on the We calculate text edits here: https://github.com/microsoft/vscode-black-formatter/blob/main/bundled/tool/lsp_edit_utils.py Tests can be added here: https://github.com/microsoft/vscode-black-formatter/blob/main/src/test/python_tests/test_edit_utils.py |
Hi @karthiknadig , thanks for the response! I can confirm that the encoding of the document was set on UTF-8 in all the examples. I didn't capture any log, but I can definitely do if you can give me an hint on what do you need. |
In my previous comment I have tried to summarize the problem (you identified the core issue correctly). Basically, we need to use UTF-16 to calculate the range offsets but return UTF-8 strings. /cc @dbaeumer is that accurate? |
What needs to be |
/cc @charliermarsh This is something to note for |
@crwilcox yeah, it seems kinda the same |
I was about to file similar bug. The formatter introduces invalid strings when formatting lines with emojis. Here is screencast of it in action in vscode: My initial thought was that it might be related to psf/black#1197 |
Have you tried running black on the source code via cli? It shouldn't happen |
Just tried it and can confirm the issue does not exist when calling from the cli. Given this, as a workaround for VS Code until this bug is fixed we could use a keyboard shortcut to run black from the cli. (via this SO) Open Preferences: Open Keyboard Shortcuts (JSON)) and paste in this:
|
I've experienced this emoji problem multiple times too, would really be helpful to get a fix for this |
I have a potential fix for this, please give this a try: https://github.com/microsoft/vscode-black-formatter/actions/runs/7426466173/artifacts/1150857594 |
@karthiknadig it seems to be working fine now. Thanks! |
Verification steps:
s = '😊'
|
So when will this fix be released to the extension marketplace? |
@JokerQyou This is in pre-release version. |
It seems that this extension does not handle correctly formatting strings containing emojis when formatting from vscode. Formatting the file with the black command via cli works correctly.
Examples:
before
after
or
after
Investigating a bit, it seems that the positions sent back from the language server to the client count the emojis as a single character while the visual studio code editor doesn't, as for the first example the line edits relative to the line 0 are
0:4-0:5, new_text='"'
and0:6-0:7, new_text='"'
, but the second one seems to replace something other than the second'
. Tampering with the server code and setting0:7-0:8
to the second edit fixes the problem.Tested on Windows 10.
The text was updated successfully, but these errors were encountered: