-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update wcwidth/CharWidth.ts #1707
Conversation
Tests were passing locally but looks like I didn't update the build.. Will fix. |
There's still one bug with |
I believe this fix is correct but would appreciate some review on it, especially since I had to update a few tests to make it consistently work. It looks like print had a bug wherein it would insert an extra space character when wrapping was involved... Latest commit has relevant changes. |
@dnfield Wow thx! You are kinda tackling two of the most annoying subtile core bugs in xterm.js at once.
I think we should not mix both issues into one PR, since each might have side effects on the rest of the code base (our test coverage isnt that good for some core parts, we are working on it). Imho your PR is a good starter for the unicode mess 👍 |
Nice, line wrapping seems to work in both macOS HS and Ubuntu 18.04 👍 these changes will end up closing off a bunch of issues 😍 |
src/Buffer.test.ts
Outdated
@@ -508,7 +508,7 @@ describe('Buffer', () => { | |||
// const s = terminal.buffer.contents(true).toArray()[0]; | |||
// assert.equal(input, s); | |||
for (let i = 10; i < input.length; ++i) { | |||
const bufferIndex = terminal.buffer.stringIndexToBufferIndex(0, i + 1); // TODO: remove +1 after fix | |||
const bufferIndex = terminal.buffer.stringIndexToBufferIndex(0, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address this with a separate PR. The test case tested the wrong behavior by purpose to track it.
const s = terminal.buffer.iterator(true).next().content; | ||
assert.equal(input, s); | ||
assert.equal(getStringCellWidth(s), 12); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to #1685 you can temporarily fix your test case by placing a whitespace after every second emoji char.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test wasn't breaking, it was another existing one ("Lots of makes me ." - smiley just happend to occur after a break in the buffer, leaving an extra space)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yupp my bad, the line isnt wrapping (mixed cols and rows in my head).
src/InputHandler.ts
Outdated
@@ -407,6 +407,9 @@ export class InputHandler extends Disposable implements IInputHandler { | |||
// autowrap - DECAWM | |||
// automatically wraps to the beginning of the next line | |||
if (wraparoundMode) { | |||
if (buffer.x === cols - 1) { | |||
bufferRow.set(buffer.x, [curAttr, '', 0, undefined]); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said above, please move this to a separate PR.
[0x1f9c0, 0x1f9c2], // (nil) .. | ||
[0x1f9d0, 0x1f9ff], // (nil) .. | ||
[0x20000, 0x2fffd], // Cjk Unified Ideograph-20.. | ||
[0x30000, 0x3fffd] // (nil) .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need some version selection thing on top of this and support both rule sets not to leave ppl with 3ys old systems behind.
Edit: This also will need a setting for the ambigious chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure how to best make that work in this codebase. Am I right in thinking the concern is that if someone is running xterm.js with a font that hasn't been updated in three years it would cause problems? Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first place we have to make sure, the active spec is in line with the pty downwards (the shell/foreground app at the slave side + the pty itself), since they expect a certain string to fit into xy terminal cells. Doing otherwise weird things will happen to the cursor.
The rendering metrics are secondary, and with the canvas renderer we can force them fit to some degree even if the font is not monospace. The best case is the local app where both frontend and backend run on the same system. It gets really interesting for remote sessions with ssh, there is literally no way to tell which unicode version the backend runs. Therefore I think we need some option for that.
Imho the easiest way for now would be to put both definitions into some key-value storage and expose the key as the option.
Edit: Or even easier - both versions closured behind an access key. The option (key) would have to be managed by the InputHandler
, so if someone request the other version, the input handler could swap the implmentation. Something like that.
@dnfield Btw the |
@Tyriar We gonna need at least two other API options for this - one to request an unicode version (which loads to correct rule set) and another to set the current locale setting for the ambiguous chars. |
I've moved the requested commits to another branch (https://github.com/dnfield/xterm.js/tree/wrapping_unicode), but I may hold off to open a PR for it until this lands to avoid confusion (it would still show all the changes in this PR, minus the additional space I added to make the "Lots of yen" test pass). Let me know what you'd prefer there. |
@dnfield About the "other" issue - hmm maybe we can go with it fixing only the last cells for now, it surely will resolve a bunch of reported issues. Also I think it should be covered quit good by test cases (those you had to change in |
@dnfield Just lemme know if you get stuck or need some pointers. I kinda feel responsible for that stuff, have it way to long on the todo list. Many thx in advance! |
So should I un-revert those changes that (partially?) relate to the wrapping bug? |
Ah I'd still prefer them in a separate PR since they address a quite different issue. Its just those changes might be almost ready to go, I think you should not go for the whole problem behind #1685 as I kinda suggest above. |
Makes sense to me. As far as the toggle/configurability for unicode, I could use some pointers as to where that kind of configuration would be stored/used. It sounds like it wouldn't be a huge deal to implement, but I don't want to reinvent something you already have. |
My thinking is to keep it internal for now, the only place it is used is |
Added option - I think that's what you're asking for, but let me know. |
Yes almost - just one more thing. To be able to switch from legacy to v11 forth and back at runtime, we will need 2 function symbols. I'd say just remove the immediately invoked function construct and build the v11 version as the default Edit: Sorry - its easier to create both versions at startup, maybe |
@jerch if a separate pty is launched I suppose you could glean some information on the character width by printing
@jerch what would these options look like in the .d.ts file? My concern with your related PR (#1470) that ended up getting stale was that these APIs felt like they wouldn't ever be used and were there for completeness. What are the implications of shipping the latest version of unicode? |
Redid the exports as suggested. |
Yepp this should work with the pty itself and we should get the right version for the system that is running the pty. We could basically test this on the native pty without any program attached yet or with a small helper preattached (more reliable). It might need a small change in node-pty, not sure yet how to integrate. Imho we cannot do that without changes to node-pty since it attaches the requested program right away and this test string would end up as stdin input.
Well I hope that we dont have to expose the unicode version thing and can get away with the trick above. If we have to expose it this option would be a list with supported versions e.g. ['5.2', '11']. The locale setting is needed to get the ambiguous chars working correctly. It is not part of this PR (which is good to keep it small). This option would be basically some language code (maybe ISO 639-1) that we have to grab to resolve the ambiguous chars to the actual terminal width. Since nodejs has no default notion of locales (icu or locales are not bound to nodejs by default), we will have to deploy those maps ourselves, for different unicode versions. Thats the main reason I stopped working on #1470, it will need a good abstraction to get the unicode awkwardities and big data sets handled without busting xterm.js' code base (remember the graphemes PR #1478, the data part really scared me). Implications of shipping unicode v11: @Tyriar Will this width detection thing work with win-pty? How does it deal with different widths and the line ending? |
@dnfield Thx alot, LGTM 👍 |
@Tyriar About getting the default unicode settings, it might be sufficient to call $> icuinfo
<icuSystemParams type="icu4c">
<param name="copyright"> Copyright (C) 2013, International Business Machines Corporation and others. All Rights Reserved. </param>
<param name="product">icu4c</param>
<param name="product.full">International Components for Unicode for C/C++</param>
<param name="version">52.1</param>
<param name="version.unicode">6.3</param>
<param name="platform.number">4000</param>
<param name="platform.type">Linux</param>
<param name="locale.default">de_DE</param>
<param name="locale.default.bcp47">de-DE</param>
<param name="converter.default">UTF-8</param>
<param name="icudata.name">icudt52l</param>
<param name="icudata.path"></param>
<param name="cldr.version">24.0</param>
<param name="tz.version">2013g</param>
<param name="tz.default">Europe/Berlin</param>
<param name="cpu.bits">64</param>
<param name="cpu.big_endian">0</param>
<param name="os.wchar_width">4</param>
<param name="os.charset_family">0</param>
<param name="os.host">x86_64-pc-linux-gnu</param>
<param name="build.build">x86_64-pc-linux-gnu</param>
<param name="build.cc">gcc</param>
<param name="build.cxx">g++</param>
<param name="uconfig.internal_digitlist">1</param>
<param name="uconfig.have_parseallinput">1</param>
<param name="uconfig.format_fastpaths_49">1</param>
</icuSystemParams> It still would not be able track the unicode version of a certain program that has been linked against a non standard unicode version, but tbh - thats the problem of that program. @egmontkob Do we need to introduce an escape sequence, that gives programs the ability to report the unicode version/locale settings it emits to the terminal? This could also be used by ssh to change the emulator to the remote style. No more messed up emojis. Intriguing idea 😸 |
Nice idea! I don't think we'd implement something like this in VTE (currently we don't use an API that would provide us with different Unicode versions). I guess iTerm2 @gnachman might be more interested in this idea, it already has some config option to pick the Unicode version. I'm not sure what escape sequence would be the best, but I recommend using the Unicode major version directly (i.e. no mapping like 1 = Unicode 8, 2 = Unicode 9 etc., but rather values 8 and 9 for them) (not sure if there's any chance for a width change in minor Unicode versions though), I recommend to specify that the emulator is supposed to pick the closest it supports (that's the one that likely leads to the smallest mismatch, e.g. if an app asks for Unicode 12 but the emulator supports 10 and 11 then pick 11). Or / in addition to this maybe add an asynchronous escape sequence where the emulator responds with a list of versions it supports. Also, a special value like 0 to return to the emulator's default. On another note, take a look at the yellow paragraph added to Unicode 11.0's tr11, section 2 here. I'm not sure what it'll mean in practice. Looks to me that they release themselves from the responsibility of maintaining wcwidth and make it the realm of terminal emulators and apps, without a strict relation to the (various versions of the) Unicode database. Maybe something you should keep in mind while developing this extension, although I have no idea how. E.g. VTE has a config option (API) to toggle whether "ambiguous width" characters are narrow or wide, maybe this should fall to this category, and somehow handled under the same escape sequence? As far as I understand, this new note added to tr11 allows for further deviations along these lines. I guess we're all yet to see what it will actually mean. |
rant @egmontkob |
See https://iterm2.com/documentation-escape-codes.html, search for Unicode Version. I guess I need to add support for new emoji, too. I'd be glad to have a more standard way to define unicode version. I like the stack-based approach I used for this one because it recovers reasonably well from an ssh session terminating. |
@gnachman Thx, well you already have such an escape sequence, maybe we can adopt it to some degree. Can you elaborate on how this stack is supposed to work? Does this get reset automagically after a crashing ssh session or will the user have to call it explicitly?
@egmontkob |
The idea is you'd write a wrapper around ssh that does a push and pop. If ssh dies, the pop still happens. Since each stack entry has a unique identifier, a pop can remove any entries added while you were ssh'ed. So: ssh.sh:
|
@gnachman Thats a pretty nice feature, very useful. It requires some config work by the users but tbh, this is already a more advanced ssh usage so the audiance might already be aware of those problems. And the script/alias could easily be tweaked into a version, where the unicode settings looks like an argument to ssh. Nice. |
@jerch the trick above would be outside of xterm.js though as it depends on node access.
👍 in winpty:
If we could query and set that would be nice, this command doesn't work on macOS though. Wow this is a mess, would it make sense to just have a setting like this which defaults to 9?
I don't know if we'd want to use the proprietary escape sequences route, it would be nice if we just behaved correctly most of the time (which I think is wide emoji as this point). |
IMO we should definitely support setting the unicode version via a config (not sure if it can also work at runtime, but that would be best) - and then unicode detection can be handled by another module or addon (e.g. one that detects the iterm2 sequences), or even manually (for example, one could add a widget to the VSCode statusbar or add an action that allows to change the unicode version if the user detects weird behaviour). |
@mofux: Doing this at runtime should be no problem, what not will work is changing data that already landed in the terminal buffer (after it is in the buffer we lost track how it got in there, so no chance to revert this). @Tyriar: Yes the option could look like that. Im still unsure about the autodetection, it would need some "UnicodeGuesser" that acts on a lower level and yeah - outside of xterm.js. But once such an ominous unicode oracle got the version, xterm.js could set it correctly. The escape sequence offers much more power to people, that often face such a problem as shown by the stack implementation of @gnachman. Imho it will not hurt anyone else (they can just ignore it) and it is really easy to implement, once we settled something workable in #1709. |
Love this idea, we expose a setting and we want to eventually make it possible to support custom escape sequences 👍 How much would supporting the 2 unicode versions add to package weight? If it's not too bad then we should just add a setting and let consumers deal with it for now. This would be a much better place than where we're at now, VS Code for example could default to the latter version and expose a setting to tweak if you want. |
@Tyriar Currently we only would need an unicode aware wcwidth implementation, it basically adds the new table from @dnfield to the package if we can go with 2 versions. The tables could even be packed together, but this makes the code less readable. |
I suspect the main issue that people care about is the width of emoji, that's all I ever see people reporting. |
I have a setting for this but it causes more trouble than it's worth. It turns out ambiguous characters should usually be narrow. Some people in east asian locales might prefer them wide, but most apps (vim, etc.) are not aware of this and render improperly. The main source of problems is bash. It cares a lot about how wide your prompt is, and does horrible things if it guesses wrong.
You got that right. |
@gnachman Sorry for so many questions, your experience and insights will definitely help us to come up with something that will get the job done for most ppl out there. Thx alot. To not just chitchat and whine about the unicode mess (the latter goes on my account) I suggest to move the discussion over to #1709. Lets get the ball rolling. |
The current wcwidth implementation isn't up to date with the latest from Unicode. For many emojis, this means that they're treated as a single terminal cell when they really need to take up 2, resulting in lots of strange input/rendering problems for people who like moons, eggplans, and smiley faces.
This PR updates it. One thing I'm not 100% sure about is whether the
wideBMP
methods are still needed - it looks to me like they're meant to be a performance optimization, but I'm not 100% sure about what's going on there and didn't want to mess with it too much.This fixes #1059 as well as microsoft/vscode#51385 as well as #1502