Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the vast majority of our supra-BMP output on windows/msys is fubar #2117

Closed
dankamongmen opened this issue Aug 29, 2021 · 43 comments
Closed
Assignees
Labels
bug Something isn't working mswindows microsoft windows
Milestone

Comments

@dankamongmen
Copy link
Owner

I was kinda hoping this would resolve itself without my intervention, but it appears not. Run some notcurses code. Note that most UTF-8 doesn't show up, or is otherwise fucked up. I refuse to believe that Microsoft Windows in 2021 cannot display most unicode. I mean, we don't even seem to have quadrants, and yet I think I've seen quadrants generated by other programs.

Figure out what we're doing wrong, and rectify it.

@dankamongmen dankamongmen added bug Something isn't working mswindows microsoft windows labels Aug 29, 2021
@dankamongmen dankamongmen added this to the 3.0.0 milestone Aug 29, 2021
@dankamongmen dankamongmen self-assigned this Aug 29, 2021
@dankamongmen
Copy link
Owner Author

@j4james, how would you describe Unicode capabilities in Windows Terminal?

@j4james
Copy link

j4james commented Sep 1, 2021

Here's a screenshot from the notcurses uniblock demo running under WSL. Most of the BMP blocks should be showing content like this, so if most UTF-8 is not working for you, then something is seriously wrong.

image

Once you get to the higher planes things become a little more messy, mostly in terms of the width calculations being wrong, so you'll have text "leaking" out of the area you were expecting it to fit and screwing up the rest of the page. I think this is because Windows stores Unicode in 16-bit "wide characters", and code points from the supplementary planes require surrogate pairs, so they don't fit in a single cell. Don't quote me on that, though. I just know that it's a known issue that they're still working on.

It's also worth mentioning that there are additional limitations in the old conhost console. It uses a GDI renderer that doesn't do font fallback so if your selected font doesn't support a particular code point, you're just going to get a � (or something of the sort). I think the GDI renderer also ignored anything outside the BMP, so none of the supplementary planes are likely to work. Both those issues have been worked on recently, but unless you're building from source, you won't have those fixes.

Then if you're testing on Mintty (or any other third-party terminal), there could be a number of other issues coming into play, so I'd suggest you stick to Windows Terminal to start with, until you've got something working reasonably.

@dankamongmen
Copy link
Owner Author

if you could run notcurses-info.exe for me, and let me know if it's mostly empty on your side, i'd appreciate it. that looks great, holy crap!

@dankamongmen
Copy link
Owner Author

i'm testing on both. right now windows terminal is getting close to usable. mintty i would not say so, due to input problems detailed in #2116. in neither do i get much in the way of good notcurses-info.exe output. here's a pretty ideal output (kitty):

2021-09-01-200830_1251x1417_scrot

@j4james
Copy link

j4james commented Sep 2, 2021

OK, there are quite a few gaps on the notcurses-info page. So if that's what's worrying you, that's to be expected.

image

@j4james
Copy link

j4james commented Sep 2, 2021

Just looking at a couple of the missing code points at random, they seem to be outside the BMP, but are meant to be narrow width, so this could be the surrogate pair issue I mentioned above. Otherwise it may just be that my system doesn't have an appropriate font for all of those code points, although then I would expect a bunch of � replacement characters instead. Whatever it is, I don't think it's your problem.

@j4james
Copy link

j4james commented Sep 2, 2021

Now I'm beginning to have some doubts. I've just installed a font that it is supposed to have the segmented digit characters. That didn't seem to help with font fallback, but if I select it as my primary font, I can display those characters in the shell, like this:

image

When I run notcurse-info, though, I'm still seeing those code points missing. I wouldn't have expected it to work perfectly, because I know the width calculations are going to be wrong, but I am surprised it's not showing anything at all.

@dankamongmen
Copy link
Owner Author

oh hey you seem to be seeing a good bit more than i am.

2021-09-01-214801_2019x1308_scrot

@dankamongmen
Copy link
Owner Author

2021-09-01-215020_2019x1308_scrot

@dankamongmen
Copy link
Owner Author

you have uniblock running in a Windows Terminal? it exits with failure immediately for me =[

@dankamongmen
Copy link
Owner Author

ohhhhhhhhhhhh you have WSL there not Windows Terminal. that's a whole different thing, no? that's using UNIX interfaces.

@j4james
Copy link

j4james commented Sep 2, 2021

Yeah. I haven't tried getting the native Windows build running. I was just trying to show you what Unicode you should be able to see in Windows Terminal.

Also I can redirect the notcurses-info output to a file, and then type that file from a cmd.exe shell, and as long as I've set the UTF8 codepage first, I still get the same result (more or less). So a native Windows shell should definitely be doing a better job than what you're seeing.

@dankamongmen
Copy link
Owner Author

i am setting the code page in Windows Terminal for sure i'm pretty sure

@j4james
Copy link

j4james commented Sep 2, 2021

Actually, that's probably an easy way for you to test the Windows build. Just redirect to a file and compare the output to what you're seeing from a Linux build. I would think they would be more or less the same if things were working correctly. Then you don't have to worry about whether or not the terminal is rendering it correctly.

@dankamongmen
Copy link
Owner Author

i wouldn't be shocked if this is due to 16-bit wchar_t...but i primarily deal with utf-8. it's possible though.

@dankamongmen dankamongmen changed the title the vast majority of our supraASCII output on windows/msys is fubar the vast majority of our supra-BMP output on windows/msys is fubar Sep 6, 2021
@dankamongmen
Copy link
Owner Author

yeah, i'm becoming pretty convinced that this has to do with being beyond the BMP and thus outside the range of wchar_t without surrogate pairs.

@dankamongmen
Copy link
Owner Author

_setmode(_O_U8TEXT) might be relevant, unsure.

Set the console code page to cp65001 (UTF-8) doesn’t improve Unicode support, it is the opposite: non-ASCII are not rendered correctly and type non-ASCII characters (e.g. using the keyboard) doesn’t work correctly, especially using raster fonts.

hrmmm.

@dankamongmen
Copy link
Owner Author

Set the console code page to cp65001 (UTF-8) doesn’t improve Unicode support, it is the opposite: non-ASCII are not rendered correctly and type non-ASCII characters (e.g. using the keyboard) doesn’t work correctly, especially using raster fonts.

removing this definitely did not help

@j4james
Copy link

j4james commented Sep 6, 2021

FYI, I figured out why there were so many gaps in the notcurse-info output from my WSL build. It seems that on my system those characters aren't support by wc_width, so it returns -1. I wrote a little test app like this:

wchar_t wc = 0x1fbf0;
int cols = wcwidth(wc);
printf("%d\n", cols);

And the output I get is -1.

I don't unix, so I don't know whether that means I need to upgrade the compiler or the libraries, or there's something else I'm doing wrong, but at least it suggests it's not your problem, and it likely isn't a terminal problem either.

@dankamongmen
Copy link
Owner Author

i notice that Braille works just fine for us in notcurses-demo k, but it doesn't show up in notcurses-info. very interesting.

@dankamongmen
Copy link
Owner Author

alright, coming back around to take a look at this. i think the thing to do is to write a small testing tool that allows us to explore these characters in windows. they're definitely usable, we're just doing something wrong. i'll look into this this weekend.

@dankamongmen
Copy link
Owner Author

once we get this resolved, we're going to pretty much be right on windows, so let's put some effort in here soon.

@j4james
Copy link

j4james commented Nov 13, 2021

Before you get too stressed about this, note that the narrow width characters in the astral planes are expected not to work correctly in Windows Terminal. They're going to take up twice as many cells as expected, which completely screws up the notcurses-info output. This is what it looks like at the moment when running from WSL:

image

It actually looked better before you fixed the wcwidth problem and the characters were just dropped, but I'm definitely not suggesting you revert that. This is something that Windows Terminal needs to fix.

@dankamongmen
Copy link
Owner Author

It actually looked better before you fixed the wcwidth problem and the characters were just dropped, but I'm definitely not suggesting you revert that. This is something that Windows Terminal needs to fix.

do you know of any upstream bug i can track and/or comment on?

@dankamongmen
Copy link
Owner Author

ok, it's all clear now:

RAST 00000020 [ ] to 45/0 cols: 1 40ffffff40191970
RAST 000000e2 [▒] to 45/1 cols: 1 40ffffff40191b70
RAST 00000096 [▒] to 45/2 cols: 1 40ffffff40191e71
RAST 00000098 [▒] to 45/3 cols: 1 40ffffff40192071
RAST 000000e2 [▒] to 45/4 cols: 1 40ffffff40192372
RAST 00000096 [▒] to 45/5 cols: 1 40ffffff40192572
RAST 0000009d [▒] to 45/6 cols: 1 40ffffff40192872
RAST 000000e2 [▒] to 45/7 cols: 1 40ffffff40192a73
RAST 00000096 [▒] to 45/8 cols: 1 40ffffff40192c73
RAST 00000080 [▒] to 45/9 cols: 1 40ffffff40192f73
RAST 000000e2 [▒] to 45/10 cols: 1 40ffffff40193174
RAST 00000096 [▒] to 45/11 cols: 1 40ffffff40193474
RAST 00000096 [▒] to 45/12 cols: 1 40ffffff40193675
RAST 000000e2 [▒] to 45/13 cols: 1 40ffffff40193875
RAST 00000096 [▒] to 45/14 cols: 1 40ffffff40193b75
RAST 0000008c [▒] to 45/15 cols: 1 40ffffff40193d76
RAST 000000e2 [▒] to 45/16 cols: 1 40ffffff40194076
RAST 00000096 [▒] to 45/17 cols: 1 40ffffff40194276
RAST 0000009e [▒] to 45/18 cols: 1 40ffffff40194577

this is from the quadrants output in unicodedumper() from notcurses-info. look at e.g. 45/1--45/3. we're emitting 0xe2, 0x96, and 0x98 as three columns. that's the UTF8 for U+2598 QUADRANT UPPER LEFT, which is what we want to see. but we ought be seeing all three bytes in a single cell.

also, right above this, we have:

▘▝▀▖▌▞▛▗▚▐▜▄▙▟█⎧ 49

output directly to stderr. so yeah, it's all a matter of our UTF8 being broken up into cells. find that, and we've got this resolved.

@dankamongmen
Copy link
Owner Author

it looks like mbrtowc() is always returning 1?

@dankamongmen
Copy link
Owner Author

i think we have a ridiculously low MB_CUR_MAX when we compile...

@dankamongmen
Copy link
Owner Author

yep

@dankamongmen
Copy link
Owner Author

we're getting somewhere!

image

@j4james
Copy link

j4james commented Nov 17, 2021

I can't comment on mbrtowc or anything that you may be doing right or wrong in notcurses. All I'm saying is that no matter how perfect your code is, the output is going to look broken in Windows Terminal (and the conhost console for that matter). Don't assume that broken output is your fault.

If you want an issue to track, the root of the problem is probably microsoft/terminal#8000 - essentially the text buffer implementation needs to be rewritten. But if you want to comment on this specific manifestation, something like microsoft/terminal#11694 might be more appropriate.

@dankamongmen
Copy link
Owner Author

aye, but we've just made massive progress! we now have quadrants!

@dankamongmen
Copy link
Owner Author

so it's not that mbrtowc() always returns 1, it's that you have to set the locale up properly for Windows. in UNIX land, we usually want a setlocale(LC_ALL, "") to pull from LANG. not so much on windows. furthermore, just setting the encoding to UTF8 doesn't get us all the way there; we appear to require a setlocale(LC_ALL, ".UTF8"). but at that point, we start getting real results from even lowly old mbrtowc(). yay! tally ho!

@dankamongmen
Copy link
Owner Author

highlighting covers the expected area, perhaps for the first time!

image

@dankamongmen
Copy link
Owner Author

looking good in actual Microsoft Terminal as opposed to MinTTY, too!

image

@dankamongmen
Copy link
Owner Author

the [luigi] demo now looks PERFECT, with none of the weird bugs we were seeing before, yay ya yay

image

@dankamongmen
Copy link
Owner Author

[intro] now looks PERFECT

image

@dankamongmen
Copy link
Owner Author

https://www.youtube.com/watch?v=lO6mNbJGWHI oh yeaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaah

@dankamongmen
Copy link
Owner Author

ahhh this is a glorious day indeed, w00t w00t w00t, all good things come to he who hacks

@dankamongmen
Copy link
Owner Author

as hoped, this has also fixed various demos which were failing, including [uniblock] and [normal]

@dankamongmen
Copy link
Owner Author

we actually have multiple demos working one-after-another now, with a working braille FPS plot, tremendous improvement.

image

@dankamongmen
Copy link
Owner Author

alright, whatever problems still exist, we no longer have the vast majority of our supra-BMP output on windows/msys fubar. we'll create focused issues for remaining problems, but this showstopper is resolved.

@DHowett
Copy link

DHowett commented Jan 25, 2023

microsoft/terminal#14640 and microsoft/terminal#13626 probably put a significant dent in this issue; they were just released as part of v1.17.1023.

The infrastructure they lay will be available in newer[1] versions of ConPTY and therefore other terminal emulators on Windows at some future point.

[1] we have plans about how we can update ConPTY outside of the Windows update cadence :)

@dankamongmen
Copy link
Owner Author

awesome! are you a fellow Friend of Redmond? feel free to hit me up at niblack on teams =]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mswindows microsoft windows
Projects
None yet
Development

No branches or pull requests

3 participants