Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Windows Terminal, prompt.spacing can "eat" lines that contain Unicode surrogate pairs. #574

Closed
bw1faeh0 opened this issue Mar 12, 2024 · 7 comments
Labels
external The issue is due to external causes outside of Clink

Comments

@bw1faeh0
Copy link

Hi,

I'm not sure if it is a clink or a flex-prompt issue, but take a look at this:

grafik

Or at this:

grafik

In contrast:

grafik

grafik

This does not happen in all directories. In D:\Temp\ for instance (where I can find a lot more files (34 files, 39 dirs)) I can not reproduce this issue.

@chrisant996
Copy link
Owner

chrisant996 commented Mar 12, 2024

Have you tried it in CMD without Clink? If it's indeed that the prompt prints before output is complete, then I would expect it to behave identically in plain CMD. Clink shows a prompt when CMD tries to show a prompt -- Clink literally intercepts the attempt to print the prompt.

UPDATE: It was not printing the prompt before output was complete. I updated the title accordingly.

Do you have aliases for dirx and eza? If so, what are the aliases defined as? Or batch scripts for them? If so what are the contents of the batch scripts?

The behavior in the screen shots would be expected if dirx or eza were started but CMD regained control before the spawned processes completed. There are a variety of ways that can be done, the easiest being use of start.

(This is likely external to both Clink and flexprompt.)

What terminal program are you using? (And what version of it?)

@chrisant996
Copy link
Owner

Also, can you narrow down specific repro steps? Maybe --git or --icons or some other flag(s) might be relevant.

@bw1faeh0
Copy link
Author

Have you tried it in CMD without Clink? If it's indeed that the prompt prints before output is complete, then I would expect it to behave identically in plain CMD. Clink shows a prompt when CMD tries to show a prompt -- Clink literally intercepts the attempt to print the prompt.

Now, I have tried in plain cmd:
grafik

grafik

So, I would say it is related to clink and/or flex-prompt.

Do you have aliases for dirx and eza? If so, what are the aliases defined as? Or batch scripts for them? If so what are the contents of the batch scripts?

Yes, I have. for both lines you can see in the screenshots I normally use aliases (doskey). The behavior of output is the same, that is why I used the long command line. So you can see all command line arguments.

-- Alias funcion
local function alias(short, long)
    os.execute('doskey ' .. short .. '=' .. long)
end

alias('ll',      'eza -lgah --group-directories-first --git --icons --time-style long-iso --no-permissions --classify=always')
alias('dd',      'dirx -c -og -i -Q-s-v --hyperlinks -h -Z --classify --pad-icons=2 -f "| F | Sm [(C?)] | D |" $1')

The behavior in the screen shots would be expected if dirx or eza were started but CMD regained control before the spawned processes completed. There are a variety of ways that can be done, the easiest being use of start.

start or something similar is not used.

(This is likely external to both Clink and flexprompt.)

What terminal program are you using? (And what version of it?)

Windows-Terminal Preview
Version: 1.20.10572.0

@bw1faeh0
Copy link
Author

Also, can you narrow down specific repro steps? Maybe --git or --icons or some other flag(s) might be relevant.

It looks like if I remove the -s parameter to suppress the summary the output is ok:
grafik

If I disable the icons by removing -i, -Q-s-v is working fine again ok:
grafik

In the case of eza the problem disappears if i remove the --icons parameter, too:
grafik

Without clink enabled, the use of icons make no difference:
grafik

So maybe it is related to the display of the icons?

@chrisant996
Copy link
Owner

chrisant996 commented Mar 14, 2024

@bw1faeh0 Good news -- I can reproduce the behavior. But only in Windows Terminal (all versions; 1.19.10573.0, Preview 1.20.10572.0, and Canary 1.20.240124001-llm). Not in legacy conhost, not in ConEmu, not in ConsoleZ.

And thank you for including the full sample output, so that I could construct a directory containing files that produce the exact same output; which I then redirected and embedded in a separate C++ program.

The following C++ program reproduces the problem, without the need for any special directories or files being present. Interestingly, adding other lines makes it not reproduce the problem. So there's something about these lines.

I haven't determined yet whether this is an issue in Windows Terminal, or an issue in Clink, or an issue in clink-flex-prompt. I'm working on narrowing that down further.

#include <stdio.h>
#include <windows.h>

static const WCHAR* const c_strings[]
{
L"-a--- 17k 13 Mar 14:38 \U000f15c0 DevCE_Conn_Configuration.xml\r\n",
L"-a--- 11k 13 Mar 14:44 \U000f15c0 DevCE_Conn_Configuration_fm5.xml\r\n",
L"-a--- 10k 13 Mar 14:43 \U000f15c0 DevCE_Conn_Configuration_fm5_future.xml\r\n",
L"-a--- 17k 13 Mar 14:39 \U000f15c0 DevCE_Conn_Engineering.xml\r\n",
L"-a--- 11k 13 Mar 14:44 \U000f15c0 DevCE_Conn_Engineering_fm5.xml\r\n",
};

int _cdecl main(int argc, char const** argv)
{
    argc--, argv++;

    HANDLE h = GetStdHandle(STD_OUTPUT_HANDLE);
    DWORD written;

    for (const WCHAR* s : c_strings)
    {
        WriteConsoleW(h, s, DWORD(wcslen(s)), &written, nullptr);
    }

    return 0;
}

@chrisant996 chrisant996 added the investigation needed Deeper investigation is needed label Mar 14, 2024
@chrisant996
Copy link
Owner

chrisant996 commented Mar 14, 2024

I tracked it down.

It's due to a change in behavior in Windows Terminal:

The problem occurs when prompt.spacing is set to compact or sparse (which is a variant of compact) AND the last line(s) of output from a program contain one or more "surrogate pairs", i.e. a Unicode codepoint that doesn't fit into 16 bits and thus gets represented by a pair of 16-bit characters (in UTF16, which is the Unicode encoding used natively in Windows).

The problem will only appear in directories where the final line(s) of output from a program (dirx, eza, or any other program) contains one or more surrogate pairs. Some icons require a surrogate pair, and some icons don't. The icon for .xml files uses a surrogate pair. The icon for .cpp files does not use a surrogate pair. And so on.

This is the function that handles compact and sparse:

static void adjust_prompt_spacing()
{
assert(g_printer);
static_assert(MAX == prompt_spacing::MAX, "ambiguous symbol");
const int32 _spacing = s_prompt_spacing.get();
const prompt_spacing spacing = (_spacing < normal || _spacing >= MAX) ? normal : prompt_spacing(_spacing);
if (spacing != normal)
{
// Consume blank lines before the prompt. The prompt.spacing concept
// was inspired by Powerlevel10k, which doesn't consume blank lines,
// but CMD causes blank lines more often than zsh does. So to achieve
// a similar effect it's necessary to actively consume blank lines.
CONSOLE_SCREEN_BUFFER_INFO csbi;
HANDLE h = GetStdHandle(STD_OUTPUT_HANDLE);
if (GetConsoleScreenBufferInfo(h, &csbi) && csbi.dwCursorPosition.X == 0)
{
str<> text;
SHORT y = csbi.dwCursorPosition.Y;
while (y > 0)
{
if (!g_printer->get_line_text(y - 1, text))
break;
if (!text.empty())
break;

To get the output line from the terminal's screen buffer, the get_line_text() call ends up here:

if (!ReadConsoleOutputCharacterW(m_handle, m_chars, csbi.dwSize.X, coord, &len))

The Windows Terminal implementation for that console API behaves strangely when a line contains a surrogate pair. It returns "success", with len == 0. My guess is that the WT API correctly recognizes that the surrogate pair means it can't return the full content of the line (the actual content contains more WCHARs than the width of the screen), and gives up but for some reason still returns success.

In contrast, the legacy conhost implementation for that console API returns "success", with len == 0x78 (since 0x78 is my screen width) and with the following content. The legacy implementation reports that the length is the width of the screen, and returns as much of the line as can fit into a buffer whose size is the width of the screen. Since each surrogate pair takes up TWO characters instead of ONE, the returned result isn't fully precisely accurate, but it's how the API worked until WT, and it's typically a good approximation -- usually what gets lost is just spaces at the end of the screen line.

0x000001C154D44038  2d 00 61 00 2d 00 2d 00 2d 00 20 00 31 00 31 00 6b 00 20 00 31 00 33 00 20 00 4d 00  -.a.-.-.-. .1.1.k. .1.3. .M.
0x000001C154D44054  61 00 72 00 20 00 31 00 34 00 3a 00 34 00 34 00 20 00 81 db c0 dd 20 00 44 00 65 00  a.r. .1.4.:.4.4. ..ÛÀÝ .D.e.
0x000001C154D44070  76 00 43 00 45 00 5f 00 43 00 6f 00 6e 00 6e 00 5f 00 45 00 6e 00 67 00 69 00 6e 00  v.C.E._.C.o.n.n._.E.n.g.i.n.
0x000001C154D4408C  65 00 65 00 72 00 69 00 6e 00 67 00 5f 00 66 00 6d 00 35 00 2e 00 78 00 6d 00 6c 00  e.e.r.i.n.g._.f.m.5...x.m.l.
0x000001C154D440A8  20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . . . . . . . .

The root cause is a difference in behavior that was introduced in Windows Terminal. The API no longer behaves the same as it did before Windows Terminal, so I would say it's a regression. To me, it seems worth fixing in Windows Terminal. I'll open an issue in the WT repo in a few days, with a tiny program that reproduces the problem without Clink or eza or dirx being involved.

I'm marking this "external" because the root cause is in Windows Terminal.

But, I'll also hack Clink to work around the problem, by considering any line of 0 length to be a line that has content which Windows Terminal fails to return, so that Clink stops walking upwards past a seemingly blank line. Anything else in Clink or scripts that tries to read output from the terminal will still fail to read any output from those lines, because Windows Terminal returns a 0 length string (e.g. the complete-numbers command won't see any numbers present in lines that contain one or more surrogate pairs).

@chrisant996 chrisant996 added external The issue is due to external causes outside of Clink and removed investigation needed Deeper investigation is needed labels Mar 14, 2024
@chrisant996 chrisant996 changed the title prompt returns before command output is completed In Windows Terminal, prompt.spacing can "eat" lines that contain Unicode surrogate pairs. Mar 14, 2024
@chrisant996
Copy link
Owner

@bw1faeh0 in the meantime, until the workaround is published, you can run clink set prompt.spacing normal to avoid encountering the problem.

chrisant996 added a commit that referenced this issue Mar 14, 2024
See issue #574.  An API in Windows Terminal behaves differently than in
legacy conhost.

1. It fails, but should succeed.
2. It fails, but returns that it succeeded.
3. It claims to have succeeded, but returns that the length was 0.

This detects the failure based on the 0 length and treats it as having
failed, despite the API claiming to have succeeded.

So, at least Clink and Lua scripts can be aware that it failed, instead
of thinking it succeeded.

Since Clink's Lua API strips trailing spaces, callers have no way to
deduce failure, since the returned string is empty, which is consistent
with a line that was full of spaces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external The issue is due to external causes outside of Clink
Projects
None yet
Development

No branches or pull requests

2 participants