-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure WM_CHAR
functions properly when _UNICODE
is defined.
#260
Ensure WM_CHAR
functions properly when _UNICODE
is defined.
#260
Conversation
this case, we'll convert them to a UTF8 byte sequence and enqueue the characters individually.
Hmmm... I think I see what you mean (took me a little while; I was slow on the uptake here.) Hit, say, € (Euro sign), and Not entirely sure how to handle this. One possibility would be to have some logic along the following lines: Let's say you call So we should remove the U+20AC from the front of the queue and replace it with 82 and AC, and return E2. The next two calls to One small catch : those next two calls may say, "huh, 82 (or AC) is non-ASCII; let's UTF8-ize them." Shouldn't be too tough to keep track of how many decoded bytes are at the start of the queue, though. That's the product of a few minute's thinking after finally realizing what you were trying to fix here, though. I reserve the right to disavow/revise the above upon further thinking (or having my errors pointed out to me.) |
CI fails with
So you may want to add somewhere: #ifndef _UNICODE
#define _UNICODE 0
#endif or check for its definition (whatever is done otherwise) |
Before going too deeply into fixing this patch, I'd warn that I don't expect to use it. It's not quite finished yet, but this patch is (intended to) fix the problem on all platforms, not just WinCon or WinGUI. The idea is that the various platforms continue to queue up the "actual" characters, and It works correctly at present in wide-character builds; I need to shuffle things slightly for 8-bit character builds. Shouldn't be difficult; I've just been short on time lately... it'll also mean that |
Ah, yeah, I sorta forgot about WinCon. Given that my patch is incomplete and there is another upstream patch in progress, I think we can close this ticket. |
I am facing something similar. I am using Ubuntu to cross-compile for windows using mingw32. Duplicating the issue is very easy. Creating a minimum setup for this is easy:
sudo -E apt -qq update && sudo apt upgrade -y
sudo -E apt -qq install -y autoconf automake autopoint gcc mingw-w64 gettext git groff make pkg-config texinfo p7zip-full
git clone https://github.com/Bill-Gray/PDCursesMod.git
cd PDCursesMod/wingui
make WIDE=Y UTF8=Y _w64=Y demos
#include <curses.h>
int quit;
wint_t c;
wchar_t wbuff[2];
int main()
{
initscr();
noecho();
keypad( stdscr, TRUE);
while( !quit)
{
get_wch(&c);
if( c == KEY_F(1) || c == 27){
quit = 1;
}else{
wbuff[0] = c;
wbuff[1] = '\0';
mvaddwstr( 0,1, wbuff);
mvprintw( 1, 1, "U+%08x ", c);
refresh();
}
}
endwin();
return 0;
}
x86_64-w64-mingw32-gcc -Wall -Wextra -pedantic -O3 -I.. -DPDC_FORCE_UTF8 -mwindows -outf8.exe ./utf8.c pdcurses.a -lgdi32 -lcomdlg32 -lwinmm
Just type any char and the program will reprint it and write the unicode. It works up to a certain point. If you introduce an emoji like this 😀 with unicode U+0001F600 — using an unicode above U+0000FFFF — PDCursesMod fails. It fails to detect the code, and obviously, it can´t print it properly. An easy way to type emojis directly in Windows 11 is by pressing With a more extended code you can see that PDCursesMod is actually delivering two codes in the buffer 0xd83d 0xde00 corresponding to the UTF-16 encoding of U+0001F600. I hope this clarifies the issue with the PDCursesMod support of UTF8 in Windows. I have tested the same behavior using the |
Probably this code can illustrate better the problem with wide char management in PDCursesMod. Just use the steps mentioned in my previous comment. but in step 3 use this code: #include <curses.h>
int i=0, quit=0;
wint_t c, wbuff[3];
unsigned char *address;
int main()
{
initscr();
noecho();
keypad( stdscr, TRUE);
wbuff[1] = ' ';
while( !quit)
{
get_wch(&c);
if( c == KEY_F(1) || c == 27){
quit = 1;
}else{
wbuff[0]=wbuff[1];
wbuff[1] = c;
wbuff[2] = '\0';
mvaddwstr( 0,1, wbuff);
mvprintw( 1, 1, "U+%08x U+%08x ", wbuff[0], wbuff[1]);
mvprintw( 3, 1, "Size of wchar_t is %d bytes", sizeof(wchar_t));
mvprintw( 4, 1, "Size of Wint_t is %d bytes", sizeof(wint_t));
mvprintw( 5, 1, "Size of int is %d bytes", sizeof(int));
mvprintw( 7, 1, "Memory:");
address=&wbuff[0];
for (i = 0; i < 3*sizeof(wint_t); i++){
mvprintw( 8, 1+3*i, "%.2x ", address[i]);
}
refresh();
}
}
endwin();
return 0;
} If you use a font capable of displaying emojis such as
If you type the emoji using the Windows unicode tool ( Read and write chars are not consistent. affecting how strings are stored. |
@okibcn Did you gave that changed version a try? In any case: thanks for the additional test code and scenario. |
@GitMensch I am afraid that the code still fails. I am not attaching another screenshot as it is exactly the same as before. The problem is wider as it affects not only |
@GitMensch, I want to come back to the UTF-8 entry support for all the PDCursesMod flavors. The patch you mentioned has two main problems:
I have got very good results replacing the function in that patch by this one: /* deliver a byte sequence in UTF-8 or UTF-16 encoding.*/
/* Supports supplentary unicode planes up to 20 bits (UTF-16). */
int wgetch(WINDOW *win)
{
static union
{
wchar_t word[2];
char byte[4];
} buff;
static size_t n_buff = 0;
int rval = 0;
if (!n_buff) /* Empty buffer, no pending bytes */
{
rval = raw_wgetch(win);
do
{
if (rval < 0)
rval = ERR;
if( rval == ERR || (rval < 0x80) || PDC_is_function_key(rval))
return rval;
buff.word[0] = rval;
if ( (rval & 0xF800) != 0xD800) /* single-unit UTF-16 unicode codepoint */
n_buff = 2;
else /* Potential multiunit UTF-16 code */
{
rval = raw_wgetch(win); /* get another UTF-16 unit from buffer*/
if ( (rval & 0xFC00) != 0xDC00) /* mask check of second UTF-16 unit */
{
unget_wch((wchar_t)rval);
return ERR;
}
buff.word[1]=rval;
n_buff = 4;
rval = (((int)(buff.word[0] & 0x3FF) << 10) | (rval & 0x3FF)) + 0x10000; /* rval is the full unicode */
}
} while (!n_buff);
#ifdef PDC_FORCE_UTF8
n_buff = PDC_wc_to_utf8(buff.byte, rval); /* rval has the raw unicode codepoint */
#endif
}
size_t i;
rval = buff.byte[0] & 0xFF;
n_buff--;
for (i = 0; i < n_buff; i++)
buff.byte[i] = buff.byte[i + 1];
return (rval);
}
int get_wch(wint_t *wch)
{
PDC_LOG(("get_wch() - called\n"));
return wget_wch(stdscr, wch);
} Obviously, you will need to add the prototype This solves the UTF-8 input for all the PDCursesMod flavors, but there is still another problem. PDCursesMod is not capable to print all the supplemental pages beyond unicode codepoint 64K. Basically you need both to properly use it. @Bill-Gray and @GitMensch, I am modifying other functions to desabiguate wc (widechar) from UC (unicode). In your code it seems that you are using both as wchar_t, but a codepoint sometimes requires more than one wchar_t. |
@Bill-Gray - just to recheck: did you worked on your unfinished patch to fix it on all platforms, possibly with including the function replacement suggested by @okibcn? For now I've created a bug issue for this and - as asked by @clangen - close this PR in the hope for the patch being finished and checked in. |
…g characters past 256 should be expanded into multi-byte strings.
I refer to the fix in commit bd50b2b as 'partial' because in non-wide mode, |
This should fix it completely. Already tested and working |
Note: I think that this patch in its current form may not be something you'd want to merge; it may not even be something you want to merge at all, but I wanted to bring it up...
My app that uses
PDCursesMod
is compiled with theUse Unicode Character Set
option with MSVC++, which will ensure allWin32
API calls use the wide-character variants, all of which use UTF16 encoded characters and strings regardless of the user's selected locale.That means that when
_UNICODE
is defined the character sent to theWM_CHAR
message is UTF16 encoded -- andPDCursesMod
adds it to key queue directly viaadd_key_to_queue((int) wParam);
My app definesPDC_FORCE_UTF8
, so I was assuming that internally this character would be converted to a sequence of UTF8 bytes to be received bygetch()
calls.However, this was not the case, and parsing some non-ascii characters was problematic. To work around this, I added a patch that converts the UTF16 character received by
WM_CHAR
into a sequence of UTF8 bytes, and adds them to the key queue one at a time.