Patch: new method get_wch for ncurses bindings: accept wide characters (unicode) #51004

inigoserna · 2009-08-21T11:43:30Z

BPO	6755
Nosy	@akuchling, @jcea, @cben, @pitrou, @vstinner, @bitdancer
Files	curses.get_wch.patch: Patch for the documentation test_get_wch.py: Test example _cursesmodule.get_wch.patch: Patch against Python 2.6.2 _cursesmodule.c _cursesmodule.311.get_wch.patch: Patch against Python 3.1.1 _cursesmodule.c test_ucs2w.py: Several implementations of wcwidth() and wcswidth() ucs2w.c: C extension implementation of wcwidth() and wcswidth()

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-03-08.01:18:55.099>
created_at = <Date 2009-08-21.11:43:30.490>
labels = ['extension-modules', 'type-feature']
title = 'Patch: new method get_wch for ncurses bindings: accept wide characters (unicode)'
updated_at = <Date 2012-03-08.01:18:55.098>
user = 'https://bugs.python.org/inigoserna'

bugs.python.org fields:

activity = <Date 2012-03-08.01:18:55.098>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2012-03-08.01:18:55.099>
closer = 'vstinner'
components = ['Extension Modules']
creation = <Date 2009-08-21.11:43:30.490>
creator = 'inigoserna'
dependencies = []
files = ['14753', '14754', '14755', '14756', '14782', '14783']
hgrepos = []
issue_num = 6755
keywords = ['patch']
message_count = 30.0
messages = ['91816', '91817', '91818', '91819', '91821', '91867', '91868', '91956', '91973', '91974', '91975', '91982', '108249', '139333', '139335', '140371', '140374', '140378', '140403', '140404', '140408', '140409', '140410', '140412', '140640', '143798', '155052', '155054', '155119', '155146']
nosy_count = 13.0
nosy_names = ['akuchling', 'jcea', 'cben', 'pitrou', 'vstinner', 'gpolo', 'r.david.murray', 'inigoserna', 'phep', 'zeha', 'schodet', 'python-dev', 'Nicholas.Cole']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue6755'
versions = ['Python 3.3']

inigoserna · 2009-08-21T11:43:30Z

Currently,there is no a simple way in curses bindings to get the code
associated with a key press of non ascii keystroke (f.e. ç) in terminals
configured with UTF-8 encoding.

getch returns the code for a wide character byte a byte.
But ncurses library has a proper function to do it: get_wch.

Patch against Python v2.6.2 to provide this missing get_wch method
in the ncurses bindings.

Include a test example and a patch to the documentation as well.

More info and a partial solution without patching python curses module
on this thread:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/67dce30f0a2742a6?fwc=2

inigoserna · 2009-08-21T11:44:33Z

Added patch for the documentation

inigoserna · 2009-08-21T11:46:20Z

Added test example

inigoserna · 2009-08-21T11:53:25Z

Added missing file: patch against Python v2.6.2

inigoserna · 2009-08-21T12:14:02Z

Added patch against Python v3.1.1.
NOT TESTED!

gpolo · 2009-08-22T18:13:31Z

Have you looked into bpo-700921 already ? It seems a lot of discussion
was generated there, but no patches.

inigoserna · 2009-08-22T18:23:23Z

Thanks for the pointer, haven't seen anything when I searched for get_wch.

The patch provided here only adds this get_wch function, because as A.M.
Kuchling explained in bpo-700921, it's possible to use wide chars now,
the only feature missing is get_wch.

cben · 2009-08-25T19:26:31Z

Nice. 2 questions:

Why not change getch() to always use get_wch()?
I think you also want fix getkey() / introduce get_wkey().

inigoserna · 2009-08-26T15:47:54Z

Q. Why not change getch() to always use get_wch()?

This could break backwards compatibility.
There are some code out there that may use getch() to get the bytes
stream one by one and build the wide char.
In fact I'm using this trick to get unicode chars by now.
Look the thread link in first comment to find the implementation I've
developed for my app. Other people are using similar approaches too.

Q. I think you also want fix getkey() / introduce get_wkey().

In my own experience get_wkey isn't be as useful when dealing with wide
chars. But, of course, that's only my use cases.

inigoserna · 2009-08-26T16:13:20Z

Btw, I don't know if this is the best place to comment it but as it is
somehow related with ncurses...

Other functions I miss a lot are wcwidth() and wcswidth().

These functions return the real width (read, cells length in screen) for
unicode strings.

An example to clarify the issue: one simple Chinese character could need
2 cells on screen, thus len(chinese_unicode_string) won't return the
real screen width needed to show the string.

i.e., len(chinese_unicode_string) != wcswidth(chinese_unicode_string)

Those functions are included into not so old glibc versions (2.2+?), at
least on my Linux systems.

Sadly enough, python doesn't bind them, afaik.
I've tried ctypes but don't work for me (don't know the reason), so I've
written some replacements.

Please look at these files:

test_ucs2w.py: benchmarks to different implementations. Most of them,
pure python. Please consider only ucs2w_1x, other are only experiments.
ucs2w.c: C extension implementation

I think Python could benefit from having these functions in the standard
library. Surely, most simple way should be to bind glibc functions, but
don't know if they exist on other platforms such as MacOS X or Windows.

Neither do I know where they fit... perhaps in unicodedata module.

What do you think? who is the person to convince? (please, don't ask me
to write a PEP, my English is not good enough).

bitdancer · 2009-08-26T17:03:43Z

For the title concern of this patch I'm adding akuchling as nosy.

Judging by your post your English probably is good enough to write a PEP
(the PEP editors should help with fine tuning it, at least in theory).
However, I doubt a PEP would be necessary.

As for where to raise the (new) issue...given that these functions are
independent of curses, I'd say open a new issue. If you do that, I'd
suggest making lemburg nosy on the issue, as he is the original author
of the unicodedata module and many other unicode things in python.

I'm setting the stage to test needed since your test case isn't a unit
test. I have no idea (never having worked with curses) how hard a unit
test would be to write. Nor have I reviewed the patch, for the same reason.

vstinner · 2009-08-26T22:14:24Z

inigoserna>> Other functions I miss a lot are wcwidth() and wcswidth()

I wrote a patch to implement unicode.width() method:
http://bugs.python.org/file13357/unicode_width.patch

It's part of the issue bpo-2382 (SyntaxError cursor shifted if multibyte
character is in line)

zeha · 2010-06-20T21:22:35Z

Will this be part of 3.2 and possibly 2.7?

Without these patches wide character input using curses is basically impossible (on at least some platforms).

NicholasCole · 2011-06-27T22:52:11Z

Is there any hope that something like this patch will make it into a future version? As far as I can see, entering accented characters is currently impossible on the latest release versions of python...or am I missing something?

vstinner · 2011-06-27T23:03:55Z

Can someone update the patch for Python 3.3? Python 2.7 and 3.2 don't accept new features.

NicholasCole · 2011-07-14T20:10:31Z

The bug is marked "Test Needed".

I am very keen to see this issue fixed, and would be very willing to help, but I don't really know what is still required. As far as I can see there is a patch waiting - what is the hold up?

python-dev · 2011-07-14T21:08:42Z

New changeset dec10ad41b2a by Victor Stinner in branch 'default':
Close bpo-6755: Add get_wch() method to curses.window class
http://hg.python.org/cpython/rev/dec10ad41b2a

vstinner · 2011-07-14T22:45:56Z

I don't really know what is still required

_cursesmodule.311.get_wch.patch doesn't apply correctly on Python 3.3 and use PyInt_FromLong() function, function removed from Python 3.0. Indeed, Iñigo wrote that the patch was not tested.

what is the hold up?

Nobody wanted to take the responsability of the choice for get_wch(): add a new method or patch getch() ;-)

--

I commited Iñigo's patch to add window.get_wch() method with minor changes:

add :versionadded: 3.3 in the doc
document the new method What's new in Python 3.3 document
fix an error message: getch => get_wch
change error message (if ch==ERROR): "get_wch failed" => "no input" (message copied from the getch function)

--

I think that the Unicode support of curses in Python 3 is just completly broken: I opened a new issue for that, issue bpo-12567.

I also create the issue bpo-12568 to add a function to get the width of a character.

NicholasCole · 2011-07-15T13:21:57Z

Nobody wanted to take the responsability of the choice for get_wch(): add a new method or patch getch() ;-)

I suspect that a new method is the right way to go, here.

I see it has been moved to "committed/rejected" status - does that mean that it might still go in, or that it is rejected?

I think that the Unicode support of curses in Python 3 is just completly broken

It certainly is less than ideal. ;-)

vstinner · 2011-07-15T13:28:50Z

I see it has been moved to "committed/rejected"
status - does that mean that it might still go in, or that
it is rejected?

I commited the new method, did you see my commit dec10ad41b2a?

I propose to continue the discussion on issue bpo-12567 (for example, to decide if we need unget_wch or not).

pitrou · 2011-07-15T13:58:29Z

/home/antoine/cpython/default/Modules/_cursesmodule.c: In function ‘PyCursesWindow_Get_WCh’:
/home/antoine/cpython/default/Modules/_cursesmodule.c:919:9: attention : implicit declaration of function ‘wget_wch’
/home/antoine/cpython/default/Modules/_cursesmodule.c:926:9: attention : implicit declaration of function ‘mvwget_wch’
gcc -pthread -shared build/temp.linux-x86_64-3.3-pydebug/home/antoine/cpython/default/Modules/_cursesmodule.o -L/usr/local/lib -lncurses -o build/lib.linux-x86_64-3.3-pydebug/_curses.cpython-33dm.so
*** WARNING: renaming "_curses" since importing it failed: build/lib.linux-x86_64-3.3-pydebug/_curses.cpython-33dm.so: undefined symbol: mvwget_wch
*** WARNING: importing extension "_curses_panel" failed with <class 'SystemError'>: initialization of _curses_panel raised unreported exception

Failed to build these modules:
_curses _curses_panel

pitrou · 2011-07-15T14:15:01Z

Also compilation warnings on some buildbots:

/var/lib/buildslave/3.x.murray-gentoo-wide/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_Get_WCh':
/var/lib/buildslave/3.x.murray-gentoo-wide/build/Modules/_cursesmodule.c:919: warning: implicit declaration of function 'wget_wch'
/var/lib/buildslave/3.x.murray-gentoo-wide/build/Modules/_cursesmodule.c:926: warning: implicit declaration of function 'mvwget_wch'

vstinner · 2011-07-15T14:36:35Z

implicit declaration of function ‘wget_wch’

Oh oh, I expected such error: it means that your ncurses library don't have the wide character API. The compiler command confirm that: "gcc ... -lncurses ...". You use libncurses and not libncursesw.

Antoine told me that libncursesw is available on its OS, but Python chose libncurses. I suppose that it's because readline is linked to libncurses (and not libncursesw) => see issue bpo-7384.

Antoine setup is not rare: many Linux distro link readline to libncurses, and so Python cannot use libncursesw.

For this issue, it's not a problem: we can just add a test to check if get_wch is available or not, and only define the Python function if the C function does exist. But for bpo-12567, it's a bigger problem because it means that we cannot always use the wide character functions if the argument is Unicode (character/string).

vstinner · 2011-07-15T14:43:40Z

... I suppose that it's because readline is linked to libncurses
(and not libncursesw) => see issue bpo-7384.

See also the issue bpo-9408.

vstinner · 2011-07-19T00:30:11Z

implicit declaration of function ‘wget_wch’

curses_unicode.patch of issue bpo-12567 adds a HAVE_NCURSESW define to only use wide character functions if _curses is linked to libncursesw.

This define can be used to fix this bug (use wget_ch whereas it is not available).

jcea · 2011-09-09T19:55:15Z

I have compiled ncurses myself, supporting wide characters. I get this warnings in the buildbots:

"""
/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Modules/_cursesmodule.c:920: warning: implicit declaration of function 'wget_wch'
/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Modules/_cursesmodule.c:927: warning: implicit declaration of function 'mvwget_wch'
/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Modules/_cursesmodule.c:2760: warning: implicit declaration of function 'unget_wch'
"""

Studying the "ncurses.h", I see the definition of "wget_wch" and others. But these definitions are created only if "_XOPEN_SOURCE_EXTENDED" is defined.

Something to be explored?.

NicholasCole · 2012-03-07T07:22:21Z

I hope that this is the right bug to file this on (I'm getting lost in all of the curses bugs!).

I'm testing out the 3.3a1, and I've run into the following issue. On previous releases addch() could accept curses.ACS_HLINE and similar.

Attempting to use the same code now raises the exception:

OverflowError: byte doesn't fit in chtype.

I'm sure this is related to the new code that uses addwstr(), but currently code that used to work will crash.

I can't work out a fix myself, because I don't fully understand the problem, but I'm happy to provide sample code if it will help.

Nicholas

bitdancer · 2012-03-07T07:40:27Z

Since this bug is about adding a new feature, it is unlikely to be the correct bug for this to be against.

Given that you've identified a regression, I suggest you open a new bug with a reproducer, and we'll set it to release blocker.

NicholasCole · 2012-03-07T20:47:08Z

On Wed, Mar 7, 2012 at 7:40 AM, R. David Murray <report@bugs.python.org> wrote:

R. David Murray <rdmurray@bitdance.com> added the comment:

Since this bug is about adding a new feature, it is unlikely to be the correct bug for this to be against.

Given that you've identified a regression, I suggest you open a new bug with a reproducer, and we'll set it to release blocker.

I've created bpo-14223. I hope I've done so correctly.

Best wishes,

Nicholas

vstinner · 2012-03-08T01:18:42Z

Antoine's issue has been fixed:
"Modules/_cursesmodule.c:919:9: attention : implicit declaration of function ‘wget_wch’"

It looks like Jesús's issue is specific to Solaris (or is already fixed?), and so I added a comment to the issue bpo-13552: "Modules/_cursesmodule.c:920: warning: implicit declaration of function 'wget_wch'".

inigoserna mannequin added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Aug 21, 2009

python-dev mannequin closed this as completed Jul 14, 2011

pitrou reopened this Jul 15, 2011

vstinner closed this as completed Mar 8, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

jquast mentioned this issue Jan 6, 2024

What would wcwidth look like if it were built-in to Python? jquast/wcwidth#94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch: new method get_wch for ncurses bindings: accept wide characters (unicode) #51004

Patch: new method get_wch for ncurses bindings: accept wide characters (unicode) #51004

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

gpolo mannequin commented Aug 22, 2009

inigoserna mannequin commented Aug 22, 2009

cben mannequin commented Aug 25, 2009

inigoserna mannequin commented Aug 26, 2009

inigoserna mannequin commented Aug 26, 2009

bitdancer commented Aug 26, 2009

vstinner commented Aug 26, 2009

zeha mannequin commented Jun 20, 2010

NicholasCole mannequin commented Jun 27, 2011

vstinner commented Jun 27, 2011

NicholasCole mannequin commented Jul 14, 2011

python-dev mannequin commented Jul 14, 2011

vstinner commented Jul 14, 2011

NicholasCole mannequin commented Jul 15, 2011

vstinner commented Jul 15, 2011

pitrou commented Jul 15, 2011

pitrou commented Jul 15, 2011

vstinner commented Jul 15, 2011

vstinner commented Jul 15, 2011

vstinner commented Jul 19, 2011

jcea commented Sep 9, 2011

NicholasCole mannequin commented Mar 7, 2012

bitdancer commented Mar 7, 2012

NicholasCole mannequin commented Mar 7, 2012

vstinner commented Mar 8, 2012

Patch: new method get_wch for ncurses bindings: accept wide characters (unicode) #51004

Patch: new method get_wch for ncurses bindings: accept wide characters (unicode) #51004

Comments

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

inigoserna mannequin commented Aug 21, 2009

gpolo mannequin commented Aug 22, 2009

inigoserna mannequin commented Aug 22, 2009

cben mannequin commented Aug 25, 2009

inigoserna mannequin commented Aug 26, 2009

inigoserna mannequin commented Aug 26, 2009

bitdancer commented Aug 26, 2009

vstinner commented Aug 26, 2009

zeha mannequin commented Jun 20, 2010

NicholasCole mannequin commented Jun 27, 2011

vstinner commented Jun 27, 2011

NicholasCole mannequin commented Jul 14, 2011

python-dev mannequin commented Jul 14, 2011

vstinner commented Jul 14, 2011

NicholasCole mannequin commented Jul 15, 2011

vstinner commented Jul 15, 2011

pitrou commented Jul 15, 2011

pitrou commented Jul 15, 2011

vstinner commented Jul 15, 2011

vstinner commented Jul 15, 2011

vstinner commented Jul 19, 2011

jcea commented Sep 9, 2011

NicholasCole mannequin commented Mar 7, 2012

bitdancer commented Mar 7, 2012

NicholasCole mannequin commented Mar 7, 2012

vstinner commented Mar 8, 2012