diff --git a/README.rst b/README.rst index 4ab24b9..972151d 100644 --- a/README.rst +++ b/README.rst @@ -19,34 +19,40 @@ Introduction ============ This API is mainly for Terminal Emulator implementors -- any python program -that attempts to determine the printable width of a string on a Terminal. +that attempts to determine the printable width of a string on a Terminal. It +is implemented in python (no C library calls) and has no 3rd-party dependencies. -It is certainly possible to use your Operating System's ``wcwidth()`` and -``wcswidth()`` calls if it is POSIX-conforming, but this would not be possible +It is certainly possible to use your Operating System's ``wcwidth(3)`` and +``wcswidth(3)`` calls if it is POSIX-conforming, but this would not be possible on non-POSIX platforms, such as Windows, or for alternative Python -implementations, such as jython. - -Furthermore, testing (`wcwidth-libc-comparator.py`_) has shown that libc -wcwidth() is particularly out of date on most operating systems, reporting -1 -for a great many characters that are actually a displayable width of 1 or 2. +implementations, such as jython. It is also commonly many releases older +than the most current Unicode Standard release files, which this project +aims to track. The most current release of this API is based from Unicode Standard release -_7.0.0_, dated 2014-02-28, 23:15:00 GMT [KW, LI] +*7.0.0*, dated *2014-02-28, 23:15:00 GMT [KW, LI]* for table generated by +file ``EastAsianWidth-7.0.0.txt`` and *2014-02-07, 18:42:08 GMT [MD]* for +``DerivedCombiningClass-7.0.0.txt``. + +Installation +------------ + +The stable version of this package is maintained on pypi, install using pip:: + + pip install wcwidth Problem ------- You may have noticed some characters especially Chinese, Japanese, and Korean (collectively known as the *CJK Unified Ideographs*) consume more -than 1 terminal cell. - -In python, if you ask for the length of the string, ``u'コンニチハ'`` -(Japanese: Hello), it is correctly determined to be a length of **5**. +than 1 terminal cell. If you ask for the length of the string, ``u'コンニチハ'`` +(Japanese: Hello), it is correctly determined to be a length of **5** using +the ``len()`` built-in. However, if you were to print this to a Terminal Emulator, such as xterm, -urxvt, Terminal.app, or PuTTY, it would consume **10** *cells* (columns) -- -two for each symbol. - +urxvt, Terminal.app, PuTTY, or iTerm2, it would consume **10** *cells* (columns). +This causes problems for many of the text-alignment functions, such as ``rjust()``. On an 80-wide terminal, the following would wrap along the margin, instead of displaying it right-aligned as desired:: @@ -65,17 +71,10 @@ that the length of ``wcwidth(u'コ')`` is reported as ``2``, and This allows one to determine the printable effects of displaying *CJK* characters on a terminal emulator. -Installation ------------- - -The stable version of this package is maintained on pypi, install using pip:: - - pip install wcwidth - wcwidth, wcswidth ----------------- -Use ``wcwidth`` to determine the length of a single character, -and ``wcswidth`` to determine the length of a string of characters. +Use ``wcwidth`` to determine the length of a *single character*, +and ``wcswidth`` to determine the length of a *string of characters*. To Display ``u'コンニチハ'`` right-adjusted on screen of 80 columns:: @@ -88,9 +87,9 @@ To Display ``u'コンニチハ'`` right-adjusted on screen of 80 columns:: Values ------ -See the docstring for ``wcwidth()``, general overview of return values: +A general overview of return values: - - ``-1``: indeterminate, such as combining_ characters. + - ``-1``: indeterminate (see Todo_). - ``0``: do not advance the cursor, such as NULL. @@ -99,12 +98,37 @@ See the docstring for ``wcwidth()``, general overview of return values: - ``1``: all others. ``wcswidth()`` simply returns the sum of all values along a string, or -``-1`` if it has occurred for any value returned by ``wcwidth()``. +``-1`` if it has occurred for any value returned by ``wcwidth()``. A more +exacting list of conditions and return values may be found in the docstring +for ``wcwidth()``. + +Discrepacies +------------ + +There may be discrepancies with the determined printable width of of characters +by *wcwidth* and the results of any given terminal emulator -- most commonly, +emulators are using your Operating System's ``wcwidth(3)`` implementation which +is often based on tables much older than the most current Unicode Specification. +Python's determination of non-zero combining_ characters may also be based on an +older specification. + +You may determine an exacting list of these discrepancies using files +`wcwidth-libc-comparator.py`_ and `wcwidth-combining-comparator.py`_ + +.. _`wcwidth-libc-comparator.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-libc-comparator.py +.. _`wcwidth-combining-comparator.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-combining-comparator.py + ========== Developing ========== +Execute the command ``python setup.py develop`` to prepare an environment +for running tests (``python setup.py test``), updating tables ( +``python setup.py update``) or using any of the scripts in the ``bin/`` +sub-folder. These files are only made available in the source repository. + + Updating Tables --------------- @@ -113,7 +137,10 @@ The command ``python setup.py update`` will fetch the following resources: - http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt - http://www.unicode.org/Public/UNIDATA/extracted/DerivedCombiningClass.txt -Generating the table files `wcwidth/table_wide.py`_ and `wcwidth/table_comb.py`_. +And generate the table files `wcwidth/table_wide.py`_ and `wcwidth/table_comb.py`_. + +.. _`wcwidth/table_wide.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_wide.py +.. _`wcwidth/table_comb.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_comb.py wcwidth.c --------- @@ -122,9 +149,8 @@ This code was originally derived directly from C code of the same name, whose latest version is available at: `wcwidth.c`_ And is authored by Markus Kuhn -- 2007-05-26 (Unicode 5.0) -Any subsequent changes were done by directly testing against the various libc -implementations of POSIX-compliant Operating Systems, such as Mac OSX, Linux, -and OpenSolaris. +.. _`wcwidth.c`: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c + Examples -------- @@ -133,17 +159,24 @@ This library is used in: - `jquast/blessed`_, a simplified wrapper around curses. -- `jonathanslenders/python-prompt-toolkit`_, a Library for building powerful interactive command lines in Python. +- `jonathanslenders/python-prompt-toolkit`_, a Library for building powerful + interactive command lines in Python. Additional tools for displaying and testing wcwidth is found in the ``bin/`` folder of this project (github link: `wcwidth/bin`_). They are not distributed as a script or part of the module. +.. _`jquast/blessed`: https://github.com/jquast/blessed +.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit +.. _`wcwidth/bin`: https://github.com/jquast/wcwidth/tree/master/bin + Todo ---- -It is my wish that `combining`_ characters are understood. Currently, -any string containing combining characters will always return ``-1``. +Though some of the most common ("zero-width") `combining`_ characters +are understood by wcswidth, there are still many edge cases that need +to be covered, especially certain kinds of sequences such as those +containing Control-Sequence-Inducer (CSI). License @@ -181,31 +214,33 @@ an OSI-approved license that appears most-alike has been chosen, the MIT license OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -.. _`jquast/blessed`: https://github.com/jquast/blessed -.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit -.. _`wcwidth/bin`: https://github.com/jquast/wcwidth/tree/master/bin -.. _`wcwidth-libc-comparator.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-libc-comparator.py -.. _`wcwidth/table_wide.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_wide.py -.. _`wcwidth/table_comb.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_comb.py -.. _`combining`: https://en.wikipedia.org/wiki/Combining_character -.. _`wcwidth.c`: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c - Changes ------- 0.1.4 + * **Feature**: ``wcswidth()`` now determines printable length + for (most) combining characters. The developer's tool + `bin/wcwidth-browser.py`_ is improved to display combining_ + characters when provided the ``--combining`` option + (`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_). * added static analysis (prospector_) to testing framework. 0.1.3 - * *Bugfix*: 2nd parameter of wcswidth was not honored. - (`thomasballinger`_ PR #4). + * **Bugfix**: 2nd parameter of wcswidth was not honored. + (`Thomas Ballinger`_, `PR #4`). 0.1.2 - * Updated tables to Unicode Specification 7.0.0 - (`thomasballinger`_ PR #3). + * **Updated** tables to Unicode Specification 7.0.0. + (`Thomas Ballinger`_, `PR #3`). 0.1.1 * Initial release to pypi, Based on Unicode Specification 6.3.0 -.. _`thomasballinger`: https://github.com/thomasballinger .. _`prospector`: https://github.com/landscapeio/prospector +.. _`combining`: https://en.wikipedia.org/wiki/Combining_character +.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py +.. _`Thomas Ballinger`: https://github.com/thomasballinger +.. _`Leta Montopoli`: https://github.com/lmontopo +.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3 +.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4 +.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5 diff --git a/bin/wcwidth-browser.py b/bin/wcwidth-browser.py index 9dcd4ca..541cd1c 100755 --- a/bin/wcwidth-browser.py +++ b/bin/wcwidth-browser.py @@ -5,6 +5,18 @@ This displays the full range of unicode points for 1 or 2-character wide ideograms, with pipes ('|') that should always align for any terminal that supports utf-8. + +Usage: + ./bin/wcwidth-browser.py [--wide=] + [--alignment=] + [--combining] + [--help] + +Options: + --wide= Browser 1 or 2 character-wide cells. + --alignment= Chose left or right alignment. [default: left] + --combining Use combining character generator. [default: 2] + --help Display usage """ # pylint: disable=C0103 # Invalid module name "wcwidth-browser" @@ -18,10 +30,11 @@ import signal # local imports -from wcwidth import wcwidth +from wcwidth import wcwidth, table_comb # 3rd party imports from blessed import Terminal +from docopt import docopt # BEGIN, python 2.6 through 3.4 compatibilities, @@ -126,6 +139,51 @@ def __next__(self): next = __next__ +class WcCombinedCharacterGenerator(object): + + """ Generator yields unicode characters with combining. """ + + # pylint: disable=R0903 + # Too few public methods (0/2) + + def __init__(self, width=1): + """ + Class constructor. + + :param width: generate characters of given width. + :type width: int + """ + self.characters = [] + letters_o = (u'o' * width) + for boundaries in table_comb.NONZERO_COMBINING: + for val in [_val for _val in + range(boundaries[0], boundaries[1] + 1) + if _val <= LIMIT_UCS]: + self.characters.append(letters_o[:1] + + unichr(val) + + letters_o[1:]) + self.characters.reverse() + + def __iter__(self): + """ Special method called by iter(). """ + return self + + def __next__(self): + """ Special method called by next(). """ + while True: + if not self.characters: + raise StopIteration + ucs = self.characters.pop() + try: + name = string.capwords(unicodedata.name(ucs[1])) + except ValueError: + continue + return (ucs, name) + + # python 2.6 - 3.3 compatibility + next = __next__ + + class Style(object): """ Styling decorator class instance for terminal output. """ @@ -138,10 +196,8 @@ class Style(object): continuation = u' $' header_hint = u'-' header_fill = u'=' - name_len = 0 + name_len = 10 alignment = 'right' - msg_loading = '[please wait]' - msg_fill = '[drawing ...]' def __init__(self, **kwargs): """ @@ -158,8 +214,7 @@ class Screen(object): """ Represents terminal style, data dimensions, and drawables. """ - intro_msg_fmt = (u'Characters {wide} terminal cells wide. ' - u'Delimiters ({delim}) should align.') + intro_msg_fmt = u'Delimiters ({delim}) should align.' def __init__(self, term, style, wide=2): """ Class constructor. """ @@ -202,9 +257,8 @@ def head_item(self): def msg_intro(self): """ Introductory message disabled above heading. """ delim = self.style.attr_minor(self.style.delimiter) - wide = self.style.attr_major('{}'.format(self.wide)) - return self.term.wrap(self.intro_msg_fmt.format( - wide=wide, delim=delim)) + txt = self.intro_msg_fmt.format(delim=delim).rstrip() + return self.term.center(txt) @property def row_ends(self): @@ -226,7 +280,7 @@ def num_rows(self): @property def row_begins(self): """ Top row displayed for content. """ - return len(self.msg_intro) + 1 + return 2 @property def page_size(self): @@ -258,14 +312,14 @@ def __init__(self, term, screen, character_factory): self.character_generator = self.character_factory(self.screen.wide) self.dirty = self.STATE_REFRESH self.last_page = 0 - - self._page_data = self.initialize_page_data() - self._set_lastpage() + self._page_data = list() def on_resize(self, *args): """ Signal handler callback for SIGWINCH. """ # pylint: disable=W0613 # Unused argument 'args' + self.screen.style.name_len = min(self.screen.style.name_len, + self.term.width - 15) assert self.term.width >= self.screen.hint_width, ( 'Screen to small {}, must be at least {}'.format( self.term.width, self.screen.hint_width)) @@ -387,6 +441,8 @@ def run(self, writer, reader): instance of blessed.keyboard.Keystroke. :type reader: callable """ + self._page_data = self.initialize_page_data() + self._set_lastpage() if not self.term.is_a_tty: self._run_notty(writer) else: @@ -483,13 +539,12 @@ def draw(self, writer, idx, offset): # our self.dirty flag can become re-toggled; because we are # not re-flowing our pagination, we must begin over again. while self.dirty: - if not self.draw_heading(writer): - self.draw_loading(writer, idx) + self.draw_heading(writer) self.dirty = self.STATE_CLEAN (idx, offset), data = self.page_data(idx, offset) for txt in self.page_view(data): writer(txt) - self.draw_status(writer, idx, offset) + self.draw_status(writer, idx) flushout() return idx, offset @@ -506,54 +561,33 @@ def draw_heading(self, writer): if self.dirty == self.STATE_REFRESH: writer(u''.join( (self.term.home, self.term.clear, - '\n'.join(self.screen.msg_intro), - '\n', self.screen.header, '\n',))) + self.screen.msg_intro, '\n', + self.screen.header, '\n',))) return True - def draw_loading(self, writer, idx): - """ - Conditionally draw 'loading' status when output terminal is a tty. - - :param writer: callable writes to output stream, receiving unicode. - :type writer: callable - :param idx: current page position index. - :type idx: int - """ - if self.term.is_a_tty: - writer(self.term.show_cursor()) - style = self.screen.style - if idx not in self._page_data: - txt = style.attr_major(self.screen.style.msg_loading) - else: - txt = style.attr_minor(self.screen.style.msg_fill) - writer(u' {0}'.format(txt)) - flushout() - - def draw_status(self, writer, idx, offset): + def draw_status(self, writer, idx): """ Conditionally draw status bar when output terminal is a tty. :param writer: callable writes to output stream, receiving unicode. :param idx: current page position index. :type idx: int - :param offset: scrolling region offset of current page. - :type offset: int """ if self.term.is_a_tty: writer(self.term.hide_cursor()) style = self.screen.style writer(self.term.move(self.term.height - 1)) if idx == self.last_page: - last_end = u' (END)' + last_end = u'(END)' else: last_end = u'/{0}'.format(self.last_page) - writer(u'Page {idx}(:{offset}){last_end}). ' + txt = (u'Page {idx}{last_end} - ' u'{q} to quit, [keys: {keyset}]' .format(idx=style.attr_minor(u'{0}'.format(idx)), - offset=style.attr_minor(u'{0}'.format(offset)), last_end=style.attr_major(last_end), keyset=style.attr_major('kjfb12-='), q=style.attr_minor(u'q'))) + writer(self.term.center(txt).rstrip()) def page_view(self, data): """ @@ -605,24 +639,56 @@ def text_entry(self, ucs, name): idx = max(0, style.name_len - len(style.continuation)) name = u''.join((name[:idx], style.continuation if idx else u'')) if style.alignment == 'right': - fmt = u' '.join(('0x{value:0>{ucs_len}x}', + fmt = u' '.join(('0x{val:0>{ucs_printlen}x}', '{name:<{name_len}s}', '{delimiter}{ucs}{delimiter}' )) else: fmt = u' '.join(('{delimiter}{ucs}{delimiter}', - '0x{value:0>{ucs_len}x}', + '0x{val:0>{ucs_printlen}x}', '{name:<{name_len}s}')) delimiter = style.attr_minor(style.delimiter) + if len(ucs) != 1: + # determine display of combining characters + val = ord(next((_ucs for _ucs in ucs + if wcwidth(_ucs) == -1))) + # a combining character displayed of any fg color + # will reset the foreground character of the cell + # combined with (iTerm2, OSX). + disp_ucs = style.attr_major(ucs[0:2]) + if len(ucs) > 2: + disp_ucs += ucs[2] + else: + # non-combining + val = ord(ucs) + disp_ucs = style.attr_major(ucs) + return fmt.format(name_len=style.name_len, - ucs_len=UCS_PRINTLEN, + ucs_printlen=UCS_PRINTLEN, delimiter=delimiter, name=name, - ucs=style.attr_major(ucs), - value=ord(ucs)) + ucs=disp_ucs, + val=val) + + +def validate_args(opts): + """ Validate and return options provided by docopt parsing. """ + if opts['--wide'] is None: + opts['--wide'] = 2 + else: + assert opts['--wide'] in ("1", "2"), opts['--wide'] + if opts['--alignment'] is None: + opts['--alignment'] = 'left' + else: + assert opts['--alignment'] in ('left', 'right'), opts['--alignment'] + opts['--wide'] = int(opts['--wide']) + opts['character_factory'] = WcWideCharacterGenerator + if opts['--combining']: + opts['character_factory'] = WcCombinedCharacterGenerator + return opts -def main(): +def main(opts): """ Program entry point. """ term = Terminal() style = Style() @@ -630,17 +696,18 @@ def main(): # if the terminal supports colors, use a Style instance with some # standout colors (magenta, cyan). if term.number_of_colors: - style = Style(attr_major=term.magenta, attr_minor=term.bright_cyan) - if not term.is_a_tty: - # use a fixed 1-column length of ~80 characters - style.name_len = 80 - 15 + style = Style(attr_major=term.magenta, + attr_minor=term.bright_cyan, + alignment=opts['--alignment']) + style.name_len = term.width - 15 - screen = Screen(term, style) - character_factory = WcWideCharacterGenerator - pager = Pager(term, screen, character_factory) + screen = Screen(term, style, wide=opts['--wide']) + pager = Pager(term, screen, opts['character_factory']) - with term.location(), term.cbreak(), term.fullscreen(): + with term.location(), term.cbreak(), \ + term.fullscreen(), term.hidden_cursor(): pager.run(writer=echo, reader=term.inkey) + return 0 if __name__ == '__main__': - main() + exit(main(validate_args(docopt(__doc__)))) diff --git a/setup.py b/setup.py index b633723..1dc33e5 100755 --- a/setup.py +++ b/setup.py @@ -227,7 +227,7 @@ def run(self): ('dodgy', 'frosted', 'mccabe', 'pep257', 'pep8', 'pylint', 'pyroma', 'vulture',)]) self.spawn(('pip', 'install', '-U', - 'blessed', 'requests', 'tox', + 'blessed', 'requests', 'tox', 'docopt', 'prospector[{0}]'.format(with_prospector))) diff --git a/tox.ini b/tox.ini index d14c9cd..2172ee6 100644 --- a/tox.ini +++ b/tox.ini @@ -23,7 +23,7 @@ commands = {envbindir}/py.test \ [testenv:prospector] usedevelop = True -deps = prospector[with_dodgy,with_frosted,with_mccabe,with_pep257,with_pep8,with_pylint,with_pyroma,with_vulture] +deps = prospector[with_dodgy,with_frosted,with_mccabe,with_pep257,with_pep8,with_pylint,with_pyroma] commands = prospector [testenv:vulture] diff --git a/wcwidth/tests/test_core.py b/wcwidth/tests/test_core.py index c527cef..cfff611 100755 --- a/wcwidth/tests/test_core.py +++ b/wcwidth/tests/test_core.py @@ -83,12 +83,12 @@ def test_control_c0_width_negative_1(): def test_combining_width_negative_1(): """ - Simple test combining reports width -1. + Simple test combining reports total width of 4. """ # given, phrase = u'--\u05bf--' expect_length_each = (1, 1, -1, 1, 1) - expect_length_phrase = -1 + expect_length_phrase = 4 # exercise, length_each = tuple(map(wcwidth.wcwidth, phrase)) diff --git a/wcwidth/wcwidth.py b/wcwidth/wcwidth.py index b563ff2..802316a 100644 --- a/wcwidth/wcwidth.py +++ b/wcwidth/wcwidth.py @@ -199,6 +199,9 @@ def wcswidth(pwcs, n=None): for char in pwcs[idx]: wcw = wcwidth(char) if wcw < 0: + ucs = ord(char) + if _bisearch(ucs, NONZERO_COMBINING): + continue return -1 else: width += wcw