Skip to content
View andjc's full-sized avatar
  • Melbourne, Australia

Block or report andjc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. enabling-languages/python-i18n enabling-languages/python-i18n Public

    Random notes on Python internationalisation

    Jupyter Notebook 18

  2. enabling-languages/library-i18n enabling-languages/library-i18n Public

    Exploration of internationalisation issues for libraries.

    Jupyter Notebook 1

  3. Grapheme tokenisation in Python Grapheme tokenisation in Python
    1
    # Grapheme tokenisation in Python 
    2
    
                  
    3
    When working with tokenisation and break iterators, it is sometimes necessary to work at the character, syllable, line, or sentence levels. Character level tokenisation is an interesting case. By character, I mean a user perceivable unit of text, which the Unicode standard would refer to as a grapheme. The usual way I see developers handling character level tokenisation of English is via list comprehension or typecasting a string to a list:
    4
    
                  
    5
    ```py
  4. enabling-languages/dinka enabling-languages/dinka Public

    Dinka language resources

    JavaScript 1

  5. enabling-languages/nuer enabling-languages/nuer Public

    Nuer language resources

    Rich Text Format 1

  6. enabling-languages/australian_indigenous enabling-languages/australian_indigenous Public

    Keyboard layouts and web support for Aboriginal and Torres Straight Island languages

    4