-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3e0ac5c
commit 81dcf90
Showing
3 changed files
with
96 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
====================== | ||
Python String Encoding | ||
====================== | ||
|
||
The Python developer community has published a great article that covers the | ||
details of unicode character processing. | ||
|
||
- Python 3: https://docs.python.org/3/howto/unicode.html | ||
- Python 2: https://docs.python.org/2/howto/unicode.html | ||
|
||
The following notes are intended to help answer some common questions and issues | ||
that developers frequently encounter while learning to properly work with different | ||
character encodings in Python. | ||
|
||
Does ChatterBot handle non-ascii characters? | ||
============================================ | ||
|
||
ChatterBot is able to handle unicode values correctly. You can pass it | ||
non-encoded data and it should be able to process it properly | ||
(you will need to make sure that you decode the output that is returned). | ||
|
||
Bellow is one of ChatterBot's tests from `tests/test_chatbot.py`_, | ||
this is just a simple check that a unicode response can be processed. | ||
|
||
.. code-block:: python | ||
def test_get_response_unicode(self): | ||
""" | ||
Test the case that a unicode string is passed in. | ||
""" | ||
response = self.chatbot.get_response(u'سلام') | ||
self.assertGreater(len(response.text), 0) | ||
This test passes in both Python 2.7 and 3.x. It also verifies that | ||
ChatterBot *can* take unicode input without issue. | ||
|
||
Fixing encoding errors | ||
====================== | ||
|
||
When working with string type data in Python, it is possible to encounter errors | ||
such as the following. | ||
|
||
.. code-block:: text | ||
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 48: invalid start byte | ||
Depending on what your code looks like, there are a few things that you can do | ||
to prevent errors like this. | ||
|
||
Unicode header | ||
-------------- | ||
|
||
.. code-block:: python | ||
# -*- coding: utf-8 -*- | ||
When to use | ||
+++++++++++ | ||
|
||
If your strings use escaped unicode characters (they looks like :code:`u'\u00b0C'`) then | ||
you do not need add the header. If your strings like :code:`'ØÆÅ'` then you are required | ||
to use the header. | ||
|
||
If you are using this header it must be the first line in your Python file. | ||
|
||
Unicode escape characters | ||
------------------------- | ||
|
||
.. code-block:: text | ||
>>> print u'\u0420\u043e\u0441\u0441\u0438\u044f' | ||
Россия | ||
When to use | ||
+++++++++++ | ||
|
||
Prefix your strings with the unicode escape character :code:`u'...'` when you are | ||
using excaped unicode characters. | ||
|
||
Import unicode literals from future | ||
----------------------------------- | ||
|
||
.. code-block:: python | ||
from __future__ import unicode_literals | ||
When to use | ||
+++++++++++ | ||
|
||
Use this when you need to make sure that Python 3 code also works in Python 2. | ||
|
||
A good article on this can be found here: http://python-future.org/unicode_literals.html | ||
|
||
.. _`tests/test_chatbot.py`: https://github.com/gunthercox/ChatterBot/blob/master/tests/test_chatbot.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -70,6 +70,7 @@ Contents: | |
utils | ||
django/index | ||
testing | ||
encoding | ||
|
||
Report an Issue | ||
=============== | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters