Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaml.load does not support encodings different from current system encoding, cannot you add it? #123

Closed
Felix-neko opened this issue Jan 29, 2018 · 8 comments
Labels

Comments

@Felix-neko
Copy link

Felix-neko commented Jan 29, 2018

Hi folks!

We try to use PyYaml in Windows with UTF-8 yaml files. Alas, yaml.load raises an error: it does not support encoding different from system one (in Windows it is CP-1251). Can you add such a feature to manually set the encoding in which the yaml file is?

The traceback, if needed:

Traceback (most recent call last):
  File "D:/Projects/bricks2/main.py", line 45, in <module>
    main_wnd.load_components()
  File "D:\Projects\bricks2\bricks\gui\main_wnd.py", line 286, in load_components
    self.registry.load()
  File "D:\Projects\bricks_cli\bricks_cli\registry.py", line 38, in load
    self._load_config(root_node, config)
  File "D:\Projects\bricks_cli\bricks_cli\registry.py", line 44, in _load_config
    config_obj = yaml.load(open(config, 'r'))
  File "C:\Python35\lib\site-packages\yaml\__init__.py", line 73, in load
    loader = Loader(stream)
  File "C:\Python35\lib\site-packages\yaml\loader.py", line 24, in __init__
    Reader.__init__(self, stream)
  File "C:\Python35\lib\site-packages\yaml\reader.py", line 85, in __init__
    self.determine_encoding()
  File "C:\Python35\lib\site-packages\yaml\reader.py", line 124, in determine_encoding
    self.update_raw()
  File "C:\Python35\lib\site-packages\yaml\reader.py", line 178, in update_raw
    data = self.stream.read(size)
  File "C:\Python35\lib\encodings\cp1251.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 2574: character maps to <undefined>
@TormodLandet
Copy link

The yaml.load() method takes an open file object. You must set the encoding when you open the file. This does not have anything to do with PyYAML. Your code contains

config_obj = yaml.load(open(config, 'r'))

I would suggest to change this to

with open(config, 'rt', encoding='utf8') as yml:
    config_obj = yaml.load(yml)

PS: I did not test this code, but it (or something close to it) should work on Python3. If you are still on python2 you can import codecs and use codecs.open.

I suggest to close this issue

lawvs added a commit to lawvs/rssbot that referenced this issue Jan 21, 2019
@aliceinwire
Copy link

aliceinwire commented Jan 30, 2019

rt mode are not needed explicitly as they are the default options.
https://docs.python.org/3/library/functions.html#open

@perlpunk
Copy link
Member

@Felix-neko if your question is not answered by @TormodLandet then please reopen.

@NostraDavid
Copy link

In case anyone finds this thread, thinking PyYaml is the problem:

Run python with the -X utf8 option. python -X utf8 .\script.py should do the trick.

It's just Windows being poopy, in my case, as I even used encoding='utf8' in my open(). Stupid Windows kept using cp1252.py, which caused a UnicodeEncodeError :/

@mohamadmansourX
Copy link

I would suggest to change this to

with open(config, 'rt', encoding='utf8') as yml:
    config_obj = yaml.load(yml)

Incase of having !!python/tuple in the yaml file, I can't apply utf-8 encoding anymore.

~\anaconda3\lib\site-packages\yaml\constructor.py in construct_undefined(self, node)
    425 
    426     def construct_undefined(self, node):
--> 427         raise ConstructorError(None, None,
    428                 "could not determine a constructor for the tag %r" % node.tag,
    429                 node.start_mark)

ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'
  in "tmp.yaml", line 4, column 5

Any suggestion!?

@joaotextor
Copy link

joaotextor commented Apr 25, 2023

In case anyone finds this thread, thinking PyYaml is the problem:

Run python with the -X utf8 option. python -X utf8 .\script.py should do the trick.

It's just Windows being poopy, in my case, as I even used encoding='utf8' in my open(). Stupid Windows kept using cp1252.py, which caused a UnicodeEncodeError :/

Thanks. Solved my problem of accents returning weird characters :-)

@giuliohome
Copy link

giuliohome commented May 20, 2024

This is not the correct answer, however.
Windows uses UTF-8 if you open the file with that encoding.
The issue arises when you use a different encoding for the file (other than UTF-8). The correct question is

@giuliohome
Copy link

giuliohome commented May 20, 2024

The correct answer is that the YAML specification itself does not support encodings like CP-1252 or CP-1251, rather than this being an issue with PyYAML.

What PyYAML could do is implement a custom check for invalid string delimiters like curly quotes, which are valid UTF-8 characters but not valid YAML string delimiters. This issue, highlighted in #800, can result in exceptions like UnicodeDecodeError when the YAML file is not opened with UTF-8 encoding on Windows. However, in certain contexts, the exception might be preferred over incorrect YAML content, which could include these erroneous curly quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants