Skip to content
This repository was archived by the owner on Sep 14, 2018. It is now read-only.
This repository was archived by the owner on Sep 14, 2018. It is now read-only.

ElementTree does not handle UTF-8 encoding #1127

@ironpythonbot

Description

@ironpythonbot

import tempfile
>>> from xml.etree.ElementTree import ElementTree
>>> xml = '\n\n'
>>> with tempfile.TemporaryFile() as f:
... f.write(bytes(xml, 'utf-8')) # use xml.encode('utf-8') in CPython 2.7
... f.flush()
... f.seek(0)
... tree = ElementTree(file=f)
... name = next(tree.iter()).get('name')
... print(repr(name))
... assert name == unichr(169)
...
u'\xc2\xa9'
Traceback (most recent call last):
File "", line 8, in
AssertionError

unichr(169) is the copyright sign "©" and is encoded in UTF-8 as b'\xc2\xa9' . The two-byte encoding is ignored by ElementTree and gets
interpreted as two separate characters.

Work Item Details

Original CodePlex Issue: Issue 35635
Status: Proposed
Reason Closed: Unassigned
Assigned to: Unassigned
Reported on: Oct 21 at 4:23 AM
Reported by: ysitu
Updated on: Nov 7 at 2:43 PM
Updated by: tcalmant

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions