ElementTree does not handle UTF-8 encoding

> > > import tempfile
> > >     >>> from xml.etree.ElementTree import ElementTree
> > >     >>> xml = '<?xml version="1.0" encoding="UTF-8"?>\n<test name="' + unichr(169) + '"/>\n'
> > >     >>> with tempfile.TemporaryFile() as f:
> > >     ...     f.write(bytes(xml, 'utf-8'))  # use xml.encode('utf-8') in CPython 2.7
> > >     ...     f.flush()
> > >     ...     f.seek(0)
> > >     ...     tree = ElementTree(file=f)
> > >     ...     name = next(tree.iter()).get('name')
> > >     ...     print(repr(name))
> > >     ...     assert name == unichr(169)
> > >     ...
> > >     u'\xc2\xa9'
> > >     Traceback (most recent call last):
> > >       File "<stdin>", line 8, in <module>
> > >     AssertionError

`unichr(169)` is the copyright sign "©" and is encoded in UTF-8 as `
b'\xc2\xa9'` . The two-byte encoding is ignored by `ElementTree` and gets
interpreted as two separate characters.  
## Work Item Details

**Original CodePlex Issue:**    [Issue 35635](http://ironpython.codeplex.com/workitem/35635)
**Status:** Proposed
**Reason Closed:**  Unassigned
**Assigned to:**    Unassigned
**Reported on:**    Oct 21 at 4:23 AM
**Reported by:**    ysitu
**Updated on:** Nov 7 at 2:43 PM
**Updated by:** tcalmant


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ElementTree does not handle UTF-8 encoding #1127

Work Item Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ElementTree does not handle UTF-8 encoding #1127

Description

Work Item Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions