Skip to content

Commit

Permalink
Fix encoding bug in RSC image character handling - fixes #8
Browse files Browse the repository at this point in the history
  • Loading branch information
mcs07 committed Oct 13, 2016
1 parent 9210065 commit 020cc21
Show file tree
Hide file tree
Showing 3 changed files with 453 additions and 2 deletions.
4 changes: 2 additions & 2 deletions chemdataextractor/scrape/pub/rsc.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,9 +273,9 @@ def replace_rsc_img_chars(document):
if not u2 and u1 in RSC_IMG_CHARS:
rep = RSC_IMG_CHARS[u1]
else:
rep = (b'\u%s' % u1).decode('unicode-escape')
rep = (b'\u%s' % u1.encode()).decode('unicode-escape')
if u2:
rep += (b'\u%s' % u2).decode('unicode-escape')
rep += (b'\u%s' % u2.encode()).decode('unicode-escape')
if img.tail is not None:
rep += img.tail # Make sure we don't remove any tail text
parent = img.getparent()
Expand Down
Loading

0 comments on commit 020cc21

Please sign in to comment.