Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError #36

Closed
tw4l opened this issue May 9, 2018 · 8 comments
Closed

UnicodeDecodeError #36

tw4l opened this issue May 9, 2018 · 8 comments

Comments

@tw4l
Copy link
Owner

tw4l commented May 9, 2018

brunnhilde_error

Need to add error handling around unicode handling

@bdietz
Copy link

bdietz commented Jan 24, 2022

Hi Tessa, I believe I ran into a related (or the same) error today. Here's the output:

Traceback (most recent call last):
File "/Users/[user]/anaconda3/bin/brunnhilde.py", line 1387, in
main()
File "/Users/[user]/anaconda3/bin/brunnhilde.py", line 1362, in main
args, source, cursor, conn, html, siegfried_version, use_hash, ssn_mode,
File "/Users/[user]/anaconda3/bin/brunnhilde.py", line 949, in process_content
use_hash = import_csv(cursor, conn, use_hash)
File "/Users/[user]/anaconda3/bin/brunnhilde.py", line 295, in import_csv
for row in reader:
File "/Users/[user]/anaconda3/bin/brunnhilde.py", line 285, in
x.replace("\0", "") for x in f
File "/Users/[user]/anaconda3/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 6398: invalid start byte

I ran the error past Dianne D. who said:

My hunch is that it's failing again in the except block and pointed to these lines:

except UnicodeDecodeError:

x.replace("\0", "") for x in f

But she also said she wasn't entirely sure.

We did test brunnhilde against another directory of files and it worked as expected.

We're using OSX 12.1 and managing brunnhilde via pip.

Thanks!
Brian

@tw4l
Copy link
Owner Author

tw4l commented Feb 1, 2022

Thanks @bdietz ! I think Dianne's right that the except block routine there isn't working as intended. Will take a look shortly! If you could share the offending CSV file by email that would be helpful!

@bdietz
Copy link

bdietz commented Feb 1, 2022

Yeah, for sure. Thank you @tw4l. I'm sorry to ask, do you want the siegried.csv file? IIRC, that's the point at which brunnhilde was failing, after siegfried wrapped up.

@tw4l
Copy link
Owner Author

tw4l commented Feb 1, 2022

@bdietz yes exactly, and thanks for clarifying!

@tw4l
Copy link
Owner Author

tw4l commented Feb 6, 2022

Hey Brian, I have a fix in the development branch that uses Python's errors argument with the "ignore" setting, which will silently skip characters that would otherwise throw a UnicodeDecodeError when reading the Siegfried CSV. I've tested that this resolves the issue with the Siegfried CSV file you emailed me. I'm just running the tests now and then will cut a 1.9.4 patch release with the fix and push it to PyPI this week :)

@tw4l
Copy link
Owner Author

tw4l commented Feb 6, 2022

Thanks for the detailed bug report and the nudge to get this fixed!

@bdietz
Copy link

bdietz commented Feb 7, 2022

Awesome. Thank you!

@tw4l
Copy link
Owner Author

tw4l commented Feb 7, 2022

Fixed in commit 09defef. The fix is now released to PyPI so sudo pip3 install --upgrade brunnhilde should fix it for you!

@tw4l tw4l closed this as completed Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants