-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf8 encoding bug in Finding Places in Text #2783
Comments
Thank you for letting us know about this, @srappel! I will test this. If I encounter the same error, I'll check if the proposed workaround you suggest solves it from my side too. Many thanks, |
I am able to work through the steps of the lesson on Google Colab including step 6.1, and can successfully load and print the gazetteer in Google Colab. Here's the link to the notebook I worked in. I also tested these steps on my mac, running OS BigSur and Python 3.10. I wrote my code in Atom, and ran it in the command line. I was able to successfully work through to tokenise the sentence Berlin ist eine Stadt in Deutschland but then ran into a few unexpected problems at step 6.1 too.
I tried your suggestion of adjusting the line
I wonder if @hawc2 might be able to advise? One of the authors is Andy Janco, who we might be able to reach out to with this question. -- Notes to myself: This Stackoverflow post could be useful but I don't understand well enough to implement this at the moment. This comment I found on a Python community GitHub repository could also be relevant. Looking at the traceback errors I got, I think this could be the key line: |
@anisa-hawes the proposed solution by @srappel makes sense to me. That's pretty good standard practice, to cite the encoding with the read function. I believe the error you're running into Anisa just relates to the gazatteer.txt file not being locateable by the Python script you're running, and probably is not a problem to worry about for updating the tutorial. It sounds like a quick fix to me to just add the utf-8 specificity to the line of code |
I'm reporting a bug and potential workaround in Finding Places in Text with the World Historical Gazeteer
The specific part of the lesson is Section 6.1
The following line of python code from the lesson produces a
UnicodeDecodeError
for me:gazetteer = Path("gazetteer.txt").read_text()
But by simply specifying UTF-8 encoding, as in:
gazetteer = Path("gazetteer.txt").read_text('utf8')
the error is then resolved.
I'm using a Windows PC and python version 3.9 (Anaconda)
The text was updated successfully, but these errors were encountered: