Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Downloading Web Pages in Python #1611

Closed
svmelton opened this issue Jan 7, 2020 · 16 comments
Closed

Error in Downloading Web Pages in Python #1611

svmelton opened this issue Jan 7, 2020 · 16 comments
Assignees

Comments

@svmelton
Copy link
Contributor

svmelton commented Jan 7, 2020

A reader reports that there's an error in this function:

f = open('obo-t17800628-33.html', 'w')

He notes that this will fix the error:
f = open('obo-t17800628-33.html', 'wb')

@acrymble
Copy link

acrymble commented Jan 7, 2020

@mdlincoln
Copy link
Contributor

Correct, it has to do with how Windows uses line endings, which is why this lesson might work just fine on mac/linux but would cause problems on Windows.

@acrymble
Copy link

acrymble commented Jan 8, 2020

@mdlincoln do you know if it will work fine for linux and mac if we add wd to all? Or is it something we'll have to keep separate?

@mdlincoln
Copy link
Contributor

it should work fine

@acrymble
Copy link

acrymble commented Jan 13, 2020

This will have implications for a number of lessons that use the same functions:

https://programminghistorian.org/en/lessons/working-with-web-pages
https://programminghistorian.org/en/lessons/working-with-text-files
https://programminghistorian.org/en/lessons/output-data-as-html-file
https://programminghistorian.org/en/lessons/from-html-to-list-of-words-2
https://programminghistorian.org/en/lessons/creating-and-viewing-html-files-with-python
All of the obo.py syncing files used in the python lessons

All Spanish / French translations of the above
Possibly also translations in progress (haven't checked)

@acrymble
Copy link

@rivaquiroga @spapastamkou I have not checked french or spanish translations to see if this problem affects your lessons, but I would encourage you to do so.

@acrymble acrymble assigned spapastamkou and rivaquiroga and unassigned acrymble Jan 20, 2020
@rivaquiroga
Copy link
Member

@acrymble, I'll check them during the week

@acrymble
Copy link

The pull request fixing the English lessons is now complete and has been made live.

@spapastamkou
Copy link
Contributor

@fdlaramee Can I count on your help for this? It is about this lesson: https://programminghistorian.org/fr/lecons/travailler-avec-des-fichiers-texte I will make any modification needed but I'll need some guidance, if you have the availability.

@fdlaramee
Copy link
Contributor

@spapastamkou Sure; I will compare the English lesson's current version with our translation ASAP. Shouldn't take much more than a few minutes of work to bring the translation up to date.

@fdlaramee
Copy link
Contributor

@spapastamkou OK, I compared the two lessons and the changes that should be performed are minimal:

  1. Remove the warning about Python 2.7 in the lesson header as the code has been updated to Python 3.
  2. The line of code f = open('helloworld.txt','w') should be replaced with f = open('helloworld.txt','wb'); that is, the 'w' parameter needs to become 'wb' to be compatible with Windows.
  3. A bit lower in the text, in "Le paramètre ‘w‘ spécifie...", you need to replace w with wb as well.

That's it. However...

Technically, a file opened with the 'wb' parameter is no longer a text file in Python but a binary data file. Which means that, also technically, it should be read not with an open statement that contains a 'r' parameter but with an 'rb'. And of course, we are also breaking the contract with the user by saying that we are teaching them to use text files and then writing, reading and appending binary files instead.

That may be splitting hairs and I haven't the foggiest idea of what current practice in Pythonworld is. As far as I know, Windows is the only system that still really treats text and binary differently under the hood. And FWIW, I have been using binary files to encode text since my days programming in BASIC on the Apple II in the mid 1980s so I don't think it's an important distinction at all, but some people might. What do you think @mdlincoln ? @ZoeLeBlanc ? Anyone else on the tech team?

@spapastamkou
Copy link
Contributor

thank you very much, @fdlaramee, for helping with this. I'll wait in case there are further comments on your remarks and then I'll make the modifications.

@rivaquiroga
Copy link
Member

FYI, the pull request fixing the Spanish lessons has been merged

@acrymble
Copy link

@spapastamkou this fix should probably be implemented now, as it means we have a broken french lesson published.

@spapastamkou
Copy link
Contributor

Ok, I was not very sure if I could proceed. I'll work on this next week.

@spapastamkou
Copy link
Contributor

Done. Thanks again for the help @fdlaramee.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants