We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm trying to parse Azerbaijani forum
url = "http://www.disput.az/index.php?app=forums&module=forums&controller=topic&id=1051606" g = Grab() g.go(url) messages = g.doc.pyquery('[data-role="commentContent"]>p') for message in messages: print(message.text_content())
This code prints many strange characters (encoding is broken). Setting Grab charset/document_charset has no effect.
I've found this fix:
for message in messages: print(str(message.text_content()).encode("iso-8859-1").decode())
But it's rather strange.
If I try do the same thing with requests, everythin is ok:
r = requests.get(url) print(r.text)
prints clean Azerbaijani text (html)
Is there any normal solution for this problem?
Info:
Ubuntu 16.04 Grab 0.6.38 (current) Python 3.5.2
The text was updated successfully, but these errors were encountered:
ee9b33a
Merge pull request #308 from lorien/issue_285_pyquery
8a68282
Fix #285: pyquery extension parses html incorrectly
No branches or pull requests
I'm trying to parse Azerbaijani forum
This code prints many strange characters (encoding is broken).
Setting Grab charset/document_charset has no effect.
I've found this fix:
But it's rather strange.
If I try do the same thing with requests, everythin is ok:
prints clean Azerbaijani text (html)
Is there any normal solution for this problem?
Info:
The text was updated successfully, but these errors were encountered: