Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: added a functionality to make sure escaped characters stay escaped. #912

Merged

Conversation

ahmad-alkadri
Copy link
Contributor

This PR is linked to the issue #908 which shows that, basically, Whoogle results render html characters unescaped. Here's a screenshot as referenced in the issue:

image

After checking, I found out that several points:

  • the characters inside the <div> content tag from the search results (getbody.text in search.py) are already escaped, with "<" and ">" characters converted into "&lt;" and "&gt;", respectively
  • however, because the getbody.text then passed through several bsoup class, the escaped tag characters became unescaped.

To prevent this, I replaced "&lt;" and "&gt;" with "andlt;" and "andgt;", respectively. This way, when the 'response' object get loaded to bsoup (which happens several times throughout the process between search.py and routes.py), bsoup will not unescape them. Finally, at the end, before the responses object sent to the render_template in routes.py, I simply replaced the "andlt;" and "andgt;" back to "&lt;" and "&gt;".

Here's the screenshot from the search result on Whoogle following this fix:

screenshot-localhost_5000-2022 12 23-22_58_09

Moved the cleaner functions to app/utils/escaper.py

Fixes

Removed unused import 're'

Moved the cleaner functionalities to the "search.py" and "routes.py"

Making sure escaped chars stay escaped during process

Replaced "&lt;" and "&gt;" with "andlt;" and "andgt;", respectively. This way, when the 'response' object get loaded to bsoup (which happens several times throughout the process between search.py and routes.py), bsoup will not unescape them.
@benbusby benbusby linked an issue Dec 29, 2022 that may be closed by this pull request
Copy link
Owner

@benbusby benbusby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@benbusby benbusby merged commit 3dda8b2 into benbusby:main Dec 29, 2022
@ahmad-alkadri
Copy link
Contributor Author

Thanks!

You're welcome! It's a pleasure.

@ahmad-alkadri ahmad-alkadri deleted the fix/908-html-element-need-escape branch December 31, 2022 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Results renders HTML elements
2 participants