Fix: added a functionality to make sure escaped characters stay escaped. #912

ahmad-alkadri · 2022-12-23T22:10:58Z

This PR is linked to the issue #908 which shows that, basically, Whoogle results render html characters unescaped. Here's a screenshot as referenced in the issue:

After checking, I found out that several points:

the characters inside the <div> content tag from the search results (getbody.text in search.py) are already escaped, with "<" and ">" characters converted into "<" and ">", respectively
however, because the getbody.text then passed through several bsoup class, the escaped tag characters became unescaped.

To prevent this, I replaced "<" and ">" with "andlt;" and "andgt;", respectively. This way, when the 'response' object get loaded to bsoup (which happens several times throughout the process between search.py and routes.py), bsoup will not unescape them. Finally, at the end, before the responses object sent to the render_template in routes.py, I simply replaced the "andlt;" and "andgt;" back to "<" and ">".

Here's the screenshot from the search result on Whoogle following this fix:

Moved the cleaner functions to app/utils/escaper.py Fixes Removed unused import 're' Moved the cleaner functionalities to the "search.py" and "routes.py" Making sure escaped chars stay escaped during process Replaced "<" and ">" with "andlt;" and "andgt;", respectively. This way, when the 'response' object get loaded to bsoup (which happens several times throughout the process between search.py and routes.py), bsoup will not unescape them.

benbusby

Thanks!

ahmad-alkadri · 2022-12-31T23:44:31Z

Thanks!

You're welcome! It's a pleasure.

benbusby linked an issue Dec 29, 2022 that may be closed by this pull request

[BUG] Results renders HTML elements #908

Closed

benbusby approved these changes Dec 29, 2022

View reviewed changes

benbusby merged commit 3dda8b2 into benbusby:main Dec 29, 2022

ahmad-alkadri deleted the fix/908-html-element-need-escape branch December 31, 2022 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: added a functionality to make sure escaped characters stay escaped. #912

Fix: added a functionality to make sure escaped characters stay escaped. #912

ahmad-alkadri commented Dec 23, 2022

benbusby left a comment

ahmad-alkadri commented Dec 31, 2022

Fix: added a functionality to make sure escaped characters stay escaped. #912

Fix: added a functionality to make sure escaped characters stay escaped. #912

Conversation

ahmad-alkadri commented Dec 23, 2022

benbusby left a comment

Choose a reason for hiding this comment

ahmad-alkadri commented Dec 31, 2022