-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TwitterBridge] Fully decode item #926
Conversation
Fully decode item. Some incidences of " in the RSS output.
Fix line length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Find below a few comments. Could you also provide a sample query for testing?
bridges/TwitterBridge.php
Outdated
@@ -148,7 +148,7 @@ public function collectData(){ | |||
// extract fullname (pseudonym) | |||
$item['fullname'] = $tweet->getAttribute('data-name'); | |||
// get author | |||
$item['author'] = $item['fullname'] . ' (@' . $item['username'] . ')'; | |||
$item['author'] = htmlspecialchars_decode($item['fullname'] . ' (@' . $item['username'] . ')', ENT_QUOTES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't make sense.
$item['fullname']
and $item['username']
may still contain special chars, so they should be decoded beforehand.
bridges/TwitterBridge.php
Outdated
@@ -158,7 +158,8 @@ public function collectData(){ | |||
// extract tweet timestamp | |||
$item['timestamp'] = $tweet->find('span.js-short-timestamp', 0)->getAttribute('data-time'); | |||
// generate the title | |||
$item['title'] = strip_tags($this->fixAnchorSpacing($tweet->find('p.js-tweet-text', 0), '<a>')); | |||
$item['title'] = htmlspecialchars_decode( | |||
strip_tags($this->fixAnchorSpacing($tweet->find('p.js-tweet-text', 0), '<a>')), ENT_QUOTES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
htmlspecialchars_decode
should directly be called on $tweet->find('p.js-tweet-text', 0)
(before calling $this->fixAnchorSpacing(...)
and strip_tags(...)
Sample query containing
|
Ah okay, you use the CLI. Notice that HTML contents must be encoded inside XML, otherwise parsers don't know how where XML ends and HTML starts. This is also clearly defined in the output data (i.e. If you want to access the "raw" text, I suggest you opt for either Atom<content type="html"><div style="display: inline-block; vertical-align: top;">
<a href="https://twitter.com/rachelparris">
<img
style="align:top; width:75px; border:1px solid black;"
alt="rachelparris"
src="https://pbs.twimg.com/profile_images/947121157307854848/7HzYN27O_bigger.jpg"
title="Rachel Parris" />
</a>
</div>
<div style="display: inline-block; vertical-align: top;">
<blockquote>Hey folks! Watch this if you like Earth a bit! <a href="https://twitter.com/hashtag/TheMashReport?src=hash" dir="ltr" ><s>#</s><b>TheMashReport</b></a> <a href="https://twitter.com/hashtag/climatechange?src=hash" dir="ltr" ><s>#</s><b>climatechange</b></a> <a href="https://twitter.com/BBCTwo/status/1060110060410601472" dir="ltr" ><span class="tco-ellipsis"></span><span class="js-display-url">twitter.com/BBCTwo/status/</span><span class="tco-ellipsis">…</span></a></blockquote>
</div>
<div style="display: block; vertical-align: top;">
<blockquote></blockquote>
</div>
<hr>
<div style="display: inline-block; vertical-align: top;">
<blockquote>With just 12 years left to save the planet, here&#39;s <span class="twitter-atreply pretty-link js-nav" dir="ltr" data-mentioned-user-id="23759767" ><s>@</s><b>RachelParris</b></span> on why we CAN&#39;T let the world go floppy! <img class="Emoji Emoji--forText" src="https://abs.twimg.com/emoji/v2/72x72/1f30d.png" draggable="false" alt="🌍" title="Europa-Afrika auf dem Globus" aria-label="Emoji: Europa-Afrika auf dem Globus" style=" height: 1em;"> <span data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr" ><s>#</s><b>TheMashReport</b></span> <span class="twitter-timeline-link u-hidden" data-pre-embedded="true" dir="ltr" >pic.twitter.com/RyoI19u2Ed</span></blockquote>
</div>
<div style="display: block; vertical-align: top;">
<blockquote><a href="https://pbs.twimg.com/amplify_video_thumb/1060104074606059520/img/WnBmUi13811Y3r1C.jpg:orig">
<img
style="align:top; max-width:558px; border:1px solid black;"
src="https://pbs.twimg.com/amplify_video_thumb/1060104074606059520/img/WnBmUi13811Y3r1C.jpg:thumb" />
</a></blockquote>
</div></content> JSON"content": "<div style=\"display: inline-block; vertical-align: top;\">\n\t<a href=\"https:\/\/twitter.com\/rachelparris\">\n<img\n\tstyle=\"align:top; width:75px; border:1px solid black;\"\n\talt=\"rachelparris\"\n\tsrc=\"https:\/\/pbs.twimg.com\/profile_images\/947121157307854848\/7HzYN27O_bigger.jpg\"\n\ttitle=\"Rachel Parris\" \/>\n<\/a>\n<\/div>\n<div style=\"display: inline-block; vertical-align: top;\">\n\t<blockquote>Hey folks! Watch this if you like Earth a bit! <a href=\"https:\/\/twitter.com\/hashtag\/TheMashReport?src=hash\" dir=\"ltr\" ><s>#<\/s><b>TheMashReport<\/b><\/a> <a href=\"https:\/\/twitter.com\/hashtag\/climatechange?src=hash\" dir=\"ltr\" ><s>#<\/s><b>climatechange<\/b><\/a> <a href=\"https:\/\/twitter.com\/BBCTwo\/status\/1060110060410601472\" dir=\"ltr\" ><span class=\"tco-ellipsis\"><\/span><span class=\"js-display-url\">twitter.com\/BBCTwo\/status\/<\/span><span class=\"tco-ellipsis\">\u2026<\/span><\/a><\/blockquote>\n<\/div>\n<div style=\"display: block; vertical-align: top;\">\n\t<blockquote><\/blockquote>\n<\/div>\n<hr>\n<div style=\"display: inline-block; vertical-align: top;\">\n\t<blockquote>With just 12 years left to save the planet, here's <span class=\"twitter-atreply pretty-link js-nav\" dir=\"ltr\" data-mentioned-user-id=\"23759767\" ><s>@<\/s><b>RachelParris<\/b><\/span> on why we CAN'T let the world go floppy! <img class=\"Emoji Emoji--forText\" src=\"https:\/\/abs.twimg.com\/emoji\/v2\/72x72\/1f30d.png\" draggable=\"false\" alt=\"\ud83c\udf0d\" title=\"Europa-Afrika auf dem Globus\" aria-label=\"Emoji: Europa-Afrika auf dem Globus\" style=\" height: 1em;\"> <span data-query-source=\"hashtag_click\" class=\"twitter-hashtag pretty-link js-nav\" dir=\"ltr\" ><s>#<\/s><b>TheMashReport<\/b><\/span> <span class=\"twitter-timeline-link u-hidden\" data-pre-embedded=\"true\" dir=\"ltr\" >pic.twitter.com\/RyoI19u2Ed<\/span><\/blockquote>\n<\/div>\n<div style=\"display: block; vertical-align: top;\">\n\t<blockquote><a href=\"https:\/\/pbs.twimg.com\/amplify_video_thumb\/1060104074606059520\/img\/WnBmUi13811Y3r1C.jpg:orig\">\n<img\n\tstyle=\"align:top; max-width:558px; border:1px solid black;\"\n\tsrc=\"https:\/\/pbs.twimg.com\/amplify_video_thumb\/1060104074606059520\/img\/WnBmUi13811Y3r1C.jpg:thumb\" \/>\n<\/a><\/blockquote>\n<\/div>" Edit: JSON is formattted that way due to my browser, should return regular text on the CLI Plaintext
|
Please check this tweet: https://twitter.com/rachelparris/status/1063121390856007685 |
Merged. Thanks for the fix 👍 |
|
Removes duplicate encoding like &quot; (should be ").
Fully decode item. Some incidences of
&quot;
in the RSS output.