Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter Profile Images Not Scraped #2012

Closed
Ge0rg3 opened this issue Jan 11, 2021 · 6 comments · Fixed by #2013
Closed

Twitter Profile Images Not Scraped #2012

Ge0rg3 opened this issue Jan 11, 2021 · 6 comments · Fixed by #2013
Labels
3. to review Waiting for reviews bug Something isn't working

Comments

@Ge0rg3
Copy link

Ge0rg3 commented Jan 11, 2021

Describe the bug

When trying to use the "Get from twitter" button to assign profile photos from a twitter link, the user is shown a "Avatar download failed" error message, and the logs show the following:

Exception: Argument 1 passed to OC\Http\Client\Client::get() must be of the type string, null given, called in /var/www/nextcloud/apps/contacts/lib/Service/SocialApiService.php on line 213

This is because twitter requires a valid user agent header to be supplied. For example, try the following:

curl https://mobile.twitter.com/BarackObama

The response shows "This browser is no longer supported" (which is what is being presented to nextcloud currently).

However, if we provide the following instead...

curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" https://mobile.twitter.com/BarackObama

...the correct response is returned.

TLDR; We need to include a browser header in the request in order to correctly scrape the twitter page due to changes in twitter's browser handling.

To Reproduce
Steps to reproduce the behavior:

  1. Create a contact and give it a twitter handle under "Social".
  2. Click on the profile icon and "Get from twitter"
  3. Error will appear

Expected behavior
The twitter pfp should be correctly scraped.

Actual behavior
The previously described error appears.

Screenshots
N/A

Server configuration

Operating system: Ubuntu

Web server: Nginx

Database: Postgres

PHP version: 7.4

Nextcloud version: 19.0.6

Contacts version: 3.4.3

Updated from an older Nextcloud or fresh install: Updated from older

Nextcloud log

data/nextcloud.log

{"reqId":"xoZkRSRacsHvBFYVRkM2","level":3,"time":"2021-01-11T10:17:31+00:00","remoteAddr":"REDACTED","user":"REDACTED","app":"index","method":"PUT","url":"/apps/contacts/api/v1/social/avatar/twitter/contacts/REDACTED","message":{"Exception":"Exception","Message":"Argument 1 passed to OC\\Http\\Client\\Client::get() must be of the type string, null given, called in /var/www/nextcloud/apps/contacts/lib/Service/SocialApiService.php on line 213","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/AppFramework/App.php","line":137,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Routing/RouteActionHandler.php","line":47,"function":"main","class":"OC\\AppFramework\\App","type":"::"},{"function":"__invoke","class":"OC\\AppFramework\\Routing\\RouteActionHandler","type":"->"},{"file":"/var/www/nextcloud/lib/private/Route/Router.php","line":297,"function":"call_user_func"},{"file":"/var/www/nextcloud/lib/base.php","line":1010,"function":"match","class":"OC\\Route\\Router","type":"->"},{"file":"/var/www/nextcloud/index.php","line":37,"function":"handleRequest","class":"OC","type":"::"}],"File":"/var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php","Line":110,"Previous":{"Exception":"TypeError","Message":"Argument 1 passed to OC\\Http\\Client\\Client::get() must be of the type string, null given, called in /var/www/nextcloud/apps/contacts/lib/Service/SocialApiService.php on line 213","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/contacts/lib/Service/SocialApiService.php","line":213,"function":"get","class":"OC\\Http\\Client\\Client","type":"->"},{"file":"/var/www/nextcloud/apps/contacts/lib/Controller/SocialApiController.php","line":141,"function":"updateContact","class":"OCA\\Contacts\\Service\\SocialApiService","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php","line":170,"function":"updateContact","class":"OCA\\Contacts\\Controller\\SocialApiController","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php","line":100,"function":"executeController","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/App.php","line":137,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Routing/RouteActionHandler.php","line":47,"function":"main","class":"OC\\AppFramework\\App","type":"::"},{"function":"__invoke","class":"OC\\AppFramework\\Routing\\RouteActionHandler","type":"->"},{"file":"/var/www/nextcloud/lib/private/Route/Router.php","line":297,"function":"call_user_func"},{"file":"/var/www/nextcloud/lib/base.php","line":1010,"function":"match","class":"OC\\Route\\Router","type":"->"},{"file":"/var/www/nextcloud/index.php","line":37,"function":"handleRequest","class":"OC","type":"::"}],"File":"/var/www/nextcloud/lib/private/Http/Client/Client.php","Line":226},"CustomMessage":"--"},"userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36","version":"19.0.6.2","id":"5ffc274127f55"}
@Ge0rg3 Ge0rg3 added 0. Needs triage bug Something isn't working labels Jan 11, 2021
@skjnldsv
Copy link
Member

TLDR; We need to include a browser header in the request in order to correctly scrape the twitter page due to changes in twitter's browser handling.

That's seems like a fair compromise :)
Thanks for the debugging!

@skjnldsv skjnldsv added 1. to develop Accepted and waiting to be taken care of and removed 0. Needs triage labels Jan 11, 2021
@skjnldsv
Copy link
Member

The issue is getting the raw profile picture url.
Using the Googlebot/2.1 user agent seems to make twitter display the page without any javascript requirements, so we can now grep the raw img profile picture url from the source code

@skjnldsv skjnldsv added 3. to review Waiting for reviews and removed 1. to develop Accepted and waiting to be taken care of labels Jan 11, 2021
@Ge0rg3
Copy link
Author

Ge0rg3 commented Jan 12, 2021

That was super quick thanks! :)

@Ge0rg3
Copy link
Author

Ge0rg3 commented Jan 13, 2021

Hey @skjnldsv don't have time to investigate rn but looks like instagram is broken too? May be worth looking into. :)

@call-me-matt
Copy link
Member

call-me-matt commented Jan 13, 2021

I can confirm insta is broken. when I took a look I saw a 429 (too many requests) and thought it was because of my background sync job being (too) active. Maybe worth a second opinion.
@Georg3 do you have the background sync of profile pictures activated, too?

@Ge0rg3
Copy link
Author

Ge0rg3 commented Jan 14, 2021

@call-me-matt don't think its a 429 for me, but a login-page blocker instead :( I've just created #2016 to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants