Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Twitter returns a 404 for t.co links if "bot" is included in the user agent header #13120

Open
roughnecks opened this issue Jun 27, 2022 · 9 comments
Labels
A-URL-Preview Issues related to generating server-side previews of remote URLs S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@roughnecks
Copy link

Description

Hello,
I have some RSS feed in a Matrix room and since a few days already, some Twitter links won't show any preview.

Steps to reproduce

  • Have some Twitter links posted to a room
  • Wait for links previews, which is failing

Homeserver

woodpeckersnest.space

Synapse Version

{"server":{"name":"Synapse","version":"1.61.0"}}

Installation Method

Other (please mention below)

Platform

I'm using the matrix-docker-ansible-deploy playbook on Debian 11 VPS

Relevant log output

Jun 27 14:51:26 pandora.woodpeckersnest.space matrix-synapse[1877154]: 2022-06-27 12:51:26,558 - synapse.http.client - 730 - WARNING - GET-5725 - Got 404 when downloading https://t.co/2rf0lRAc1Y
Jun 27 14:51:26 pandora.woodpeckersnest.space matrix-synapse[1877154]: 2022-06-27 12:51:26,890 - synapse.http.client - 730 - WARNING - GET-5726 - Got 404 when downloading http://pic.twitter.com/2rf0lRAc1Y

Anything else that would be useful to know?

No response

@DMRobertson
Copy link
Contributor

some Twitter links won't show any preview.

Is this all Twitter links or just some? If it's just some, can you give us an example of a link that doesn't form a preview?

@DMRobertson
Copy link
Contributor

Oh, from the log sample: https://twitter.com/CM_Memorabili/status/1541402380972879872

The source code is archived at sample.txt

@roughnecks
Copy link
Author

Oh, from the log sample: https://twitter.com/CM_Memorabili/status/1541402380972879872

But it's just the short url (t.co) which has problems.. If I paste the long url - twitter.com - it works.

@DMRobertson
Copy link
Contributor

> GET /2rf0lRAc1Y HTTP/2
> Host: t.co
> user-agent: curl/7.82.0
> accept: */*
> 

< HTTP/2 301 
< date: Tue, 28 Jun 2022 10:32:33 GMT
< vary: Origin
< server: tsa_f
< expires: Tue, 28 Jun 2022 10:37:34 GMT
< location: https://twitter.com/CM_Memorabili/status/1541402380972879872/photo/1
< set-cookie: muc=eb3cdf7c-7ddd-4da7-aed6-3087d511a4fd; Max-Age=34214400; Expires=Sat, 29 Jul 2023 10:32:34 GMT; Domain=t.co; Secure; SameSite=None
< cache-control: private,max-age=300
< content-length: 0
< strict-transport-security: max-age=0
< x-response-time: 107
< x-connection-hash: d3bbab17fbe0ac3fd0d071c92608787baf272467a4c4c565a1eb6a7f393ce45c

I wonder if we're not processing the 301 redirect somehow.

@anoadragon453
Copy link
Member

The problem looks to be alleviated if bot is not included in the user agent. If bot is included, Twitter does not return a 302 response with a Location header. It simply 404s.

Attempting this with a local homeserver and setting a breakpoint on this line, I found that the following response headers are returned to Synapse from querying t.co:

Request headers:

{b'User-Agent': ['Synapse (bot; +https://github.com/matrix-org/synapse)'], b'Accept-Language': ['en']}

Response headers:

b'Date': [b'Mon, 04 Jul 2022 17:13:40 GMT']
b'Vary': [b'Origin']
b'Server': [b'tsa_f']
b'Content-Type': [b'text/html;charset=utf-8']
b'Cache-Control': [b'no-cache,no-store,must-revalidate']
b'X-XSS-Protection': [b'0']
b'Content-Security-Policy': [b"default-src 'none'; img-src https://abs.twimg.com; script-src https://abs.twimg.com about:; style-src https://abs.twimg.com 'unsafe-inline'; font-src https://abs.twimg.com https://twitter.com; connect-src 'none'; object-src 'none'; media-src 'none'; frame-src 'none'; report-uri https://twitter.com/i/csp_report?a=ORTGK%3D%3D%3D&ro=false"]
b'Strict-Transport-Security': [b'max-age=0']
b'X-Response-Time': [b'105']
b'X-Connection-Hash': [b'ff7b9177c8443a7c6cb907cfce4732f6c6d3ec7b191d6e0ec178d60dddbc780f']

changing this line from bot to not allowed the URL preview to work:

b"User-Agent": [
"Synapse (bot; +https://github.com/matrix-org/synapse)"
],

@anoadragon453 anoadragon453 added S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jul 4, 2022
@anoadragon453 anoadragon453 changed the title Got 404 for some links preview Twitter returns a 404 for t.co links if "bot" is included in the user agent header Jul 4, 2022
@roughnecks
Copy link
Author

Sorry but this is all too technical for me.. Is there something I can do or just wait for a fix?

@DMRobertson
Copy link
Contributor

Sorry but this is all too technical for me.. Is there something I can do or just wait for a fix?

Wait for a fix. (The comments above will help us to understand how to fix the issue)

@clokep
Copy link
Member

clokep commented Jul 5, 2022

Ironically (?) bot was added to user agents to fix previewing of Twitter URLs (see #11985).

I have no idea of a solution here without special-casing t.co. 😢

@MadLittleMods MadLittleMods added the A-URL-Preview Issues related to generating server-side previews of remote URLs label Apr 25, 2023
@clokep
Copy link
Member

clokep commented May 26, 2023

Changing the user-agent no longer worked for me... 😢

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-URL-Preview Issues related to generating server-side previews of remote URLs S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

5 participants