-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bring back Facebook by scraping m.facebook.com #886
Comments
A way to tell if https://mbasic.facebook.com is https://m.facebook.com ping
traceroute
I think they are separate services hitting a single shared load balancer. Perhaps they are subtly different or occasionally imperceptibly different |
DNS records would be another way to go, but it could be a PaaS ingress/egress router so can be opaque |
You could fake the user agent and full browser request signature from the browser, but surveillance capitalists have recently been checking for subtle timing differences in implementations |
right. having worked on infrastructure, networking, and end user applications at another big tech company, i can confirm that none of those are really conclusive in any direction. fortunately it didn't really matter in this case. |
So no net new information.. Why ask me to comment or help if you are so sure you know better? |
https://m.facebook.com/ is a little-known "lite" version of Facebook's full webapp with no JS and fairly simple HTML. it requires login, specifically
c_user
andxs
cookies, but it's eminently scrapeable. https://facebook-atom.appspot.com/ already scrapes it to generate Atom feeds. apart from how distasteful it is to scrape with login cookies, we could scrape it like Instagram to bring back Facebook backfeed!...sadly, FB's blocking is better than IG's. i actually implemented the scraping and extracted posts, comments, and likes/reactions, but i haven't been able to fetch users' timelines consistently. after one or two requests, FB consistently starts redirecting requests to
/login.php
, even with all cookies that m.facebook.com gives me, fully spoofed User-Agent, and fetching from the same IP I logged in from. maybe browser fingerprinting? got me. this is where i stop digging. scraping, ugh.related:
The text was updated successfully, but these errors were encountered: