bring back Facebook by scraping m.facebook.com #886

snarfed · 2019-09-10T22:18:04Z

https://m.facebook.com/ is a little-known "lite" version of Facebook's full webapp with no JS and fairly simple HTML. it requires login, specifically c_user and xs cookies, but it's eminently scrapeable. https://facebook-atom.appspot.com/ already scrapes it to generate Atom feeds. apart from how distasteful it is to scrape with login cookies, we could scrape it like Instagram to bring back Facebook backfeed!

...sadly, FB's blocking is better than IG's. i actually implemented the scraping and extracted posts, comments, and likes/reactions, but i haven't been able to fetch users' timelines consistently. after one or two requests, FB consistently starts redirecting requests to /login.php, even with all cookies that m.facebook.com gives me, fully spoofed User-Agent, and fetching from the same IP I logged in from. maybe browser fingerprinting? got me. this is where i stop digging. scraping, ugh.

https://github.com/rugantio/fbcrawl, a more fleshed out project that heavily scrapes https://mbasic.facebook.com/ (an alias for https://m.facebook.com/ afaict)
Seems Facebook mentions have stopped again #826 (comment), earlier nod toward this idea when bridgy FB originally died

The text was updated successfully, but these errors were encountered:

…k.com for #886. IN PROGRESS.

Lewiscowles1986 · 2020-02-12T15:38:12Z

A way to tell if https://mbasic.facebook.com is https://m.facebook.com

ping

lewiscowles@Lewiss-MacBook-Pro torrents % ping mbasic.facebook.com
PING star.c10r.facebook.com (157.240.221.18): 56 data bytes
64 bytes from 157.240.221.18: icmp_seq=0 ttl=55 time=15.373 ms
64 bytes from 157.240.221.18: icmp_seq=1 ttl=55 time=16.342 ms
64 bytes from 157.240.221.18: icmp_seq=2 ttl=55 time=13.923 ms
64 bytes from 157.240.221.18: icmp_seq=3 ttl=55 time=12.656 ms
64 bytes from 157.240.221.18: icmp_seq=4 ttl=55 time=14.350 ms
64 bytes from 157.240.221.18: icmp_seq=5 ttl=55 time=18.655 ms
^C
--- star.c10r.facebook.com ping statistics ---
6 packets transmitted, 6 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 12.656/15.216/18.655/1.919 ms
lewiscowles@Lewiss-MacBook-Pro torrents % ping m.facebook.com
PING star-mini.c10r.facebook.com (157.240.221.35): 56 data bytes
64 bytes from 157.240.221.35: icmp_seq=0 ttl=55 time=15.494 ms
64 bytes from 157.240.221.35: icmp_seq=1 ttl=55 time=15.000 ms
64 bytes from 157.240.221.35: icmp_seq=2 ttl=55 time=11.809 ms
64 bytes from 157.240.221.35: icmp_seq=3 ttl=55 time=11.920 ms
64 bytes from 157.240.221.35: icmp_seq=4 ttl=55 time=11.081 ms
64 bytes from 157.240.221.35: icmp_seq=5 ttl=55 time=20.812 ms
64 bytes from 157.240.221.35: icmp_seq=6 ttl=55 time=12.071 ms

traceroute

lewiscowles@Lewiss-MacBook-Pro torrents % traceroute m.facebook.com
traceroute to star-mini.c10r.facebook.com (157.240.221.35), 64 hops max, 52 byte packets
...
6  ae13.pr04.lhr3.tfbnw.net (157.240.65.124)  13.349 ms  23.875 ms  39.767 ms
 7  po131.asw01.lhr3.tfbnw.net (129.134.45.52)  16.781 ms
    po141.asw01.lhr3.tfbnw.net (129.134.45.56)  15.748 ms
    po131.asw02.lhr3.tfbnw.net (129.134.45.54)  21.754 ms
 8  po223.psw01.lhr8.tfbnw.net (129.134.50.143)  13.670 ms
    po243.psw03.lhr8.tfbnw.net (129.134.50.105)  14.629 ms
    po233.psw03.lhr8.tfbnw.net (129.134.50.81)  15.957 ms
 9  157.240.38.215 (157.240.38.215)  15.529 ms
    157.240.38.209 (157.240.38.209)  14.165 ms
    157.240.38.143 (157.240.38.143)  16.325 ms
10  edge-star-mini-shv-01-lhr8.facebook.com (157.240.221.35)  13.596 ms  14.736 ms  19.857 ms

lewiscowles@Lewiss-MacBook-Pro torrents % traceroute mbasic.facebook.com
traceroute to star.c10r.facebook.com (157.240.221.18), 64 hops max, 52 byte packets
...
 6  ae4.pr02.lhr7.tfbnw.net (157.240.66.192)  16.236 ms  19.098 ms  15.590 ms
 7  po121.asw02.lhr3.tfbnw.net (129.134.44.194)  15.165 ms
    po121.asw01.lhr3.tfbnw.net (129.134.44.190)  17.864 ms  18.253 ms
 8  po231.psw04.lhr8.tfbnw.net (129.134.50.89)  18.365 ms
    po213.psw01.lhr8.tfbnw.net (129.134.50.31)  15.419 ms
    po243.psw04.lhr8.tfbnw.net (129.134.50.117)  17.272 ms
 9  157.240.38.125 (157.240.38.125)  19.586 ms
    173.252.67.29 (173.252.67.29)  32.396 ms
    157.240.38.97 (157.240.38.97)  16.277 ms
10  edge-star-shv-01-lhr8.facebook.com (157.240.221.18)  19.260 ms  16.162 ms  17.667 ms

I think they are separate services hitting a single shared load balancer. Perhaps they are subtly different or occasionally imperceptibly different

Lewiscowles1986 · 2020-02-12T15:38:26Z

DNS records would be another way to go, but it could be a PaaS ingress/egress router so can be opaque

Lewiscowles1986 · 2020-02-12T15:43:45Z

You could fake the user agent and full browser request signature from the browser, but surveillance capitalists have recently been checking for subtle timing differences in implementations

https://www.gamingonlinux.com/articles/if-you-cant-login-to-world-of-warcraft-or-wow-classic-on-linux-heres-a-quick-fix-for-now.14967

snarfed · 2020-02-12T15:44:06Z

right. having worked on infrastructure, networking, and end user applications at another big tech company, i can confirm that none of those are really conclusive in any direction. fortunately it didn't really matter in this case.

Lewiscowles1986 · 2020-02-12T15:45:05Z

So no net new information.. Why ask me to comment or help if you are so sure you know better?

snarfed closed this as completed Sep 10, 2019

snarfed added a commit that referenced this issue Sep 11, 2019

bring facebook back by using granary's support for scraping m.faceboo…

4ef9970

…k.com for #886. IN PROGRESS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bring back Facebook by scraping m.facebook.com #886

bring back Facebook by scraping m.facebook.com #886

snarfed commented Sep 10, 2019

Lewiscowles1986 commented Feb 12, 2020

Lewiscowles1986 commented Feb 12, 2020 •

edited

Loading

Lewiscowles1986 commented Feb 12, 2020 •

edited

Loading

snarfed commented Feb 12, 2020 •

edited

Loading

Lewiscowles1986 commented Feb 12, 2020

bring back Facebook by scraping m.facebook.com #886

bring back Facebook by scraping m.facebook.com #886

Comments

snarfed commented Sep 10, 2019

Lewiscowles1986 commented Feb 12, 2020

Lewiscowles1986 commented Feb 12, 2020 • edited Loading

Lewiscowles1986 commented Feb 12, 2020 • edited Loading

snarfed commented Feb 12, 2020 • edited Loading

Lewiscowles1986 commented Feb 12, 2020

Lewiscowles1986 commented Feb 12, 2020 •

edited

Loading

Lewiscowles1986 commented Feb 12, 2020 •

edited

Loading

snarfed commented Feb 12, 2020 •

edited

Loading