FreshRSS plugin that provides cloudflare puzzle solving via flaresolverr
Some popular publishing sites, including substack use Cloudflare to provide content caching and DDoS protection.
If cloudflare suspects that your machine is a bot, they throw up a challenge - this is normally just a page that requires your browser to run some javascript which filters out simple scrapers that don't evaluate scripts. This means that Freshrss sometimes fails to retrieve feeds protected by cloudflare and it isn't "smart" enough to pass these cloudflare filters on its own.
This FreshRSS extension uses Flaresolverr to start a headless browser (essentially a full copy of chrome or firefox but without a UI to look at) to parse and resolve these challenges and send the contents of an RSS feed through to FreshRSS as normal.
I stumbled across ryancom16's solution and decided to have a go at hardening it into a proper FreshRSS Plugin
You will need an instance of FlareSolverr running somewhere that is accessible to your FreshRSS instance. If you are using Docker Compose to manage FreshRSS then you can add FlareSolverr to your compose file. An example setup is shown below:
version: "2.1"
services:
flaresolverr:
image: ghcr.io/flaresolverr/flaresolverr:latest
restart: always
environment:
- LOG_LEVEL=info
ports:
- 8191:8191
freshrss:
image: lscr.io/linuxserver/freshrss:latest
container_name: freshrss
environment:
- PUID=1000
- PGID=1000
- TZ=Europe/London
volumes:
- ./data:/config
ports:
- 8080:80
restart: unless-stopped
- Copy this whole directory to your FreshRSS
extensions
directory. The easiest option is probably to clone this repo:
cd /path/to/freshrss/extensions
git clone https://github.com/ravenscroftj/freshrss-flaresolverr-extension.git
- Paste in the URL of your FlareSolverr instance in the settings window
- Copy the feed URL in bold
Prepend any feeds protected by Cloudflare with the URL. For example if your freshrss instance is at https://freshrss.example.com/ and you want to subscribe Sebastian Ruder's excellent NLP newsletter https://nlpnewsletter.substack.com/, you would take the full URL to the RSS feed https://nlpnewsletter.substack.com/feed
and set the feed url in FreshRSS to:
https://freshrss.example.com/api/cloudsolver.php?feed=https://nlpnewsletter.substack.com/feed
If the feed you want to subscribe to is actually XML included within an HTML document, which is often the case with Wordpress feeds, you can ask the extension to parse it accordingly using the viahtml parameter set to 1
, like this:
https://freshrss.example.com/api/cloudsolver.php?feed=https://domain.ext/feed/&viahtml=1
It may help solve errors like this one:
A feed could not be found at `https://freshrss.example.com/api/cloudsolver.php?feed=https://domain.ext/feed/`; the status code is `200` and content-type is `application/xml` [https://freshrss.example.com/api/cloudsolver.php?feed=https://domain.ext/feed/]
This plugin only works on exact URLs for RSS feeds at the moment. It can't be used to do feed discovery. This is due to a limitation with the way that selenium works.
If you have suggestions or encounter problems, feel free to open an issue. If you'd like to make code changes, submit a pull request!