Table of contents:
The CNN proxy uses Cloudflare, Nginx, and Node, hence the name CNN (no, it's not Central News Network. Has nothing to do with that.)
Cloudflare handles the SSL and caching. Nginx handles all websocket requests and forwards all requests to the Node app which aren't websockets. Finally, the Node app processes the headers of the server and the client.
This project is intended to fully proxy all requests including websockets. Unlike some URL based proxies out there (examples are Node Unblocker and mirrorrr; both great projects nevertheless), CNN uses subdomains as a way to handle proxying so that the 'href="/assets/picture.png"' problem is solved easily.
Because this proxy uses subdomains as a way to proxy, it is relatively easy to use a simple String.replace
function to automatically grab all (sub)domain links like "https://example.com" and convert that to the proxying (sub)domain.
This proxy handles all the origin and referer headers, converting them to their original URLs before sending it to the server to avoid server rejection because of a differing origin. It also strips Cloudflare headers before sending to the server.
Nginx plays a big part in websocket handling. a request at wss://proxyurl will automatically have Nginx handle it. Other URLs like http(s):// are handled and sent directly to the node app.
Currently, this proxy can proxy:
- Google Search
- Google Maps
- YouTube
- Discord
- Any sites that are hosted on ports other than the default (like https for 443 and http for 80)
- Any sites with logins
- Any sites with websockets
- And probably a bunch of other sites that I didn't test
Here's just some of the list of proxying perks and features:
- Client injection features
- List of APIs that are wrapped (so when a site uses these APIs to make a request, they will always go through CNN):
new Websocket
new XMLHttpRequest
window.fetch
new Request
window.open
window.postMessage
(doesn't always work, especially if the code useswindow.parent.postMessage
)- Changes in the DOM:
- attribute changes are handled and converted to the proxy URL
- appending elements to the DOM are handled and each attribute is scanned and converted to the proxy URL
- List of APIs that are wrapped (so when a site uses these APIs to make a request, they will always go through CNN):
- Server header processing features
- Strips Cloudflare's anti-DDoS headers before sending to the server
- Converts the "origin" and "referer" to their original URLs before sending to the server
- Converts the "location" header to the proxy URL before sending to the client
- Set cors headers to the origin URL
- Loops through each set-cookie header and changes the "domain" value to the proxy's
- Has the feature of blocking specific URLs, useful for adblocking
The following limitations of this CNN proxy server are sadly unfixable (to the extent of my knowledge and laziness). However, if anyone wants to make a pull request, I will be more than happy to review the code and merge.
- Google logins
- Captchas and all those other "are you a bot" scripts. It may work, may not work
- All the
window.location.href = 'https://no-way-youre-using-that-proxysite.com'
in the code
There is a lot more requirements compared to that of URL proxies (like Node Unblocker) since this is a proxying method based on subdomains.
Also, be warned that your site may receive a bunch of fraud alerts. I've received multiple fraud alerts from Freenom and Chrome (makes sense but whatever).
To give you a general overview of the networking:
1. A request is made at https://proxysite on Cloudflare's network
2. Cloudflare forwards the request to your server's nginx on port 80
3. From here, things can go two ways.
3a. If the request is a websocket, let nginx handle it and proxy the request
3b. If the request isn't a websocket, nginx will forward the request to the locally hosted node app on port 8888 (or another port configured under config.json)
You will need the following:
- A domain that you fully own (at least be able to change the nameservers)
- A Cloudflare account
- A server/VPS/computer/*whatever you want to call it* and have sudo/admin privileges (for later installation)
- Ubuntu 18.04 (although this may work for a bunch of other Linux distros and probably even Windows, this guide is specifically for Ubuntu 18.04)
Before anything, always check that you've updated everything.
$ sudo apt-get update
$ sudo apt-get upgrade # this is recommended, but it's optional
If you have git installed already, then skip this step.
$ sudo apt-get install git
If you have node installed already, then skip this step.
# credit: https://github.com/nodesource/distributions
$ curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash -
$ sudo apt-get install -y nodejs
If you have nginx installed already, then skip this step.
$ sudo apt-get install nginx
If you have the Lua nginx module installed already, then skip this step.
$ sudo apt-get install lua-nginx-redis
For this tutorial, we are going to use "~/cnn-proxy" to store the code needed to run the node app.
Additionally, we are going to overwrite "/etc/nginx/nginx.conf" so if you previously modified the file, make sure to make a backup of it by running a simple command like sudo cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak
.
To start, we need to get clone the repo.
$ git clone https://github.com/scheng123/cnn-proxy.git ~/cnn-proxy
Overwrite the existing nginx.conf, replacing it with CNN's nginx.conf.
$ sudo cp ~/cnn-proxy/nginx.conf /etc/nginx/nginx.conf
Now we need to update the changes.
$ sudo service nginx restart
Note that if this command doesn't work, you have an app running on port 80 that's conflicting with this.
Change directory to the cloned CNN proxy and install the required modules.
$ cd ~/cnn-proxy
$ npm install
Just run node index.js
and you're good to go! (no you're not)
Optional: To make this script start at system startup, we will be using pm2
to do it.
# Install pm2
$ sudo npm install -g pm2
# Run the index.js script using pm2
$ pm2 start index.js
# Run the command in order to config pm2 to startup scripts
$ sudo pm2 startup
# Finally to save the run configuration
$ pm2 save
Oh, you thought we were done here? Not by a long shot!
As you saw in the requirements, you need a domain and a Cloudflare account.
Here are the steps:
- Create a Cloudflare account if you haven't already
- Click on "Add site" in the top right corner
- Paste your domain name into the input area and click on "Add site"
- Change your nameservers in your domain to your assigned Cloudflare nameservers. (Lookup on how to change your nameservers for your specific domain registrar)
- Add an A record, entering
@
for the "name" and your machine's IP address for the "value." Must be IPv4 - Add another A record, entering
*
for the "name" and your machine's IP address for the "value." Must be IPv4 - Now click next, and if it tells you to configure SSL and security items, skip it for now.
- And just like that, you're good to go! It should be accessible through your domain if you setup everything correctly.
The default/recommended config should look something like this:
{
"bindingAddress": "127.0.0.1",
"port": 8888,
"sessionPeriod": 604800000,
"sessionCookieName": "CNN_PROXY_SESSION_ID",
"defaultIPv6": false,
"enableFilterJavascript": false,
"obscureDomainChecking": false,
"injectCode": true,
"injectionCodePath": "injection.html",
"removeCachingHeaders": false,
"removeAllIntegrityAttributes": true,
"publicFolder": "public_root/",
"filterMimeTypes": [
"/html",
"/css",
"/javascript",
"/json"
],
"associateMime": {
"" : "text/html",
"html" : "text/html",
"js": "application/javascript",
"css": "text/css"
},
"defaultMime": "application/octet-stream",
"blacklistDomains": [
"example\\.com"
],
"whitelistDomains": []
}
Just to clarify, "filtering" simply means to process the file and replace all occurrences of "http(s)://domain.com" with "http(s)://proxysite.com" amongst other things.
type: String
Highly recommended that this option is set to "127.0.0.1"
since this app shouldn't be available publically other than locally, which is only accessed by nginx.
type: boolean
Enabling this will force nodejs's requests to default to IPv6. Doesn't not work if your machine doesn't have a IPv6 address.
This is super duper not recommended since requests to non-IPv6 websites will break.
type: Integer
If you are planning to change this value, be sure to update the the port under upstream
in the nginx.conf so that nginx can reach the node app.
Note that this can be overridden by specifying an argument like so: node index.js 8889
type: Integer
Takes a value in ms. On every sessionPeriod
ms, the server will generate a new session that users are required to set as their cookie. A value of 0
disables this.
type: String
Specifies the cookie name for the session ID.
type: boolean
Enabling this may break some pages. Disabling prevents the filtering/processing of the <script>
tag. Best to keep this off since all the javascript web requests are wrapped with code injection.
type: boolean
Replaces all occurrences of 'window.location.href' with '"https://originalurl"'. This is used against client-side scripts which might include this in order to detect if the website is running under their site.
This option has a high probability that it would break pages. Also, this option is not affected by enableFilterJavascript
.
type: boolean
This option will enable the functionality for injecting javascript code that wraps websocket requests amongst other stuff (like XMLHttpRequest requests).
type: String
This option will only be effective if injectCode
is enabled. This specifies the path to the html file used for injecting code.
type: boolean
This option is best used for debugging and testing. All it does is set the expires
header to the 1990s.
type: boolean
This option is best used if you have enableFilterJavascript
enabled. It should be enabled anyhow since it changes the integrity=
attribute to nointegrity=
, just in case if the src=
file it's referencing got changed by the proxy.
May break some pages.
type: String
This option is for specifying the folder to serve when a user visits the root domain (root domain is "https://domain.com", not "https://subdomain.domain.com").
404 pages will use publicFolder/404.html.
Must include a slash at the end.
type: Array[...String]
This option is used for determining whether to run the filtering function for a given mime (content-type).
Note that the test function is String.includes
so a value like ["/javascript"]
would match text/javascript
and application/javascript
.
type: Object
This option is used for cases where the server doesn't forward a content-type
. In such cases, the mime is determined by the end of the URL excluding query strings. https://somesite.com/style.css?somestuff=somestuff
<= where css
is fed into the Object and the resultant mime should return. More details in the later paragraphs.
An example of use would be that the server sends an HTML file but doesn't send a content-type
header. This will assign the missing header and the subsitute mime will get compared with filterMimeTypes
to determine if the body content should be passed through the filtering function.
This option accepts an Object where the property name is the extension and the property value is the resultant mime for that extension.
Example:
{
"" : "text/html", // assuming that the path is somewhat like https://blah.com/something/
"html" : "text/html", // https://blah.com/something.html
"js": "application/javascript", // https://blah.com/something/somescript.js
"css": "text/css" // https://blah.com/something/style.css
}
type: String
This option is used to determine what mime to associate with content-type in case the associating mime cannot be found in associateMime
.
type: Array[..."regex"]
This option determines whether to block the domain or not.
Because JSON doesn't like regexes, the regex is to be put inside a string and be passed into the function new RegExp(stringRegex)
. Therefore, the test function is (new RegExp(stringRegex)).test(targetDomain)
.
An example of blocking everything would be [^]*
and an example of blocking example.com
would be example\\.com
.
type: Array[..."regex"]
This option determines whether to allow the domain or not.
This is essentially the same as blacklistDomains
, but this takes precedence. A use case would be to have a wildcard blacklistDomain [^]*
to block everything and using this option to allow certain websites.
This software is released under GNU General Public License version 3. A copy of this license is available in the repository's root directory with a file named "LICENSE."
I may work on this project more, or I may not touch it for a couple of years. Either way, if you wish to contribute, make a pull request and I'll be more than happy to take a look at it.
Made with love,
Simon Cheng