diff --git a/README.md b/README.md index ecc05ea..ff34b1e 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,169 @@ -# PyXIE -A lightweight tracking pixel service written in Python +## PyXIE +### About +A lightweight [Tracking Pixel](https://en.wikipedia.org/wiki/Tracking_Pixel?wprov=srpw1_0) service written in Python. + +## Installation +### Quickstart using Docker +#### Pull the image from Dockerhub +```bash +user@shell> docker pull devdull/pyxie:latest +latest: Pulling from devdull/pyxie + +> snip < + +Status: Downloaded newer image for devdull/pyxie:latest +docker.io/devdull/pyxie:latest +``` + +#### Create a directory to store PyXIE's data +```bash +user@shell> mkdir data +``` + +#### Create your configuration file +When running PyXIE as a Docker image, it is recommended to set the `DATABASE_FILE` value in `config.yaml` to ensure that data is persisted between container restarts. Below is a minimal example. + +`config.yaml`: +```yaml +DATABASE_FILE: /app/data/uadb.json +API_KEYS: + - your-api-key-here + - a-different-api-key-here + - Another API key with spaces and a comma, but this might be hard to use later. +``` + +#### Run the image, mounting the data path and configuration file: +```bash +user@shell> docker run -d --mount type=bind,src="./config.yaml",dst="/app/config.yaml" --mount type=bind,src="./data",dst="/app/data" -p 5000:5000 devdull/pyxie:latest +``` + +#### Test the instance +```bash +user@shell> curl -X POST -H 'X-Api-Key: your-api-key-here' -d 'id=foo' 'http://localhost:5000/register' +Success +user@shell> ls -l data/ # Confirm the data file exists in the bound directory +total 8 +-rw-r--r-- 1 user staff 2043 Jul 8 11:57 uadb.json +``` + +#### Stuff the average user can ignore +The service inside the container is run using Gunicorn. To configure the bind IP and port, you can set the environment variables `LISTEN_IP` and `LISTEN_PORT`. These should not be confused for the configuration items used by Flask which can be defined in `config.yaml`. + +### Manual install using Flask (or Gunicorn) +#### Install the app requirements +```bash +user@shell> python3 -m venv .venv +user@shell> source .venv/bin/activate +user@shell> pip3 install -r requirements.txt +``` + +You should now be able to start PyXIE using Flask with the command `python3 pyxie.py` (listens on `127.0.0.1:5000`) or `gunicorn pyxie:pyxie` (listens on to `0.0.0.0:8000`) + +## Usage +### Configuration +Below is a minimal configuration file which lists out API keys. These keys should be long and difficult to guess. + +`config.yaml`: +```yaml +API_KEYS: + - your-api-key-here + - a-different-api-key-here + - Another API key with spaces and a comma, but this might be hard to use later. +``` + +Below is a complete list of user configurable settings: +|Configuration item|Default value|Details| +|---|---|---| +|`LISTEN_IP`|`127.0.0.1`|The IP address to listen on when running with Flask (omit for Docker, Gunicorn)| +|`LISTEN_PORT`|`5000`|The port number to listen on when running with Flask (omit for Docker, Gunicorn)| +|`API_KEYS`|`[]` (empty list)|A list of API keys that should be considered valid by PyXIE| +|`LOG_LEVEL`|`WARNING`|The logging level. Valid values are, `CRITICAL`, `ERROR`, `WARNING`, `INFO`, and `DEBUG`| +|`DATABASE_FILE`|`uadb.json`|The file that stores all pixel tracking data| +|`RRD_MAX_SIZE`|`10000`|Planned to be deprecated! The maximum number of records to keep for each `id`| + +### Register a new `id` +The purpose of an `id` is to enable the user to differentiate between the various places a tracking pixel has been embedded. For example, you would want a different `id` for tracking if a user saw an email versus tracking embedded into a specific webpage. + +Make a `POST` request to the `/register` endpoint which specifies your new `id` as a parameter using an API key specified in your configuration as the value for a `X-Api-Key` header. If successful, you should get a "Success" message and a status code of `201`. + +Here is an example that registers an `id` of `testing` for the service when it is running locally: +```bash +user@shell> curl -Ss -X POST -H 'X-Api-Key: your-api-key-here' -d 'id=testing' 'http://127.0.0.1:5000/register' +Success +``` + +If no `Success` message appears, nothing was registered. Double check your API key, your URL, and your port number. + +Using your registered `id` as a `GET` parameter, you should now be able to navigate to the tracking pixel in your browser. For the `id` of `testing` like in the above call, the URL to the tracking pixel would be `http://127.0.0.1:5000/?id=testing`. Any unregistered IDs will result in a "Not Found" message and a `404` status code. + +### Embed your tracking pixel +How you embed your pixel will depend on the document format, but here's an example for an HTML page: +```html + +``` + +Because the image is a transparent PNG a single pixel in size, it is unlikely to significantly interfere with the formatting of any website, but placing it at the bottom of a page should minimize any potential formatting issues. Specifying the width and height (like in the example or using CSS) should mitigate the likelihood of a broken image icon on your page should PyXIE go offline, or the `id` to be unregistered. + +### View or collect stats +Statistics are only viewable to individuals who have a valid API key, and can be accessed using the `/stats` endpoint. When successful, you should get valid JSON back as well as a status code of `200`. + +for example: +```bash +user@shell> curl -Ss -H 'X-Api-Key: your-api-key-here' 'http://127.0.0.1:5000/stats' | jq +{ + "browser_family_counts": { + "foo": { + "192.168.1.99": { + "Firefox": 1, + "curl": 1 + } + }, + "testing": { + "127.0.0.1": { + "Firefox": 3 + } + } + }, + "os_family_counts": { + "foo": { + "192.168.1.99": { + "Mac OS X": 1, + "Unknown": 1 + } + }, + "testing": { + "127.0.0.1": { + "Mac OS X": 3 + } + } + }, + "referrer_counts": { + "foo": { + "192.168.1.99": { + "Unknown": 2 + } + }, + "testing": { + "127.0.0.1": { + "Unknown": 3 + } + } + } +} +``` + +The data is structured in the following format (examples are from the first block in the above): +- Name of the data (e.g. `browser_family_counts`) + - an `id` you registered (e.g. `foo`) + - The IP address of the individual who viewed the tracking pixel (e.g. `192.168.1.99`) + - The value of the viewer data and the number of times that value has been seen (`Firefox` has been seen `1` time and `curl` has been seen `1` time) + +To put all of that together: One or more user at the IP address `192.168.1.99` saw a tracking pixel with an `id` of `foo`. Once with a "browser family" of `Firefox`, and another with `curl`. + +### Unregister an `id` +Note that unregistering an ID is destructive and all data for that `id` will be lost. If you wish to retain the data, make a copy of your datafile (e.g. `uadb.json`) first. If successful, you should get a "Success" message and a status code of `204`. + +```bash +user@shell> curl -Ss -X DELETE -H 'X-Api-Key: your-api-key-here' 'http://127.0.0.1:5000/unregister?id=testing' +Success +``` diff --git a/constfig.py b/constfig.py index cccd8b4..a98b5bd 100644 --- a/constfig.py +++ b/constfig.py @@ -13,6 +13,8 @@ def __init__(self): self.LISTEN_PORT = 5000 self.API_KEYS = [] self.LOG_LEVEL = "WARNING" + self.DATABASE_FILE = "uadb.json" + self.RRD_MAX_SIZE = 10000 # Maximum number of records in the database # Load user config (override defaults above) self.load_config() diff --git a/ddb.py b/ddb.py index 9a38cac..eab9fe7 100644 --- a/ddb.py +++ b/ddb.py @@ -150,7 +150,7 @@ def _get_id(self): return request.args.get("id") def register(self): - id = self._get_id() + id = request.form.get("id") if id in self: raise KeyError(f"ID {id} already registered") super().__setitem__(id, _DDB(max_size=self._max_size)) @@ -181,13 +181,13 @@ def _cleanup(self): for v in self.values(): v._cleanup() - def dump(self, filename="uadb.json"): + def dump(self, filename=C.DATABASE_FILE): with open(filename, "w") as fout: json.dump(self, fout, indent=2) fout.flush() fout.truncate() - def load(self, filename="uadb.json"): + def load(self, filename=C.DATABASE_FILE): try: with open(filename, "r") as fin: data = json.load(fin) diff --git a/pyxie.py b/pyxie.py index 7e63f7a..f4e4951 100644 --- a/pyxie.py +++ b/pyxie.py @@ -10,9 +10,7 @@ def _validate_api_key(): api_key = request.headers.get(C.HTTP_HEADER_X_API_KEY) - if api_key in C.API_KEYS: - return True - return False + return api_key in C.API_KEYS @pyxie.route("/register", methods=[C.HTTP_METHOD_POST]) @@ -52,7 +50,11 @@ def metrics(): @pyxie.route("/", methods=[C.HTTP_METHOD_GET]) def root(): - _data() + try: + _data() + except KeyError as e: + return "Not Found", 404 + return Response(C.ONE_BY_ONE, mimetype=C.HTTP_MIME_TYPE_PNG) diff --git a/run.sh b/run.sh index 321d983..6dc4fbb 100755 --- a/run.sh +++ b/run.sh @@ -5,7 +5,7 @@ if [ -z "$LISTEN_IP" ]; then fi if [ -z "$LISTEN_PORT" ]; then - export LISTEN_PORT=8000 + export LISTEN_PORT=5000 # Set to 5000 to match Flask's default and avoid confusion in the docs fi gunicorn --bind $LISTEN_IP:$LISTEN_PORT pyxie:pyxie