Skip to content

Commit

Permalink
Create access_controls.md
Browse files Browse the repository at this point in the history
Docs for access controls system #15
  • Loading branch information
ikreymer authored Mar 4, 2018
1 parent ba38998 commit cac2262
Showing 1 changed file with 124 additions and 0 deletions.
124 changes: 124 additions & 0 deletions docs/access_controls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
## Access Control System

The access controls system allows for a flexible configuration of rules to allow, block or exclude access to individual urls by
longest-prefix match.

### Access Control Files (.aclj)

Access controls are set in one or more access control json files (.aclj), sorted in reverse alphabetical order.
To determine the best match, a binary search is used (similar to CDXJ) lookup and then the best match is found forward.

An .aclj file may look as follows:

```
org,httpbin)/anything/something - {"access": "allow", "url": "http://httpbin.org/anything/something"}
org,httpbin)/anything - {"access": "exclude", "url": "http://httpbin.org/anything"}
org,httpbin)/ - {"access": "block", "url": "httpbin.org/"}
com, - {"access": "allow", "url": "com,"}
```

Each JSON entry contains an `access` field and the original `url` field that was used to convert to the SURT (if any).

The prefix consists of a SURT key and a `-` (currently reserved for a timestamp/date range field to be added later)

Given these rules, a user would:
- be allowed to visit `http://httpbin.org/anything/something` (allow)
- but would receive an 'access blocked' error message when viewing `http://httpbin.org/` (block)
- would receive a 404 not found error when viewing `http://httpbin.org/anything` (exclude)

#### Access Types: `allow`, `block`, `exclude`

The available access types are as follows:

- `exclude` - when matched, results are excluded from the index, as if they do not exist. User will receive a 404.
- `block` - when matched, results are not excluded from the index, marked with `access: block`, but access to the actual is blocked. User will see a 451
- `allow` - full access to the index and the resource.

The difference between `exclude` and `block` is that when blocked, the user can be notified that access is blocked, while
with exclude, no trace of the resource is presented to the user.

The use of `allow` is useful to provide access to more specific resources within a broader block/exclude rule.


### Managing Access Lists

The .aclj files need not ever be edited manually by the user.

The pywb `wb-manager` utility has been extended to provide tools for adding, removing and checking access control rules.

For example, to add the first line to an ACL file `access.aclj`, one could run:

```
wb-manager acl add ./access.aclj http://httpbin.org/anything/something exclude
```

The URL supplied can be a URL or a SURT prefix. If a SURT is supplied, it is used as is:

```
wb-manager acl add ./access.aclj com, allow
```

To remove a rule, one can run:

```
wb-manager acl remove ./access.aclj http://httpbin.org/anything/something
```

To import rules in bulk, such as from an OpenWayback-style excludes.txt and mark them as `exclude`:

```
wb-manager acl importtxt ./accessl.aclj ./excludes.txt exclude
```

See `wb-manager acl -h` for a list of additional commands such as for validating rules files and running a match against
an existing rule set.

### Configuring Access Controls

For manually configured collections, access controls can be specified explicitly using the `acl_paths` key:

Single ACLJ:
```
collections:
ukwa:
acl_paths: ./path/to/file.aclj
default_access: block
```

Multiple ACLJ:
```
collections:
ukwa:
acl_paths:
- ./path/to/allows.aclj
- ./path/to/blocks.aclj
- ./path/to/other.aclj
- ./path/to/directory
default_access: block
```

The `acl_paths` can be a single entry or a list, and can also include directories. If a directory is specified, all `.aclj` files
in the directory are checked.

When finding the best rule from muliple `.aclj` files, each file is binary searched and the result
set merge-sorted to find the best match (very similar to the CDXJ index lookup).

Note: It might make sense to separate `allows.aclj` and `blocks.aclj` into individual files for organizational reasons,
but there is no difference for the system and no specific need to keep different rule types separate.

#### Default Access

An additional `default_access` setting can be added to specify the default rule if no other rules match.
If omitted, this setting is `default_access: allow`.

Setting `default_access: block` and providing a list of `allow` rules provides a flexible way to allow access
to only a limited set of resources, and block access to anything out of scope by default.

### Implementation

The implementation of the access system is part of the [ukwa fork of pywb](https://github.com/ukwa/pywb) but will eventually be added
to a future release of pywb!

The fork contains unit tests for this system, and additional tests are part of the [Integration Test Suite](https://github.com/ukwa/ukwa-pywb/tree/master/integration-test)

0 comments on commit cac2262

Please sign in to comment.