-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Docs for access controls system #15
- Loading branch information
Showing
1 changed file
with
124 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
## Access Control System | ||
|
||
The access controls system allows for a flexible configuration of rules to allow, block or exclude access to individual urls by | ||
longest-prefix match. | ||
|
||
### Access Control Files (.aclj) | ||
|
||
Access controls are set in one or more access control json files (.aclj), sorted in reverse alphabetical order. | ||
To determine the best match, a binary search is used (similar to CDXJ) lookup and then the best match is found forward. | ||
|
||
An .aclj file may look as follows: | ||
|
||
``` | ||
org,httpbin)/anything/something - {"access": "allow", "url": "http://httpbin.org/anything/something"} | ||
org,httpbin)/anything - {"access": "exclude", "url": "http://httpbin.org/anything"} | ||
org,httpbin)/ - {"access": "block", "url": "httpbin.org/"} | ||
com, - {"access": "allow", "url": "com,"} | ||
``` | ||
|
||
Each JSON entry contains an `access` field and the original `url` field that was used to convert to the SURT (if any). | ||
|
||
The prefix consists of a SURT key and a `-` (currently reserved for a timestamp/date range field to be added later) | ||
|
||
Given these rules, a user would: | ||
- be allowed to visit `http://httpbin.org/anything/something` (allow) | ||
- but would receive an 'access blocked' error message when viewing `http://httpbin.org/` (block) | ||
- would receive a 404 not found error when viewing `http://httpbin.org/anything` (exclude) | ||
|
||
#### Access Types: `allow`, `block`, `exclude` | ||
|
||
The available access types are as follows: | ||
|
||
- `exclude` - when matched, results are excluded from the index, as if they do not exist. User will receive a 404. | ||
- `block` - when matched, results are not excluded from the index, marked with `access: block`, but access to the actual is blocked. User will see a 451 | ||
- `allow` - full access to the index and the resource. | ||
|
||
The difference between `exclude` and `block` is that when blocked, the user can be notified that access is blocked, while | ||
with exclude, no trace of the resource is presented to the user. | ||
|
||
The use of `allow` is useful to provide access to more specific resources within a broader block/exclude rule. | ||
|
||
|
||
### Managing Access Lists | ||
|
||
The .aclj files need not ever be edited manually by the user. | ||
|
||
The pywb `wb-manager` utility has been extended to provide tools for adding, removing and checking access control rules. | ||
|
||
For example, to add the first line to an ACL file `access.aclj`, one could run: | ||
|
||
``` | ||
wb-manager acl add ./access.aclj http://httpbin.org/anything/something exclude | ||
``` | ||
|
||
The URL supplied can be a URL or a SURT prefix. If a SURT is supplied, it is used as is: | ||
|
||
``` | ||
wb-manager acl add ./access.aclj com, allow | ||
``` | ||
|
||
To remove a rule, one can run: | ||
|
||
``` | ||
wb-manager acl remove ./access.aclj http://httpbin.org/anything/something | ||
``` | ||
|
||
To import rules in bulk, such as from an OpenWayback-style excludes.txt and mark them as `exclude`: | ||
|
||
``` | ||
wb-manager acl importtxt ./accessl.aclj ./excludes.txt exclude | ||
``` | ||
|
||
See `wb-manager acl -h` for a list of additional commands such as for validating rules files and running a match against | ||
an existing rule set. | ||
|
||
### Configuring Access Controls | ||
|
||
For manually configured collections, access controls can be specified explicitly using the `acl_paths` key: | ||
|
||
Single ACLJ: | ||
``` | ||
collections: | ||
ukwa: | ||
acl_paths: ./path/to/file.aclj | ||
default_access: block | ||
``` | ||
|
||
Multiple ACLJ: | ||
``` | ||
collections: | ||
ukwa: | ||
acl_paths: | ||
- ./path/to/allows.aclj | ||
- ./path/to/blocks.aclj | ||
- ./path/to/other.aclj | ||
- ./path/to/directory | ||
default_access: block | ||
``` | ||
|
||
The `acl_paths` can be a single entry or a list, and can also include directories. If a directory is specified, all `.aclj` files | ||
in the directory are checked. | ||
|
||
When finding the best rule from muliple `.aclj` files, each file is binary searched and the result | ||
set merge-sorted to find the best match (very similar to the CDXJ index lookup). | ||
|
||
Note: It might make sense to separate `allows.aclj` and `blocks.aclj` into individual files for organizational reasons, | ||
but there is no difference for the system and no specific need to keep different rule types separate. | ||
|
||
#### Default Access | ||
|
||
An additional `default_access` setting can be added to specify the default rule if no other rules match. | ||
If omitted, this setting is `default_access: allow`. | ||
|
||
Setting `default_access: block` and providing a list of `allow` rules provides a flexible way to allow access | ||
to only a limited set of resources, and block access to anything out of scope by default. | ||
|
||
### Implementation | ||
|
||
The implementation of the access system is part of the [ukwa fork of pywb](https://github.com/ukwa/pywb) but will eventually be added | ||
to a future release of pywb! | ||
|
||
The fork contains unit tests for this system, and additional tests are part of the [Integration Test Suite](https://github.com/ukwa/ukwa-pywb/tree/master/integration-test) | ||
|