-
Notifications
You must be signed in to change notification settings - Fork 123
Backend
Due to the security restriction of the browser, several advanced features of WebScrapBook requires a running collaborating backend server to work. The backend server can be set up using our PyWebScrapBook, with basic instruction in Basic.
PyWebScrapBook supports many configuration that can be adjusted by editing config.ini
. You can run wsb help config
under CLI to see available configs. Below are some useful configurations:
The data structure of legacy ScrapBook X is to store captured pages under data/
and metadata under tree/
, while the default data structure of PyWebScrapBook is to store captured pages under root directory and metadata under .wsb/tree/
, so that all metadata are under .wsb/
and is easier to manage.
To use the default data structure of PyWebScrapBook, simply omit wsb config -ba
before running wsb serve
for a directory. If it has been run, you can edit .wsb/config.ini
and comment out [book ""]
section like this:
; [book ""]
; name = scrapbook
; top_dir =
; data_dir = data
; tree_dir = tree
; index = tree/map.html
; no_tree = false
Or reassign values for them, like:
[book ""]
name = scrapbook
top_dir =
data_dir =
tree_dir = .wsb/tree
index = .wsb/tree/map.html
no_tree = false
By default the backend server is allowed to access the special .wsb
directory. For better security and error-proofing, the root directory of the backend server can be configured to not include .wsb
.
For example, for the following directory structure:
C:\Users\MyUserName\ScrapBooks
C:\Users\MyUserName\ScrapBooks\.wsb
C:\Users\MyUserName\ScrapBooks\public
C:\Users\MyUserName\ScrapBooks\public\tree
C:\Users\MyUserName\ScrapBooks\public\data
below configuration uses C:\Users\MyUserName\ScrapBooks\public
as the root directory of the backend server and C:\Users\MyUserName\ScrapBooks\.wsb
is never accessible:
[app]
root = public
[book ""]
name = scrapbook
top_dir =
data_dir = data
tree_dir = tree
index = tree/map.html
no_tree = false
This will make the backup directory (whose default path is
.wsb/backup
) inaccessible from the web interface of the backend server. If it's not desired, configureapp.backup_dir
to a path underapp.root
, such aspublic/backup
.
For example, to host C:\Users\MyUserName\ScrapBooks
as the root directory of the backend server, with three scrapbooks, using the following directory structure:
C:\Users\MyUserName\ScrapBooks
C:\Users\MyUserName\ScrapBooks\scrapbook1
C:\Users\MyUserName\ScrapBooks\scrapbook2
C:\Users\MyUserName\ScrapBooks\scrapbook3
Enter CLI, change working directory to C:\Users\MyUserName\ScrapBooks
, and run wsb config -ba
to generate config files, and edit C:\Users\MyUserName\ScrapBooks\.wsb\config.ini
with text editor and modify [book ""]
to the following:
Above steps can be simplified to running
wsb --root "C:\Users\MyUserName\ScrapBooks" config -bae
.
[book ""]
name = Book1
top_dir = scrapbook1
data_dir =
tree_dir = .wsb/tree
index = .wsb/tree/map.html
no_tree = false
[book "scrapbook2"]
name = Book2
top_dir = scrapbook2
data_dir =
tree_dir = .wsb/tree
index = .wsb/tree/map.html
no_tree = false
[book "scrapbook3"]
name = Book3
top_dir = scrapbook3
data_dir =
tree_dir = .wsb/tree
index = .wsb/tree/map.html
no_tree = false
And run C:\Users\MyUserName\ScrapBooks\.wsb\serve.py
to start the backend server. Different scrapbooks can be switched form the sidebar (their display name are determined by the name
s above).
Items can be transfered between scrapbooks. Just select items to transfer and use the Copy to...
command to copy them to another scrapbook. Or open a manage window using the Manage
command, and select items to transfer and drag them into the other window.
Alternatively, items can be exported into an archive file, and then imported into another scrapbook.
When the server is run behined a reverse proxy, it may require information through X-Forwarded-For
, X-Forwarded-Host
, and X-Forwarded-Prefix
headers set by the reverse proxy to work correctly.
These headers are by default ignored, and corresponding configs need to be set to consume them. Note that not all proxies set all these headers, and the corresponding config should not be set if there's not really a proxy that sets the header, to prevent a security issue if the client provides a faked header.
If https://example.com/
is served by an nginx server which pass https://scrapbooks.example.com/
to http://127.0.0.1:8000/
served by PyWebScrapBook, the nginx server can set following headers:
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Port $server_port;
}
And the PyWebScrapBook application can be configured like this:
[app]
...
allowed_x_for = 1
allowed_x_proto = 1
allowed_x_host = 1
allowed_x_port = 1
allowed_x_prefix = 0
If https://example.com/
is served by an nginx server which pass https://example.com/scrapbooks/
to http://127.0.0.1:8000/
served by PyWebScrapBook, the nginx server can set following headers:
location /scrapbooks/ {
rewrite ^/scrapbooks/(.*)$ /$1 break;
proxy_pass http://127.0.0.1:8000;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Port $server_port;
proxy_set_header X-Forwarded-Prefix /scrapbooks;
}
And the PyWebScrapBook application can be configured like this:
[app]
...
allowed_x_for = 1
allowed_x_proto = 1
allowed_x_host = 1
allowed_x_port = 1
allowed_x_prefix = 1