Skip to content

tbz-pariv/tbzuploader

Repository files navigation

https://travis-ci.org/tbz-pariv/tbzuploader.svg?branch=master

tbzuploader - Generic HTTP Uploading

A lot of daily work is based on regular files.

tbzuploader is a command line tool which detects uploadable files and posts them via HTTP while conforming to the standardized HTTP Status Codes.

Upload Protocol

tbzuploader conforms to the generally accepted upload protocol.

201 Created

If the HTTP upload is successful, the server responds with 201 Created. The files will then be moved to a done directory.

400 Bad Request

If the HTTP upload is not successful and it is a client error (such as wrong files or corrupted files), the server responds with 400 Bad Request. The files will then be moved to a failed directory.

In case you want to inform an admin, specify an email address which gets notified in that case, because failed files won't be retried.

500 Internal Server Error and others

If the HTTP upload was not successful (such as an login page, outage, programming error or overload), the server responds with other status codes (such as 500 Internal Server Error). tbzuploader will then retry to post the files next time.

Features

  • pairs of arbitrary size (tuples, triplets, etc.)
    • For example you have four files: a.pdf, a.xml, b.pdf, b.xml
    • The first upload should take a.pdf and a.xml, and the second upload b.pdf and b.xml.
    • See the docs for --patterns.
  • mail to admin if broken files are uploaded

Use Case

Imagine you provide a modern solution (ReST/HTTP/SaaS) with a nice API and many many manhours invested into it. Unfortunately, many of your customers don't have any programming skills. The only thing they can do is providing files such as PDF documents, Excel workbooks, CSV tables, XML files etc. In the past, these files were imported using protocols like ftp, scp, windows shares (smb) and others.

The main problem with these dated protocols is the missing data validation on the receiving side (on your side!).

tbzuploader helps overcome this obstacle:

  1. First, you write a simple HTTP service which validates the uploaded files. If the data is valid, return 201 Success.
  2. Second, you tell the customer to use tbzuploader. It is a simple command line tool which works everywhere (on Linux, Windows, Mac, ...)

If the data of the customer is valid, the data will be imported.

If the data of the customer is not valid, the issue will stay where it belongs: on the sending side (on the client's side!).

Example

user@host> tbzuploader my-local-dir https://user:password@myhost/upload-files

This will upload files from directory my-local-dir to the specified URL.

If the upload was successful (server returned HTTP status 201 Created), then the local files in my-local-dir get moved to my-local-dir/done.

If the upload failed because the server rejects the files (400 Bad Request), then the local files in my-local-dir get moved to my-local-dir/failed.

If there was another error (network timeout, server overload, ...), the files stay in the current location and the next call of the command line tool will try to upload the files again.

Usage

>>> bin/tbzuploader --help
usage: tbzuploader [-h] [--patterns= LIST_OF_PATTERNS]
                   [--min-age-seconds MIN_AGE_SECONDS]
                   [--done-directory DONE_DIRECTORY]
                   [--failed-directory FAILED_DIRECTORY]
                   [--smtp-server SMTP_SERVER] [--mail-from MAIL_FROM]
                   [--mail-to MAIL_TO] [--all-files-in-one-request]
                   [--all-files-in-n-requests] [--no-ssl-cert-verification]
                   [--ca-bundle CA_BUNDLE] [--dry-run]
                   local_directory url

positional arguments:
  local_directory
  url                   URL can contain http-basic-auth like this:
                        https://apiuser:mypwd@example.com/input-process-
                        output/

optional arguments:
  -h, --help            show this help message and exit
  --patterns= LIST_OF_PATTERNS
                        List of file endings which should get uploaded
                        together. Example: --patterns="*.pdf *.xml" The pairs
                        (a.pdf, a.xml) and (b.pdf, b.xml) get uploaded
                        together
  --min-age-seconds MIN_AGE_SECONDS
                        Skip files which are too young. Default: 60
  --done-directory DONE_DIRECTORY
                        files get moved to this directory after successful
                        upload. Defaults to {local_directory}/done
  --failed-directory FAILED_DIRECTORY
                        files get moved to this directory after failed upload
                        due to broken files. Defaults to
                        {local_directory}/failed
  --smtp-server SMTP_SERVER
                        SMTP server which sends mails in case broken files
                        were tried to be uploaded.
  --mail-from MAIL_FROM
                        Sender of mails in case broken files were tried to be
                        uploaded.
  --mail-to MAIL_TO     Recipient of mails in case broken files were tried to
                        be uploaded.
  --all-files-in-one-request
                        Upload all files in one request (if you give not
                        --pattern). Upload all matching files in one request
                        (if you give --pattern)
  --all-files-in-n-requests
                        Upload all files in N requests (if you give not
                        --pattern). Upload all matching files in N requests
                        (if you give --pattern)
  --no-ssl-cert-verification
  --ca-bundle CA_BUNDLE
  --dry-run             Do not upload. Just print the pair of files which
                        would get uploaded together

Install

Install for usage from pypi:

pip install tbzuploader

Development Install on Python2

Install tbzuploader for development on Python2:

virtualenv tbzuploader-env
cd tbzuploader-env
. ./bin/activate
pip install -e git+https://github.com/guettli/tbzuploader.git#egg=tbzuploader

Development Install on Python3

Install tbzuploader for development on Python3:

python3 -m venv tbzuploader-py3env
cd tbzuploader-py3env
. ./bin/activate
pip install --upgrade pip
pip install -e git+https://github.com/guettli/tbzuploader.git#egg=tbzuploader

Development Testing

Testing:

pip install -r src/tbzuploader/requirements.txt
cd src/tbzuploader
pytest # all test ok?
pyCharm src/tbzuploader/...
pytest # all test still ok?
.... I am waiting for your pull request :-)

Protocol for resumable uploads

Unfortunately, tbzuploader does not support resumable uploads up to now.

There is already a spec for it.

It would very cool if tbzuploader could support this spec: https://tus.io/

Pull requests are welcome.

Trivia: Why 201?

Why using 201 Created instead of 200 Success?

In the beginning, we used 200 Success for "successful upload". A server misconfiguration caused a redirect to the login page, thus ignoring the uploaded files and returning a 200 Success. Since the upload was "successful", the files were moved into done erroneously.

That's why 201 Created is used to so signify "successful upload".

Releases

No releases published

Packages

No packages published

Languages