Skip to content
Matthieu Crapet edited this page Mar 24, 2016 · 10 revisions

Implement your own modules

Plowshare is designed with modularity in mind, so it should be easy for other programmers to add new modules. Study the code of any of the existing modules (i.e. 2shared) and create your own.

Some hosters are exporting a public API (formalized way for downloading or uploading), if it is available, it can save you lots of time calling this API, instead of simulating a web browser. For example: HotFile.

Table of content:

Script template

Each module implements services for one sharing site:

  • anonymous download
  • free/premium account download
  • anonymous upload (if allowed from host)
  • free/premium account upload
  • free/premium account remote upload (if available from host)
  • delete or kill url (anonymous or not)
  • shared folder (and sub-folders) list (if available from host)

The module must declare the following global variables:

MODULE_XXX_REGEXP_URL

Depending module features, some additional variables should also be declared:

MODULE_XXX_DOWNLOAD_OPTIONS
MODULE_XXX_DOWNLOAD_RESUME
MODULE_XXX_DOWNLOAD_FINAL_LINK_NEEDS_COOKIE
MODULE_XXX_DOWNLOAD_SUCCESSIVE_INTERVAL
# Rare use, give additional curl options
MODULE_XXX_DOWNLOAD_FINAL_LINK_NEEDS_EXTRA=()

MODULE_XXX_UPLOAD_OPTIONS
MODULE_XXX_UPLOAD_REMOTE_SUPPORT

MODULE_XXX_DELETE_OPTIONS

MODULE_XXX_LIST_OPTIONS
MODULE_XXX_LIST_HAS_SUBFOLDERS

MODULE_XXX_PROBE_OPTIONS

Where XXX is the name of module (uppercase). No other global variable declaration is allowed.

Module must export one to five entries point:

  • xxx_download()
  • xxx_upload()
  • xxx_delete()
  • xxx_list()
  • xxx_probe()

Downloading function

Prototype is:

xxx_download() {
    local -r COOKIE_FILE=$1
    local -r URL=$2

    ...
}

Notes:

  • xxx is the name of the plugin: src/modules/xxx.sh.
  • xxx must not contain points, use underscores instead.
  • Never call curl_with_log function here, use curl.

Arguments:

  • $1: cookie file (empty content at start, use it with curl)
  • $2: URL string (for example http://x7.to/fwupja)

Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowdown will take care of this.

When a link is correct, function should return 0 and echo one or two arguments, corresponding to file URL and filename:

echo "$FILE_URL"
echo "$FILENAME"

$FILENAME can be empty, or even not echoed at all. If so, plowdown will guess filename from provided $FILE_URL.

If cookie file is required for final download MODULE_XXX_DOWNLOAD_FINAL_LINK_NEEDS_COOKIE must be set to yes.

File URL must return the final link (that's it, a link that return a 200 HTTP code, without redirection). Use curl -I and grep_http_header_location when necessary.

Note: $FILE_URL will be encoded right after. So don't bother about weird characters. For example: spaces chars will be translated to %20 for you.

Possible return values

Module can return the following codes:

  • 0: Everything is ok (arguments have to be echoed, see below).
  • $ERR_FATAL: Unexpected result (upstream site updated, etc).
  • $ERR_LOGIN_FAILED: Correct login/password argument is required.
  • $ERR_LINK_TEMP_UNAVAILABLE: Link alive but temporarily unavailable.
  • $ERR_LINK_PASSWORD_REQUIRED: Link alive but requires a password (password protected link).
  • $ERR_LINK_NEED_PERMISSIONS: Link alive but requires some authentication (private or premium link).
  • $ERR_LINK_DEAD: Link is dead (we must be sure of that). Each download function should return this value at least one time.
  • $ERR_SIZE_LIMIT_EXCEEDED: Can't download link because file is too big (need permissions, probably need to be premium).
  • $ERR_EXPIRED_SESSION: When cache is used. See storage_get, storage_set and storage_reset.

Additional error codes (returned by plowdown only, module download function should not return these):

  • $ERR_NOMODULE: No module available for provided link. Hoster is not supported yet!
  • $ERR_NETWORK: Specific network error (socket reset, curl, etc).
  • $ERR_SYSTEM: System failure (missing executable, local filesystem, wrong behavior, etc).
  • $ERR_CAPTCHA: Captcha solving failure.
  • $ERR_MAX_WAIT_REACHED: Countdown timeout (see -t/--timeout command line option).
  • $ERR_MAX_TRIES_REACHED: Max tries reached (see -r/--max-retries command line option).
  • $ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.

Guidelines

  • If hoster asks to try again later (and you don't know how much time to wait): download function must return $ERR_LINK_TEMP_UNAVAILABLE.
  • If hoster asks to try again later (and you do know how much time to wait): download function must echo wait time (in seconds) and return $ERR_LINK_TEMP_UNAVAILABLE.
  • Respect time waits even if the download seems to work without them. Don't hammer website!
  • Try to force english language in the website (usually using a cookie), if your are going to parse human messages (it's better to parse HTML nodes, though).
  • If you provide premium download, bad login must lead to an error ($ERR_LOGIN_FAILED). No fallout to anonymous download must be made (even if remote web site accepts it).
  • MODULE_XXX_DOWNLOAD_SUCCESSIVE_INTERVAL global variable contain delay value (in seconds) used when two successive downloads (links of the same hoster) are performed. Some hosters may behave nasty (force user to wait, declare link as dead, or sometimes worst) when successively downloading a bunch of links.

Uploading function

Prototype is:

xxx_upload() {
    local -r COOKIE_FILE=$1
    local -r FILE=$2
    local -r DESTFILE=$3

    ...

    PAGE=$(curl_with_log ...) || return

    ...
}

Notes:

  • xxx is the name of the plugin: src/modules/xxx.sh.
  • xxx must not contain points, use underscores instead.
  • Use curl_with_log function only one time for the file upload (it's quite conveniant to see progress), otherwise use simply curl.

Arguments:

  • $1: cookie file (empty content at start, use it with curl)
  • $2: local filename (with full path) to upload or (remote) URL
  • $3: remote filename (no path)

Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowup will take care of this.

When requested file has been successfully uploaded, function should return 0 and echo one or three lines.

echo "$DL_URL"
echo "$DEL_URL"
echo "$ADMIN_URL_OR_CODE"

$DEL_URL and $ADMIN_URL_OR_CODE are optional (can be empty or not echoed at all).

Example1 (seen in depositfiles module):

echo "$DL_LINK"
echo "$DEL_LINK"

Example2 (seen in 2shared module):

echo "$FILE_URL"
echo
echo "$FILE_ADMIN"

Possible return values

Module can return the following codes:

  • 0: Success. File successfully uploaded.
  • $ERR_FATAL: Unexpected result (upstream site updated, etc).
  • $ERR_LINK_NEED_PERMISSIONS: Authentication required (for example: anonymous users can't do remote upload).
  • $ERR_LINK_TEMP_UNAVAILABLE: Upload service seems temporarily unavailable from upstream. Note: This status does not affect retry number (see -r/--max-retries command line option) but timeout if specified (see -t/--timeout command line option).
  • $ERR_SIZE_LIMIT_EXCEEDED: Can't upload too big file (need permissions, probably need to be premium).
  • $ERR_LOGIN_FAILED: Correct login/password argument is required.
  • $ERR_ASYNC_REQUEST: Asynchronous remote upload started.
  • $ERR_EXPIRED_SESSION: When cache is used. See storage_get, storage_set and storage_reset.

Additional error codes (returned by plowup only, module upload function should not return these):

  • $ERR_NOMODULE: Specified module does not exist or is not supported.
  • $ERR_NETWORK: Specific network error (socket reset, curl, etc).
  • $ERR_SYSTEM: System failure (missing executable, local filesystem, wrong behavior, etc).
  • $ERR_MAX_WAIT_REACHED: Countdown timeout (see -t/--timeout command line option).
  • $ERR_MAX_TRIES_REACHED: Max tries reached (see -r/--max-retries command line option).
  • $ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.

Guidelines

  • Remember that $2 can also be a remote file. It should be checked with match_remote_url. Most of the time, remote upload feature is only available for premium users. If module do not support this put on top of file: MODULE_xxx_UPLOAD_REMOTE_SUPPORT=no.
  • Upload file size if usually limited (can be quite low for anonymous upload). Dealing with it could be nice for user! For example:
MAX_SIZE=... # hardcoded value or parse it in html page (if possible)
SIZE=$(get_filesize "$FILE")
if [ $SIZE -gt $MAX_SIZE ]; then
    log_debug "file is bigger than $MAX_SIZE"
    return $ERR_SIZE_LIMIT_EXCEEDED
fi

Deleting function

Prototype is:

xxx_delete() {
    local -r COOKIE_FILE=$1
    local -r URL=$2

    ...
}

Notes:

  • xxx is the name of the plugin: src/modules/xxx.sh
  • xxx must not contain points, use underscores instead
  • Never call curl_with_log function here, use curl.

Argument:

  • $1: cookie file (empty content at start, use it with curl)
  • $2: kill/admin URL string

Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowdel will take care of this.

There is not output for this function. When file has been successfully deleted, function should return 0.

Possible return values

Module can return the following codes:

  • 0: Success. File successfully deleted.
  • $ERR_FATAL: Unexpected result (upstream site updated, etc).
  • $ERR_LOGIN_FAILED: Authentication failed (bad login/password).
  • $ERR_LINK_NEED_PERMISSIONS: Authentication required (anonymous users can't delete files).
  • $ERR_LINK_PASSWORD_REQUIRED: Link requires an admin or removal code.
  • $ERR_LINK_DEAD: Link is dead. File has been previously deleted.

Additional error codes (returned by plowdel only, module delete function should not return these):

  • $ERR_NOMODULE: No module available for provided link.
  • $ERR_NETWORK: Specific network error (socket reset, curl, etc).
  • $ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.

Guidelines

  • On success operation (return 0), don't print a message; plowdel will log_notice for you.

Listing function

Prototype is:

xxx_list() {
    local -r URL=$1
    local -r RECURSE=${2:-0}

    ...
}

Notes:

  • xxx is the name of the plugin: src/modules/xxx.sh
  • xxx must not contain points, use underscores instead
  • Never call curl_with_log function here, use curl.

Arguments:

  • $1: list URL (aka root folder URL)
  • $2: list link and recurse subfolders (if any). If $2 is empty string, the option has not been selected.

As result, function should return 0 and echo a list of two lines.

echo "$FILE_URL"
echo "$FILENAME"

$FILENAME can be empty, but echo must be done. But you usually have more that one link in the folder, so it can be complex to echo pair of line in a while loop. To simplify process, you should use list_submit() API.

Example (seen in depositfiles module):

PAGE=$(curl "$URL") || return

LINKS=$(echo "$PAGE" | parse_all_attr_quiet 'class="dl" align="center' href)
NAMES=$(echo "$PAGE" | parse_all_attr_quiet 'class="dl" align="center' title)

list_submit "$LINKS" "$NAMES" || return

list_submit() can also accept an optional third argument: a link prefix (string) to prepend to file link. This is useful when the parsed links are relative.

Example (seen in mediafire module):

...
NAMES=$(echo "$DATA" | parse_all_tag filename)
LINKS=$(echo "$DATA" | parse_all_tag quickkey)

list_submit "$LINKS" "$NAMES" 'http://www.mediafire.com/?' || return

list_submit() can even accept an optional fourth argument: a link suffix (string) to append to file link. This is useful when the parsed links are relative.

Example (seen in turbobit module):

...
NAMES=$(parse_all ...
LINKS=$(parse_json 'id' 'split' <<< "$JSON")

list_submit "$LINKS" "$NAMES" 'http://turbobit.net/' '.html' || return

Possible return values

Module can return the following codes:

  • 0: Success. Folder contain one or several files.
  • $ERR_FATAL: Unexpected content (not a folder, parsing error, etc).
  • $ERR_LINK_TEMP_UNAVAILABLE: Links are temporarily unavailable (can't be listed actually). This is used by mirroring/multi-upload services (uploads are still beeing processed).
  • $ERR_LINK_PASSWORD_REQUIRED: Folder is password protected.
  • $ERR_LINK_DEAD: Folder has been deleted or does not exist or is empty.

Additional error codes (returned by plowlist only, module list function should not return these):

  • $ERR_NOMODULE: No module available for provided link.
  • $ERR_NETWORK: Specific network error (socket reset, curl, etc).
  • $ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.

Guidelines

  • If hoster support subfolders: declare on top on module source: MODULE_xxx_LIST_HAS_SUBFOLDERS=yes.
  • If hoster doesn't have subfolders capability (this includes mirroring/multi-upload services): declare on top on module source: MODULE_xxx_LIST_HAS_SUBFOLDERS=no.
  • You should notify with a log_error message if module (so, it's on plowshare's side) doesn't support recursive subfolders option. For example in zalaa module:
test "$2" && log_error 'Recursive flag not implemented, ignoring'
  • When recursing sub folders, don't echo folder URL (but you can log_debug it)
  • When recurse subfolders option is enabled: $ERR_LINK_DEAD means that there is no file in all folders.
  • When recurse subfolders option is disabled: $ERR_LINK_DEAD means that there is no file in the root folder. There might be files in sub folders.

Probing function

Prototype is:

xxx_probe() {
    local -r COOKIE_FILE=$1
    local -r URL=$2
    local -r REQ_IN=$3

    local REQ_OUT

    ...
}

Notes:

  • xxx is the name of the plugin: src/modules/xxx.sh
  • xxx must not contain points, use underscores instead
  • Never call curl_with_log function here, use curl.

Arguments:

  • $1: cookie file (empty content at start, use it with curl)
  • $2: download URL to check
  • $3: capability list. One character is one feature.

Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowprobe will take care of this.

Capabilities

  • c: link is alive (usually: 0 for ok or $ERR_LINK_DEAD ko, see below for details)
  • f: file name
  • i: fileid (usually included in url)
  • s: file size (in bytes, no prefix/suffix). Use translate_size helper function for converting if necessary.
  • h: file hash (md5, sha1,.. hexstring format). If several algorithms are available, always use the longest digest (for example: sha1 is preferred to md5).
  • t: file timestamp (unspecified time format)
  • v: refactored file url (can be different from input url, for example short hostname or https redirections)

Of course depending hosters, this is not always possible to get access to these information.

When a link is correct, function should return 0 and echo check link char:

echo 'c'
return 0

If you can parse filename, you can return this way:

echo "$FILE_NAME"
echo 'cf'
return 0

Even better, if you can parse filename and filesize, you can return this way:

echo "$FILE_NAME"
echo "$FILE_SIZE"
echo 'cfs'
return 0

OR

echo "$FILE_SIZE"
echo "$FILE_NAME"
echo "csf"
return 0

Order is given by last argument (a variable usually called REQ_OUT).

Possible return values

Module can return the following codes:

  • 0: Success. Link is alive (arguments have to be echoed, see below).
  • $ERR_FATAL: Unexpected content (upstream updated, parsing error, etc).
  • $ERR_LINK_DEAD: Link is dead, no more information can be returned.

Additional error codes (returned by plowprobe only, module list function should not return these):

  • $ERR_NOMODULE: No module available for provided link.
  • $ERR_NETWORK: Specific network error (socket reset, curl, etc).
  • $ERR_BAD_COMMAND_LINE: Unknown command line parameter or incompatible options.

Hash policy

Some hosters are able to return more that one hash (for example: md5 and sha1). In that case %h must return the strongest algorithm. A module option can be added to change %h behaviour (like --md5).

Guidelines

  • Probe function should be fast and efficient. One single curl request is advised.
  • Using javascript is strongly discouraged.

Output debug messages (stderr)

Do not use echo which is reserved for function return value(s). Use log_debug() or log_error(). You can use -vN command line option switch to change debug verbosity.

Note: An intermediate verbosity level exists: log_notice(), it is reserved to core functions, do not use it inside modules.

Function: curl

This is probably the most important command in plowshare API set. This wrapper function is calling curl real binary (let's call it true-curl)

Arguments:

  • $1 ... $n : true-curl command-line arguments
  • $?: 0 for success or $ERR_NETWORK, $ERR_SYSTEM

Note: curl_with_log is calling curl but force verbose level to 3. This is a specific usage for module upload function (should be called one time only).

It's a good habit to always append || return for error handling.

Examples:

PAGE1=$(curl "http://www.google.com") || return

# Get remote content and take cookies (if any)
PAGE2=$(curl -c "$COOKIE_FILE" "$URL") || return

# Get remote content, provides and append cookie entries
PAGE3=$(curl -c "$COOKIE_FILE" -b 'lang=en' "$URL") || return
PAGE4=$(curl -c "$COOKIE_FILE" -b "$COOKIE_FILE" "$URL") || return

PAGE5=$(curl "${URL}?param=1") || return
# or
PAGE5=$(curl --get --data 'param=1' "$URL") || return

Notes:

  • curl will add a valid User-Agent for you.
  • curl exit codes are mapped to plowshare error codes. Human debug message have been added too.
  • curl are mapping implicitly plowdown (or plowup) command-line switches (--interface, --max-rate, ...)

Use correctly cookies

true-curl can handle one --cookie-jar/-c option and one --cookie/-b option:

PAGE=$(curl -c "$COOKIE_FILE_1" -b "$COOKIE_FILE_2"  http://...) || return

$COOKIE_FILE_2: entries will be read from file and set in the HTTP request header:

Cookie: key=value... 

$COOKIE_FILE_1: entries will be returned from HTTP server and written to file:

Set-Cookie: key=value... 

$COOKIE_FILE_1 and $COOKIE_FILE_2 can be the same filename.

true-curl does not handle multiple --cookie/-b switches, but you can only have one string (key=value) and one file argument. These are source entries (read only) given to HTTP protocol (Cookie: header).

Example 1 (last -b switch will be used only):

curl -b "$COOKIE_FILE_1" -b "$COOKIE_FILE_2" http://...
// $COOKIE_FILE_1 will be ignored

Example 2 (last -b switch will be used only):

curl -b 'lang=english' -b 'user=foo' http://...
// 'lang' cookie entry will be ignored

Example 3:

curl -b 'lang=english' -b "$COOKIE_FILE" http://...
// correct example

curl -b "$COOKIE_FILE" -b 'lang=english' -b 'user=foo' http://...
// 'lang' cookie entry will be ignored

Temporary files are deleted in case of error

First example using -H/--dump-headers:

HEADERS=$(create_tempfile) || return
HTML=$(curl -H "$HEADERS" http://...) || return
rm -f "$HEADERS"

If something goes wrong in curl (network issue or anything else), $HEADERS will be deleted for you.

Remember, it's only if an error occurs. On curl's success nothing is deleted (as expected).

Another classic example if using -o/--output:

CAPTCHA_URL='http://...'
CAPTCHA_IMG=$(create_tempfile '.png') || return
curl -o "$CAPTCHA_IMG" "$CAPTCHA_URL" || return
...
rm -f "$CAPTCHA_IMG"

If something append when retrieving captcha image, curl will delete temporary file for you.

Split long data string

Here is a first case with a POST request and content type application/x-www-form-urlencoded.

DATA="action=validate&uid=123456&recaptcha_challenge_field=$CHALLENGE&recaptcha_response_field=$WORD"
RESULT=$(curl -b "$COOKIE_FILE" --data "$DATA" "$URL") || return

Consider passing several -d/--data argument instead of one (order is not important).

RESULT=$(curl -b "$COOKIE_FILE" -d 'action=validate' \
    -d "uid=123456" \
    -d "recaptcha_challenge_field=$CHALLENGE" \
    -d "recaptcha_response_field=$WORD" \
    "$URL") || return

It is better for maintenance.

Second example with a GET request:

URL='http://ab19.hostmyfile.net/upload'
RESULT=$(curl "$URL?X-Progress-ID=12345&premium=1") || return

Can be written in a better way:

URL='http://ab19.hostmyfile.net/upload'
RESULT=$(curl --get -d 'X-Progress-ID=12345' -d 'premium=1' "$URL") || return

Auxiliar functions

You can see a full list of plowshare public API here.

core.sh script provides usual auxiliar functions.

Do not use But use
basename basename_file
grep -o "^http://[^/]*" basename_url
sleep wait (must always be ORed with return keyword)
grep, grep -i, grep -q match and matchi
sed, awk, perl parse_* or replace_all, replace
head -n1, tail -n1 first_line, last_line
mktemp, tempfile create_tempfile
tr '[A-Z]' '[a-z]' lowercase
tr '[a-z]' '[A-Z]' uppercase
sed ... strip (delete leading and trailing spaces, tabs), delete_last_line
js detect_javascript and javascript
stat -c %s get_filesize
$RANDOM or $$ random
md5sum md5 or md5_file
wget curl

Goal here, is not calling non portable commands in modules.

Function first_line

Arguments:

  • $1: (optional): how many head lines to take (default is 1). This must be a strictly positive integer.
  • stdin: input data (multiline text)

Results:

  • $?: 0 on success or $ERR_FATAL (bad argument)
  • stdout: result

Examples:

$ echo "$BUFFER1"
line a
line b
line c
line d

$ echo "$BUFFER1" | first_line
line a

$ echo "$BUFFER1" | first_line 3
line a
line b
line c

Function delete_first_line

Arguments:

  • $1: (optional): how many head lines to delete (default is 1). This must be a strictly positive integer.
  • stdin: input data (multiline text)

Results:

  • $?: 0 on success or $ERR_FATAL (bad argument)
  • stdout: result

Examples:

$ echo "$BUFFER1"
line a
line b
line c
line d

$ echo "$BUFFER1" | delete_first_line
line b
line c
line d

$ echo "$BUFFER1" | delete_first_line 2
line c
line d

Function: post_login

It is a useful function for registered accounts because ID information is stored inside cookie. This function will send the HTML form for you, It takes 4 or 5 arguments.

Arguments:

  • $1: authentication string 'username:password' (password can contain semicolons)
  • $2: cookie file (system existing file)
  • $3: string to post (can contain keywords: $USER and $PASSWORD)
  • $4: URL
  • $5..$n (optional): Additional curl arguments
  • stdin: input data (text)

Example:

# comes from command line
AUTH="mylogin:mypassword"

# important: notice simple quote, $USER and $PASSWORD must not be interpreted.
LOGIN_DATA='login=1&redir=1&username=$USER&password=$PASSWORD'
LOGIN_URL="https://xxx.com/login.php"

# or simply use $(create_tempfile)
COOKIES=/tmp/my_cookie_file

post_login "$AUTH" "COOKIES" "$LOGIN_DATA" "$LOGIN_URL" >/dev/null

Results:

  • $?: 0 for success; $ERR_NETWORK, $ERR_LOGIN_FAILED for error (no cookie return)
  • stdout: HTML result of POST request

A common usage is (snippet taken from filesonic module):

LOGIN_RESULT=$(post_login "$AUTH" "$COOKIE_FILE" "$LOGIN_DATA" \
    'http:///www.fileserve.com/login.php') || return

If no password is provided, post_login will prompt for one.

Warning: Having $?=0 does not mean that your account is valid, it just means that the request (in a HTTP protocol point of view) have been successful. For detecting bad login/password, you'll have to parse returned HTML content or sometimes cookie file.

Note: Sometimes, parsing LOGIN_RESULT can be useful to distinguish free account from premium account. Sometimes parsing cookie (looking for specific entry in it) can help too.

Use case 1 (seen in netload.in module)

An empty $LOGIN_RESULT is not necessarily an error. You can get for example a HTTP redirection. You could eventually follow this redirection by giving '-L' option to curl:

LOGIN_RESULT=$(post_login "$AUTH" "$COOKIE_FILE" "$LOGIN_DATA" \
    "$BASEURL/login.php" -L) || return

Use case 2 (seen in mediafire module)

You already have valid entries in $COOKIEFILE (language for example) and you want keeping them.

LOGIN_RESULT=$(post_login "$AUTH_FREE" "$COOKIEFILE" "$LOGIN_DATA" \
    "$BASE_URL/dynamic/login.php?popup=1" -b "$COOKIEFILE") || return

Without this additional -b "$COOKIEFILE" given to curl, cookie file would be overwritten.

Functions: match and matchi

Arguments:

  • $1: match regexp (like grep)
  • $2: input data (text)

Results:

  • $?: 0 for success; not null any error
  • stdout: nothing!

'I' letter stand for case-insensitive match.

Regexp are basic posix (BRE syntax). Reserved characters (to escape) are: . ** [ ] $ ^ \.

Coding convention is to use the shortest write:

match 'foo' "$HTML_PAGE" && ...        // right
$(match 'foo' "$HTML_PAGE") && ...     // wrong (useless subshell creation)
match '\(foo\)' "$HTML_PAGE" && ...    // wrong (useless parenthesis)

if (! match 'You are ' "$HTML"); then  // wrong (useless subshell creation)
    ...
fi

Typical use:

if ! match '/js/myfiles\.php/' "$PAGE"; then
    log_error "not a folder"
    return $ERR_FATAL
fi
if match '<h1>Delete File?</h1>' "$PAGE"; then
   ...
fi
if match '/error\.php?code=25[14]' "$LOCATION"; then
    return $ERR_LINK_DEAD
fi

Simple examples:

match '[0-9][0-9]\+' 'Wait 19 seconds'       // true
match '[0-9][0-9]\+' 'Wait 9 seconds'        // false
match 'times\?' 'One time ago'               // true
match 's/n' 'yes/no'                         // true
match '(euros)' '3.5 (euros)'                // true
match '\[euros\]' '3.5 [euros]'              // true

More examples (seen in modules):

match '^http://download' "$LOCATION"   // ^ matches beginning of line
match 'errno=999$' "$LOCATION"         // $ matches end of line
match '.*/#!index|' "$URL"             // . means any character
match 'File \(deleted\|not found\|ID invalid\)' "$ERROR"

// Character classes can be used too (see POSIX bracket expressions)
match 'Password:[[:space:]]*<input' "$HTML"

Functions: parse, parse_all, parse_quiet, parse_all_quiet

The first function will return first match, second one will return all matches (multiline result). sed command is internally used here.

Arguments:

  • $1: filter regexp (lines to stop; . or empty to stop on every line)
  • $2: parse regexp (enclose with ( ) to retrieve match)
  • $3 (optional): number of line to skip (default is 0)
  • stdin: input data (text)

Results:

  • $?: 0 on success or $ERR_FATAL (non matching or empty result)
  • stdout: parsed content (non null string)

Regexp are basic posix (BRE syntax). Reserved characters (to escape) are: . * [ ] $ ^ \.

Note: Remember that Bash can interpret some symbols in double quoted strings. The following characters must be escaped: $ (dollar sign), " (double quote), backticks character. Also ! (exclamation sign) must be escaped if Bash history expansion is enabled. Use simple quote string it's easier!

Examples:

ID=$(echo "$HTML_PAGE" | parse 'name="freeaccountid"' 'value="\([[:digit:]]*\)"')
HOSTERS=$(echo "$FORM" | parse_all 'checked' '">\([^<]*\)<br')
MSG=$(echo "$RESPONSE" | parse_quiet "ERROR:" "ERROR:[[:space:]]*\(.*\)")

Example using $ (end-of-line) meta-character:

# Parse: [key]='7be8933035d221026ff2245be258c763'; 
# Notes:
# - Don't forget to escape `[` in the match regexp.
# - [:cntrl:] is used here to match `\r` because answer comes from an Windows server.
# - `$` matches end of line.
HASH=$(echo "$PAGE" | 'Array\.downloads\[' "\]='\([[:xdigit:]]\+\)';[[:cntrl:]]$")

Always keep in mind that parsing is greedy. So within a line, last occurrence will be taken. For example:

# Usual greedy behavior. Result: 789
echo 'value=123, value=456, value=789' | parse . '=\([^,]\+\)'
# Modify regex to get second value. Result: 456
echo 'value=123, value=456, value=789' | parse . '=\([^,]\+\),'
# Modify regex to get first value. Result: 123
echo 'value=123, value=456, value=789' | parse . '^value=\([^,]\+\)'

FIXME: Add example with ^

Should I use parse or parse_quiet?

Use xxx_quiet functions when parsing failure is a normal behavior, for example, parsing an optional value.

Typical use:

OPT_RESULT=$(echo "$HTML_PAGE" | parse_quiet 'id="completed"' '">\([^<]*\)<\/font>')

If you actually require a result, do not use xxx_quiet. This way you'll get a sed error message if parse fails, i.e. when your parse regexp did not capture anything.

Typical use:

WAIT_TIME=$(echo "$HTML_PAGE" | parse '^[[:space:]]*count=' "count=\([[:digit:]]\+\);") || return

Note: Don't use these functions for HTML parsing. Consider using parse_tag and parse_attr functions family (see below Parsing HTML markers and Parsing HTML attributes).

Non zero offset

Use the offset whenever filter regexp and parse regexp are not on the same line. A positive value will skip ahead the specified number of line whiles a negative value will apply your parse regexp to a line before the one that matched your filter regexp. See the following examples:

<div class="dl_filename">
FooBar.tar.bz2</div>

We can get the right line with filtering with dl_filename and apply your filename regexp on the second line (the line after). This will give:

echo "$PAGE" | parse 'dl_filename' '\([^<]*\)' 1

Example 2:

function js_fff() {
    R4z5sjkNo = "http://...";
    DelayTime = 60;
...

Get URL with:

DL_LINK=$(echo "$PAGE" | parse 'js_fff' '"\([^"]\+\)";' 1) || return

Get counter value with:

COUNT=$(echo "$PAGE" | parse 'js_fff' '=[[:space:]]*\([[:digit:]]\+\)' 2) || return

Example 3 (negative offset):

<TD><input type="checkbox" name="file_id" value="123456"></TD>
<TD align=left><a href="http://...">FooBar.tar.bz2</a></TD>

To get the file ID that belongs to a known URL you can use:

FILE_ID=$(echo "$PAGE" | parse "$URL" '^\(.*\)$' -1 | parse_form_input_by_name 'file_id') || return

First retrieve the whole line that is directly before the one containing the known URL. Then parse the file ID with one of plowshare's form parsing functions (see below Parsing HTML forms).

Function: basename_url

Get basename (hostname) of an URL.

Argument:

  • $1: string (URL)

Result:

  • $?: always 0
  • stdout: basename of URL (if possible) or the same input argument
A=$(basename_url 'http://code.google.com/p/plowshare/wiki/NewModules'
# result: http://code.google.com
B=$(basename_url 'http://code.google.com/'
# result: http://code.google.com
C=$(basename_url 'abc'
# result: abc

Supported protocols: http, https, ftp, ftps, file.

Function: match_remote_url

Check if URL is suitable for remote upload.

Argument:

  • $1: string (URL)
  • $2..$n (optional): additional URI scheme names to match

Result:

  • $?: 0 on success or $ERR_FATAL (not a remote accepted URL)

Called with one single argument, http and https are accepted.

URL='http://www.foo/bar'
if match_remote_url "$URL"; then
  ...
fi

If you want to accept more schemes, add them to the argument list.

URL='ftp://www.foo/bar'
if match_remote_url "$URL" 'ftp'; then
  ...
fi

Functions: grep_http_header_location, grep_http_header_location_quiet

Argument:

  • stdin: data (HTTP headers)

Result:

  • $?: 0 on success or $ERR_FATAL (non matching or empty string)
  • stdout: parsed header (non null string)

If you think you reached the final url (let's call it $FINAL_URL) for download, and when you curl it (with -I/--head option), you got some HTTP answer like this:

HTTP/1.1 301 Moved Permanently
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /download/123/5687/final_filename.xyz
Content-type: text/html
Content-Length: 0
Connection: close
Date: Sun, 17 Jan 2010 14:34:47 GMT
Server: Apache

Use grep_http_header_location to deal with this redirection. Have a look at sendspace module:

HOST=$(basename_url "$FINAL_URL")
PATH=$(curl -I "$FINAL_URL" | grep_http_header_location) || return
echo "${HOST}${PATH}"

Another example with absolute uri (comes from euroshare.eu):

HTTP/1.1 302 Found
Date: Sat, 10 Mar 2012 11:14:31 GMT
Server: Apache/2.2.16 (Debian)
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Set-Cookie: sid=61bu6nt3kkh9nsk92mg7otg501; expires=Sun, 11-Mar-2012 11:14:31 GMT; path=/
Location: http://s1.euroshare.eu/download/3598184/aXa2YWy3ytUhu3uVUsAQEgUzUDUseje3/5344113/myfile.zip
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: x-requested-with
Access-Control-Allow-Headers: x-file-name
Access-Control-Allow-Headers: content-type
Vary: Accept-Encoding
Content-Type: text/html
FILE_URL=$(curl -I "$FINAL_URL" | grep_http_header_location) || return
echo "$FILE_URL"

Note: Like other *_quiet functions, grep_http_header_location_quiet is silent and do always return 0. Use this only on dedicated case. For example:

FILE_URL=$(echo "$HTML_PAGE" | grep_http_header_location_quiet) || return
if [ -z "$FILE_URL" ]; then
    ... # not premium

Function: grep_http_header_content_disposition

Argument:

  • stdin: data (HTTP headers)

Result:

  • $?: 0 on success or $ERR_FATAL (non matching or empty string)
  • stdout: parsed filename (non null string)

Sharing websites often return their files as an attachment. curl doesn't care about Content-Disposition:. So, it will not parse this HTTP header but keeps url as name reference (see -O option documentation).

$ curl http://p123.share-site.com/download/dl.php?id=123456456
# saved filename will be: "dl.php?id=123456456"

The reason for that, is that link can have multiple attachments. Note: This is a difference between curl and wget.

Note: This is not true anymore. Since curl 7.20.0, -J/--remote-header-name option has been added (you must combine it with -O/--remote-name). Plowshare does not use this for now.

Have a look at divshare module:

FILE_NAME=$(curl -I "$FILE_URL" | grep_http_header_content_disposition) || return

Before plowdown core script make the final HTTP GET request, module is doing a HTTP HEAD request in order to parse attachment header and get filename.

$ curl -I http://p123.share-site.com/download/dl.php?id=123456456
HTTP/1.0 200 OK
Date: Sun, 28 Feb 2010 11:41:50 GMT
Server: Apache
Last-Modified: Mon, 12 Oct 2009 10:04:20 GMT
ETag: 9852859-16341905311255341860
Cache-Control: max-age=30
Content-Disposition: attachment; filename="kop_standard.pdf"
Accept-Ranges: bytes
Content-Length: 412848
Vary: User-Agent
Keep-Alive: timeout=300, max=100
Connection: keep-alive
Content-Type: application/octet-stream

Notice that some sharing sites does not an allow HTTP HEAD requests. Restricting web server is maybe a security concern?

There is a possible workaround: HTTP 1.1 protocol allow to make to HTTP GET request and specify a byte range.

FILE_NAME=$(curl -i -r 0-99 "$FILE_URL" | grep_http_header_content_disposition) || return

This is not very classy, but this can work, except if sharing site only allow one (and only one) HTTP request to that final URL (uploaded.to for example). In that case you couldn't get attachment filename.

Other functions: grep_http_header_content_*

Retrieve a specific HTTP header.

Argument:

  • stdin: data (HTTP headers)

Result:

  • $?: 0 on success or $ERR_FATAL (non matching or empty string)
  • stdout: parsed content (non null string)
Name HTTP header
grep_http_header_content_length Content-Length
grep_http_header_content_location Content-Location
grep_http_header_content_type Content-Type
$ curl --head http://share-site.net/wm8tbV6gZCp
HTTP/1.1 200 OK
Content-Disposition: attachment; filename="foobar"
Content-length: 5156
Content-Type: application/octet-stream
Date: Mon, 30 Sep 2013 06:29:14 GMT
ETag: "bc7f4762443939bd7dccb42370f0d932"
Last-Modified: Mon, 30 Sep 2013 06:28:44 GMT
Server: Apache
Vary: User-Agent
Connection: keep-alive

Functions: parse_cookie, parse_cookie_quiet

Arguments:

  • $1: entry name
  • stdin: data (netscape/mozilla cookie file format)

Result:

  • $?: 0 on success or $ERR_FATAL (non matching or empty string)
  • stdout: parsed content (non null string)

This is often used to get account settings. Sometimes, for premium account, remote site adds an extra key in cookie file. So it can be convenient to differ free account from premium account.

LOGIN_ID=$(parse_cookie 'Login' < "$COOKIEFILE") || return
PASS_HASH=$(parse_cookie 'Password' < "$COOKIEFILE") || return
# At this point You are sure that $LOGIN_ID and $PASS_HASH are valid (non empty)

Note: Like other *_quiet functions, parse_cookie_quiet is silent and do always return 0. Use this only on dedicated case. For example:

USERNAME=$(parse_cookie_quiet 'login' < "$COOKIEFILE")
if [ -z "$USERNAME" ]; then

    ... # invalid account

    return $ERR_LOGIN_FAILED
fi

Parsing HTML markers

Arguments:

  • $1 (optional): filtering regexp.
  • $2: tag name. This is case sensitive.
  • stdin: data (HTML, XML)

Result:

  • $?: 0 on success or $ERR_FATAL (non matching or empty marker)
  • stdout: parsed content (non null string)
Name Usage example
parse_tag T=$(echo "$LINE"
parse_tag_quiet Same as parse_tag but don't print on parsing error
parse_all_tag n/a
parse_all_tag_quiet Same as parse_all_tag but don't print on parsing error

The _all functions are for multiline content, one tag is parsed per line.

Important: If you have several matching tags on the same line, the first one is taken.

Remember that this is line oriented, if beginning tag and ending are not on the same line, it won't work. It's not perfect, but for now, it covers all our need.

Examples:

LINE='... <a href="link1">Link number 1</a> <a href="javascript:;">Link number 2</a>'
LINK1=$(echo "$LINE" | parse_tag a) || return       # First link returned
LINE='... <b></b> ...'
CONTENT=$(echo "$LINE" | parse_tag b) || return     # Error: <b> content is empty
# Nested elements: take the deepest one!
WAIT_MSG='<span id="foo">Wait <span id="bar">30</span> seconds</span>'
WAIT_TIME=$(echo "$WAIT_MSG" | parse_tag span) || return  # 30

Note: parse_tag b is equivalent to parse_tag . b and parse_tag b b.

Parsing HTML attributes

Arguments:

  • $1 (optional): filtering regexp.
  • $2: attribute name
  • stdin: data (HTML, XML)

Result:

  • $?: 0 on success or $ERR_FATAL (non matching or empty attribute)
  • stdout: parsed content (non null string)
Name Usage example
parse_attr `LINK=$(echo "$IMG"
parse_attr_quiet Same as parse_attr but don't print on parsing error
parse_all_attr `LINKS=$(echo "$PAGE"
parse_all_attr_quiet Same as parse_all_attr but don't print on parsing error

The _all functions are for multiline content, one attribute is parsed per line.

Quoting is handled according to HTML5 standard:

<div class="foo">
<div class = "foo" >
<div class='foo'>
<div class=foo>
<div class = foo >

Note: In XHTML, all attribute values must be quoted using double quote marks.

Important: If you have several matching attribute on the same line, the last one is taken (parsing is greedy).

Examples:

IMG='<img href="http://foo.com/bar.jpg" alt="">'
CONTENT=$(echo "$IMG" | parse_attr img alt) || return   # Error: 'alt' content is empty
PAGE='<a href="http://...">click here to download</a>'
LINK=$(echo "$PAGE" | parse_attr 'download' 'href') || return
log_debug "[$LINK]"          # [http://...]
IMG='<img href="http://foo.com/bar.jpg" id = image_id>'
ID=$(echo "$IMG" | parse_attr 'id') || return

Note: parse_attr b is equivalent to parse_attr . b and parse_attr b b.

Some websites return page as a single big line of HTML (without any EOL). As parse_xxx functions are per-line oriented, proper parsing can be difficult. Two functions exist: break_html_lines and break_html_lines_alt (more aggressive) to split single line HTML.

Extract HTML forms: grep_form_by_*

There are 3 helper functions.

Arguments:

  • $1: input (X)HTML data
  • $2: 1-based index or string
  • stdin: data (HTML, XML)

Result:

  • $?: 0 on success or $ERR_FATAL (non matching or no such form)
  • stdout: parsed content (non null string)
Name Usage example
grep_form_by_order FORM_HTML=$(grep_form_by_name "$PAGE" 2)
grep_form_by_name FORM_HTML=$(grep_form_by_name "$PAGE" 'named_form')
grep_form_by_id FORM_HTML=$(grep_form_by_name "$PAGE" 'id_form')

You are strongly encouraged to append regular || return error handling.

Note: grep_form_by_order can take a negative index (as argument $2). Get last form of page with -1. Giving 0 or null string will default to 1.

Tip: On some websites HTML data contain commented HTML or JS code. It can sometimes be useful to strip HTML comments. There is a function doing this named strip_html_comments (input data on stdin, filtered data on stdout).

Parsing HTML forms: parse_form_*

As other parse functions, input argument is through stdin.

Name Usage example
parse_form_action `ACTION=$(echo "$FORM_HTML"
parse_form_input_by_id `VALUE=$(echo "$FORM_HTML"
parse_form_input_by_name `VALUE=$(echo "$FORM_HTML"
parse_form_input_by_type `VALUE=$(echo "$FORM_HTML"

Example:

FORM_URL=$(grep_form_by_order "$HTML_PAGE" 1 | parse_form_action) || return
# We are sure here, that $HTML_PAGE has a form with an action attribute
# We can safely use $FORM_URL now

Note: parse_form_input_by_id_quiet, parse_form_input_by_name_quiet and parse_form_input_by_type_quiet are available.

Like other *_quiet functions, there's no error message and do always return 0. You generally use them when you want to parse a html form field with possible empty value. For example:

FORM_SID=$(echo "$FORM_HTML" | parse_form_input_by_id_quiet 'sid')
# $FORM_SID can be empty for anonymous users and it can be defined
# (non empty: session id defined) for account user.

Captcha helper functions

core.sh script provides some functions.

Function: captcha_process

Captchas are solved using --captchamethod command line option (in plowdown, plowup and plowdel). If not defined, it is autodetected (look for an image viewer and prompt for answer).

Arguments:

  • $1: local image file (any format) or URL (which doesn't require cookies)
  • $2: captcha type or hint
  • $3 (optional): minimum length
  • $4 (optional): maximum length

Current captcha types:

  • recaptcha (better use recaptcha_process() to get reload feature)
  • solvemedia (better use solvemedia_captcha_process() to get reload feature)
  • digits
  • letters

Results:

  • stdout (2 lines) : captcha answer (ascii text) / transaction id
  • $?: 0 for success, or $ERR_CAPTCHA, $ERR_FATAL, $ERR_NETWORK

Typical usage ($CAPTCHA_IMG is a valid image file):

local WI WORD ID
WI=$(captcha_process "$CAPTCHA_IMG" ocr_digit) || return
{ read WORD; read ID; } <<< "$WI"
rm -f "$CAPTCHA_IMG"

Note: If something goes wrong ($? is not 0), argument image file is deleted.

Function: recaptcha_process

Argument:

  • $1: site key

Results:

  • stdout (3 lines) : captcha answer (ascii text) / recaptcha challenge / transaction id
  • $?: 0 for success, or $ERR_CAPTCHA, $ERR_FATAL, $ERR_NETWORK

Typical usage:

local PUBKEY WCI CHALLENGE WORD ID
PUBKEY='6Lftl70SAAABAItWJueKIVvyG5QfLgmAgtKgVbDT'
WCI=$(recaptcha_process $PUBKEY) || return
{ read WORD; read CHALLENGE; read ID; } <<< "$WCI"

Function: solvemedia_captcha_process

Argument:

  • $1: site key

Results:

  • stdout (2 lines) : verified challenge / transaction id
  • $?: 0 for success, or $ERR_CAPTCHA, $ERR_FATAL, $ERR_NETWORK

positive or negative acknowledge

Each time you call captcha_process or recaptcha_process, you get a transaction id as result. Once captcha result submitted, module function must acknowledge or not acknowledge captcha transaction reply (some solving captcha services can refund credits on wrong answer).

Validation captcha answer is made through two functions: captcha_ack or captcha_nack.

Argument:

  • $1: transaction id

Typical usage:

if match ... wrong captcha ...; then
    captcha_nack $ID
    log_error "Wrong captcha"
    return $ERR_CAPTCHA
fi

captcha_ack $ID
log_debug "correct captcha"

Note: A module must not loop in case of wrong captcha, just captcha_nack and return $ERR_CAPTCHA. The retry mechanism is made at upper level with plowdown -r policy.

JSON parsing

Stands for JavaScript Object Notation. Official format standard is RFC4627.

If you know nothing about JSON, try this:

curl http://twitter.com/users/bob.json | python -mjson.tool

Functions: parse_json, parse_json_quiet

Simple and limited JSON parsing. sed command in internally used here. This is really a poor line-oriented parser (instead beeing tree oriented).

Arguments:

  • $1: variable name (string)
  • $2 (optional): preprocess option. Accepted values are: join and split.
  • stdin: input JSON data

Results:

  • $?: 0 on success or $ERR_FATAL (non matching or empty result)
  • stdout: parsed content (non null string)

Important notes:

  • Single line parsing oriented (user should strip newlines first): no tree model
  • Array and Object types: basic poor support (depth 1 without complex types)
  • String type: no support for escaped unicode characters (\uXXXX) but two-character escaped sequences are handled (for exemple: \t)
  • No non standard C/C++ comments handling (like in JSONP)
  • If several entries exist on same line: last occurrence is taken (lile parse_attr), but: consider precedence (order of priority): number, boolean/empty, string.
  • If several entries exist on different lines: all are returned (it's a parse_all_json)

Simple usage:

FILE_URL=$(echo "$JSON" | parse_json 'downloadUrl') || return
JSON='{"name":"foo","attr":["size":123,"type":"f","url":"http:\/\/www.bar.org\/4c0476"]}'

# ARR='["size":123,"type":"f"]'
ARR=$(parse_json 'attr' <<< "$JSON")

# URL='http://www.bar.org/4c0476' 
# (as you can see, it does not care about hierarchy)
URL=$(parse_json 'url' <<< "$JSON")

Function match_json_true

Arguments:

  • $1: name (string)
  • $2: input data (json data)

Results:

  • $?: 0 for success; not null any error
  • stdout: nothing!

This will literally match for true boolean token, "true" string token or any number will be considered as false.

# Assuming that a curl request can result one of two $JSON content:
# JSON='{"err":"Entered digits are incorrect."}'
# JSON='{"ok":true,"dllink":"http:\/\/www.share-me.com\/..."}'

if ! match_json_true 'ok' "$JSON"; then
    ERR=$(echo "$JSON" | parse_json_quiet err)
    test "$ERR" && log_error "Remote error: $ERR"
    return $ERR_FATAL
fi
log_debug "ok answer..."

Function javascript

Arguments:

  • stdin: input JavaScript code

Results:

  • $?: 0 on success or $ERR_FATAL (js error)
  • stdout: result

Example:

JS='print("Hello World!");'
RESULT=$(javascript <<< "$JS") || return
log_debug "result: '$RESULT'"

Modules using javascript function need to add on top the module function (for example zippyshare_download) this line:

detect_javascript || return

Important note: Don't use classes that are not in javascript core engine. For example:

var strJson = '{"City":"Paris", "Country":"France"}';
var objJson = JSON.parse(strJson);
var dump = JSON.stringify(objJson, null, 2);
print(dump);

rhino interpreter will know JSON object, but not spidermonkey:

ReferenceError: JSON is not defined

Module command-line switches

When entering module function, dedicated module arguments will be processed according to module variables:

  • MODULE_XXX_DOWNLOAD_OPTIONS
  • MODULE_XXX_UPLOAD_OPTIONS
  • MODULE_XXX_DELETE_OPTIONS
  • MODULE_XXX_LIST_OPTIONS
  • MODULE_XXX_PROBE_OPTIONS

Example 1

Assuming module source contains:

MODULE_XXX_DELETE_OPTIONS="
AUTH,a,auth,a=USER:PASSWORD,User account"

Assuming user is invoking plowdel with an account:

$ plowdel -a 'user:password' 'http://www.sharing-site.com/?delete=12D45G5'

xxx_delete will be called with the environment variable defined:

AUTH='user:password'

Common authentication options

AUTH,a,auth,a=USER:PASSWORD,Premium account
AUTH_FREE,b,auth-free,a=USER:PASSWORD,Free account

Most of the time, when a module can deal with both free and premium, we will see a single option:

AUTH,a,auth,a=USER:PASSWORD,User account

For delete, it's quite usual that authentication is mandatory for deleting files, you'll see:

AUTH,a,auth,a=USER:PASSWORD,User account (mandatory)

Download usual options

LINK_PASSWORD,p,link-password,S=PASSWORD,Used in password-protected files
NOMD5,,nomd5,,Disable md5 authentication (use plain text)

Ask for password if not supplied:

log_debug "File is password protected"
if [ -z "$LINK_PASSWORD" ]; then
    LINK_PASSWORD=$(prompt_for_password) || return
fi

Upload usual options

LINK_PASSWORD,p,link-password,S=PASSWORD,Protect a link with a password
DESCRIPTION,d,description,S=DESCRIPTION,Set file description
TOEMAIL,,email-to,e=EMAIL,<To> field for notification email
FROMEMAIL,,email-from,e=EMAIL,<From> field for notification email
INCLUDE,,include,l=LIST,Provide list of host site (comma separated)
COUNT,,count,n=COUNT,Take COUNT hosters from the available list. Default is 5.
PRIVATE_FILE,,private,,Do not allow others to download the file
FOLDER,,folder,s=FOLDER,Folder to upload files into (account only)
ADMIN_CODE,,admin-code,s=ADMIN_CODE,Admin code (used for file deletion)

Argument types

Name Description
a Authentication string (user:password or user)
n Positive integer (>0)
N Positive integer or zero (>=0)
s Non empty string
S Any string
t Non empty string, multiple command-line switch allowed
e Email address string
l Comma-separated list, strip leading & trailing spaces
f Filename (with read access)

Reserved argument types (should not be used in modules):

Name Description
c Choice list (argument must match a string)
C Same as c type, but empty string is allowed
r Speed rate. Allowed suffixes: Ki, K, k, Mi, M, m.
R Disk size. Allowed suffixes: Mi, m, M, MB, Gi, G, GB.
F Executable (search in $PATH and $HOME/.config/plowshare/exec)
D Directory (with write access)

Example 2

Assuming module source contains:

MODULE_XXX_UPLOAD_OPTIONS="
INCLUDE,,include,l=LIST,Provide list of host site (comma separated)"

Assuming user is invoking plowup this way:

$ plowup xxx --include 'first, second,thir d' myfile.foo

xxx_upload will be called with the environment variable defined:

# This is an array
INCLUDE=( 'first' 'second' 'thir d')

Guidelines

  • Consider module option variables (AUTH, LINK_PASSWORD, ...) as read only, don't reassign them.
  • Because of command-line parsing, modules options with the same name must have the same argument type, this is important. For example: if module1 has an option --useapi with type s (non empty string), module2 can't have option --useapi with type n (positive integer).

Coding rules

Bash pitfall: quote variable when it contains several lines

WAIT_TIME=$(echo $WAIT_HTML | parse 'foo' '.. \(...\) ..')

Won't give you expected answer if $WAIT_HTML is multiline (which is most of the time the case). You should write instead:

WAIT_TIME=$(echo "$WAIT_HTML" | parse 'foo' '.. \(...\) ..')

Consider this example for understanding:

$ MYS=$(seq 3)
$ echo "$MYS"
1
2
3
$ echo $MYS
1 2 3
$ echo $MYS | xxd
0000000: 3120 3220 330a                           1 2 3.

More information about word splitting.

Bash pitfall: no local keyword with || return

Unfortunately, this is not correct:

local HTML_PAGE=$(curl "$URL") || return

If curl function returns an error, it won't be catched by || return because of the local keyword.

local HTML_PAGE
...
HTML_PAGE=$(curl "$URL") || return

is correct.

Bash pitfall: avoid single statement "&& ||" test

$ set -- test
$ [ -z "$1" ] && echo empty || echo nonempty
nonempty
$ set --
$ [ -z "$1" ] && echo empty || echo nonempty
empty
$ set -- test
$ [ -z "$1" ] || echo nonempty && echo empty
nonempty
empty
$ set --
$ [ -z "$1" ] || echo nonempty && echo empty
empty

Looks like "&& ||" is better than "|| &&". But imagine that echo empty does not return $?=0:

$ set --
$ [ -z "$1" ] && echo empty; false || echo nonempty
empty
nonempty

Finally, classic if/then/else/fi is not so bad!

if [ -z "$1"]; then
    echo empty
else
    echo nonempty
fi

See also shellcheck.net note.

Bash pitfall: conditional test at the end of a function

Don't put '&&' test as last statement of a function. For example:

myhoster_upload() {
  ...

  echo "$DL_URL"
  [ -n "$PUBLIC_FILE" ] && echo "$DEL_URL"
}

If $PUBLIC_FILE is not empty, myhoster_upload() will return $?=0. This is good. But if $PUBLIC_FILE is empty, echo is not performed (as wished) and myhoster_upload() will return $?=1. Plowup will assume this is a $ERR_FATAL module return. This is not what we want! We only want to display the download link and not the delete link (because it's not available).

So prefer this:

myhoster_upload() {
  ...

  echo "$DL_URL"
  [ -z "$PUBLIC_FILE" ] || echo "$DEL_URL"
}

A paranoid version:

myhoster_upload() {
  ...

  echo "$DL_URL"
  [ -z "$PUBLIC_FILE" ] || echo "$DEL_URL"
  return 0
}

Portability

Plowshare is running on lots of unix/linux systems. There is always several ways to write bash code. We try to keep compatibility with busybox shell.

Things to take care or avoid in your module functions:

  • no awk invocation
  • no xargs invocation
  • no grep -v (invert match) invocation
  • no wc invocation (wc -c can be easily replaced with bash internal string manipulation)
  • no infinite loops like while true; or while :;.
  • no tr -d, try using bash internal replacement. For example: ${MYSTRING//$'\n'} or replace_all for multiline content.

Bash specific construct to avoid:

  • no bash regexp: [[ =~ ]] (requires bash >=3.0). This is an historic choice not using it. Behavior has changed (see E14 in bash FAQ).
  • no += string (or array) concatenation operator (requires bash >=3.1)
  • no for loop expand sequence: for i in {1..10} ; do ... ; done (requires bash >=3.0). You can use seq instead.
  • no printf -v (requires bash >=3.1).

BSD specific pitfalls:

  • base64 -d is a GNU coreutils only short option. Use base64 --decode instead.
  • BSD sed has less feature than GNU sed (can't use \? or \r for example). Try to use parse_* functions instead.
  • stat -c is only available on GNU. Use ls instead.
  • readlink -f is only available on GNU.

Busybox specific pitfalls:

  • grep -o and grep -w (word-regexp) are not supported by old versions of busybox. Do not use them.
  • sleep with s/m suffixes or even fractional argument (example: sleep 1m). BusyBox may not be compiled with CONFIG_FEATURE_FANCY_SLEEP option.
  • tr with classes (such as [:upper:]). Busybox may not be compiled with CONFIG_FEATURE_TR_CLASSES option.
  • sed does not support \xNN escaped sequences. Tested on Busybox 1.13, 1.18 and 1.19.3.
  • sed does not support \r escaped sequence before version 1.19 commit. Don't use it, find another way!
  • sed do support \s, \S, \w, \W (these are GNU extensions). But prefer using the equivalent: [[:space:]], [^[:space:]], [[:alnum:]_], [^[:alnum:]_].

Try being compliant with bash 3.x. Interesting reading:

Miscellaneous remarks

  • Do not create temporal files unless necessary, don't forget to delete it if you used one.
  • curl calls should not be invoked with --silent option. curl wrapper function take care of verbose level.

Why is this so complicated and constrained?

It's because we want to be portable as much as possible. We loose flexibility, but it can be run on slow and old embedded hardware, this is the original starting point of the project. But maybe plowshare with bash 4.0 as minimum requirement will pop-up one day...

Coding style

General Rules

  • GPL-compatible license.
  • No tabs, use 4 spaces. Also use 4 spaces after splitted \ lines
  • Line lengths should stay within 80 columns.
  • Comments (like ruby) are written in english. No extra empty line before function declaration. No boxes or ascii art stuff.
  • Always declare (with local keyword) variables you are using.

Naming Rules

  • Uppercase variables, this is an historical choice, let's keep traditions. We suggest using underscore in it. For example: MARY_POPPINS (instead of MARYPOPPINS). This is optional but recommended (especially for names with more than 7 characters). For example APIURL, DESTFILE and FILEURL are accepted. COOKIEFILE is accepted too (but COOKIE_FILE is prefered).
  • Use appropriate names to ease maintainability. For example: FILE_URL (instead of MARY_POPPINS). Don't use too long variable name: for example UPLOADED_FILE_JSON_DATA is too descriptive, JSON_DATA or JSON is enough.
  • For form parsing, usual names are: FILE_ID, FILE_NAME, FILE_URL, BASE_URL, FORM_HTML, FORM_URL (action parameter), FORM_xxx (input field name in uppercase), ADMIN_URL, DELETE_ID, WAIT_TIME.
  • Usual names for curl results are HTML, PAGE, RESPONSE, JSON, STATUS.

Remark: The choice of uppercase is historical and you can disagree with this approach. Convention is lowercase for internal or temporary variables and uppercase for environment or global variables. This convention avoids accidentally overriding environmental variables.

Strongly recommended guidelines

  1. if/then construct and while/do are on the same line.

  2. Restrict usage of curly braces:

test "$FILE_URL" || { log_error "location not found"; return $ERR_FATAL; }

should be written:

if test "$FILE_URL"; then
    log_error "location not found"
    return $ERR_FATAL
fi

3a. In comment, insert a space character of # symbol

#get id of file                  (wrong)
# Get id of file                 (right)

3b. Avoid meaningless comments

# wait 15 seconds
wait 15 seconds || return
  1. Proper indentation on continued lines
HTML=$(curl -b "$COOKIE_FILE" 'http://www.foo.bar/long...url...') \
|| return

should be written:

HTML=$(curl -b "$COOKIE_FILE" \
    'http://www.foo.bar/long...url...') || return
  1. Simple quote strings as much as possible If there is no variable referencing of course!
local BASE_URL="http://shareme.com"        # wrong
local BASE_URL='http://shareme.com'        # right
  1. Don't quote unless required
return "$ERR_LINK_TEMP_UNAVAILABLE"         # wrong
return $ERR_LINK_TEMP_UNAVAILABLE           # right

Testing

Test and retest your module. Little check-list of possible cases:

  • File not found
  • File temporarily unavailable
  • File unavailable (server busy), come back in X minutes
  • Download (quota) limit reached
  • Your IP address is already downloading a file
  • Password protected link
  • Premium link download only
  • etc.

Other concerns:

  • Check for geographical location aware sites, it can affect url TLD
  • Don't send incomplete script or nearly-working stuff.
  • Don't use illegal or patented content, if you want to make some test, use material here.

External documentation

Clone this wiki locally