-
Notifications
You must be signed in to change notification settings - Fork 89
Modules
Plowshare is designed with modularity in mind, so it should be easy for other programmers to add new modules. Study the code of any of the existing modules (i.e. 2shared
) and create your own.
Some hosters are exporting a public API (formalized way for downloading or uploading), if it is available, it can save you lots of time calling this API, instead of simulating a web browser. For example: HotFile.
Table of content:
- Script template
- Downloading function
- Uploading function
- Deleting function
- Listing function
- Probing function
- Output debug messages (stderr)
- curl API
- Auxiliar APIs
- Module command-line switches
- Coding rules
- Coding style
- Testing
- External documentation
Each module implements services for one sharing site:
- anonymous download
- free/premium account download
- anonymous upload (if allowed from host)
- free/premium account upload
- free/premium account remote upload (if available from host)
- delete or kill url (anonymous or not)
- shared folder (and sub-folders) list (if available from host)
The module must declare the following global variables:
MODULE_XXX_REGEXP_URL
Depending module features, some additional variables should also be declared:
MODULE_XXX_DOWNLOAD_OPTIONS
MODULE_XXX_DOWNLOAD_RESUME
MODULE_XXX_DOWNLOAD_FINAL_LINK_NEEDS_COOKIE
MODULE_XXX_DOWNLOAD_SUCCESSIVE_INTERVAL
# Rare use, give additional curl options
MODULE_XXX_DOWNLOAD_FINAL_LINK_NEEDS_EXTRA=()
MODULE_XXX_UPLOAD_OPTIONS
MODULE_XXX_UPLOAD_REMOTE_SUPPORT
MODULE_XXX_DELETE_OPTIONS
MODULE_XXX_LIST_OPTIONS
MODULE_XXX_LIST_HAS_SUBFOLDERS
MODULE_XXX_PROBE_OPTIONS
Where XXX is the name of module (uppercase). No other global variable declaration is allowed.
Module must export one to five entries point:
xxx_download()
xxx_upload()
xxx_delete()
xxx_list()
xxx_probe()
Prototype is:
xxx_download() {
local -r COOKIE_FILE=$1
local -r URL=$2
...
}
Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh
. - xxx must not contain points, use underscores instead.
- Never call
curl_with_log
function here, usecurl
.
Arguments:
-
$1
: cookie file (empty content at start, use it with curl) -
$2
: URL string (for examplehttp://x7.to/fwupja
)
Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowdown
will take care of this.
When a link is correct, function should return 0
and echo one or two arguments, corresponding to file URL and filename:
echo "$FILE_URL"
echo "$FILENAME"
$FILENAME
can be empty, or even not echoed at all. If so, plowdown
will guess filename from provided $FILE_URL
.
If cookie file is required for final download MODULE_XXX_DOWNLOAD_FINAL_LINK_NEEDS_COOKIE
must be set to yes
.
File URL must return the final link (that's it, a link that return a 200 HTTP code, without redirection). Use curl -I
and grep_http_header_location
when necessary.
Note: $FILE_URL
will be encoded right after. So don't bother about weird characters. For example: spaces chars will be translated to %20
for you.
Module can return the following codes:
-
0
: Everything is ok (arguments have to be echoed, see below). -
$ERR_FATAL
: Unexpected result (upstream site updated, etc). -
$ERR_LOGIN_FAILED
: Correct login/password argument is required. -
$ERR_LINK_TEMP_UNAVAILABLE
: Link alive but temporarily unavailable. -
$ERR_LINK_PASSWORD_REQUIRED
: Link alive but requires a password (password protected link). -
$ERR_LINK_NEED_PERMISSIONS
: Link alive but requires some authentication (private or premium link). -
$ERR_LINK_DEAD
: Link is dead (we must be sure of that). Each download function should return this value at least one time. -
$ERR_SIZE_LIMIT_EXCEEDED
: Can't download link because file is too big (need permissions, probably need to be premium). -
$ERR_EXPIRED_SESSION
: When cache is used. Seestorage_get
,storage_set
andstorage_reset
.
Additional error codes (returned by plowdown only, module download function should not return these):
-
$ERR_NOMODULE
: No module available for provided link. Hoster is not supported yet! -
$ERR_NETWORK
: Specific network error (socket reset, curl, etc). -
$ERR_SYSTEM
: System failure (missing executable, local filesystem, wrong behavior, etc). -
$ERR_CAPTCHA
: Captcha solving failure. -
$ERR_MAX_WAIT_REACHED
: Countdown timeout (see-t
/--timeout
command line option). -
$ERR_MAX_TRIES_REACHED
: Max tries reached (see-r
/--max-retries
command line option). -
$ERR_BAD_COMMAND_LINE
: Unknown command line parameter or incompatible options.
- If hoster asks to try again later (and you don't know how much time to wait): download function must return
$ERR_LINK_TEMP_UNAVAILABLE
. - If hoster asks to try again later (and you do know how much time to wait): download function must echo wait time (in seconds) and return
$ERR_LINK_TEMP_UNAVAILABLE
. - Respect time waits even if the download seems to work without them. Don't hammer website!
- Try to force english language in the website (usually using a cookie), if your are going to parse human messages (it's better to parse HTML nodes, though).
- If you provide premium download, bad login must lead to an error (
$ERR_LOGIN_FAILED
). No fallout to anonymous download must be made (even if remote web site accepts it). -
MODULE_XXX_DOWNLOAD_SUCCESSIVE_INTERVAL
global variable contain delay value (in seconds) used when two successive downloads (links of the same hoster) are performed. Some hosters may behave nasty (force user to wait, declare link as dead, or sometimes worst) when successively downloading a bunch of links.
Prototype is:
xxx_upload() {
local -r COOKIE_FILE=$1
local -r FILE=$2
local -r DESTFILE=$3
...
PAGE=$(curl_with_log ...) || return
...
}
Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh
. - xxx must not contain points, use underscores instead.
- Use
curl_with_log
function only one time for the file upload (it's quite conveniant to see progress), otherwise use simplycurl
.
Arguments:
-
$1
: cookie file (empty content at start, use it with curl) -
$2
: local filename (with full path) to upload or (remote) URL -
$3
: remote filename (no path)
Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowup
will take care of this.
When requested file has been successfully uploaded, function should return 0
and echo one or three lines.
echo "$DL_URL"
echo "$DEL_URL"
echo "$ADMIN_URL_OR_CODE"
$DEL_URL
and $ADMIN_URL_OR_CODE
are optional (can be empty or not echoed at all).
Example1 (seen in depositfiles module):
echo "$DL_LINK"
echo "$DEL_LINK"
Example2 (seen in 2shared module):
echo "$FILE_URL"
echo
echo "$FILE_ADMIN"
Module can return the following codes:
-
0
: Success. File successfully uploaded. -
$ERR_FATAL
: Unexpected result (upstream site updated, etc). -
$ERR_LINK_NEED_PERMISSIONS
: Authentication required (for example: anonymous users can't do remote upload). -
$ERR_LINK_TEMP_UNAVAILABLE
: Upload service seems temporarily unavailable from upstream. Note: This status does not affect retry number (see-r/--max-retries
command line option) but timeout if specified (see-t/--timeout
command line option). -
$ERR_SIZE_LIMIT_EXCEEDED
: Can't upload too big file (need permissions, probably need to be premium). -
$ERR_LOGIN_FAILED
: Correct login/password argument is required. -
$ERR_ASYNC_REQUEST
: Asynchronous remote upload started. -
$ERR_EXPIRED_SESSION
: When cache is used. Seestorage_get
,storage_set
andstorage_reset
.
Additional error codes (returned by plowup only, module upload function should not return these):
-
$ERR_NOMODULE
: Specified module does not exist or is not supported. -
$ERR_NETWORK
: Specific network error (socket reset, curl, etc). -
$ERR_SYSTEM
: System failure (missing executable, local filesystem, wrong behavior, etc). -
$ERR_MAX_WAIT_REACHED
: Countdown timeout (see-t/--timeout
command line option). -
$ERR_MAX_TRIES_REACHED
: Max tries reached (see-r/--max-retries
command line option). -
$ERR_BAD_COMMAND_LINE
: Unknown command line parameter or incompatible options.
- Remember that
$2
can also be a remote file. It should be checked withmatch_remote_url
. Most of the time, remote upload feature is only available for premium users. If module do not support this put on top of file:MODULE_xxx_UPLOAD_REMOTE_SUPPORT=no
. - Upload file size if usually limited (can be quite low for anonymous upload). Dealing with it could be nice for user! For example:
MAX_SIZE=... # hardcoded value or parse it in html page (if possible)
SIZE=$(get_filesize "$FILE")
if [ $SIZE -gt $MAX_SIZE ]; then
log_debug "file is bigger than $MAX_SIZE"
return $ERR_SIZE_LIMIT_EXCEEDED
fi
Prototype is:
xxx_delete() {
local -r COOKIE_FILE=$1
local -r URL=$2
...
}
Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh
- xxx must not contain points, use underscores instead
- Never call
curl_with_log
function here, usecurl
.
Argument:
-
$1
: cookie file (empty content at start, use it with curl) -
$2
: kill/admin URL string
Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowdel
will take care of this.
There is not output for this function. When file has been successfully deleted, function should return 0.
Module can return the following codes:
-
0
: Success. File successfully deleted. -
$ERR_FATAL
: Unexpected result (upstream site updated, etc). -
$ERR_LOGIN_FAILED
: Authentication failed (bad login/password). -
$ERR_LINK_NEED_PERMISSIONS
: Authentication required (anonymous users can't delete files). -
$ERR_LINK_PASSWORD_REQUIRED
: Link requires an admin or removal code. -
$ERR_LINK_DEAD
: Link is dead. File has been previously deleted.
Additional error codes (returned by plowdel only, module delete function should not return these):
-
$ERR_NOMODULE
: No module available for provided link. -
$ERR_NETWORK
: Specific network error (socket reset, curl, etc). -
$ERR_BAD_COMMAND_LINE
: Unknown command line parameter or incompatible options.
- On success operation (
return 0
), don't print a message;plowdel
willlog_notice
for you.
Prototype is:
xxx_list() {
local -r URL=$1
local -r RECURSE=${2:-0}
...
}
Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh
- xxx must not contain points, use underscores instead
- Never call
curl_with_log
function here, usecurl
.
Arguments:
-
$1
: list URL (aka root folder URL) -
$2
: list link and recurse subfolders (if any). If $2 is empty string, the option has not been selected.
As result, function should return 0 and echo a list of two lines.
echo "$FILE_URL"
echo "$FILENAME"
$FILENAME
can be empty, but echo must be done. But you usually have more that one link in the folder, so it can be complex to echo pair of line in a while loop. To simplify process, you should use list_submit()
API.
Example (seen in depositfiles module):
PAGE=$(curl "$URL") || return
LINKS=$(echo "$PAGE" | parse_all_attr_quiet 'class="dl" align="center' href)
NAMES=$(echo "$PAGE" | parse_all_attr_quiet 'class="dl" align="center' title)
list_submit "$LINKS" "$NAMES" || return
list_submit()
can also accept an optional third argument: a link prefix (string) to prepend to file link. This is useful when the parsed links are relative.
Example (seen in mediafire module):
...
NAMES=$(echo "$DATA" | parse_all_tag filename)
LINKS=$(echo "$DATA" | parse_all_tag quickkey)
list_submit "$LINKS" "$NAMES" 'http://www.mediafire.com/?' || return
list_submit()
can even accept an optional fourth argument: a link suffix (string) to append to file link. This is useful when the parsed links are relative.
Example (seen in turbobit module):
...
NAMES=$(parse_all ...
LINKS=$(parse_json 'id' 'split' <<< "$JSON")
list_submit "$LINKS" "$NAMES" 'http://turbobit.net/' '.html' || return
Module can return the following codes:
-
0
: Success. Folder contain one or several files. -
$ERR_FATAL
: Unexpected content (not a folder, parsing error, etc). -
$ERR_LINK_TEMP_UNAVAILABLE
: Links are temporarily unavailable (can't be listed actually). This is used by mirroring/multi-upload services (uploads are still beeing processed). -
$ERR_LINK_PASSWORD_REQUIRED
: Folder is password protected. -
$ERR_LINK_DEAD
: Folder has been deleted or does not exist or is empty.
Additional error codes (returned by plowlist only, module list function should not return these):
-
$ERR_NOMODULE
: No module available for provided link. -
$ERR_NETWORK
: Specific network error (socket reset, curl, etc). -
$ERR_BAD_COMMAND_LINE
: Unknown command line parameter or incompatible options.
- If hoster support subfolders: declare on top on module source:
MODULE_xxx_LIST_HAS_SUBFOLDERS=yes
. - If hoster doesn't have subfolders capability (this includes mirroring/multi-upload services): declare on top on module source:
MODULE_xxx_LIST_HAS_SUBFOLDERS=no
. - You should notify with a
log_error
message if module (so, it's on plowshare's side) doesn't support recursive subfolders option. For example in zalaa module:
test "$2" && log_error 'Recursive flag not implemented, ignoring'
- When recursing sub folders, don't echo folder URL (but you can
log_debug
it) - When recurse subfolders option is enabled:
$ERR_LINK_DEAD
means that there is no file in all folders. - When recurse subfolders option is disabled:
$ERR_LINK_DEAD
means that there is no file in the root folder. There might be files in sub folders.
Prototype is:
xxx_probe() {
local -r COOKIE_FILE=$1
local -r URL=$2
local -r REQ_IN=$3
local REQ_OUT
...
}
Notes:
-
xxx is the name of the plugin:
src/modules/xxx.sh
- xxx must not contain points, use underscores instead
- Never call
curl_with_log
function here, usecurl
.
Arguments:
-
$1
: cookie file (empty content at start, use it with curl) -
$2
: download URL to check -
$3
: capability list. One character is one feature.
Warning: If function does not need a cookie file, do not delete cookie file provided as argument, plowprobe
will take care of this.
-
c: link is alive (usually:
0
for ok or$ERR_LINK_DEAD
ko, see below for details) - f: file name
- i: fileid (usually included in url)
-
s: file size (in bytes, no prefix/suffix). Use
translate_size
helper function for converting if necessary. -
h: file hash (
md5
,sha1
,.. hexstring format). If several algorithms are available, always use the longest digest (for example:sha1
is preferred tomd5
). - t: file timestamp (unspecified time format)
- v: refactored file url (can be different from input url, for example short hostname or https redirections)
Of course depending hosters, this is not always possible to get access to these information.
When a link is correct, function should return 0
and echo check link char:
echo 'c'
return 0
If you can parse filename, you can return this way:
echo "$FILE_NAME"
echo 'cf'
return 0
Even better, if you can parse filename and filesize, you can return this way:
echo "$FILE_NAME"
echo "$FILE_SIZE"
echo 'cfs'
return 0
OR
echo "$FILE_SIZE"
echo "$FILE_NAME"
echo "csf"
return 0
Order is given by last argument (a variable usually called REQ_OUT
).
Module can return the following codes:
-
0
: Success. Link is alive (arguments have to be echoed, see below). -
$ERR_FATAL
: Unexpected content (upstream updated, parsing error, etc). -
$ERR_LINK_DEAD
: Link is dead, no more information can be returned.
Additional error codes (returned by plowprobe only, module list function should not return these):
-
$ERR_NOMODULE
: No module available for provided link. -
$ERR_NETWORK
: Specific network error (socket reset, curl, etc). -
$ERR_BAD_COMMAND_LINE
: Unknown command line parameter or incompatible options.
Some hosters are able to return more that one hash (for example: md5 and sha1). In that case %h
must return the strongest algorithm.
A module option can be added to change %h
behaviour (like --md5
).
- Probe function should be fast and efficient. One single
curl
request is advised. - Using
javascript
is strongly discouraged.
Do not use echo
which is reserved for function return value(s). Use log_debug()
or log_error()
.
You can use -v
N command line option switch to change debug verbosity.
Note: An intermediate verbosity level exists: log_notice()
, it is reserved to core functions, do not use it inside modules.
This is probably the most important command in plowshare API set. This wrapper function is calling curl real binary (let's call it true-curl)
Arguments:
-
$1
...$n
: true-curl command-line arguments -
$?
:0
for success or$ERR_NETWORK
,$ERR_SYSTEM
Note: curl_with_log
is calling curl
but force verbose level to 3.
This is a specific usage for module upload function (should be called one time only).
It's a good habit to always append || return
for error handling.
Examples:
PAGE1=$(curl "http://www.google.com") || return
# Get remote content and take cookies (if any)
PAGE2=$(curl -c "$COOKIE_FILE" "$URL") || return
# Get remote content, provides and append cookie entries
PAGE3=$(curl -c "$COOKIE_FILE" -b 'lang=en' "$URL") || return
PAGE4=$(curl -c "$COOKIE_FILE" -b "$COOKIE_FILE" "$URL") || return
PAGE5=$(curl "${URL}?param=1") || return
# or
PAGE5=$(curl --get --data 'param=1' "$URL") || return
Notes:
- curl will add a valid User-Agent for you.
- curl exit codes are mapped to plowshare error codes. Human debug message have been added too.
- curl are mapping implicitly plowdown (or plowup) command-line switches (
--interface
,--max-rate
, ...)
true-curl can handle one --cookie-jar
/-c
option and one --cookie
/-b
option:
PAGE=$(curl -c "$COOKIE_FILE_1" -b "$COOKIE_FILE_2" http://...) || return
$COOKIE_FILE_2
: entries will be read from file and set in the HTTP request header:
Cookie: key=value...
$COOKIE_FILE_1
: entries will be returned from HTTP server and written to file:
Set-Cookie: key=value...
$COOKIE_FILE_1
and $COOKIE_FILE_2
can be the same filename.
true-curl does not handle multiple --cookie
/-b
switches, but you can only have one string (key=value
) and one file argument. These are source entries (read only) given to HTTP protocol (Cookie:
header).
Example 1 (last -b
switch will be used only):
curl -b "$COOKIE_FILE_1" -b "$COOKIE_FILE_2" http://...
// $COOKIE_FILE_1 will be ignored
Example 2 (last -b
switch will be used only):
curl -b 'lang=english' -b 'user=foo' http://...
// 'lang' cookie entry will be ignored
Example 3:
curl -b 'lang=english' -b "$COOKIE_FILE" http://...
// correct example
curl -b "$COOKIE_FILE" -b 'lang=english' -b 'user=foo' http://...
// 'lang' cookie entry will be ignored
First example using -H
/--dump-headers
:
HEADERS=$(create_tempfile) || return
HTML=$(curl -H "$HEADERS" http://...) || return
rm -f "$HEADERS"
If something goes wrong in curl
(network issue or anything else), $HEADERS
will be deleted for you.
Remember, it's only if an error occurs. On curl
's success nothing is deleted (as expected).
Another classic example if using -o
/--output
:
CAPTCHA_URL='http://...'
CAPTCHA_IMG=$(create_tempfile '.png') || return
curl -o "$CAPTCHA_IMG" "$CAPTCHA_URL" || return
...
rm -f "$CAPTCHA_IMG"
If something append when retrieving captcha image, curl will delete temporary file for you.
Here is a first case with a POST request and content type application/x-www-form-urlencoded
.
DATA="action=validate&uid=123456&recaptcha_challenge_field=$CHALLENGE&recaptcha_response_field=$WORD"
RESULT=$(curl -b "$COOKIE_FILE" --data "$DATA" "$URL") || return
Consider passing several -d
/--data
argument instead of one (order is not important).
RESULT=$(curl -b "$COOKIE_FILE" -d 'action=validate' \
-d "uid=123456" \
-d "recaptcha_challenge_field=$CHALLENGE" \
-d "recaptcha_response_field=$WORD" \
"$URL") || return
It is better for maintenance.
Second example with a GET request:
URL='http://ab19.hostmyfile.net/upload'
RESULT=$(curl "$URL?X-Progress-ID=12345&premium=1") || return
Can be written in a better way:
URL='http://ab19.hostmyfile.net/upload'
RESULT=$(curl --get -d 'X-Progress-ID=12345' -d 'premium=1' "$URL") || return
You can see a full list of plowshare public API here.
core.sh
script provides usual auxiliar functions.
Do not use | But use |
---|---|
basename |
basename_file |
grep -o "^http://[^/]*" |
basename_url |
sleep |
wait (must always be ORed with return keyword) |
grep , grep -i , grep -q
|
match and matchi |
sed , awk , perl
|
parse_* or replace_all , replace
|
head -n1 , tail -n1
|
first_line, last_line
|
mktemp , tempfile
|
create_tempfile |
tr '[A-Z]' '[a-z]' |
lowercase |
tr '[a-z]' '[A-Z]' |
uppercase |
sed ... |
strip (delete leading and trailing spaces, tabs), delete_last_line
|
js |
detect_javascript and javascript
|
stat -c %s |
get_filesize |
$RANDOM or $$
|
random |
md5sum |
md5 or md5_file
|
wget |
curl |
Goal here, is not calling non portable commands in modules.
Arguments:
-
$1
: (optional): how many head lines to take (default is 1). This must be a strictly positive integer. -
stdin
: input data (multiline text)
Results:
-
$?
:0
on success or$ERR_FATAL
(bad argument) -
stdout
: result
Examples:
$ echo "$BUFFER1"
line a
line b
line c
line d
$ echo "$BUFFER1" | first_line
line a
$ echo "$BUFFER1" | first_line 3
line a
line b
line c
Arguments:
-
$1
: (optional): how many head lines to delete (default is 1). This must be a strictly positive integer. -
stdin
: input data (multiline text)
Results:
-
$?
:0
on success or$ERR_FATAL
(bad argument) -
stdout
: result
Examples:
$ echo "$BUFFER1"
line a
line b
line c
line d
$ echo "$BUFFER1" | delete_first_line
line b
line c
line d
$ echo "$BUFFER1" | delete_first_line 2
line c
line d
It is a useful function for registered accounts because ID information is stored inside cookie. This function will send the HTML form for you, It takes 4 or 5 arguments.
Arguments:
-
$1
: authentication string 'username:password' (password can contain semicolons) -
$2
: cookie file (system existing file) -
$3
: string to post (can contain keywords:$USER
and$PASSWORD
) -
$4
: URL -
$5..$n
(optional): Additional curl arguments -
stdin
: input data (text)
Example:
# comes from command line
AUTH="mylogin:mypassword"
# important: notice simple quote, $USER and $PASSWORD must not be interpreted.
LOGIN_DATA='login=1&redir=1&username=$USER&password=$PASSWORD'
LOGIN_URL="https://xxx.com/login.php"
# or simply use $(create_tempfile)
COOKIES=/tmp/my_cookie_file
post_login "$AUTH" "COOKIES" "$LOGIN_DATA" "$LOGIN_URL" >/dev/null
Results:
-
$?
:0
for success;$ERR_NETWORK
,$ERR_LOGIN_FAILED
for error (no cookie return) -
stdout
: HTML result of POST request
A common usage is (snippet taken from filesonic
module):
LOGIN_RESULT=$(post_login "$AUTH" "$COOKIE_FILE" "$LOGIN_DATA" \
'http:///www.fileserve.com/login.php') || return
If no password is provided, post_login
will prompt for one.
Warning: Having $?=0
does not mean that your account is valid, it just means that the request (in a HTTP protocol point of view) have been successful. For detecting bad login/password, you'll have to parse returned HTML content or sometimes cookie file.
Note: Sometimes, parsing LOGIN_RESULT
can be useful to distinguish free account from premium account. Sometimes parsing cookie (looking for specific entry in it) can help too.
An empty $LOGIN_RESULT
is not necessarily an error. You can get for example a HTTP redirection. You could eventually follow this redirection by giving '-L'
option to curl:
LOGIN_RESULT=$(post_login "$AUTH" "$COOKIE_FILE" "$LOGIN_DATA" \
"$BASEURL/login.php" -L) || return
You already have valid entries in $COOKIEFILE
(language for example) and you want keeping them.
LOGIN_RESULT=$(post_login "$AUTH_FREE" "$COOKIEFILE" "$LOGIN_DATA" \
"$BASE_URL/dynamic/login.php?popup=1" -b "$COOKIEFILE") || return
Without this additional -b "$COOKIEFILE"
given to curl
, cookie file would be overwritten.
Arguments:
-
$1
: match regexp (like grep) -
$2
: input data (text)
Results:
-
$?
:0
for success; not null any error -
stdout
: nothing!
'I' letter stand for case-insensitive match.
Regexp are basic posix (BRE syntax). Reserved characters (to escape) are: . ** [ ] $ ^ \
.
Coding convention is to use the shortest write:
match 'foo' "$HTML_PAGE" && ... // right
$(match 'foo' "$HTML_PAGE") && ... // wrong (useless subshell creation)
match '\(foo\)' "$HTML_PAGE" && ... // wrong (useless parenthesis)
if (! match 'You are ' "$HTML"); then // wrong (useless subshell creation)
...
fi
Typical use:
if ! match '/js/myfiles\.php/' "$PAGE"; then
log_error "not a folder"
return $ERR_FATAL
fi
if match '<h1>Delete File?</h1>' "$PAGE"; then
...
fi
if match '/error\.php?code=25[14]' "$LOCATION"; then
return $ERR_LINK_DEAD
fi
Simple examples:
match '[0-9][0-9]\+' 'Wait 19 seconds' // true
match '[0-9][0-9]\+' 'Wait 9 seconds' // false
match 'times\?' 'One time ago' // true
match 's/n' 'yes/no' // true
match '(euros)' '3.5 (euros)' // true
match '\[euros\]' '3.5 [euros]' // true
More examples (seen in modules):
match '^http://download' "$LOCATION" // ^ matches beginning of line
match 'errno=999$' "$LOCATION" // $ matches end of line
match '.*/#!index|' "$URL" // . means any character
match 'File \(deleted\|not found\|ID invalid\)' "$ERROR"
// Character classes can be used too (see POSIX bracket expressions)
match 'Password:[[:space:]]*<input' "$HTML"
The first function will return first match, second one will return all matches (multiline result). sed
command is internally used here.
Arguments:
-
$1
: filter regexp (lines to stop;.
or empty to stop on every line) -
$2
: parse regexp (enclose with ( ) to retrieve match) -
$3
(optional): number of line to skip (default is 0) -
stdin
: input data (text)
Results:
-
$?
:0
on success or$ERR_FATAL
(non matching or empty result) -
stdout:
parsed content (non null string)
Regexp are basic posix (BRE syntax). Reserved characters (to escape) are: . * [ ] $ ^ \
.
Note: Remember that Bash can interpret some symbols in double quoted strings. The following characters must be escaped: $
(dollar sign),
"
(double quote), backticks character. Also !
(exclamation sign) must be escaped if Bash history expansion is enabled. Use simple quote string it's easier!
Examples:
ID=$(echo "$HTML_PAGE" | parse 'name="freeaccountid"' 'value="\([[:digit:]]*\)"')
HOSTERS=$(echo "$FORM" | parse_all 'checked' '">\([^<]*\)<br')
MSG=$(echo "$RESPONSE" | parse_quiet "ERROR:" "ERROR:[[:space:]]*\(.*\)")
Example using $
(end-of-line) meta-character:
# Parse: [key]='7be8933035d221026ff2245be258c763';
# Notes:
# - Don't forget to escape `[` in the match regexp.
# - [:cntrl:] is used here to match `\r` because answer comes from an Windows server.
# - `$` matches end of line.
HASH=$(echo "$PAGE" | 'Array\.downloads\[' "\]='\([[:xdigit:]]\+\)';[[:cntrl:]]$")
Always keep in mind that parsing is greedy. So within a line, last occurrence will be taken. For example:
# Usual greedy behavior. Result: 789
echo 'value=123, value=456, value=789' | parse . '=\([^,]\+\)'
# Modify regex to get second value. Result: 456
echo 'value=123, value=456, value=789' | parse . '=\([^,]\+\),'
# Modify regex to get first value. Result: 123
echo 'value=123, value=456, value=789' | parse . '^value=\([^,]\+\)'
FIXME: Add example with ^
Use xxx_quiet
functions when parsing failure is a normal behavior, for example, parsing an optional value.
Typical use:
OPT_RESULT=$(echo "$HTML_PAGE" | parse_quiet 'id="completed"' '">\([^<]*\)<\/font>')
If you actually require a result, do not use xxx_quiet
. This way you'll get a sed error message if parse fails, i.e. when your parse regexp did not capture anything.
Typical use:
WAIT_TIME=$(echo "$HTML_PAGE" | parse '^[[:space:]]*count=' "count=\([[:digit:]]\+\);") || return
Note: Don't use these functions for HTML parsing. Consider using parse_tag
and parse_attr
functions family
(see below Parsing HTML markers and Parsing HTML attributes).
Use the offset whenever filter regexp and parse regexp are not on the same line. A positive value will skip ahead the specified number of line whiles a negative value will apply your parse regexp to a line before the one that matched your filter regexp. See the following examples:
<div class="dl_filename">
FooBar.tar.bz2</div>
We can get the right line with filtering with dl_filename
and apply your filename regexp on the second line (the line after). This will give:
echo "$PAGE" | parse 'dl_filename' '\([^<]*\)' 1
Example 2:
function js_fff() {
R4z5sjkNo = "http://...";
DelayTime = 60;
...
Get URL with:
DL_LINK=$(echo "$PAGE" | parse 'js_fff' '"\([^"]\+\)";' 1) || return
Get counter value with:
COUNT=$(echo "$PAGE" | parse 'js_fff' '=[[:space:]]*\([[:digit:]]\+\)' 2) || return
Example 3 (negative offset):
<TD><input type="checkbox" name="file_id" value="123456"></TD>
<TD align=left><a href="http://...">FooBar.tar.bz2</a></TD>
To get the file ID that belongs to a known URL you can use:
FILE_ID=$(echo "$PAGE" | parse "$URL" '^\(.*\)$' -1 | parse_form_input_by_name 'file_id') || return
First retrieve the whole line that is directly before the one containing the known URL. Then parse the file ID with one of plowshare's form parsing functions (see below Parsing HTML forms).
Get basename (hostname) of an URL.
Argument:
-
$1
: string (URL)
Result:
-
$?
: always 0 -
stdout:
basename of URL (if possible) or the same input argument
A=$(basename_url 'http://code.google.com/p/plowshare/wiki/NewModules'
# result: http://code.google.com
B=$(basename_url 'http://code.google.com/'
# result: http://code.google.com
C=$(basename_url 'abc'
# result: abc
Supported protocols: http
, https
, ftp
, ftps
, file
.
Check if URL is suitable for remote upload.
Argument:
-
$1
: string (URL) -
$2..$n
(optional): additional URI scheme names to match
Result:
-
$?
: 0 on success or$ERR_FATAL
(not a remote accepted URL)
Called with one single argument, http
and https
are accepted.
URL='http://www.foo/bar'
if match_remote_url "$URL"; then
...
fi
If you want to accept more schemes, add them to the argument list.
URL='ftp://www.foo/bar'
if match_remote_url "$URL" 'ftp'; then
...
fi
Argument:
-
stdin
: data (HTTP headers)
Result:
-
$?
: 0 on success or$ERR_FATAL
(non matching or empty string) -
stdout
: parsed header (non null string)
If you think you reached the final url (let's call it $FINAL_URL
) for download, and when you curl it (with -I/--head
option), you got some HTTP
answer like this:
HTTP/1.1 301 Moved Permanently
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: /download/123/5687/final_filename.xyz
Content-type: text/html
Content-Length: 0
Connection: close
Date: Sun, 17 Jan 2010 14:34:47 GMT
Server: Apache
Use grep_http_header_location
to deal with this redirection. Have a look at sendspace
module:
HOST=$(basename_url "$FINAL_URL")
PATH=$(curl -I "$FINAL_URL" | grep_http_header_location) || return
echo "${HOST}${PATH}"
Another example with absolute uri (comes from euroshare.eu):
HTTP/1.1 302 Found
Date: Sat, 10 Mar 2012 11:14:31 GMT
Server: Apache/2.2.16 (Debian)
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Set-Cookie: sid=61bu6nt3kkh9nsk92mg7otg501; expires=Sun, 11-Mar-2012 11:14:31 GMT; path=/
Location: http://s1.euroshare.eu/download/3598184/aXa2YWy3ytUhu3uVUsAQEgUzUDUseje3/5344113/myfile.zip
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: x-requested-with
Access-Control-Allow-Headers: x-file-name
Access-Control-Allow-Headers: content-type
Vary: Accept-Encoding
Content-Type: text/html
FILE_URL=$(curl -I "$FINAL_URL" | grep_http_header_location) || return
echo "$FILE_URL"
Note: Like other *_quiet
functions, grep_http_header_location_quiet
is silent and do always return 0. Use this only on dedicated case. For example:
FILE_URL=$(echo "$HTML_PAGE" | grep_http_header_location_quiet) || return
if [ -z "$FILE_URL" ]; then
... # not premium
Argument:
-
stdin
: data (HTTP headers)
Result:
-
$?
: 0 on success or$ERR_FATAL
(non matching or empty string) -
stdout
: parsed filename (non null string)
Sharing websites often return their files as an attachment. curl doesn't care about Content-Disposition:
. So, it will not parse this HTTP header but keeps url as name reference (see -O
option documentation).
$ curl http://p123.share-site.com/download/dl.php?id=123456456
# saved filename will be: "dl.php?id=123456456"
The reason for that, is that link can have multiple attachments. Note: This is a difference between curl and wget.
Note: This is not true anymore. Since curl 7.20.0, -J
/--remote-header-name
option has been added (you must combine it with -O
/--remote-name
). Plowshare does not use this for now.
Have a look at divshare
module:
FILE_NAME=$(curl -I "$FILE_URL" | grep_http_header_content_disposition) || return
Before plowdown
core script make the final HTTP GET request, module is doing a HTTP HEAD request in order to parse attachment header and get filename.
$ curl -I http://p123.share-site.com/download/dl.php?id=123456456
HTTP/1.0 200 OK
Date: Sun, 28 Feb 2010 11:41:50 GMT
Server: Apache
Last-Modified: Mon, 12 Oct 2009 10:04:20 GMT
ETag: 9852859-16341905311255341860
Cache-Control: max-age=30
Content-Disposition: attachment; filename="kop_standard.pdf"
Accept-Ranges: bytes
Content-Length: 412848
Vary: User-Agent
Keep-Alive: timeout=300, max=100
Connection: keep-alive
Content-Type: application/octet-stream
Notice that some sharing sites does not an allow HTTP HEAD requests. Restricting web server is maybe a security concern?
There is a possible workaround: HTTP 1.1 protocol allow to make to HTTP GET request and specify a byte range.
FILE_NAME=$(curl -i -r 0-99 "$FILE_URL" | grep_http_header_content_disposition) || return
This is not very classy, but this can work, except if sharing site only allow one (and only one) HTTP request to that final URL (uploaded.to
for example). In that case you couldn't get attachment filename.
Retrieve a specific HTTP header.
Argument:
-
stdin
: data (HTTP headers)
Result:
-
$?
: 0 on success or$ERR_FATAL
(non matching or empty string) -
stdout:
parsed content (non null string)
Name | HTTP header |
---|---|
grep_http_header_content_length |
Content-Length |
grep_http_header_content_location |
Content-Location |
grep_http_header_content_type |
Content-Type |
$ curl --head http://share-site.net/wm8tbV6gZCp
HTTP/1.1 200 OK
Content-Disposition: attachment; filename="foobar"
Content-length: 5156
Content-Type: application/octet-stream
Date: Mon, 30 Sep 2013 06:29:14 GMT
ETag: "bc7f4762443939bd7dccb42370f0d932"
Last-Modified: Mon, 30 Sep 2013 06:28:44 GMT
Server: Apache
Vary: User-Agent
Connection: keep-alive
Arguments:
-
$1
: entry name -
stdin
: data (netscape/mozilla cookie file format)
Result:
-
$?
: 0 on success or$ERR_FATAL
(non matching or empty string) -
stdout:
parsed content (non null string)
This is often used to get account settings. Sometimes, for premium account, remote site adds an extra key in cookie file. So it can be convenient to differ free account from premium account.
LOGIN_ID=$(parse_cookie 'Login' < "$COOKIEFILE") || return
PASS_HASH=$(parse_cookie 'Password' < "$COOKIEFILE") || return
# At this point You are sure that $LOGIN_ID and $PASS_HASH are valid (non empty)
Note: Like other *_quiet
functions, parse_cookie_quiet
is silent and do always return 0. Use this only on dedicated case. For example:
USERNAME=$(parse_cookie_quiet 'login' < "$COOKIEFILE")
if [ -z "$USERNAME" ]; then
... # invalid account
return $ERR_LOGIN_FAILED
fi
Arguments:
-
$1
(optional): filtering regexp. -
$2
: tag name. This is case sensitive. -
stdin
: data (HTML, XML)
Result:
-
$?
: 0 on success or$ERR_FATAL
(non matching or empty marker) -
stdout:
parsed content (non null string)
Name | Usage example |
---|---|
parse_tag |
T=$(echo "$LINE" |
parse_tag_quiet |
Same as parse_tag but don't print on parsing error |
parse_all_tag |
n/a |
parse_all_tag_quiet |
Same as parse_all_tag but don't print on parsing error |
The _all functions are for multiline content, one tag is parsed per line.
Important: If you have several matching tags on the same line, the first one is taken.
Remember that this is line oriented, if beginning tag and ending are not on the same line, it won't work. It's not perfect, but for now, it covers all our need.
Examples:
LINE='... <a href="link1">Link number 1</a> <a href="javascript:;">Link number 2</a>'
LINK1=$(echo "$LINE" | parse_tag a) || return # First link returned
LINE='... <b></b> ...'
CONTENT=$(echo "$LINE" | parse_tag b) || return # Error: <b> content is empty
# Nested elements: take the deepest one!
WAIT_MSG='<span id="foo">Wait <span id="bar">30</span> seconds</span>'
WAIT_TIME=$(echo "$WAIT_MSG" | parse_tag span) || return # 30
Note: parse_tag b
is equivalent to parse_tag . b
and parse_tag b b
.
Arguments:
-
$1
(optional): filtering regexp. -
$2
: attribute name -
stdin
: data (HTML, XML)
Result:
-
$?
: 0 on success or$ERR_FATAL
(non matching or empty attribute) -
stdout:
parsed content (non null string)
Name | Usage example |
---|---|
parse_attr |
`LINK=$(echo "$IMG" |
parse_attr_quiet |
Same as parse_attr but don't print on parsing error |
parse_all_attr |
`LINKS=$(echo "$PAGE" |
parse_all_attr_quiet |
Same as parse_all_attr but don't print on parsing error |
The _all functions are for multiline content, one attribute is parsed per line.
Quoting is handled according to HTML5 standard:
<div class="foo">
<div class = "foo" >
<div class='foo'>
<div class=foo>
<div class = foo >
Note: In XHTML, all attribute values must be quoted using double quote marks.
Important: If you have several matching attribute on the same line, the last one is taken (parsing is greedy).
Examples:
IMG='<img href="http://foo.com/bar.jpg" alt="">'
CONTENT=$(echo "$IMG" | parse_attr img alt) || return # Error: 'alt' content is empty
PAGE='<a href="http://...">click here to download</a>'
LINK=$(echo "$PAGE" | parse_attr 'download' 'href') || return
log_debug "[$LINK]" # [http://...]
IMG='<img href="http://foo.com/bar.jpg" id = image_id>'
ID=$(echo "$IMG" | parse_attr 'id') || return
Note: parse_attr b
is equivalent to parse_attr . b
and parse_attr b b
.
Some websites return page as a single big line of HTML (without any EOL). As parse_xxx
functions are per-line oriented, proper parsing can be difficult. Two functions exist:
break_html_lines
and break_html_lines_alt
(more aggressive) to split single line HTML.
There are 3 helper functions.
Arguments:
-
$1
: input (X)HTML data -
$2
: 1-based index or string -
stdin
: data (HTML, XML)
Result:
-
$?
: 0 on success or$ERR_FATAL
(non matching or no such form) -
stdout:
parsed content (non null string)
Name | Usage example |
---|---|
grep_form_by_order |
FORM_HTML=$(grep_form_by_name "$PAGE" 2) |
grep_form_by_name |
FORM_HTML=$(grep_form_by_name "$PAGE" 'named_form') |
grep_form_by_id |
FORM_HTML=$(grep_form_by_name "$PAGE" 'id_form') |
You are strongly encouraged to append regular || return
error handling.
Note: grep_form_by_order
can take a negative index (as argument $2
). Get last form of page with -1. Giving 0 or null string will default to 1.
Tip: On some websites HTML data contain commented HTML or JS code. It can sometimes be useful to strip HTML comments. There is a function doing this named strip_html_comments
(input data on stdin
, filtered data on stdout
).
As other parse
functions, input argument is through stdin.
Name | Usage example |
---|---|
parse_form_action |
`ACTION=$(echo "$FORM_HTML" |
parse_form_input_by_id |
`VALUE=$(echo "$FORM_HTML" |
parse_form_input_by_name |
`VALUE=$(echo "$FORM_HTML" |
parse_form_input_by_type |
`VALUE=$(echo "$FORM_HTML" |
Example:
FORM_URL=$(grep_form_by_order "$HTML_PAGE" 1 | parse_form_action) || return
# We are sure here, that $HTML_PAGE has a form with an action attribute
# We can safely use $FORM_URL now
Note: parse_form_input_by_id_quiet
, parse_form_input_by_name_quiet
and parse_form_input_by_type_quiet
are available.
Like other *_quiet
functions, there's no error message and do always return 0.
You generally use them when you want to parse a html form field with possible empty value. For example:
FORM_SID=$(echo "$FORM_HTML" | parse_form_input_by_id_quiet 'sid')
# $FORM_SID can be empty for anonymous users and it can be defined
# (non empty: session id defined) for account user.
core.sh
script provides some functions.
Captchas are solved using --captchamethod
command line option (in plowdown, plowup and plowdel). If not defined, it is autodetected (look for an image viewer and prompt for answer).
Arguments:
-
$1
: local image file (any format) or URL (which doesn't require cookies) -
$2
: captcha type or hint -
$3
(optional): minimum length -
$4
(optional): maximum length
Current captcha types:
-
recaptcha
(better userecaptcha_process()
to get reload feature) -
solvemedia
(better usesolvemedia_captcha_process()
to get reload feature) - digits
- letters
Results:
-
stdout
(2 lines) : captcha answer (ascii text) / transaction id -
$?
: 0 for success, or$ERR_CAPTCHA
,$ERR_FATAL
,$ERR_NETWORK
Typical usage ($CAPTCHA_IMG
is a valid image file):
local WI WORD ID
WI=$(captcha_process "$CAPTCHA_IMG" ocr_digit) || return
{ read WORD; read ID; } <<< "$WI"
rm -f "$CAPTCHA_IMG"
Note: If something goes wrong ($?
is not 0), argument image file is deleted.
Argument:
-
$1
: site key
Results:
-
stdout
(3 lines) : captcha answer (ascii text) / recaptcha challenge / transaction id -
$?
: 0 for success, or$ERR_CAPTCHA
,$ERR_FATAL
,$ERR_NETWORK
Typical usage:
local PUBKEY WCI CHALLENGE WORD ID
PUBKEY='6Lftl70SAAABAItWJueKIVvyG5QfLgmAgtKgVbDT'
WCI=$(recaptcha_process $PUBKEY) || return
{ read WORD; read CHALLENGE; read ID; } <<< "$WCI"
Argument:
-
$1
: site key
Results:
-
stdout
(2 lines) : verified challenge / transaction id -
$?
: 0 for success, or$ERR_CAPTCHA
,$ERR_FATAL
,$ERR_NETWORK
Each time you call captcha_process
or recaptcha_process
, you get a transaction id as result. Once captcha result submitted, module function must acknowledge or not acknowledge captcha transaction reply (some solving captcha services can refund credits on wrong answer).
Validation captcha answer is made through two functions: captcha_ack
or captcha_nack
.
Argument:
-
$1
: transaction id
Typical usage:
if match ... wrong captcha ...; then
captcha_nack $ID
log_error "Wrong captcha"
return $ERR_CAPTCHA
fi
captcha_ack $ID
log_debug "correct captcha"
Note: A module must not loop in case of wrong captcha, just captcha_nack
and return $ERR_CAPTCHA
. The retry mechanism is made at upper level with plowdown -r
policy.
Stands for JavaScript Object Notation. Official format standard is RFC4627.
If you know nothing about JSON, try this:
curl http://twitter.com/users/bob.json | python -mjson.tool
Simple and limited JSON parsing. sed
command in internally used here. This is really a poor line-oriented parser (instead beeing tree oriented).
Arguments:
-
$1
: variable name (string) -
$2
(optional): preprocess option. Accepted values are:join
andsplit
. -
stdin
: input JSON data
Results:
-
$?
:0
on success or$ERR_FATAL
(non matching or empty result) -
stdout:
parsed content (non null string)
Important notes:
- Single line parsing oriented (user should strip newlines first): no tree model
- Array and Object types: basic poor support (depth 1 without complex types)
- String type: no support for escaped unicode characters (
\uXXXX
) but two-character escaped sequences are handled (for exemple:\t
) - No non standard C/C++ comments handling (like in JSONP)
- If several entries exist on same line: last occurrence is taken (lile
parse_attr
), but: consider precedence (order of priority): number, boolean/empty, string. - If several entries exist on different lines: all are returned (it's a
parse_all_json
)
Simple usage:
FILE_URL=$(echo "$JSON" | parse_json 'downloadUrl') || return
JSON='{"name":"foo","attr":["size":123,"type":"f","url":"http:\/\/www.bar.org\/4c0476"]}'
# ARR='["size":123,"type":"f"]'
ARR=$(parse_json 'attr' <<< "$JSON")
# URL='http://www.bar.org/4c0476'
# (as you can see, it does not care about hierarchy)
URL=$(parse_json 'url' <<< "$JSON")
Arguments:
-
$1
: name (string) -
$2
: input data (json data)
Results:
-
$?
:0
for success; not null any error -
stdout
: nothing!
This will literally match for true
boolean token, "true"
string token or any number will be considered as false.
# Assuming that a curl request can result one of two $JSON content:
# JSON='{"err":"Entered digits are incorrect."}'
# JSON='{"ok":true,"dllink":"http:\/\/www.share-me.com\/..."}'
if ! match_json_true 'ok' "$JSON"; then
ERR=$(echo "$JSON" | parse_json_quiet err)
test "$ERR" && log_error "Remote error: $ERR"
return $ERR_FATAL
fi
log_debug "ok answer..."
Arguments:
-
stdin
: input JavaScript code
Results:
-
$?
:0
on success or$ERR_FATAL
(js error) -
stdout
: result
Example:
JS='print("Hello World!");'
RESULT=$(javascript <<< "$JS") || return
log_debug "result: '$RESULT'"
Modules using javascript function need to add on top the module function (for example zippyshare_download
) this line:
detect_javascript || return
Important note: Don't use classes that are not in javascript core engine. For example:
var strJson = '{"City":"Paris", "Country":"France"}';
var objJson = JSON.parse(strJson);
var dump = JSON.stringify(objJson, null, 2);
print(dump);
rhino
interpreter will know JSON object, but not spidermonkey
:
ReferenceError: JSON is not defined
When entering module function, dedicated module arguments will be processed according to module variables:
MODULE_XXX_DOWNLOAD_OPTIONS
MODULE_XXX_UPLOAD_OPTIONS
MODULE_XXX_DELETE_OPTIONS
MODULE_XXX_LIST_OPTIONS
MODULE_XXX_PROBE_OPTIONS
Assuming module source contains:
MODULE_XXX_DELETE_OPTIONS="
AUTH,a,auth,a=USER:PASSWORD,User account"
Assuming user is invoking plowdel
with an account:
$ plowdel -a 'user:password' 'http://www.sharing-site.com/?delete=12D45G5'
xxx_delete
will be called with the environment variable defined:
AUTH='user:password'
AUTH,a,auth,a=USER:PASSWORD,Premium account
AUTH_FREE,b,auth-free,a=USER:PASSWORD,Free account
Most of the time, when a module can deal with both free and premium, we will see a single option:
AUTH,a,auth,a=USER:PASSWORD,User account
For delete, it's quite usual that authentication is mandatory for deleting files, you'll see:
AUTH,a,auth,a=USER:PASSWORD,User account (mandatory)
LINK_PASSWORD,p,link-password,S=PASSWORD,Used in password-protected files
NOMD5,,nomd5,,Disable md5 authentication (use plain text)
Ask for password if not supplied:
log_debug "File is password protected"
if [ -z "$LINK_PASSWORD" ]; then
LINK_PASSWORD=$(prompt_for_password) || return
fi
LINK_PASSWORD,p,link-password,S=PASSWORD,Protect a link with a password
DESCRIPTION,d,description,S=DESCRIPTION,Set file description
TOEMAIL,,email-to,e=EMAIL,<To> field for notification email
FROMEMAIL,,email-from,e=EMAIL,<From> field for notification email
INCLUDE,,include,l=LIST,Provide list of host site (comma separated)
COUNT,,count,n=COUNT,Take COUNT hosters from the available list. Default is 5.
PRIVATE_FILE,,private,,Do not allow others to download the file
FOLDER,,folder,s=FOLDER,Folder to upload files into (account only)
ADMIN_CODE,,admin-code,s=ADMIN_CODE,Admin code (used for file deletion)
Name | Description |
---|---|
a | Authentication string (user:password or user ) |
n | Positive integer (>0) |
N | Positive integer or zero (>=0) |
s | Non empty string |
S | Any string |
t | Non empty string, multiple command-line switch allowed |
e | Email address string |
l | Comma-separated list, strip leading & trailing spaces |
f | Filename (with read access) |
Reserved argument types (should not be used in modules):
Name | Description |
---|---|
c | Choice list (argument must match a string) |
C | Same as c type, but empty string is allowed |
r | Speed rate. Allowed suffixes: Ki , K , k , Mi , M , m . |
R | Disk size. Allowed suffixes: Mi , m , M , MB , Gi , G , GB . |
F | Executable (search in $PATH and $HOME/.config/plowshare/exec ) |
D | Directory (with write access) |
Assuming module source contains:
MODULE_XXX_UPLOAD_OPTIONS="
INCLUDE,,include,l=LIST,Provide list of host site (comma separated)"
Assuming user is invoking plowup
this way:
$ plowup xxx --include 'first, second,thir d' myfile.foo
xxx_upload
will be called with the environment variable defined:
# This is an array
INCLUDE=( 'first' 'second' 'thir d')
- Consider module option variables (
AUTH
,LINK_PASSWORD
, ...) as read only, don't reassign them. - Because of command-line parsing, modules options with the same name must have the same argument type, this is important. For example: if module1 has an option
--useapi
with types
(non empty string), module2 can't have option--useapi
with typen
(positive integer).
WAIT_TIME=$(echo $WAIT_HTML | parse 'foo' '.. \(...\) ..')
Won't give you expected answer if $WAIT_HTML
is multiline (which is most of the time the case).
You should write instead:
WAIT_TIME=$(echo "$WAIT_HTML" | parse 'foo' '.. \(...\) ..')
Consider this example for understanding:
$ MYS=$(seq 3)
$ echo "$MYS"
1
2
3
$ echo $MYS
1 2 3
$ echo $MYS | xxd
0000000: 3120 3220 330a 1 2 3.
More information about word splitting.
Unfortunately, this is not correct:
local HTML_PAGE=$(curl "$URL") || return
If curl
function returns an error, it won't be catched by || return
because of the local
keyword.
local HTML_PAGE
...
HTML_PAGE=$(curl "$URL") || return
is correct.
$ set -- test
$ [ -z "$1" ] && echo empty || echo nonempty
nonempty
$ set --
$ [ -z "$1" ] && echo empty || echo nonempty
empty
$ set -- test
$ [ -z "$1" ] || echo nonempty && echo empty
nonempty
empty
$ set --
$ [ -z "$1" ] || echo nonempty && echo empty
empty
Looks like "&& ||"
is better than "|| &&"
. But imagine that echo empty
does not return $?=0
:
$ set --
$ [ -z "$1" ] && echo empty; false || echo nonempty
empty
nonempty
Finally, classic if/then/else/fi is not so bad!
if [ -z "$1"]; then
echo empty
else
echo nonempty
fi
See also shellcheck.net note.
Don't put '&&' test as last statement of a function. For example:
myhoster_upload() {
...
echo "$DL_URL"
[ -n "$PUBLIC_FILE" ] && echo "$DEL_URL"
}
If $PUBLIC_FILE
is not empty, myhoster_upload()
will return $?=0
. This is good.
But if $PUBLIC_FILE
is empty, echo
is not performed (as wished) and myhoster_upload()
will return $?=1
. Plowup will assume this is a $ERR_FATAL
module return. This is not what we want! We only want to display the download link and not the delete link (because it's not available).
So prefer this:
myhoster_upload() {
...
echo "$DL_URL"
[ -z "$PUBLIC_FILE" ] || echo "$DEL_URL"
}
A paranoid version:
myhoster_upload() {
...
echo "$DL_URL"
[ -z "$PUBLIC_FILE" ] || echo "$DEL_URL"
return 0
}
Plowshare is running on lots of unix/linux systems. There is always several ways to write bash code. We try to keep compatibility with busybox shell.
Things to take care or avoid in your module functions:
- no
awk
invocation - no
xargs
invocation - no
grep -v
(invert match) invocation - no
wc
invocation (wc -c
can be easily replaced with bash internal string manipulation) - no infinite loops like
while true;
orwhile :;
. - no
tr -d
, try using bash internal replacement. For example:${MYSTRING//$'\n'}
orreplace_all
for multiline content.
Bash specific construct to avoid:
- no bash regexp:
[[ =~ ]]
(requires bash >=3.0). This is an historic choice not using it. Behavior has changed (see E14 in bash FAQ). - no
+=
string (or array) concatenation operator (requires bash >=3.1) - no for loop expand sequence:
for i in {1..10} ; do ... ; done
(requires bash >=3.0). You can useseq
instead. - no
printf -v
(requires bash >=3.1).
BSD specific pitfalls:
-
base64 -d
is a GNU coreutils only short option. Usebase64 --decode
instead. -
BSD sed
has less feature thanGNU sed
(can't use\?
or\r
for example). Try to use parse_* functions instead. -
stat -c
is only available on GNU. Usels
instead. -
readlink -f
is only available on GNU.
Busybox specific pitfalls:
-
grep -o
andgrep -w
(word-regexp) are not supported by old versions of busybox. Do not use them. -
sleep
withs
/m
suffixes or even fractional argument (example:sleep 1m
). BusyBox may not be compiled withCONFIG_FEATURE_FANCY_SLEEP
option. -
tr
with classes (such as[:upper:]
). Busybox may not be compiled withCONFIG_FEATURE_TR_CLASSES
option. -
sed
does not support\xNN
escaped sequences. Tested on Busybox 1.13, 1.18 and 1.19.3. -
sed
does not support\r
escaped sequence before version 1.19 commit. Don't use it, find another way! -
sed
do support\s
,\S
,\w
,\W
(these are GNU extensions). But prefer using the equivalent:[[:space:]]
,[^[:space:]]
,[[:alnum:]_]
,[^[:alnum:]_]
.
Try being compliant with bash 3.x. Interesting reading:
- Do not create temporal files unless necessary, don't forget to delete it if you used one.
-
curl
calls should not be invoked with--silent
option.curl
wrapper function take care of verbose level.
It's because we want to be portable as much as possible. We loose flexibility, but it can be run on slow and old embedded hardware, this is the original starting point of the project. But maybe plowshare with bash 4.0 as minimum requirement will pop-up one day...
- GPL-compatible license.
- No tabs, use 4 spaces. Also use 4 spaces after splitted
\
lines - Line lengths should stay within 80 columns.
- Comments (like ruby) are written in english. No extra empty line before function declaration. No boxes or ascii art stuff.
- Always declare (with
local
keyword) variables you are using.
- Uppercase variables, this is an historical choice, let's keep traditions. We suggest using underscore in it. For example:
MARY_POPPINS
(instead ofMARYPOPPINS
). This is optional but recommended (especially for names with more than 7 characters). For exampleAPIURL
,DESTFILE
andFILEURL
are accepted.COOKIEFILE
is accepted too (butCOOKIE_FILE
is prefered). - Use appropriate names to ease maintainability. For example:
FILE_URL
(instead ofMARY_POPPINS
). Don't use too long variable name: for exampleUPLOADED_FILE_JSON_DATA
is too descriptive,JSON_DATA
orJSON
is enough. - For form parsing, usual names are:
FILE_ID
,FILE_NAME
,FILE_URL
,BASE_URL
,FORM_HTML
,FORM_URL
(action parameter),FORM_xxx
(input field name in uppercase),ADMIN_URL
,DELETE_ID
,WAIT_TIME
. - Usual names for curl results are
HTML
,PAGE
,RESPONSE
,JSON
,STATUS
.
Remark: The choice of uppercase is historical and you can disagree with this approach. Convention is lowercase for internal or temporary variables and uppercase for environment or global variables. This convention avoids accidentally overriding environmental variables.
-
if/then
construct andwhile/do
are on the same line. -
Restrict usage of curly braces:
test "$FILE_URL" || { log_error "location not found"; return $ERR_FATAL; }
should be written:
if test "$FILE_URL"; then
log_error "location not found"
return $ERR_FATAL
fi
3a. In comment, insert a space character of #
symbol
#get id of file (wrong)
# Get id of file (right)
3b. Avoid meaningless comments
# wait 15 seconds
wait 15 seconds || return
- Proper indentation on continued lines
HTML=$(curl -b "$COOKIE_FILE" 'http://www.foo.bar/long...url...') \
|| return
should be written:
HTML=$(curl -b "$COOKIE_FILE" \
'http://www.foo.bar/long...url...') || return
- Simple quote strings as much as possible If there is no variable referencing of course!
local BASE_URL="http://shareme.com" # wrong
local BASE_URL='http://shareme.com' # right
- Don't quote unless required
return "$ERR_LINK_TEMP_UNAVAILABLE" # wrong
return $ERR_LINK_TEMP_UNAVAILABLE # right
Test and retest your module. Little check-list of possible cases:
- File not found
- File temporarily unavailable
- File unavailable (server busy), come back in X minutes
- Download (quota) limit reached
- Your IP address is already downloading a file
- Password protected link
- Premium link download only
- etc.
Other concerns:
- Check for geographical location aware sites, it can affect url TLD
- Don't send incomplete script or nearly-working stuff.
- Don't use illegal or patented content, if you want to make some test, use material here.
- Advanced Bash-Scripting Guide (the bible)
- Bash hackers (very interesting page about bash version features)
- Greg's Wiki (very interesting page about bash pitfalls)
- Interesting mediawiki website (Freddy Vulto)
- Blog about shell scripting