Skip to content

brannondorsey/pastebin-mirror

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pastebin-mirror

Mirror Pastebin.com to a local SQLite database or flat text files. Archives all new and trending pastes in real-time.

Getting Started

Archive Trending

This functionality is no longer provided by Pastebin. The --trending feature still exists in this mirror tool, however, it will always return zero trending paste results.

To archive only the 18 trending posts each hour you need an API Key from pastebin. This is free but does require creating an account with pastebin. Once you've registered, you can find your API key here.

Archive All New

Archiving all pastes in real-time requires a PRO LIFETIME account ($50 one-time payment). Once you purchase a PRO LIFETIME account, you must whitelist your public IP address (strangely, you do not need an API key to archive all new posts). Once that is done, you should be free to use this tool for life, or more realistically, as long as pastebin is still kickin'.

Download

# download
git clone https://github.com/brannondorsey/pastebin-mirror
cd pastebin-mirror

# run in "full" mode, archiving all new AND trending pastes to pastebin.db 
python3 pastebin-mirror --output pastebin.db --trending --api-key <YOUR_API_KEY>

Download Only Trending Pastes

You can download only trending pastes without a PRO LIFETIME account like so:

python3 pastebin-mirror --output pastebin.db --no-mirror --trending --api-key <YOUR_API_KEY>

Dowload Only New Pastes

Inversely, omitting the --trending flag only downloads new pastes (--mirror is enabled by default). In this mode you may omit the --api-key flag:

python3 pastebin-mirror --output pastebin.db

Saving Pastes as Flat Text Files

Pastes can optionally be saved as raw text files instead of to an SQLite database. To do this, simply run pastebin-mirror with the --output-format flat-file option. When output format is flat-file, --output is interpreted as a directory path instead of a database file. Pastes are saved in the output directory like PASTE_ID.txt, where PASTE_ID is the unique ID assigned to the paste by pastebin (e.g. output_directory/0eBX2nS3.txt).

When using flatfile output a metadata/ folder is created in the output directory. Information about each paste is included in this folder, saved with identical basenames to the raw paste content in the output directory.

Contents of output_directory/metadata/0eBX2nS3.txt:

key: 0eBX2nS3
timestamp: 1499615402
size: 5079
expires: 0
title: Doom-Mates: Indigestion
syntax: text
user: Protom

If pastebin-mirror is called with the --trending option, trending pastes will be saved inside of output_directory/trending. Information about trending pastes are also included in output_directory/metadata.

Output to stdout and stderr

pastebin-mirror outputs the paste ids of successfully downloaded pastes to stdout only. All additional/noisy info logging is output to stderr. This means that you can reliably use pastebin-mirror as a tool in a larger pipeline, triggering some event when pastes are downloaded.

Output stdout Only

$ python3 pastebin-mirror --output-format flat-file --output pastebin 2>/dev/null
mUzadurz
BiqtCmKW
SCG7eBRk
G758hYSR
pQTcXyNg
BUxnxESb
V7LTSvan
Geu3cuEH
XJGib81F
GkYJ9WvT
TgCpcF6S
PecJnApM
jt7Dym1j
rv9FMc7P

Output stderr Only

$ python3 pastebin-mirror --output-format flat-file --output test 1>/dev/null
[*] Fetching 11 new pastes
[*] Waiting 30 seconds before next paste scrape
[*] Fetching 17 new pastes
[*] Waiting 30 seconds before next paste scrape
[*] Fetching 20 new pastes
[*] Waiting 30 seconds before next paste scrape
[!] Interrupted by user, exiting

Quiet Mode

Additionally, with the --quiet option all text output except fatal errors will be suppressed.

Usage

usage: pastebin-mirror [-h] -o OUTPUT [-f {sqlite,flat-file}] [-r RATE] [-t]
                       [-m] [-n] [-k API_KEY] [-v] [-q]

Pastebin mirror tool. Save publicly uploaded pastes in real-time to an SQLite
database or as flat text files. Optionally archive trending pastes as well.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        output SQLite database file or directory name if
                        --output-format=flat-file
  -f {sqlite,flat-file}, --output-format {sqlite,flat-file}
                        output format
  -r RATE, --rate RATE  seconds between requests to the pastebin scrape API.
                        minimum 1 second.
  -t, --trending        archive trending pastes (runs once per hour)
  -m, --mirror          archive pastebin in real-time using the scrape API.
                        Requires a PRO LIFETIME account to whitelist your IP
                        address.
  -n, --no-mirror       do not archive pastebin using the scrape API.
  -k API_KEY, --api-key API_KEY
                        pastebin API key. only required with --trending option
  -v, --version         show program's version number and exit
  -q, --quiet           suppresses printing of non-essential UI output, 
                        including paste ids and stats. fatal errors will still 
                        be displayed. default is false (show everything).

License and Attribution

This software is free to use under the terms of the MIT license.

The original paste-mirror (version 0.0.1) was written by James Ward. Version 1.0.0 is a major overhaul authored by Brannon Dorsey. See the CHANGELOG for changes.