Python Pub Crawl

An enhanced Python - Directory Archiver. "Stumbles" through a given path and it's sub-directories, creates a persistent dictionary archive and simultaneously keeps a database up to date with changes. Perfect for file servers with a web-interface. Can link to download scripts (examples coming soon).

###Usage

Help and Command Summary

$ python pubcrawl.py -h
usage: pubcrawl.py [-h] [-v] [-d] [-f] directory

py pub crawler, stumbles through a given directory and stores metadata for every file it finds.

positional arguments:
  directory             directory to start crawl
  config/settings.yaml  settings file location (optional)

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose  verbose output from crawler
  -d, --dump     dumps and replaces existing dictionaries
  -f, --fake     crawl only, nothing stored to DB

Crawl - verbose and dump(create new archives)

$ python pubcrawl.py /directory/where/crawl/will/start/ -v -d
pickle found
Replacing existing dictionaries.
Continue? (q = quit)

Searching... /directory/where/crawl/will/start/
+ new add: /directory/where/crawl/will/start/file1.ex
+ new add: /directory/where/crawl/will/start/file2.ex
+ new add: /directory/where/crawl/will/start/file3.ex

Added:   3 new files to list.
Removed: 0 files from list.
Updated: 0 of 3 files in list.
Total:   3 entries in list.

Crawl - verbose (existing archives)

$ python pubcrawl.py /directory/where/crawl/will/start/ -v
pickle found
Loading files...
Loading extensions...
Using existing dictionary...

Continue? (q = quit)

Searching... /directory/where/crawl/will/start/
--- file already found ---
--- file already found ---
--- file already found ---

Added:   0 new files to list.
Removed: 0 files from list.
Updated: 0 of 3 files in list.
Total:   3 entries in list.

###Dependencies

SQLAlchemy - Object relational mapping. Interface for database.
MySQL-python - Best to install from pip
PyYAML - a YAML parser and emitter for the Python programming language.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
config		config
rx @ d223ec7		rx @ d223ec7
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
dbtask.py		dbtask.py
filemeta.py		filemeta.py
pickler.py		pickler.py
pubcrawl.py		pubcrawl.py
pubcrawl_classes.py		pubcrawl_classes.py
requirements.txt		requirements.txt
yamlRx.py		yamlRx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Pub Crawl

###Usage

###Dependencies

About

Releases

Packages

Languages

frankV/pythonpubcrawl

Folders and files

Latest commit

History

Repository files navigation

Python Pub Crawl

###Usage

###Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages