Skip to content
/ ddmp Public

Digital Database of Microbial Phenotypes. Like an online Bergey's Manual.

Notifications You must be signed in to change notification settings

rec3141/ddmp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This database is composed of phenotypic information on Bacteria and Archaea, as listed in the primary literature and Bergey's Manual of Systematic Bacteriology.
Contributors to date: R. Eric Collins

FILES:
volume2-tables.txt: Bergey's Manual Volume 2, The Proteobacteria
volume3-tables.txt: Bergey's Manual Volume 3, The Firmicutes
volume4-tables.txt: Bergey's Manual Volume 4, The Bacteroidetes, Spirochaetes, Tenericutes (Mollicutes), Acidobacteria, Fibrobacteres, Fusobacteria, Dictyoglomi, Gemmatimonadetes, Lentisphaerae, Verrucomicrobia, Chlamydiae, and Planctomycetes

NOTES:
Volume 1, the Archaea and Deeply-Branching Bacteria, is not available in digital format
Volume 5, the Actinobacteria, will be released in early 2012

PIPELINE:
 - get text out of PDFs
 -- pdftotext -layout -nopgbrk -enc UTF-8 <file>

 - get tables out of text
 -- grep -h -A 100 -B 1 -e 'TABLE' <file>

 - decode messed up Unicode in Volume 2
 -- unicode2not.pl <file>

 - fix whitespace
 -- tabit.pl <file>

 - fix formatting
 -- manually in Kate using Block Selection mode
 -- examples include cleaning up 1-column tables, multi-page tables
 -- multi-line captions were replaced using regex:
 --- Find: (TABLE.*)\n([a-z]{2,}.*)\s*
 --- Replace: \1 \2

About

Digital Database of Microbial Phenotypes. Like an online Bergey's Manual.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published