This project takes PowerPoint .pptx files and extracts their contents. It's based on ANTLR 4, ANother Tool for Language Recognition. There's an ANTLR 3 branch available as well.
- This version does not preserve text formatting or slide layouts.
- This version ignores shapes drawn with PowerPoint (that's a complex little drawing language) and might not catch all pictures.
- The output is HTML formatted for a s6 slideshow.
Intall Maven and JDK 6 or later, build using the standard Maven lifecycle targets (clean, compile, test, package).
- Other output templates (e.g. Markdown, Textile)
- Capture inline formatting
- Capture more of the layout options (titles, header/footer, text block positioning, picture positioning.)