Skip to content
Tim Colson edited this page Feb 12, 2021 · 4 revisions

Documentation and notes that don't need to be in the source directly will go into the wiki. 😄

Activity Log

Notes about progress, lessons learned, links to references.

2021-02-10 Get started!

TimC - Created a simple 3-slide PowerPoint for source material. Setup a basic Java project in Intellij IDEA; hacked up a Reader class to read the source1.pptx file. Used code snippets from Baeldung (#3) to read and extract data. After that success, extended the code to modify slides and write out a revised presentation. Used code snippets from Tutorials Point (#2) reference, with some changes due to API updates in POI 5.0.0 versus 3.x. (Note: method is now ill-named as "Reader", but this is just some temporary PoC code.)

I was able to list all slide layouts, modified slide order (move #3 -> #2), added a slide (with a layout), and extracted all Picture names (w/ bytes[], but discarded the data for now). Tried on a more complex example slide deck (45MB) and noted that attached audio files (.mp3 and .wav) were also listed in the "Pictures" list.

2021-02-11 : Plan for today:

  1. Extract a pre-determined list of fields
  2. Dive deeper into the slide "name" field -- the one that is not easily accessible from the PPT GUI (see python-pptx #671)
    • Updated the source deck XML manually to include a name element with UUID; then re-zipped
    • Embedded name UUID <p:cSld name="cSld-name-9C3C9787-305F-4251-9A7D-2B0E5B0DC7E6">
    • Successfully read in this source, manipulated it, and then read in the resulting "target1.pptx"
    • Takeaway: UUID name moved WITH the slide as predicted (into position 2)
Read in a PPTX file: ./target1.pptx
Reading: ./target1.pptx
slide = null; name=Slide1
slide = Originally Slide #3; name=cSld-name-9C3C9787-305F-4251-9A7D-2B0E5B0DC7E6
slide = Slide2; name=Slide3
slide = Click to edit Master title style; name=Slide4
picture name: image1.png
picture format: PNG
picture name: image2.svg
picture format: SVG
picture name: image3.png
picture format: PNG
picture name: image4.jpg
picture format: JPEG
picture name: media1.mp3
picture format: null
layout = Name: /ppt/slideLayouts/slideLayout1.xml - Content Type: application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml
layout = Name: /ppt/slideLayouts/slideLayout10.xml - Content Type: application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml
layout = Name: /ppt/slideLayouts/slideLayout11.xml - Content Type: application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml
... inclusive 2-9 ...
Wrote to target2.pptx!
  1. Experiment with python-pptx, especially around the name field.
  • Setup a Python virtualenv (pyenv virtualenv pyppt), updated pip, installed python-pptx.
  • Ran sample script successfully to create a basic PPTX file from code.
  • TIL that VS Code has a built-in Jupyter notebook with #%% denoting "cells" that can be run individually.
  • Opened the source1.pptx, added a unique name to the first slide: ppt.slides[1].name="TC-NewID"
  • In the resulting target source1-pyout.pptx file, after expanding, saw <p:cSld name="TC-NewID"> in the XML! Yay!

Resources

  1. Apache POI HowTo for Shapes
  2. TutorialsPoint : Apache POI PPT - Quick Guide
    • Dated 2017, but only a few methods so far seem to have changed.
  3. Baeldung : Creating a MS PowerPoint Presentation in Java
    • At time of writing this, POI 5.0.0 is current, Baeldung example used 3.17
    • Learned the object interface for "modern" PowerPoint (2007+ OOXML file format) with .pptx extension are XMLSlideShow, XSLFSlide and XSLFTextShape
Clone this wiki locally