This repo was originally intended to host several interrelated tools, all for parsing, reporting on, and editing MS Word files via python/lxml. Currently only one standalone product is in use: 'rsuite_validate'. The 'validator_isbncheck' tool is also used, as part of the egalleymaker toolchain. See more about legacy products originally served via this repo at the bottom of the README ('Legacy products')
For information on using the 'containerized' version of this tool, refer instead to this readme: ./docker_rsv/README_docker.md
- Python is required, versions 2.7.x and 3.9.x are supported.
- Requires 3 libraries for python, install like so:
pip install lxml requests six
External (Macmillan) git repo RSuite_Word-template is added here as a submodule. It's checked out at a release tag, currently: v6.5.2
To initialize and update the submodule the first time after cloning or pulling the sectionstart_converter repo, run:
git submodule update --init --recursive
To update the submodule when pulling or switching branches (as needed), run: git submodule update RSuite_Word-template
To peg the submodule HEAD to a new tag, first update it with the above command. Then cd into the submodule dir, checkout the new tag, and commit your changes.
This tool accepts Word manuscripts and validates against a number of criteria, makes small edits not related to content or large errors, and returns a report and the edited document to the user, both in an outfolder and via email. Internal documentation available here.
Dependencies for tests, local runs:
- Supplemental python libraries are required, install via pip like so:
pip install requests six
Additional dependencies for production or staging environment (unless running via Docker): - git-repo: 'bookmaker_connectors' must be cloned locally as a sibling directory to this repo ('sectionstart_converter').
- git-repo: 'bookmaker_authkeys' must be cloned locally as a sibling directory to this repo ('sectionstart_converter'). This repo is private and will also require decryption.
To run this tool directly in the cmd line:
python /path/to/rsuitevalidate_main.py '/path/to/file.docx' 'direct' 'local'
- Running with the 'local' parameter above skips sending notification emails, skips posting final files to the OUTfolder via api, and preserves the tmpfolder contents for troubleshooting (working tmpfiles and dirs will be created in the same directory as testfile.docx)
- You can change loglevel from INFO to DEBUG etc. in xml_docx_stylechecks/cfg.py
- To run rsuite_validate with emails and api, the call looks like this instead:
python /path/to/rsuitevalidate_main.py /path/to/file.docx 'direct' 'user.email@domain.com' 'User Name'
Unit and integration tests for rsuite_validate are documented in ./test/README_tests.txt
This tool is run as part of the egalleymaker process, to capture styled ISBNs, and style & capture unstyled ISBN's where needed. It logs them to a JSON where the rest of the egalleymaker process can use them.
This command takes two args: the manuscript to be edited and the existing logfile in use by bookmaker_validator, so we can append to it instead of writing our own.
python /path/to/validator_isbncheck.py /path/to/file.docx /path/to/existing/logfile.txt
The command takes two args: the .docx file and the output dir for the root of the unzipped docx:
python /path/to/unzipDOCX.py /path/to/file.docx /path/to/target/dir
This command takes two args: the root (parent folder) of the unzipped files, and path and name of the output .docx:
python /path/to/zipDOCX.py /path/to/unzip_root /path/to/new/file.docx
These were items that this repo was initially intended to serve as well, all are retired for now, not refactored out of the code as of yet:
-
xml_docx_stylechecks/converter_main.py - This tool is to update Microsoft Word documents that were styled with Macmillan styles prior to the release of our new Section Start styling. The document.xml file will be directly edited using python/lxml, in order to add and update Section Start styles to conform with updated bookmaker and egalleymaker toolchains.
-
xml_docx_stylechecks/reporter_main.py - This tool is to run functions formerly handled in our VBA Stylecheck macro(s). It will output a 'Style Report', both as a txt file, and send an email to the submitter. The original manuscript is not edited.
-
xml_docx_stylechecks/validator_main.py - This tool is to prepare a manuscript for egalley creation, as part of the bookmaker_validator toolchain; it fixes errors found in the 'Stylecheck' plus some other unique ones.