Convert office docs with LibreOffice/OpenOffice via Python, Commandline, or HTTP (including XMLRPC).
This package provides tools like WSGI apps, cache managers, and commandline converters to ease access to LibreOffice/OpenOffice installations for Python programmers. Beside basic converting it provides 'document processors' for further finetuning of generated docs (mainly HTML).
Out of the box these processors allow extracting CSS from HTML conversions, removal of LibreOffice-specific tags, zipping, unzipping, etc.
If the given processors are not enough for you, or you want some special handling of results (say, sign generated docs cryptographically, add watermarks, or whatever), you can define own additional document processors in your own packages by using the Python entry-point API. ulif.openoffice will integrate them automatically during document processing and provide them in webservices, commandline clients and Python API.
Note
ulif.openoffice trusts unoconv to do the actual conversions. So you must have the unoconv script installed on your system.
ulif.openoffice sources are hosted on
https://github.com/ulif/ulif.openoffice
The complete documentation can be found at
https://ulif-openoffice.readthedocs.io/en/latest/
Contents
A .doc to .html conversion via the Python API can be done like this:
>>> from ulif.openoffice.client import Client >>> client = Client() >>> result = client.convert('document.doc') >>> pprint(result) ('.../document.html.zip', None, {'error': False, 'oocp_status': 0})
The generated document is by default brushed up HTML with separate stylesheets and images all put into a single .zip document.
You can configure the document conversion via various options. This way you can set the output type (at least PDF, HTML, XHTML and TXT are supported), tell whether separate CSS stylesheets should be extracted, which PDF format should be generated (1.3 aka PDF/A or 1.4), and many, many things more.
We also provide a handy commandline tool to perform conversions:
$ oooclient document.doc RESULT in /tmp/.../document.html.zip
As you can see, the result is put in a freshly created directory.
The commandline client also provides help to display all supported options, document processors, etc.:
$ oooclient --help
will give you the comprehensive list.
ulif.openoffice comes with two WSGI applications that provide document conversion services to web clients. One is a RESTful document conversion service, the other is a WSGI based XMLRPC server. With one of these applications running you can send office documents to a server and will receive the converted document.
All WSGI document converters supports (optional) local caching which will store conversion results and deliver it (bypassing new conversion) if a document was requested to be converted already.
The package comes with prepared configuration files to setup and start such a web-based document converter in minutes.
See the extended docs under
https://ulif-openoffice.readthedocs.io/en/latest/
for details.
ulif.openoffice can be installed via pip:
$ pip install ulif.openoffice
Afterwards all commandline tools should be available.
It is recommended to setup sources in a virtual environment:
$ virtualenv py27 # Python 2.6, 2.7 are supported $ source py27/bin/activate (py27) $
Get the sources:
(py27) $ git clone https://github.com/ulif/ulif.openoffice.git (py27) $ cd ulif.openoffice
Install packages for testing:
(py27) $ python setup.py dev
It is recommended to start the oooctl
daemon before running
tests:
(py27) $ oooctl start
This will make LibreOffice listen in background and reduce runtime of tests significantly.
Running tests:
(py27) $ py.test
We also support tox to run tests for all supported Python versions:
(py27) $ pip install tox (py27) $ tox
Of course you must have the respective Python versions installed (currently: Python 2.6, 2.7).
Running coverage detector:
(py27) $ py.test --cov=ulif.openoffice # for cmdline results (py27) $ py.test --cov=ulif.openoffice --cov-report=html
The latter will generate HTML coverage reports in a subdirectory.
Install packages for Sphinx-base documentation:
(py27) $ python setup.py docs (py27) $ cd doc (py27) $ make html
Will generate the documentation in a subdirectory.
ulif.openoffice is covered by the GPL version 2.
By Uli Fouquet (uli at gnufix dot de). Please do not hesitate to contact me for wishes, requests, suggestions, or other questions.