-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test framework for sphinx-simplepdf #83
Comments
I agree, a test framework would be great.
A quick search hasn't found any promising solution for this. @ubmarco: As PDF miner expert, do you have an idea how this could be achieved? |
Maybe a solution would be to make a pixel-by-pixel comparison with a golden sample, which got checked once manually. There is a question on PyMuPDF, which is discussing this: technical concept (idea)A test-case contains:
Pytest-fixtures to:
A helper function like So in the end, each test case defines its own little project and therefore PDF. |
I think we should both: Read back a PDF into text representation, we could check
We could use libpdf for this (a pdfplumber and pdfminer wrapper). This test targets directly where things went wrong. This can also detect whether tables wrapped. Keep in mind, PDFs have no understanding of words, sentences, tables. They just know letters, letter orientation, font and color. Tables are made of lines. Then we'll also need a image comparison to be sure the overall layout is still valid, colors match and to test theme updates. Getting all needed programs installed to the Github node that runs the test (e.g. pillow) might be a problem. |
The text solution would handle most of the test cases i have in mind. Maybe this handling could be used not only for sphinx-simple internal tests, also for the real document tests produced during build. a pdf (one per test) test is also ok, but i am not sure if this is Here is the question: The tests should not only tests against different sphinx versions, it should also maybe test against different weasyprint versions. This might also be trick to handle |
The last point can be easily done by matrix tests. Which are supported by github actions. One PDF per test has the advantage that the tests are isolated from each other and therefore normally easier to maintain,. |
I have to start with a test framework for the generated pdf's from simplepdf in my current project. I saw that libpdf is a repository in your organisation ( https://github.com/useblocks/libpdf ). So i assume work on a test framwork could start with this as there is currently no other solution available? |
I think so, yes. May be the easiest solution as all other PDF libraries are more low-level. |
Integration of libpdf seems not to be so easy in an environment with sphinx-simplepdf and weasyprint due to pillow dependencies. libpdf seems to have a (maybe outdated) dependency to an exact pillow reference which is in conflict with the weasyprint dependency. There seems to exist a branch in libpdf to fix this, but it is not merged in the main branch. |
And there seems to by a typo in the pyproject.toml in this branch |
After hacking and get it running it only runs with no_annotations, and then gets stucked internal. Hacking steps:
It then fails internaly:
|
After "zero knowledge based hacking" the libpdf source code i was able to extract some content. This helps me going further into my efforts for the "pdf" check. May the force be with you - If you might integrate 😄 |
forked libpdf and applied fixes to https://github.com/procitec/libpdf/tree/upgrade. |
With the PR in the libpdf i am able to parse and test the pdf, e.g. chapter, headings, page numbering etc. Open questions currently:
|
I just released a new version 0.1.0 of libpdf. It now has a new element called |
Currently i think its enough for testing. see useblocks/libpdf#36 for integration of tests in libpdf and a sphinx-simplepdf/weasyprint generated pdf. I would expect one member of useblocks to create the test framework, maybe like the others with poetry/nox. libpdf is here a litte bit different, i do not know which python test framework useblocks currently prefers |
You're right, we need to set up testing for this repo. I vote for simple tox and pytest, just like for libpdf. |
Is there a chance to implement a basic test framework? I dont know if i should / could takeover these from the other repositories of useblocks "as is".
The pdf output could be testet with some python pdftotext modules, available at pypi. E.g. to count pages, or get the text from individual pages and compare if some expected text appears
Impementing a "basic" test would be good, i feel motivated to add more tests 😀
The text was updated successfully, but these errors were encountered: