A set of EPUB test files, specifically created for the purpose of testing an automated procedure to validate EPUBs against KB institutional policies. These policies require the following:
- Files must be valid EPUB (version 2 or 3)
- File may not contain DRM or encryption (edge case: font mangling, should be permitted)
- All resources in the container fall within the Core Media Types
- No Digital Talking Book (DTB) content documents
As a result, most of the files in this repo deliberately violate one or more of the above requirements.
Some of the files were newly created (with a little help from Sigil), whereas others were taken or adapted from other openly-licensed data sets.
- content - uncompressed contents of each test file (each subdirectory represents one epub)
- build - actual epub builds
- epubcheckout - epubcheck output
- pubresources - various resources (files) that were used for creating the epubs.
The script build.sh iterates over all subdirectories in the content folder and compresses the contents of each to a functional epub file in the build directory.
For an explanation of how the build process works, see here.
The script analyse.sh validates all epubs in the build directory with Epubcheck (it uses both the stable 3.0 version and the 4.0.1 one). You have to install these yourself on your system. Then update the file paths to epubcheck3Jar and epubcheck4Jar at the top of the script.
File name | Epub version | Description | Epubcheck (3,4) output |
---|---|---|---|
epub20_minimal.epub | 2 | Basic file with one text resource and one image | 3,4 |
epub20_minimal_encryption.epub | 2 | Includes encryption.xml resource in META-INF , indicating that main text resource is encrypted (text resource is not actually encrypted, BTW) |
3,4 |
epub30_font_obfuscation.epub | 3 | Includes fonts that are obfuscated (which results in hasEncryption in epubcheck). Taken from EPUB 3 Sample Documents (wasteland with OTF fonts, obfuscated). | 3,4 |
epub20_foreign_resource_no_fallback.epub | 2 | Includes JP2 image, which is a format that is not on the list of Core Media Types; no fallback defined | 3,4 |
epub20_foreign_resource_with_fallback.epub | 2 | Includes JP2 image, which is a format that is not on the list of Core Media Types; fallback defined in manifest, identifier in content document | 3,4 |
epub20_foreign_resource_with_fallback_noID.epub | 2 | Includes JP2 image, which is a format that is not on the list of Core Media Types; fallback defined in manifest, no identifier in content document | 3,4 |
epub20_dtbook.epub | 2 | Includes Digital Talking Book content. Adapted from threepress, published under BSD 3 license. | 3,4 |
epub20_xpgt.epub | 2 | Includes style definitions as Adobe Page Template. | 3,4 |
epub20_missingfontresource.epub | 2 | CSS stylesheet contains reference to font resource that is not part of the package. | 3,4 |
epub20__invalid_entity | 2 | HTML contains illegal control character | 3,4 |
epub20_encryption_binary_content | 2 | Encrypted resource with illegal named entity | 3,4 |
epub20_crazy_fixed_layout | 2 | Uses CSS to place each line at fixed position on the page. This results in all sorts of problems after resizing the page and/or font. Valid (but dumb) EPUB. | 3,4 |
epub20_crazy_columns | 2 | Uses style tags to define columns. Another valid (but dumb) EPUB. | 3,4 |
- Add uncompressed directory structure to content folder
- Run script to update the builds
- Add descriptive entry to table above
All files here are released under the Creative Commons 3.0 BY-SA license, unless stated otherwise.