- Replace
PdfFileReader
withPdfReader
and pin PyPDF to>=3.0.0
. #307 by Martin Thoma.
- Change extra requirements from
cv
tobase
. You can usepip install "camelot-py[base]"
to install everything required to run camelot.
Improvements
- Add support for multiple image conversion backends. #198 and #253 by Vinayak Mehta.
- Add markdown export format. #222 by Lucas Cimon.
Documentation
- Add faq section. #216 by Stefano Fiorucci.
Bugfixes
- Fix use of resolution argument to generate image with ghostscript. #231 by Tiago Samaha Cordeiro.
- #15 Fix duplicate strings being assigned to the same cell. #206 by Eduardo Gonzalez Lopez de Murillas.
- Save plot when filename is specified. #121 by Jens Diemer.
- Close file streams explicitly. #202 by Martin Abente Lahaye.
- Use correct re.sub signature. #186 by pevisscher.
- #183 Fix UnicodeEncodeError when using Stream flavor by adding encoding kwarg to
to_html
. #188 by Stefano Fiorucci. - #179 Fix
max() arg is an empty sequence
error on PDFs with blank pages. #189 by Vinayak Mehta.
Improvements
- Add
line_overlap
andboxes_flow
toLAParams
. #219 by Arnie97. - Add bug report template.
- Move from Travis to GitHub Actions.
- Update
.readthedocs.yml
and remove requirements.txt
Documentation
- #193 Add better checks to confirm proper installation of ghostscript. #196 by jimhall.
- Update
advanced.rst
plotting examples. #119 by Jens Diemer.
- Revert the changes in
0.8.1
.
Bugfixes
Improvements
- Drop Python 2 support!
- Remove Python 2.7 and 3.5 support.
- Replace all instances of
.format
with f-strings. - Remove all
__future__
imports. - Fix HTTP 403 forbidden exception in read_pdf(url) and remove Python 2 urllib support.
- Fix test data.
Bugfixes
- Fix library discovery on Windows. #32 by KOLANICH.
- Fix calling convention of callback functions. #34 by KOLANICH.
Improvements
Bugfixes
- Fix Click.HelpFormatter monkey-patch. #5 by Dimiter Naydenov.
- Fix strip_text argument getting ignored. #4 by Dimiter Naydenov.
- #25 edge_tol skipped in read_pdf. #26 by Vinayak Mehta.
- Fix pytest deprecation warning. #2 by Vinayak Mehta.
- #293 Split text ignores all text to the right of last cut. #294 by Vinayak Mehta.
- #277 Sort TableList by order of tables in PDF. #283 by Sym Roe.
- #312
table_regions
throwsValueError
whenflavor='stream'
. #332 by Vinayak Mehta.
Bugfixes
Bugfixes
- Move ghostscript import to inside the function so Anaconda builds don't fail.
Improvements
- #240 Add support to analyze only certain page regions to look for tables. #243 by Vinayak Mehta.
- You can use
table_regions
inread_pdf()
to specify approximate page regions which may contain tables. - Kwarg
line_size_scaling
is now calledline_scale
.
- You can use
- #212 Add support to export as sqlite database. #244 by Vinayak Mehta.
- #239 Raise warning if PDF is image-based. #240 by Vinayak Mehta.
Documentation
Note: The python wrapper to Ghostscript's C API is now vendorized under the ext
module. This was done due to unavailability of the ghostscript package on Anaconda. The code should be removed after we submit a recipe for it to conda-forge. With this release, the user doesn't need to ensure that the Ghostscript executable is available on the PATH variable.
Improvements
- #91 Add support to read from url. #236 by Vinayak Mehta.
- #229, #230 and #233 New configuration parameters. #234 by Vinayak Mehta.
strip_text
: To define characters that should be stripped from each string.edge_tol
: Tolerance parameter for extending textedges vertically.resolution
: Resolution used for PDF to PNG conversion.- Check out the advanced docs for usage details.
- #170 Add option to pass pdfminer layout kwargs. #232 by Vinayak Mehta.
- Keyword arguments for pdfminer.layout.LAParams can now be passed using
layout_kwargs
inread_pdf()
. - The
margins
keyword argument inread_pdf()
is now deprecated.
- Keyword arguments for pdfminer.layout.LAParams can now be passed using
Improvements
- #207 Add a plot type for Stream text edges and detected table areas. #224 by Vinayak Mehta.
- #204
suppress_warnings
is now calledsuppress_stdout
. #225 by Vinayak Mehta.
Bugfixes
Documentation
- Add pdfplumber comparison and update Tabula (stream) comparison. Check out the wiki page.
Bugfixes
- Add chardet to
install_requires
to fix #210. More details in pdfminer.six#213.
Improvements
- #102 Detect tables automatically when Stream is used. #206 Add implementation of Anssi Nurminen's table detection algorithm by Vinayak Mehta.
Improvements
- #186 Add
_bbox
attribute to table. #193 by Vinayak Mehta.- You can use
table._bbox
to get coordinates of the detected table.
- You can use
Improvements
- Matplotlib is now an optional requirement. #190 by Vinayak Mehta.
- You can install it using
$ pip install camelot-py[plot]
.
- You can install it using
- #127 Add tests for plotting. Coverage is now at 87%! #179 by Suyash Behera.
Improvements
- #162 Add password keyword argument. #180 by rbares.
- An encrypted PDF can now be decrypted by passing
password='<PASSWORD>'
toread_pdf
or--password <PASSWORD>
to the command-line interface. (Limited encryption algorithm support from PyPDF2.)
- An encrypted PDF can now be decrypted by passing
- #139 Add suppress_warnings keyword argument. #155 by Jonathan Lloyd.
- Warnings raised by Camelot can now be suppressed by passing
suppress_warnings=True
toread_pdf
or--quiet
to the command-line interface.
- Warnings raised by Camelot can now be suppressed by passing
- #154 The CLI can now be run using
python -m
. Trypython -m camelot --help
. #159 by Parth P Panchal. - #165 Rename
table_area
totable_areas
. #171 by Parth P Panchal.
Bugfixes
- Raise error if the ghostscript executable is not on the PATH variable. #166 by Vinayak Mehta.
- Convert filename to lowercase to check for PDF extension. #169 by Vinicius Mesel.
Files
- #114 Add Makefile and make codecov run only once. #132 by Vaibhav Mule.
- Add .editorconfig. #151 by KOLANICH.
- Downgrade numpy version from 1.15.2 to 1.13.3.
- Add requirements.txt for readthedocs.
Documentation
- Add "Using conda" section to installation instructions.
- Add readthedocs badge.
- Remove hard dependencies on requirements versions.
Bugfixes
- Move opencv-python to extra_requires. #134 by Vinayak Mehta.
Bugfixes
Improvements
- #123 Make PEP8 compatible. #125 by Oshawk.
- #110 Add more tests. Coverage is now at 84%!
- Add tests for
__repr__
. #128 by Vaibhav Mule. - Add tests for CLI. #122 by Vaibhav Mule and #117 by Vinayak Mehta.
- Add tests for errors/warnings. #113 by Vinayak Mehta.
- Add tests for output formats and parser kwargs. #126 by Vinayak Mehta.
- Add tests for
- Add Python 3.5 and 3.7 support. #119 by Vinayak Mehta.
- Add logging and warnings.
Documentation
- Copyedit all documentation. #112 by Christine Garcia.
- #115 Update issue labels in contributor's guide. #116 by Johnny Metz.
- Update installation instructions for Windows. #124 by Vinayak Mehta.
Note: This release also bumps the version for numpy from 1.13.3 to 1.15.2 and adds a MANIFEST.in. Also, openpyxl==2.5.8 is a new requirement and pytest-cov==2.6.0 is a new dev requirement.
Improvements
Improvements
- #85 Add Travis and Codecov.
Documentation
- Add documentation fixes.
- Rebirth!