You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that by setting boxes_flow outside the documented range, you can actually disable PDFMiner's advanced layout analysis.
We don't need the advanced analysis since we have no hierarchy of text boxes and we order them ourselves, and it's quite a performance gain to leave these out.
I've filed an issue (and fix) to update the documentation and also allow boxes_flow to be passed as None to explicitly disable this: pdfminer/pdfminer.six#395
Once that's merged, we should either default or hard-code our boxes_flow la param to None. It feels like we should allow it to be overridden, but equally since we ignore the resulting analysis perhaps there's no point and we should hard-code it to None.
The text was updated successfully, but these errors were encountered:
I noticed that by setting
boxes_flow
outside the documented range, you can actually disable PDFMiner's advanced layout analysis.We don't need the advanced analysis since we have no hierarchy of text boxes and we order them ourselves, and it's quite a performance gain to leave these out.
I've filed an issue (and fix) to update the documentation and also allow
boxes_flow
to be passed asNone
to explicitly disable this: pdfminer/pdfminer.six#395Once that's merged, we should either default or hard-code our
boxes_flow
la param toNone
. It feels like we should allow it to be overridden, but equally since we ignore the resulting analysis perhaps there's no point and we should hard-code it toNone
.The text was updated successfully, but these errors were encountered: