-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem extracting text on a two columns layout #112
Comments
thanks for the pointers, since |
Line 11 in 826552f
|
sorry, I cannot find it here: https://docs.juliahub.com/PDFIO/cmOJE/0.1.14/, and when I try to use it I get Stacktrace: |
ok fair enough I guess... |
|
I have opened a question here: https://discourse.julialang.org/t/how-to-extract-data-from-pdf-with-two-columns/108008
but maybe this is the right place...
from this slide
using this:
I get this:
gives a poor performance: ● SAM: The solution ○ EU incubators / accelerators: ~1200 market size ○ EU VC: ~500 ● SOM: ○ IT incubators / ● TAM:○ 1.35M tech startups accelerators: ~250, worldwide ○ IT VCs: ~60 ○ ~7000 incubators / accelerators worldwide, ○ proxies to enter and counting 5000 startups ○ ~2500 VC firms, half of them on growth giano.rocks
which is kind of ignoring the two columns layout. I am trying to understand from the API if there's a way to simply extract all the objects in the page and then extract texts from these objects.
What am I missing?
The text was updated successfully, but these errors were encountered: