Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue : Can't able to extract line spacing between paragraph #295

Closed
kkarthikvk opened this issue Oct 20, 2020 · 1 comment
Closed

Issue : Can't able to extract line spacing between paragraph #295

kkarthikvk opened this issue Oct 20, 2020 · 1 comment
Labels
feature-request All feature requests receive this label initially, can be upgraded to "enhancement"

Comments

@kkarthikvk
Copy link

While extracting text from the pdf using extract.text() . It could not able to extract line spacing between paragraph . It merges the paragraph into one .

NOTE :
1 . I have tried with different x_tolerance and y_tolerance ranging from -10 to 10 .
2 . I also tried by cropping the pdf page .

Nothing worked :( .

Example of want it happens (No need to read see the structure) :

say pdf :

The Middle Ages saw a huge rise in popularity of annual Shrovetide football matches throughout Europe, particularly in England. An early reference to a ball game played in Britain comes from the 9th century Historia Brittonum, which describes "a party of boys ... playing at ball".[30] References to a ball game played in northern

France known as La Soule or Choule, in which the ball was propelled by hands, feet, and sticks,[31] date from the 12th century.[32]he early forms of football played in England, sometimes referred to as "mob football", would be played in towns or between neighbouring villages, involving an unlimited number of players on opposing teams who would clash en masse,[33] struggling to move an item.

The first detailed description of what was almost certainly football in England was given by William FitzStephen in about 1174–1183. He described the activities of London youths during the annual festival of Shrove Tuesday:

Extracted as :

The Middle Ages saw a huge rise in popularity of annual Shrovetide football matches throughout Europe, particularly in England. An early reference to a ball game played in Britain comes from the 9th century Historia Brittonum, which describes "a party of boys ... playing at ball".[30] References to a ball game played in northern
France known as La Soule or Choule, in which the ball was propelled by hands, feet, and sticks,[31] date from the 12th century.[32]he early forms of football played in England, sometimes referred to as "mob football", would be played in towns or between neighbouring villages, involving an unlimited number of players on opposing teams who would clash en masse,[33] struggling to move an item.
The first detailed description of what was almost certainly football in England was given by William FitzStephen in about 1174–1183. He described the activities of London youths during the annual festival of Shrove Tuesday:

@kkarthikvk kkarthikvk added the feature-request All feature requests receive this label initially, can be upgraded to "enhancement" label Oct 20, 2020
@jsvine
Copy link
Owner

jsvine commented Oct 20, 2020

Thanks for your interesting in the library, @kkarthikvk. Right now, .extract_text(...) does not add spacing other than to separate lines. This is definitely something we'd like to improve in the future. I have an idea for how to implement this, but have not found the time yet to properly test it. Closing this issue as a practical duplicate of #10 to keep the discussion / progress updates in one place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request All feature requests receive this label initially, can be upgraded to "enhancement"
Projects
None yet
Development

No branches or pull requests

2 participants