Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Addition of optional visitor-functions in extract_text() #1252

Merged
merged 28 commits into from
Sep 25, 2022

Commits on Aug 18, 2022

  1. ENH: Added visitor-callbacks in PageObject.extract_text(...).

    You may use this callbacks to visit all operators and its arguments
    and to get the positions of the text-objects.
    You may use this to extract the rectangles of a table and the texts
    in its cells in some PDF files.
    srogmann committed Aug 18, 2022
    Configuration menu
    Copy the full SHA
    76801d7 View commit details
    Browse the repository at this point in the history
  2. TST: Test of visitor-callbacks in extract_text().

    It extracts labels of rectangles in Figure 2 of GeoBase_NHNC1_Data_Model_UML_EN.
    srogmann committed Aug 18, 2022
    Configuration menu
    Copy the full SHA
    39a9f08 View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2022

  1. Configuration menu
    Copy the full SHA
    92c0cf8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c320ea8 View commit details
    Browse the repository at this point in the history

Commits on Aug 20, 2022

  1. TST: Added function extractTable(...) to read text in cells of a table.

    The function extractTable(listTexts, listRects) uses the function
    extractTextAndRectangles(page, rectFilter) which uses the function
    extract_text with visitors to extract text in cells of a table.
    srogmann committed Aug 20, 2022
    Configuration menu
    Copy the full SHA
    177fea2 View commit details
    Browse the repository at this point in the history

Commits on Aug 22, 2022

  1. Configuration menu
    Copy the full SHA
    4389590 View commit details
    Browse the repository at this point in the history
  2. ENH: Added visitor-callbacks in PageObject.extract_text(...).

    You may use this callbacks to visit all operators and its arguments
    and to get the positions of the text-objects.
    You may use this to extract the rectangles of a table and the texts
    in its cells in some PDF files.
    srogmann committed Aug 22, 2022
    Configuration menu
    Copy the full SHA
    eccc779 View commit details
    Browse the repository at this point in the history
  3. TST: Test of visitor-callbacks in extract_text().

    It extracts labels of rectangles in Figure 2 of GeoBase_NHNC1_Data_Model_UML_EN.
    srogmann committed Aug 22, 2022
    Configuration menu
    Copy the full SHA
    8297b13 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    165b686 View commit details
    Browse the repository at this point in the history
  5. TST: Added function extractTable(...) to read text in cells of a table.

    The function extractTable(listTexts, listRects) uses the function
    extractTextAndRectangles(page, rectFilter) which uses the function
    extract_text with visitors to extract text in cells of a table.
    srogmann committed Aug 22, 2022
    Configuration menu
    Copy the full SHA
    ed784e9 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9922f1c View commit details
    Browse the repository at this point in the history
  7. ENH: visitor_text additionally gets font-dictionary and font-size.

    When executing extract_text(...) the optional visitor-function visitor_text
    gets the font-dictionary and the font-size.
    The font-dictionary contains the font-name and other font properties.
    srogmann committed Aug 22, 2022
    Configuration menu
    Copy the full SHA
    ae7c993 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    4afa052 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2022

  1. Configuration menu
    Copy the full SHA
    f83ae31 View commit details
    Browse the repository at this point in the history

Commits on Sep 14, 2022

  1. Configuration menu
    Copy the full SHA
    18d2f4a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    19003b3 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a5b8b44 View commit details
    Browse the repository at this point in the history

Commits on Sep 17, 2022

  1. flake8 fixes

    MartinThoma authored Sep 17, 2022
    Configuration menu
    Copy the full SHA
    17f2d61 View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2022

  1. Fix type annotations

    MartinThoma authored Sep 18, 2022
    Configuration menu
    Copy the full SHA
    72e51be View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2022

  1. Configuration menu
    Copy the full SHA
    fe11b54 View commit details
    Browse the repository at this point in the history
  2. Missed a bracket

    MartinThoma authored Sep 24, 2022
    Configuration menu
    Copy the full SHA
    ab5d118 View commit details
    Browse the repository at this point in the history
  3. another bracket

    MartinThoma authored Sep 24, 2022
    Configuration menu
    Copy the full SHA
    c5733f5 View commit details
    Browse the repository at this point in the history
  4. Remove unused functions

    MartinThoma authored Sep 24, 2022
    Configuration menu
    Copy the full SHA
    9aad439 View commit details
    Browse the repository at this point in the history
  5. Type annotations

    MartinThoma authored Sep 24, 2022
    Configuration menu
    Copy the full SHA
    3809522 View commit details
    Browse the repository at this point in the history
  6. Fix type

    MartinThoma authored Sep 24, 2022
    Configuration menu
    Copy the full SHA
    5b87ecc View commit details
    Browse the repository at this point in the history
  7. Fix type:ignore comment

    MartinThoma authored Sep 24, 2022
    Configuration menu
    Copy the full SHA
    e47e16c View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    fb7807c View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    1969c9f View commit details
    Browse the repository at this point in the history