Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tables in a page getting wrong order #336

Closed
gqh1995 opened this issue Jan 17, 2021 · 6 comments
Closed

tables in a page getting wrong order #336

gqh1995 opened this issue Jan 17, 2021 · 6 comments
Assignees
Labels

Comments

@gqh1995
Copy link

gqh1995 commented Jan 17, 2021

Hello,

First thank you for this great library, it really help me to quick process pdf files.

When i use page.extract_tables to extract tables in a same page, i get tables in a wrong order.

Why this problem occurs?

Thanks a lot for answer me

@samkit-jain samkit-jain self-assigned this Jan 17, 2021
@samkit-jain
Copy link
Collaborator

Hi @gqh1995 Appreciate your interest in the library. Could you please provide the PDF and a reproducible Python code along with the expected and actual output?

@samkit-jain samkit-jain added awaiting-code-or-pdf Issues and PRs awaiting code and/or a PDF from issue/PR-author troubleshooting Issues that seek assistance with parsing specific PDFs labels Jan 17, 2021
@gqh1995
Copy link
Author

gqh1995 commented Jan 18, 2021

@samkit-jain
image

code:

with pdfplumber.open(path) as pdf:
    page = pdf.pages[-2]
    tables = page.extract_tables()
    for table in tables:
        print(table)

expected output:

[['公路技术\n等级', '车道数', '用地指标基准值', '编制条件', None], [None, None, None, '路段交通量Q(peu/d)', '大型车比例u(%)'], ['高速公路', '八', '2.5000', '60000≤Q<80000', '20<μ≤30'], [None, '六', '2.1333', '45000≤Q<60000', '20<μ≤30'], [None, '四', '1.6667', '25000≤Q<40000', '20<μ≤30'], ['一级公路', '六', '1.3333', '30000≤Q<55000', '20<μ≤30'], [None, '四', '0.6667', '15000≤Q<30000', '20<μ≤30'], ['二级公路', '二', '0.3333', 'Q<15000', '20<μ≤30']]
[['公路\n技术\n等级', '车道数', '路段交通量Q \n(pcu/d)', '大型车比例μ(%)', None, None, None, None], [None, None, None, 'μ≤10', '10<\nμ≤20', '20<\nμ≤30', '30<\nμ≤40', 'μ>40'], ['高速\n公路', '八', '80000≤Q<100000', '0.92', '1.02', '1.11', '1.19', '1.26'], [None, None, '60000≤Q<80000', '0.87', '0.93', '1.00', '1.06', '1.10'], [None, '六', '60000≤Q<80000', '0.97', '1.04', '1.12', '1.19', '1.25'], [None, None, '45000≤Q<60000', '0.82', '0.91', '1.00', '1.09', '1.16'], [None, '四', '40000≤Q<55000', '1.01', '1.11', '1.20', '1.30', '1.39'], [None, None, '25000≤Q<40000', '0.81', '0.92', '1.00', '1.08', '1.16'], ['一级\n公路', '六', '30000≤Q<55000', '0.80', '0.90', '1.00', '1.05', '1.10'], [None, '四', '15000≤Q<30000', '0.80', '0.90', '1.00', '1.10', '1.15'], ['二级\n公路', '二', 'Q<15000', '1.00', '1.00', '1.00', '1.00', '1.00']]
[['路段监控通信分中心', '路段监控通信站', '桥隧监控通信站'], ['1.7333', '0.8667', '0.5333']]

actual output:

[['公路技术\n等级', '车道数', '用地指标基准值', '编制条件', None], [None, None, None, '路段交通量Q(peu/d)', '大型车比例u(%)'], ['高速公路', '八', '2.5000', '60000≤Q<80000', '20<μ≤30'], [None, '六', '2.1333', '45000≤Q<60000', '20<μ≤30'], [None, '四', '1.6667', '25000≤Q<40000', '20<μ≤30'], ['一级公路', '六', '1.3333', '30000≤Q<55000', '20<μ≤30'], [None, '四', '0.6667', '15000≤Q<30000', '20<μ≤30'], ['二级公路', '二', '0.3333', 'Q<15000', '20<μ≤30']]
[['路段监控通信分中心', '路段监控通信站', '桥隧监控通信站'], ['1.7333', '0.8667', '0.5333']]
[['公路\n技术\n等级', '车道数', '路段交通量Q \n(pcu/d)', '大型车比例μ(%)', None, None, None, None], [None, None, None, 'μ≤10', '10<\nμ≤20', '20<\nμ≤30', '30<\nμ≤40', 'μ>40'], ['高速\n公路', '八', '80000≤Q<100000', '0.92', '1.02', '1.11', '1.19', '1.26'], [None, None, '60000≤Q<80000', '0.87', '0.93', '1.00', '1.06', '1.10'], [None, '六', '60000≤Q<80000', '0.97', '1.04', '1.12', '1.19', '1.25'], [None, None, '45000≤Q<60000', '0.82', '0.91', '1.00', '1.09', '1.16'], [None, '四', '40000≤Q<55000', '1.01', '1.11', '1.20', '1.30', '1.39'], [None, None, '25000≤Q<40000', '0.81', '0.92', '1.00', '1.08', '1.16'], ['一级\n公路', '六', '30000≤Q<55000', '0.80', '0.90', '1.00', '1.05', '1.10'], [None, '四', '15000≤Q<30000', '0.80', '0.90', '1.00', '1.10', '1.15'], ['二级\n公路', '二', 'Q<15000', '1.00', '1.00', '1.00', '1.00', '1.00']]

@jsvine
Copy link
Owner

jsvine commented Jan 18, 2021

Hello, @gqh1995. Could you provide the PDF file rather than a screenshot of the PDF? Because pdfplumber operates only on PDFs and not image files, it will be difficult to debug your issue otherwise.

@gqh1995
Copy link
Author

gqh1995 commented Jan 19, 2021

test1.pdf
@jsvine

@samkit-jain samkit-jain added bug and removed awaiting-code-or-pdf Issues and PRs awaiting code and/or a PDF from issue/PR-author troubleshooting Issues that seek assistance with parsing specific PDFs labels Jan 19, 2021
samkit-jain added a commit that referenced this issue Jan 19, 2021
@samkit-jain
Copy link
Collaborator

Thanks for sharing the PDF @gqh1995 I have raised PR #338 that should resolve the issue.

samkit-jain added a commit that referenced this issue Jan 19, 2021
samkit-jain added a commit that referenced this issue Jan 19, 2021
@gqh1995
Copy link
Author

gqh1995 commented Jan 20, 2021

@samkit-jain thanks a lot,the problem is solved.

samkit-jain added a commit that referenced this issue Jan 20, 2021
@gqh1995 gqh1995 closed this as completed Jan 20, 2021
samkit-jain added a commit that referenced this issue Jan 20, 2021
samkit-jain added a commit that referenced this issue Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants