Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field names are not resolved properly #1860

Closed
TimoVFX opened this issue May 23, 2023 · 2 comments
Closed

Field names are not resolved properly #1860

TimoVFX opened this issue May 23, 2023 · 2 comments

Comments

@TimoVFX
Copy link

TimoVFX commented May 23, 2023

Hey,
Im trying to use pypdf to fill some templates. We now moved to testing and noticed that some of the form fields are not filled properly because of some field naming issues. I digged a bit and figured out that field names including a . causing the problem.

In our use case we got multiple categories in which all of them include a 1_Name Field. When inspecting the pdf in Acrobat I figured that the fields actually named 1_Name 2.1_Name, ... and so on. With get_form_text_field I only get back the first field. While get_fields results in two fields one named 1_Name and the second just 2. (See Output further down)

While checking with pdftk the field names are retrieved properly.

Since I can't share the initial pdf I was testing with I created a small sample pdf with just two text fields named 1_Name and 2.1_Name

To confirm it's indeed the . causing the problem I tested renaming the 2.1_Name field to 2_1_Name which works as expected.

As a side effect of this problem all fields named x.1_Name are filled with the same value when filling the fields. When running update_page_form_field_values with "1_Name": "test, both fields are filled with test. I did not include this in the example as I think this will be fixed when namings are correct.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
macOS-13.1-arm64-arm-64bit

$ python -c "import pypdf;print(pypdf.__version__)"
3.1.0

Initially I was running on Version 2.10 - I updated to 3.1 with the same result.

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

    template = "tests/field_sample.pdf"
    reader = PdfReader(template)
    form = reader.get_fields()
    textfields=reader.get_form_text_fields()
    print('TextFields: ',textfields)
    print('FormFields: ',form)

field_sample.pdf

Output

TextFields:  {'1_Name': None}
FormFields:  {'1_Name': {'/T': '1_Name', '/FT': '/Tx', '/Parent': {'/Kids': [IndirectObject(25, 0, 4342149280)], '/T': '2'}}, '2': {'/T': '2', '/Kids': [IndirectObject(25, 0, 4342149280)]}}

Renaming all fields to not include a . is an option I would like to dodge since we have quite a number of Templates.
I hope this is an easy fix on your side. Could you let me know if you are looking into this issue and what timeframe to expect a possible fix?

Best,

@pubpub-zz
Copy link
Collaborator

Field Names can not have "." in their name:
_(extract from Pdf Reference 1.7 page 1117)
image

if there is a "." we replace it with "_" (cf #1529)

@pubpub-zz pubpub-zz closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2023
@pubpub-zz
Copy link
Collaborator

Also if you look at the pdf source you will find the first field named "1_Name"with no parent and the second one one also named "1_Name" has a parent field (sort of grouping) has a parent named 2 so the names are correct.
for the get_text_form, the second one hides/ovewrites the first one.
If you want to see both use
.get_form_text_fields(True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants