Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pack the permissions (the /P entry) as unsigned long, fix #186 #352

Merged
merged 7 commits into from
Jan 7, 2020

Conversation

Recursing
Copy link
Contributor

@Recursing Recursing commented Jan 5, 2020

Description
Fixes #186, by treating the /P entry as a four byte unsigned integer

How Has This Been Tested?
Tested the pdf in the issue

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the README.md and other documentation, or I am sure that this is not necessary
  • I have added a consice human-readable description of the change to CHANGELOG.md
  • I have added docstrings to newly created methods and classes
  • I have optimized the code at least one time after creating the initial version

@Recursing Recursing changed the title Tread the permissions (the /P entry) as unsigned long, fix #186 Pack the permissions (the /P entry) as unsigned long, fix #186 Jan 5, 2020
@pietermarsman
Copy link
Member

I did some research and your solution looks spot-on to me.

However, the 5 tests for this PR are failing indicating that changing this breaks other this. Could you have a look and this?

Also, make sure to add a line to the CHANGELOG.md.

@pietermarsman
Copy link
Member

The cause of the errors is probably that self.p is negative. I've at least found one example in the tests that has a value of self.p = -12.

@Recursing
Copy link
Contributor Author

Yeah sorry, I really didn't think self.p could be negative, will look into it

@pietermarsman
Copy link
Member

Great! I am also trying to figure this out right now. I found a paragraph in the PDF reference that is quite helpful (right above Table 3.20):

Note: PDF integer objects are represented internally in signed twos-complement form. Since all the reserved high-order flag bits in the encryption dictionary’s P val- ue are required to be 1, the value must be specified as a negative integer. For exam- ple, assuming revision 2 of the security handler, the value -44 permits printing and copying but disallows modifying the contents and annotations.

@pietermarsman
Copy link
Member

I guess you need to convert the signed value (e.g. -12) to an unsigned value by using the two's complement.

I cannot find anything in pdfminer.six that already implements this algorithm, so I guess you should do it yourself. There is some pseudocode on the wikipedia.

@Recursing
Copy link
Contributor Author

Recursing commented Jan 6, 2020

You can just add 1<<32 if it's negative, and it works for all samples, but it's not the cleanest solution (should be parsed as unsigned in the first place)

Edit: I think we should just add 1<<32 and leave it like this, it seems to be coherent with the reference

I'm currently testing it on a few pdfs in my downloads folder, will push commits in an hour

@pietermarsman
Copy link
Member

It is preferable to implement the two's complement method instead of some approximation. I know it is more fuzz than you care about, but that way we follow the PDF reference exactly. And implementing things exactly like the PDF reference saves a lot of fuzz in the future.

I just noticed that the pseudo-code on wikipedia is actually Python. You can put this function in utils.py and apply it after interpreting P here:

self.p = int_value(self.param['P'])
.

@Recursing
Copy link
Contributor Author

It's not an approximation, it's how two's complement is defined, see the wikipedia article

@pietermarsman
Copy link
Member

Ah, I did not get that! Than it's fine.

pdfminer/pdfdocument.py Outdated Show resolved Hide resolved
@pietermarsman
Copy link
Member

@Recursing could you review it once more and see if I made some dumb mistake?

@Recursing
Copy link
Contributor Author

Looks fine to me, really thanks for everything!

In uint_value you don't check that the number is an int (and not a float), like int_value does, and you might want to standardize the import style ( from .pdftypes import PDFException, uint_value is the only multiple import in the same line). But they seem to be both very minor things

@pietermarsman pietermarsman merged commit 0b1741b into pdfminer:develop Jan 7, 2020
@pietermarsman
Copy link
Member

Thanks for all the effort!
(And explaining to me how this works, I've learned a lot!)

@harshdonga
Copy link

Hey folks, facing a similar issue, can anybody help?
Here are the logs

~/.local/lib/python3.7/site-packages/pdfminer/pdfdocument.py in __init__(self, parser, password, caching, fallback)
    742                     id_value = (b"", b"")
    743                 self.encryption = (id_value, dict_value(trailer["Encrypt"]))
--> 744                 self._initialize_password(password)
    745             if "Info" in trailer:
    746                 self.info.append(dict_value(trailer["Info"]))

~/.local/lib/python3.7/site-packages/pdfminer/pdfdocument.py in _initialize_password(self, password)
    769         if factory is None:
    770             raise PDFEncryptionError("Unknown algorithm: param=%r" % param)
--> 771         handler = factory(docid, param, password)
    772         self.decipher = handler.decrypt
    773         self.is_printable = handler.is_printable()

~/.local/lib/python3.7/site-packages/pdfminer/pdfdocument.py in __init__(self, docid, param, password)
    356         self.param = param
    357         self.password = password
--> 358         self.init()
    359         return
    360 

~/.local/lib/python3.7/site-packages/pdfminer/pdfdocument.py in init(self)
    364             error_msg = "Unsupported revision: param=%r" % self.param
    365             raise PDFEncryptionError(error_msg)
--> 366         self.init_key()
    367         return
    368 

~/.local/lib/python3.7/site-packages/pdfminer/pdfdocument.py in init_key(self)
    377 
    378     def init_key(self) -> None:
--> 379         self.key = self.authenticate(self.password)
    380         if self.key is None:
    381             raise PDFPasswordIncorrect

~/.local/lib/python3.7/site-packages/pdfminer/pdfdocument.py in authenticate(self, password)
    427     def authenticate(self, password: str) -> Optional[bytes]:
    428         password_bytes = password.encode("latin1")
--> 429         key = self.authenticate_user_password(password_bytes)
    430         if key is None:
    431             key = self.authenticate_owner_password(password_bytes)

~/.local/lib/python3.7/site-packages/pdfminer/pdfdocument.py in authenticate_user_password(self, password)
    433 
    434     def authenticate_user_password(self, password: bytes) -> Optional[bytes]:
--> 435         key = self.compute_encryption_key(password)
    436         if self.verify_encryption_key(key):
    437             return key

~/.local/lib/python3.7/site-packages/pdfminer/pdfdocument.py in compute_encryption_key(self, password)
    412         hash.update(self.o)  # 3
    413         # See https://github.com/pdfminer/pdfminer.six/issues/186
--> 414         hash.update(struct.pack("<L", self.p))  # 4
    415         hash.update(self.docid[0])  # 5
    416         if self.r >= 4:

error: 'L' format requires 0 <= number <= 4294967295

Here is the file

Python : 3.7.10
pdfminer.six: 20220524
pdfminer : 20191125

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

struct.error: 'l' format requires -2147483648 <= number <= 2147483647
3 participants