Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF permission retrieval #2391

Closed
stefan6419846 opened this issue Jan 3, 2024 · 7 comments · Fixed by #2400
Closed

PDF permission retrieval #2391

stefan6419846 opened this issue Jan 3, 2024 · 7 comments · Fixed by #2400

Comments

@stefan6419846
Copy link
Collaborator

stefan6419846 commented Jan 3, 2024

When looking at #2389, I stumbled upon some oddities about how PDF permissions are handled/exposed (#2389 (comment)):

  • While we have PdfReader.decode_permissions (introduced in ENH: Extract document permissions #320), we do not expose the corresponding permissions publicly, but only through PdfReader._encryption.values.
  • It is a bit confusing to have both the PdfReader._encryption.values.Perms and the PdfReader._encryption.P attributes without any actual documentation.
    • Perms is the permissions dictionary with permission handlers. Our code at
      values.Perms = encryption_entry.get("/Perms", ByteStringObject()).original_bytes
      and
      Perms: bytes
      considers this as bytes instead? (PDF 1.7 section 8.7.3 "Permissions")
    • P is the actual permission integer value. (PDF 1.7 section 3.5.2 "Standard Security Handler")

I tend to argue that providing a human-readable public API for the P permissions might indeed make sense. Relying on PdfReader._encryption might work, but being an internal API, this is not guaranteed to reliably work in the future as well.

@pubpub-zz
Copy link
Collaborator

I agree : PdfReader.decode_permissions() should be extended accepting None parameter to automatically get the actual value. Also an PdfWriter.encrypt() should be extended to accept a dict compatible with decode_permissions() output

@stefan6419846
Copy link
Collaborator Author

I will prepare a corresponding PR on Monday which adds corresponding to_dict() and from_dict() methods to the UserAccessPermissions class and deprecate decode_permissions in favour of a new property user_access_permissions returning an UserAccessPermissions instance which ensures a cleaner interface.

While preparing the corresponding changes, I stumbled upon

ALL_DOCUMENT_PERMISSIONS = UserAccessPermissions((2**31 - 1) - 3)
as well, which does not seem to be correct in conjunction with
R32 = 2**31
as it will always omit the 32nd bit: ((2**31 - 1) - 3) & (2**31) == (2**31) maps to False as ALL_DOCUMENT_PERMISSIONS has to be ((2**32 - 1) - 3) instead. I am going to fix this in the same PR as well. This might turn up as a bug as this does not match the PDF 1.7 specification in table 3.20 if a reader strictly enforces the standard.

@MartinThoma
Copy link
Member

MartinThoma commented Jan 6, 2024

deprecate decode_permissions

That sounds like a good idea

in favour of a new property user_access_permissions

I think this is good as well 👍

We should use the UserAccessPermissions directly there. I think this is reasonably readable for the user

 >>> UserAccessPermissions(15)
<UserAccessPermissions.MODIFY|PRINT|R2|R1: 15>

However, we might want to re-name some of the properties. R1 and R2 are ... very technical. For example, the decode_permissions returns forms instead of R7. Maybe we can improve the UserAccessPermissions enum in some way?

@MartinThoma
Copy link
Member

MartinThoma commented Jan 6, 2024

@stefan6419846 I've created #2399 for that which represents this. I'm not sure if this issue is more than that change.

@stefan6419846
Copy link
Collaborator Author

stefan6419846 commented Jan 6, 2024

@MartinThoma Yes PdfReader.user_access_permissions should probably return an UserAccessPermissions instance. As mentioned above, I am planning to propose a corresponding PR on Monday when I am in the office again. My current draft is basically:

class UserAccessPermissions(IntFlag):
    """TABLE 3.20 User access permissions."""

    R1 = 1
    R2 = 2
    PRINT = 4
    MODIFY = 8
    EXTRACT = 16
    ADD_OR_MODIFY = 32
    R7 = 64
    R8 = 128
    FILL_FORM_FIELDS = 256
    EXTRACT_TEXT_AND_GRAPHICS = 512
    ASSEMBLE_DOC = 1024
    PRINT_TO_REPRESENTATION = 2048
    R13 = 2**12
    R14 = 2**13
    R15 = 2**14
    R16 = 2**15
    R17 = 2**16
    R18 = 2**17
    R19 = 2**18
    R20 = 2**19
    R21 = 2**20
    R22 = 2**21
    R23 = 2**22
    R24 = 2**23
    R25 = 2**24
    R26 = 2**25
    R27 = 2**26
    R28 = 2**27
    R29 = 2**28
    R30 = 2**29
    R31 = 2**30
    R32 = 2**31

    @classmethod
    def _is_reserved(cls, name: str) -> bool:
        """Check if the given name corresponds to a reserved flag entry."""
        return name.startswith("R") and name[1:].isdigit()

    @classmethod
    def _defaults_to_one(cls, name: str) -> bool:
        """Check if the given reserved name defaults to 1 = active."""
        return name not in {"R1", "R2"}

    def to_dict(self) -> Dict[str, bool]:
        """Convert the given flag value to a corresponding verbose name mapping."""
        result: Dict[str, bool] = {}
        for name, flag in UserAccessPermissions.__members__.items():
            if UserAccessPermissions._is_reserved(name):
                continue
            result[name.lower()] = (self & flag) == flag
        return result

    @classmethod
    def from_dict(cls, value: Dict[str, bool]) -> "UserAccessPermissions":
        """Convert the verbose name mapping to the corresponding flag value."""
        value_copy = value.copy()
        result = cls(0)
        for name, flag in cls.__members__.items():
            if cls._is_reserved(name):
                # Reserved names have a required value. Use it.
                if cls._defaults_to_one(name):
                    result |= flag
                continue
            is_active = value_copy.pop(name.lower(), False)
            if is_active:
                result |= flag
        assert not value_copy, value_copy
        return result

    @classmethod
    def all(cls) -> "UserAccessPermissions":
        return cls((2**32 - 1) - 3)

However, we might want to re-name some of the properties. R1 and R2 are ... very technical. For example, the decode_permissions returns forms instead of R7. Maybe we can improve the UserAccessPermissions enum in some way?

The R* values are reserved slots without a name, thus for "pretty printing" my example approach will hide them already.

The current forms key would be mapped to fill_form_fields (R7 seems to be some mistake on your side - the form bit is the 9th one, not the 7th).

@MartinThoma
Copy link
Member

Ok, I'll keep my feet still until your PR comes 😄

@MartinThoma
Copy link
Member

@pubpub-zz I suggest you also enjoy that stefan is taking care of it 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants