Creds serialization and deserialization #45

zarganum · 2024-07-13T16:04:29Z

Creds (de)serialization to/from the FILE ccache format (vesion 4) and KCM ccache protocol.

Use case: loading/storing of Creds objects in implementation-external cache memory, with TTL aligned to ticket validity.

MIT only:
krb5_marshal_credentials
krb5_unmarshal_credentials

Example usage:

bytes = krb5.marshal_credentials(ctx, creds)
uncreds = krb5.unmarshal_credentials(ctx, bytes)

zarganum · 2024-07-14T07:02:34Z

forgot to add one last line to test_creds_serialization():
assert creds.ticket == uncreds.ticket
for some reason gcc still warns on implicit declarations of krb5_marshal_credentials and krb5_unmarshal_credentials in mit tests... but each CI run costs 55 minutes, omg...

jborean93 · 2024-07-15T02:49:52Z

The bulk of the time in CI is compiling all the macOS wheels. In the past there was no arm macOS image to use so compiling was quite time consuming. I believe GHA now has arm images we can use so there's probably some work that can be done to improve those times. Still I'm not seeing 55 minutes, the summary from the latest run is 10 minutes 22 seconds. Are you saying that adding that assertion line causes CI to run for that time? If so you might have to investigate what that is doing and why it might be taking so long locally.

zarganum · 2024-07-15T10:08:24Z

Jordan, I appreciate your responsiveness and apologize for not being clear enough.
The last assertion I wanted to add confirms the ticket itself is properly deserialized and is just a binary block compare; it is expected to be submicro- or even nanoseconds.
This assertion is not a concern... the concern is a computational costs that Github bills you.

The CI duration is indeed shown as 10m 22s (astronomical time between start and end?).
But the usage is 55m 0s (~23m for MacOS wheels, ~26m for Ubuntu tests, plus ~5m for MacOS tests).

Aren't these 55 minutes representing the total computational cost billed by Github?
I am holding the "last line" mini-commit because don't want to ruin your budget.
Let me know if you are fine with that:)

Many thanks

zarganum · 2024-07-15T10:15:23Z

you'd probably want to cut away the unsupported Python 3.7 right now, and plan for 3.8 EOL after October 2024.
https://devguide.python.org/versions/

jborean93 · 2024-07-15T10:16:00Z

Ah no worries, thanks for your concern and thinking about this. I haven’t hit the limits, or received notifications that I’m close to them, for the free open source tier so commit away. I believe public repos don’t even have limits but I’m not 100% confident on that. I probably should look at making the build process a bit more efficient but for now it is what it is.

zarganum · 2024-07-15T14:54:54Z

I added some more test assertions. It appears that Python (3.11 in my case) is interning byte sequences returned by krb5 calls, at least the deserialized ticket byte object appears the same as it was before serialization. However I assume the initial copies still remain in the runtime process memory and will get freed eventually in __dealloc__. They are just not used by Python anymore.

jborean93

Thanks for implementing this and adding the tests. I only have two things that really need addressing;

Keeping the API aligned with the krb5_* function names
Looking at a different free/deallocation option for unmarshal_credentials

src/krb5/_creds_mit.pyi

src/krb5/_creds_mit.pyx

tests/test_creds.py

jborean93 · 2024-07-15T19:29:40Z

src/krb5/_creds_mit.pyx

+    err = krb5_unmarshal_credentials(context.raw, &data_raw, &creds_raw)
+
+    if creds_raw:
+        # creds_raw was calloc'ed for krb5_creds structure


I'm somewhat weary about using free here explicitly. The docs for krb5_unmarshal_credentials state that

Use krb5_free_creds() to free creds_out when it is no longer needed.

I can see the actual implementation essentially calls krb5_free_creds_content and then calls free. https://github.com/krb5/krb5/blob/354f176ba6d6cc544e1c15712a13f9c006ca605d/src/lib/krb5/krb/kfree.c#L213-L219

While this replicates that same logic that's more of an implementation detail and not a public API. We should try to avoid freeing data using free if we don't control how it was allocated.

Instead I'm wondering if we should add another cdef field on Creds that can be used to control how the struct is deallocated to work with this new scenario. For example (the field it actually defined in the pxd file, the comment is more for documentation)

cdef class Creds: # cdef Context ctx # cdef krb5_creds raw # cdef int needs_free # cdef int free_struct def __dealloc__(Creds self): if self.needs_free: if self.free_struct: krb5_free_creds(self.ctx.raw, &self.raw) else: krb5_free_cred_contents(self.ctx.raw, &self.raw) self.needs_free = 0

The krb5_free_creds would also need to be defined in the file but seems like Heimdal has it so it shouldn't be a problem.

The other benefit of this is we don't need to copy the data we can just do something like

creds = Creds(context) ... err = krb5_unmarshal_credentials(context.raw, &data_raw, &creds.raw) if err: raise Krb5Error(context, err) creds.free_struct = 1 creds.needs_free = 1 return creds

Do you see any problem with this approach?

I agree I stepped onto thin ice here... The point is a pointer-to-pointer used by krb5_unmarshal_credentials(), it unconditioanaly allocates the structure using its k5alloc() wrapper and writes the allocated pointer to creds_out. There is just no way to not allocate some heap memory, so it has to be free'd.

https://github.com/krb5/krb5/blob/354f176ba6d6cc544e1c15712a13f9c006ca605d/src/lib/krb5/ccache/ccmarshal.c#L557

There are might be two ways to avoid shortcuts like mine, right after krb5_unmarshal_credentials()

Deep copy the new creds to our self.raw and deallocate everything just returned with krb5_free_creds() as original API prescribes. Our existing raw will contain a deep copy to be free'd with krb5_free_cred_contents() in __deallocate__.

Shallow copy the new creds to our self.raw, save the alloc'ed pointer as _raw_ptr and indeed call krb5_free_creds instead of krb5_free_cred_contents in __deallocate__. Our existing raw will remain the shallow copy to keep other things working.

The problem with (1) is that deep copy function krb5_copy_creds() behaves the same as krb5_unmarshal_credentials() and allocates the new structure.

Are you ok with having additional _raw_ptr member in Creds class?
Or any better ideas?

Ah I see, sorry I didn't see it was a krb5_creds** argument. I thought it was just krb5_creds*, now the API makes more sense.

src/krb5/_creds_mit.pyx

jborean93

I think I might revisit this in the future to try and avoid the memcpy and just handle raw based on whether the underlying Creds class was created with the krb5_creds or krb5_creds* value. That's a pretty large change as it'll require changes to everything that calls creds.raw so best to leave it to another PR.

But that shouldn't hold off this PR as what you have works and passes in CI. Appreciate you working on this and contributing to this library.

zarganum added 2 commits July 13, 2024 15:52

creds de/serialization

2b21a07

linting

8519cab

test: add essential equality checks

eb38ca4

jborean93 reviewed Jul 15, 2024

View reviewed changes

zarganum added 5 commits July 15, 2024 21:08

match original naming convention

4677881

support for safe krb5_free_creds()

4514ccb

amend serialization test

8b084d1

removed cimport free

c1f86bc

fixed name convention

0bfb065

jborean93 approved these changes Jul 15, 2024

View reviewed changes

jborean93 merged commit 8f7016e into jborean93:main Jul 15, 2024
31 checks passed

zarganum deleted the serialization branch August 26, 2024 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creds serialization and deserialization #45

Creds serialization and deserialization #45

zarganum commented Jul 13, 2024 •

edited

Loading

zarganum commented Jul 14, 2024 •

edited

Loading

jborean93 commented Jul 15, 2024

zarganum commented Jul 15, 2024 •

edited

Loading

zarganum commented Jul 15, 2024

jborean93 commented Jul 15, 2024

zarganum commented Jul 15, 2024 •

edited

Loading

jborean93 left a comment

jborean93 Jul 15, 2024

zarganum Jul 15, 2024 •

edited

Loading

jborean93 Jul 15, 2024

jborean93 left a comment •

edited

Loading

Creds serialization and deserialization #45

Creds serialization and deserialization #45

Conversation

zarganum commented Jul 13, 2024 • edited Loading

zarganum commented Jul 14, 2024 • edited Loading

jborean93 commented Jul 15, 2024

zarganum commented Jul 15, 2024 • edited Loading

zarganum commented Jul 15, 2024

jborean93 commented Jul 15, 2024

zarganum commented Jul 15, 2024 • edited Loading

jborean93 left a comment

Choose a reason for hiding this comment

jborean93 Jul 15, 2024

Choose a reason for hiding this comment

zarganum Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

jborean93 Jul 15, 2024

Choose a reason for hiding this comment

jborean93 left a comment • edited Loading

Choose a reason for hiding this comment

zarganum commented Jul 13, 2024 •

edited

Loading

zarganum commented Jul 14, 2024 •

edited

Loading

zarganum commented Jul 15, 2024 •

edited

Loading

zarganum commented Jul 15, 2024 •

edited

Loading

zarganum Jul 15, 2024 •

edited

Loading

jborean93 left a comment •

edited

Loading