Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creds serialization and deserialization #45

Merged
merged 8 commits into from
Jul 15, 2024

Conversation

zarganum
Copy link
Contributor

@zarganum zarganum commented Jul 13, 2024

Creds (de)serialization to/from the FILE ccache format (vesion 4) and KCM ccache protocol.

Use case: loading/storing of Creds objects in implementation-external cache memory, with TTL aligned to ticket validity.

MIT only:
krb5_marshal_credentials
krb5_unmarshal_credentials

Example usage:

bytes = krb5.marshal_credentials(ctx, creds)
uncreds = krb5.unmarshal_credentials(ctx, bytes)

@zarganum
Copy link
Contributor Author

zarganum commented Jul 14, 2024

  1. forgot to add one last line to test_creds_serialization():
    assert creds.ticket == uncreds.ticket
  2. for some reason gcc still warns on implicit declarations of krb5_marshal_credentials and krb5_unmarshal_credentials in mit tests... but each CI run costs 55 minutes, omg...

@jborean93
Copy link
Owner

The bulk of the time in CI is compiling all the macOS wheels. In the past there was no arm macOS image to use so compiling was quite time consuming. I believe GHA now has arm images we can use so there's probably some work that can be done to improve those times. Still I'm not seeing 55 minutes, the summary from the latest run is 10 minutes 22 seconds. Are you saying that adding that assertion line causes CI to run for that time? If so you might have to investigate what that is doing and why it might be taking so long locally.

@zarganum
Copy link
Contributor Author

zarganum commented Jul 15, 2024

Jordan, I appreciate your responsiveness and apologize for not being clear enough.
The last assertion I wanted to add confirms the ticket itself is properly deserialized and is just a binary block compare; it is expected to be submicro- or even nanoseconds.
This assertion is not a concern... the concern is a computational costs that Github bills you.

The CI duration is indeed shown as 10m 22s (astronomical time between start and end?).
But the usage is 55m 0s (~23m for MacOS wheels, ~26m for Ubuntu tests, plus ~5m for MacOS tests).

Aren't these 55 minutes representing the total computational cost billed by Github?
I am holding the "last line" mini-commit because don't want to ruin your budget.
Let me know if you are fine with that:)

Many thanks

@zarganum
Copy link
Contributor Author

you'd probably want to cut away the unsupported Python 3.7 right now, and plan for 3.8 EOL after October 2024.
https://devguide.python.org/versions/

@jborean93
Copy link
Owner

Ah no worries, thanks for your concern and thinking about this. I haven’t hit the limits, or received notifications that I’m close to them, for the free open source tier so commit away. I believe public repos don’t even have limits but I’m not 100% confident on that. I probably should look at making the build process a bit more efficient but for now it is what it is.

@zarganum
Copy link
Contributor Author

zarganum commented Jul 15, 2024

I added some more test assertions. It appears that Python (3.11 in my case) is interning byte sequences returned by krb5 calls, at least the deserialized ticket byte object appears the same as it was before serialization. However I assume the initial copies still remain in the runtime process memory and will get freed eventually in __dealloc__. They are just not used by Python anymore.

Copy link
Owner

@jborean93 jborean93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this and adding the tests. I only have two things that really need addressing;

  • Keeping the API aligned with the krb5_* function names
  • Looking at a different free/deallocation option for unmarshal_credentials

src/krb5/_creds_mit.pyi Show resolved Hide resolved
src/krb5/_creds_mit.pyx Outdated Show resolved Hide resolved
src/krb5/_creds_mit.pyx Outdated Show resolved Hide resolved
tests/test_creds.py Show resolved Hide resolved
tests/test_creds.py Show resolved Hide resolved
err = krb5_unmarshal_credentials(context.raw, &data_raw, &creds_raw)

if creds_raw:
# creds_raw was calloc'ed for krb5_creds structure
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat weary about using free here explicitly. The docs for krb5_unmarshal_credentials state that

Use krb5_free_creds() to free creds_out when it is no longer needed.

I can see the actual implementation essentially calls krb5_free_creds_content and then calls free. https://github.com/krb5/krb5/blob/354f176ba6d6cc544e1c15712a13f9c006ca605d/src/lib/krb5/krb/kfree.c#L213-L219

While this replicates that same logic that's more of an implementation detail and not a public API. We should try to avoid freeing data using free if we don't control how it was allocated.

Instead I'm wondering if we should add another cdef field on Creds that can be used to control how the struct is deallocated to work with this new scenario. For example (the field it actually defined in the pxd file, the comment is more for documentation)

cdef class Creds:
    # cdef Context ctx
    # cdef krb5_creds raw
    # cdef int needs_free
    # cdef int free_struct

    def __dealloc__(Creds self):
        if self.needs_free:
            if self.free_struct:
                krb5_free_creds(self.ctx.raw, &self.raw)
            else:
                krb5_free_cred_contents(self.ctx.raw, &self.raw)
            self.needs_free = 0

The krb5_free_creds would also need to be defined in the file but seems like Heimdal has it so it shouldn't be a problem.

The other benefit of this is we don't need to copy the data we can just do something like

creds = Creds(context)

...

err = krb5_unmarshal_credentials(context.raw, &data_raw, &creds.raw)
if err:
    raise Krb5Error(context, err)

creds.free_struct = 1
creds.needs_free = 1

return creds

Do you see any problem with this approach?

Copy link
Contributor Author

@zarganum zarganum Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree I stepped onto thin ice here... The point is a pointer-to-pointer used by krb5_unmarshal_credentials(), it unconditioanaly allocates the structure using its k5alloc() wrapper and writes the allocated pointer to creds_out. There is just no way to not allocate some heap memory, so it has to be free'd.

https://github.com/krb5/krb5/blob/354f176ba6d6cc544e1c15712a13f9c006ca605d/src/lib/krb5/ccache/ccmarshal.c#L557

There are might be two ways to avoid shortcuts like mine, right after krb5_unmarshal_credentials()

  1. Deep copy the new creds to our self.raw and deallocate everything just returned with krb5_free_creds() as original API prescribes. Our existing raw will contain a deep copy to be free'd with krb5_free_cred_contents() in __deallocate__.
  2. Shallow copy the new creds to our self.raw, save the alloc'ed pointer as _raw_ptr and indeed call krb5_free_creds instead of krb5_free_cred_contents in __deallocate__. Our existing raw will remain the shallow copy to keep other things working.

The problem with (1) is that deep copy function krb5_copy_creds() behaves the same as krb5_unmarshal_credentials() and allocates the new structure.

Are you ok with having additional _raw_ptr member in Creds class?
Or any better ideas?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, sorry I didn't see it was a krb5_creds** argument. I thought it was just krb5_creds*, now the API makes more sense.

src/krb5/_creds_mit.pyx Outdated Show resolved Hide resolved
Copy link
Owner

@jborean93 jborean93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might revisit this in the future to try and avoid the memcpy and just handle raw based on whether the underlying Creds class was created with the krb5_creds or krb5_creds* value. That's a pretty large change as it'll require changes to everything that calls creds.raw so best to leave it to another PR.

But that shouldn't hold off this PR as what you have works and passes in CI. Appreciate you working on this and contributing to this library.

@jborean93 jborean93 merged commit 8f7016e into jborean93:main Jul 15, 2024
31 checks passed
@zarganum zarganum deleted the serialization branch August 26, 2024 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants