Skip to content

Conversation

ofek
Copy link
Contributor

@ofek ofek commented Oct 1, 2025

There are 2 commits:

  1. This sets up an extensible way to support encoding/decoding of custom types. At import time, modules containing custom types can register functions which the global hooks will use when that type is requested. This improves the previous way of importing all possible types within the global hooks.
  2. This refactors the ChangeSet class to be a bit simpler to reason about and fixes the typing.

@ofek ofek marked this pull request as ready for review October 5, 2025 04:38
@ofek ofek requested a review from a team as a code owner October 5, 2025 04:38
@ofek ofek force-pushed the ofek/msgspec-hooks branch from d6e488f to da57d5c Compare October 5, 2025 17:11

def __init__(self, changes: dict[Path, ChangedFile]) -> None:
self.__changes = MappingProxyType(changes)
def __init__(self, changed_files: Iterable[ChangedFile]) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it so the caller can directly instantiate the class without first forming a mapping which also simplifies the disk storage format so there is no longer a redundant top-level key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, definitely makes it easier to use 👍

@property
def changes(self) -> MappingProxyType[Path, ChangedFile]:
return self.__changes
def paths(self) -> MappingProxyType[str, ChangedFile]:
Copy link
Contributor Author

@ofek ofek Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This provides the previous interface of all changes being represented by a mapping of relative paths to the ChangedFile class and removes the requirement for pass-through dictionary methods. The keys are now strings because msgspec actually doesn't support any other types for dictionary keys which is why the decoding hook had to manually coerce the type.

A benefit to this is that callers possessing str instances no longer have to import the Path class for membership checks.

return cls(changes)

def __or__(self, other: Self) -> Self:
return type(self)(chain(self.files, other.files))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't really find a way to satisfy this other than having a classmethod (like the removed from_iter) that returns an instance of cls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems very clean to me like this anyway, good trick !

assert seen.path == expected_change.path
assert seen.type == expected_change.type
assert seen.patch == expected_change.patch
assert actual.paths == expected.paths
Copy link
Contributor Author

@ofek ofek Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relying on the mapping's equality check is equivalent, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the two are functionally equivalent - and probably faster.
However, individually comparing each file and each file's attribute allowed for easier debugging, since you get a lower amount of output on the stdout (only the actually differing part gets printed).

Copy link
Contributor

@Ishirui Ishirui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this new mechanism for registering hooks !

from collections.abc import Callable


class Hooks(Struct, frozen=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
class Hooks(Struct, frozen=True):
class Hook(Struct, frozen=True):

It feels very weird to me to have a plural class name, usually I prefer reserving the plural version to refer to collections of instances of that class.
Although here I guess we have an encode hook + a decode hook, ergo "hooks" - still I think it makes sense to consider the (encode hook, decode hook) pair as a single "Hook" object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to the singular version.

self.__changes = MappingProxyType(changes)
def __init__(self, changed_files: Iterable[ChangedFile]) -> None:
self.__changed = MappingProxyType({str(c.path): c for c in changed_files})
self.__files = tuple(self.__changed.values())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a tuple instead of a set ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to maintain the order.

return cls(changes)

def __or__(self, other: Self) -> Self:
return type(self)(chain(self.files, other.files))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems very clean to me like this anyway, good trick !

Comment on lines 162 to 184
@cached_property
def __change_types(
self,
) -> dict[Literal[ChangeType.ADDED, ChangeType.MODIFIED, ChangeType.DELETED], MappingProxyType[str, ChangedFile]]:
added: dict[str, ChangedFile] = {}
modified: dict[str, ChangedFile] = {}
deleted: dict[str, ChangedFile] = {}
for change in self.files:
if change.type == ChangeType.ADDED:
added[str(change.path)] = change
elif change.type == ChangeType.MODIFIED:
modified[str(change.path)] = change
elif change.type == ChangeType.DELETED:
deleted[str(change.path)] = change
else: # no cov
msg = f"Unexpected change type: {change.type}"
raise ValueError(msg)

return {
ChangeType.ADDED: MappingProxyType(added),
ChangeType.MODIFIED: MappingProxyType(modified),
ChangeType.DELETED: MappingProxyType(deleted),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not just sort into buckets dynamically ?

Suggested change
@cached_property
def __change_types(
self,
) -> dict[Literal[ChangeType.ADDED, ChangeType.MODIFIED, ChangeType.DELETED], MappingProxyType[str, ChangedFile]]:
added: dict[str, ChangedFile] = {}
modified: dict[str, ChangedFile] = {}
deleted: dict[str, ChangedFile] = {}
for change in self.files:
if change.type == ChangeType.ADDED:
added[str(change.path)] = change
elif change.type == ChangeType.MODIFIED:
modified[str(change.path)] = change
elif change.type == ChangeType.DELETED:
deleted[str(change.path)] = change
else: # no cov
msg = f"Unexpected change type: {change.type}"
raise ValueError(msg)
return {
ChangeType.ADDED: MappingProxyType(added),
ChangeType.MODIFIED: MappingProxyType(modified),
ChangeType.DELETED: MappingProxyType(deleted),
}
@cached_property
def __change_types(
self,
) -> dict[ChangeType, MappingProxyType[str, ChangedFile]]:
tmp: defaultdict[ChangeType, dict[str, ChangedFile]] = defaultdict(dict)
for change in self.files:
tmp[change.type][str(change.path)] = change
return {k: MappingProxyType(v) for k, v in tmp.items()}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, don't know what I was thinking. Thanks!


def __init__(self, changes: dict[Path, ChangedFile]) -> None:
self.__changes = MappingProxyType(changes)
def __init__(self, changed_files: Iterable[ChangedFile]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, definitely makes it easier to use 👍

assert seen.path == expected_change.path
assert seen.type == expected_change.type
assert seen.patch == expected_change.patch
assert actual.paths == expected.paths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the two are functionally equivalent - and probably faster.
However, individually comparing each file and each file's attribute allowed for easier debugging, since you get a lower amount of output on the stdout (only the actually differing part gets printed).

Comment on lines +80 to +83
@property
def modified(self) -> MappingProxyType[str, ChangedFile]:
"""Set of files that were modified."""
return self.__change_types[ChangeType.MODIFIED]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should convert our keys to Path objects within these methods, if the expectation is that our callers will mostly be using Paths instead of raw strs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so then they can always str(p) whereas membership checks for callers that only have str objects would require an import.

@ofek ofek merged commit 695d20c into main Oct 6, 2025
20 checks passed
@ofek ofek deleted the ofek/msgspec-hooks branch October 6, 2025 17:34
github-actions bot pushed a commit that referenced this pull request Oct 6, 2025
* Instate msgspec type registration mechanism

* Update ChangeSet API

* address review 695d20c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants