-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: revise point cloud plumbing and add unit tests #205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I like how it is structured. My only nit would be that I would subclass UMAP from Projectors. But I understand what you're going for and this might be the design decision.
I see you making changes so let me know when it's ready so I give a final look and approve
|
||
Identifier = TypeVar("Identifier", bound=Hashable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need for this generic? trying to understand what you mean by an identifier here. A doc-string could help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly a placeholder. I see that we have used uuid
(which is a str
) as an identifier for the prediction_id
, so it could be just a str
in the dict key, but it's a dynamic language, so it could work if the prediction_id
is a number (e.g. row number). Making it generic here is just to keep the typing flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - I think we should probably consider not over abstracting but this is neat.
|
||
Identifier = TypeVar("Identifier", bound=Hashable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - I think we should probably consider not over abstracting but this is neat.
|
||
Parameters | ||
---------- | ||
data : mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data : mapping | |
data : Mapping[Identifier, Vector] |
Does this not corralate with the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. i was looking for examples from other code bases but haven't found any, so kind of just gave up. i wasn't sure if these types variables belong in a docstring since they're local constructs relevant only to the type checker
|
||
|
||
@dataclass | ||
class MockClustersFinder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice dependency injection.
n_components, | ||
) | ||
assert projections.shape == (samp_size, n_components) | ||
assert abs(projections.mean()) < 1e-5 # is centered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is is deterministic? Kinda scary if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i opted for a randomized input here so we don't need to create fixture, so it's all random for now
@pytest.mark.parametrize("samp_size,n_features,n_components", [(10, 8, 5), (15, 7, 3)]) | ||
def test_umap(samp_size: int, n_features: int, n_components: int) -> None: | ||
a = np.random.rand(n_features, n_features) | ||
projections = Umap().project( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if this is testing too much of UMAP itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it probably is, and we can see that from the runtime (about 8 seconds). But the other intent here is to have a piece of live code available for type checking and to run debugger on Umap itself so we can understand it better. So I kept it mostly for the that purpose, instead of having it be a real test per se. (The random matrix generator is also handy for other developments.) Eventually I don't see this test being useful in the long term.
resolves #169
This PR refactors
PointCloud
into a stand-alone process by inverting its dependencies, removing non-essential responsibilities and adding unit tests.Data filters and sampling will be addressed more comprehensively in a future PR