Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook/fix vector search #448

Merged
merged 1 commit into from
Jul 18, 2023
Merged

Notebook/fix vector search #448

merged 1 commit into from
Jul 18, 2023

Conversation

blythed
Copy link
Collaborator

@blythed blythed commented Jul 15, 2023

Description

The vector-index notebooks were broken after the big refactoring which pivoted all Component subclasses
to @dataclass. This PR provides the necessary changes.

Related Issue(s)

This points towards the need to smoke-test at least some of the notebooks #389.

@blythed blythed requested a review from thejumpman2323 July 15, 2023 13:51
@@ -16,6 +16,15 @@ class Component(Serializable):

variety: t.ClassVar[str]

def init(self, db):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@dc.dataclass
class Model(Component):
class Model(PredictMixin, Component):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model(Component, PredictMixin)

from superduperdb.encoders.torch.tensor import tensor

t = tensor(torch.float, shape=(64,))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove extra lines in this entire file

if isinstance(r, dict):
out = []
for k, v in r.items():
if isinstance(v, dict) and 'file_id' in v:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nitpick:
we can make these hardcoded strings as constants

return hash(self.artifact)
except TypeError as e:
if isinstance(self.artifact, list):
return hash(str(self.artifact[:100]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:100 magic number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to handle cases where something is not hashable.

db: 'BaseDatabase' = None, # type: ignore[name-defined]
select: t.Optional[Select] = None,
distributed: bool = False,
ids: t.Optional[t.List[str]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Sequence

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to cause problems with themypy.



def load_artifacts(d, getter, cache):
if isinstance(d, dict) or isinstance(d, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If isinstance(d, [dict, list])

serializer=v['serializer'],
)
d[k] = cache[v['file_id']]
elif isinstance(v, dict) or isinstance(v, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -36,6 +36,7 @@
# to the rest of the code.
# It should be moved to the Server's initialization code where it can be available to
# all threads.
from ...core.artifact import Artifact, get_artifacts, infer_artifacts
from ...core.job import FunctionJob, ComponentJob, Job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be consistant with the imports style

superduperdb/datalayer/base/database.py Outdated Show resolved Hide resolved
@nenb nenb self-requested a review July 18, 2023 18:47
@blythed blythed merged commit 71a8825 into superduper-io:main Jul 18, 2023
@blythed blythed deleted the notebook/fix-vector-search branch July 25, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants