Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor datatype with simpler design #2647

Merged

Conversation

blythed
Copy link
Collaborator

@blythed blythed commented Nov 26, 2024

Main changes in superduper/component/datatype.py.

  • Remove the composable datatype = Datatype('my-dt', encoder=func_1, decoder=func_2) (now just subclass)
  • Remove the complexity and logical nested-ness of Encodable
  • Moved all bytes encoding to the Schema, disabling inline encoding without a Schema
  • Removes much of the complexity from Document.encode output since encodables are no longer needed

@blythed blythed force-pushed the refactor/2634/create-inbuilt-datatype branch 2 times, most recently from 91c6329 to a667d6f Compare November 26, 2024 11:45
@blythed blythed changed the title Refactor types to use simpler design Refactor datatype with simpler design Nov 26, 2024
@blythed blythed force-pushed the refactor/2634/create-inbuilt-datatype branch 21 times, most recently from fa4881f to 2f516fe Compare November 26, 2024 18:09
@blythed blythed marked this pull request as ready for review November 26, 2024 18:11
@blythed blythed force-pushed the refactor/2634/create-inbuilt-datatype branch 2 times, most recently from c955598 to d8c1197 Compare November 26, 2024 18:17
superduper/base/document.py Outdated Show resolved Hide resolved
superduper/components/component.py Outdated Show resolved Hide resolved
Comment on lines 42 to 45
_fields = {
'object': dill_serializer,
'postprocess': dill_serializer,
'preprocess': dill_serializer,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be t.ClassVar. There are similar cases in other areas as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ClassVar by default.

superduper/components/datatype.py Outdated Show resolved Hide resolved
superduper/components/model.py Outdated Show resolved Hide resolved
superduper/components/model.py Outdated Show resolved Hide resolved
superduper/components/component.py Show resolved Hide resolved
@blythed blythed force-pushed the refactor/2634/create-inbuilt-datatype branch 2 times, most recently from 98bd8d2 to 0302e5b Compare November 27, 2024 08:56
@@ -161,8 +161,7 @@ def _fit_with_dataloaders(
self.append_metrics(all_metrics)
self.log(fold='VALID', iteration=iteration, **all_metrics)
if self.saving_criterion():
db.replace(model, upsert=True)
self.changed.update({'all_metrics', 'optimizer_state'})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional? @blythed

)
old_uuid = info['uuid']
except FileNotFoundError:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a warning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -328,6 +366,9 @@ def create_jobs(

depends = [d for d in attr.depends if d in triggers]

# if attr_name == 'validate_in_db':
# import pdb; pdb.set_trace()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove


# :param shape: The shape of the vector
# :param identifier: The identifier of the vector
# """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lot of commented code

# :param shape: The shape of the array.
# :param bytes_encoding: The bytes encoding to use.
# :param encodable: The encodable to use.
# """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented code

@blythed blythed force-pushed the refactor/2634/create-inbuilt-datatype branch from 0302e5b to 534442c Compare November 27, 2024 10:03
@blythed blythed merged commit b39d5da into superduper-io:main Nov 27, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants