Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement serialization and deserialization for file dict #2

Merged
merged 2 commits into from
Mar 1, 2023

Conversation

asikowitz
Copy link
Collaborator

Doing the large-pr in acryldata flow, but in your personal branch.

I opted to make serializer and deserializer required arguments, because adding defaults as json.dumps and json.loads respectively would add extra overhead for basic FileBackedDict[str] or FileBackedDict[int] objects, when the identify function would be sufficient. I don't think it's too onerous to have to specify your serializer and deserializer, even if it's just the identity function or json functions, and in general something the user should be thinking about when they're using FileBackedDict

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@asikowitz asikowitz requested a review from hsheth2 March 1, 2023 17:52
Comment on lines +125 to +129
n_deleted = self._conn.execute(
"DELETE FROM data WHERE key = ?", (key,)
).rowcount
if not in_cache and not n_deleted:
raise KeyError(key)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing self[key] will cause a deserialization call, so I reimplemented using the rowcount returned by the delete call

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Comment on lines +159 to +160
def __del__(self) -> None:
self.close()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case we forget to close, do it on garbage collection. Have confirmed close() can be called multiple times

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool thanks

@asikowitz asikowitz merged commit a0c6d85 into hsheth2:file-dict Mar 1, 2023
@asikowitz asikowitz deleted the file-dict-serialization branch March 1, 2023 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants