Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AirbyteLib: Add len() support on SQL datasets and Mapping behaviors for ReadResult (#34763) #34763

Merged
merged 8 commits into from
Feb 2, 2024

Conversation

aaronsteers
Copy link
Collaborator

  1. Make ReadResult inherit from Mapping[str, CachedDataset], including support for .keys() and .len().
  2. Add len() supports on datasets.

Copy link

vercel bot commented Feb 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 2, 2024 5:59pm

@aaronsteers
Copy link
Collaborator Author

aaronsteers commented Feb 2, 2024

@flash1293 - This responds to some beta user feedback:

  1. Intuitively expecting ReadResult to support a .keys() method. (It didn't but now it would.)
  2. Responding to the weirdness of len(list(result["users"])) requiring a nested list() call. Now __len__() is implemented so that len(result["users"]) works as expected.

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense as convenience functions - I'm worried a little bit about len being called unexpectedly by something on a lazy dataset and taking a long time to finish, but it's probably OK as it's consistent.

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there are some problems with the implementation, left notes.

airbyte-lib/airbyte_lib/datasets/_sql.py Outdated Show resolved Hide resolved
airbyte-lib/airbyte_lib/datasets/_base.py Outdated Show resolved Hide resolved
@aaronsteers
Copy link
Collaborator Author

Makes sense as convenience functions - I'm worried a little bit about len being called unexpectedly by something on a lazy dataset and taking a long time to finish, but it's probably OK as it's consistent.

Yeah - I ran into the same concern upfront. But on reflecting more, if anything is explicitly requesting the num items, I think its okay to count them. 🤷 There are plenty of small sources where this is a reasonable thing to do. And for larger sources, it seems the user would be aware of the issue.

The reason I'd revert is if we find there's some implicit or unexpected call to len() that the user would not anticipate.

@aaronsteers aaronsteers requested a review from flash1293 February 2, 2024 17:45
@aaronsteers aaronsteers changed the title AirbyteLib: Misc DX improvements from beta feedback AirbyteLib: Add len() support on SQL datasets and Mapping behaviors for ReadResult (#34763) Feb 2, 2024
@aaronsteers aaronsteers merged commit 2bbeb4e into master Feb 2, 2024
19 checks passed
@aaronsteers aaronsteers deleted the aj/airbyte-lib/usage-feedback branch February 2, 2024 18:13
xiaohansong pushed a commit that referenced this pull request Feb 13, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 21, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
xiaohansong pushed a commit that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants