-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
always include sys signals #81
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM so far. I guess it makes sense to handle this only in DataChain
but then we should try to remove all special handling for sys columns from DatasetQuery
.
I don't like that _select()
and select()
are so similar - it would probably be better to add an undocumented flag, but feel free to clean-up however you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An question is inline. Please answer.
In general, that's very interesting idea but I need time to think a bit more about this. What i'm not sure about right now:
- id & rand are technical signals that are required for map and shuffle. I'm not sure we need to expose this to app level. it feels like it belongs to DB level.
- this approach requires downloading ID & RAND all the time. In some cases it triples amount of data to transfer. In DB level it can be easily optimized.
- An open question: can we make RAND optional? So, it's created only when user asks. It will require separate datasets and shuffled/randomized dataset.
It might be hard to do in db level, due to subqueries. We won't always know if
|
We do know! It should be added in 100% of the cases 🙂 |
This PR
DatasetQuery.results()
andDatasetQuery.to_records()
todb_results()
/to_db_records()
.select(...)
, and removes them onas_iterable
.TODO