-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sys feature, and rename id/random columns #28
Conversation
It feels like the logic should be opposite - the fields should be excluded by default everywhere and user can enable these in some specific API calls like from_storage(sys=True) or from_dataset(sys=True) |
We have a lot of internals that depend on it. Eg: all of the indexing queries, udf, join require We use Another thing is, we try to preserve data as much as possible: DatasetQuery(name="test").save() This preserves all of the columns including Similarly, |
Ideally, we need a notion of default columns, in which DatasetQuery.select(C.id, C.random, *query.columns).results()
# contains `id` and `random` columns DatasetQuery.results()
# excludes `id` and `random` column But this is very hard to differentiate at the query level, as we might get columns from other queries, or tables. |
I'm talking about user facing API.
Practically, it means, the request should be like If we decide to do that in the lower level calls (which is optional until we can make it in map()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! thank you.
with super().select(*db_signals).as_iterable() as rows: | ||
yield from rows | ||
|
||
def results( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be collect_flatten()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we do for existing results
and to_records
?
self, row_factory: Optional[Callable] = None, **kwargs | ||
) -> list[tuple[Any, ...]]: | ||
rows = self.iterate_flatten() | ||
if row_factory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? It looks smart but the usage is not clear. If it's needed - we should probably introduce this in other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just trying to imitate what we do in DatasetQuery.results()
. It also makes it easier to implement to_records()
?
@@ -1515,30 +1506,35 @@ def offset(self, offset: int) -> "Self": | |||
query.steps.append(SQLOffset(offset)) | |||
return query | |||
|
|||
def as_scalar(self) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
Check individual commits.