-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: allow InMemory to take in non python based entries #3554
refactor: allow InMemory to take in non python based entries #3554
Conversation
CodSpeed Performance ReportMerging #3554 will degrade performances by 25.51%Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im a little confused in some areas. I think some docs could help
src/common/partitioning/src/lib.rs
Outdated
}; | ||
|
||
/// Common trait interface for dataset partitioning, defined in this shared crate to avoid circular dependencies. | ||
/// Acts as a forward reference for concrete partition implementations. _(Specifically the `MicroPartition` type defined in `daft-micropartition`)_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still a little bit confused why this is a trait instead of us just being able to use MicroPartition directly. Maybe explain this more? Or are we expecting to expand more. add to docs maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty much the equivalent of a forward declaration. In daft-logical-plan
, we don't have a concept of MicroPartition
yet, and we generally dont want to because 'daft-micropartition' is a pretty heavy crate that we don't want part of the logical plan. We did a similar pattern with Expr::Subquery
src/daft-connect/src/session.rs
Outdated
@@ -10,6 +11,9 @@ pub struct Session { | |||
|
|||
id: String, | |||
server_side_session_id: String, | |||
/// MicroPartitionSet associated with this session | |||
/// this will be filled up as the user runs queries | |||
pub(crate) pset: Arc<MicroPartitionSet>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc MicroPartitionSet
should only represent the result set of a single query. Should we have a Map of HashMap<key, MicroPartitionSetRef>
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i could be abusing the MicroPartitionSet
, but since MicroPartitionSet
is already a batch of partitions pub partitions: DashMap<PartitionId, Vec<Arc<MicroPartition>>>,
, it seemed redundant to have essentially HashMap<String, HashMap<String, Vec<MicroPartition>>
this is what I was actually doing at first, but found there was currently no need for the outer hashmap. We'll likely need to refactor once we support distributed, but we're still a way away from that for spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is it usually
GlobalHashMap<String, LocalHashMap<String, ...>>
in a distributed setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to get this in as soon as possible even if it is not perfect because a lot of my work depends on this.
will be glad to help fix things in future in smaller PRs if there are any issues we run into |
depends on #3554 [see here for proper diff](https://github.com/universalmind303/Daft/compare/refactor-lp-3...universalmind303:Daft:connect_show?expand=1)
No description provided.