refactor: allow InMemory to take in non python based entries #3554

universalmind303 · 2024-12-12T01:28:52Z

No description provided.

codspeed-hq · 2024-12-12T01:41:32Z

CodSpeed Performance Report

Merging #3554 will degrade performances by 25.51%

_{Comparing universalmind303:refactor-lp-3 (84e96eb) with main (07752b8)}

Summary

❌ 2 regressions
✅ 25 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`universalmind303:refactor-lp-3`	Change
❌	`test_count[1 Small File]`	3.4 ms	3.9 ms	-11.79%
❌	`test_iter_rows_first_row[100 Small Files]`	159.5 ms	214.2 ms	-25.51%

codecov · 2024-12-12T18:16:59Z

Codecov Report

Attention: Patch coverage is 75.58685% with 104 lines in your changes missing coverage. Please review.

Project coverage is 77.93%. Comparing base (07752b8) to head (84e96eb).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/common/partitioning/src/lib.rs	18.36%	40 Missing ⚠️
src/daft-micropartition/src/partitioning.rs	67.64%	33 Missing ⚠️
src/daft-connect/src/translation/logical_plan.rs	78.57%	12 Missing ⚠️
...daft-connect/src/translation/logical_plan/range.rs	77.77%	8 Missing ⚠️
...-connect/src/translation/logical_plan/aggregate.rs	91.42%	3 Missing ⚠️
.../daft-connect/src/translation/logical_plan/drop.rs	91.66%	2 Missing ⚠️
...aft-connect/src/translation/logical_plan/filter.rs	80.00%	2 Missing ⚠️
...ft-connect/src/translation/logical_plan/project.rs	88.88%	1 Missing ⚠️
...daft-connect/src/translation/logical_plan/to_df.rs	94.11%	1 Missing ⚠️
...nnect/src/translation/logical_plan/with_columns.rs	95.00%	1 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3554      +/-   ##
==========================================
+ Coverage   77.86%   77.93%   +0.06%     
==========================================
  Files         719      720       +1     
  Lines       88459    88478      +19     
==========================================
+ Hits        68877    68953      +76     
+ Misses      19582    19525      -57

Files with missing lines	Coverage Δ
src/daft-connect/src/lib.rs	`62.88% <ø> (ø)`
src/daft-connect/src/op/execute/root.rs	`96.00% <100.00%> (+0.34%)`	⬆️
src/daft-connect/src/op/execute/write.rs	`78.04% <100.00%> (-0.27%)`	⬇️
src/daft-connect/src/session.rs	`100.00% <100.00%> (ø)`
src/daft-connect/src/translation/datatype/codec.rs	`47.02% <100.00%> (-0.32%)`	⬇️
...ect/src/translation/logical_plan/local_relation.rs	`91.42% <100.00%> (+29.76%)`	⬆️
.../daft-connect/src/translation/logical_plan/read.rs	`73.68% <100.00%> (ø)`
src/daft-connect/src/translation/schema.rs	`100.00% <ø> (ø)`
src/daft-local-execution/src/pipeline.rs	`88.78% <100.00%> (+0.08%)`	⬆️
src/daft-local-execution/src/run.rs	`89.69% <100.00%> (+0.16%)`	⬆️
... and 17 more

... and 3 files with indirect coverage changes

andrewgazelka

im a little confused in some areas. I think some docs could help

andrewgazelka · 2024-12-12T19:06:19Z

src/common/partitioning/src/lib.rs

+};
+
+/// Common trait interface for dataset partitioning, defined in this shared crate to avoid circular dependencies.
+/// Acts as a forward reference for concrete partition implementations. _(Specifically the `MicroPartition` type defined in `daft-micropartition`)_


still a little bit confused why this is a trait instead of us just being able to use MicroPartition directly. Maybe explain this more? Or are we expecting to expand more. add to docs maybe

It's pretty much the equivalent of a forward declaration. In daft-logical-plan, we don't have a concept of MicroPartition yet, and we generally dont want to because 'daft-micropartition' is a pretty heavy crate that we don't want part of the logical plan. We did a similar pattern with Expr::Subquery

src/common/partitioning/src/lib.rs

src/daft-local-execution/src/pipeline.rs

src/common/partitioning/src/lib.rs

samster25 · 2024-12-17T12:51:12Z

src/daft-connect/src/session.rs

@@ -10,6 +11,9 @@ pub struct Session {

    id: String,
    server_side_session_id: String,
+    /// MicroPartitionSet associated with this session
+    /// this will be filled up as the user runs queries
+    pub(crate) pset: Arc<MicroPartitionSet>,


iiuc MicroPartitionSet should only represent the result set of a single query. Should we have a Map of HashMap<key, MicroPartitionSetRef> instead?

so i could be abusing the MicroPartitionSet, but since MicroPartitionSet is already a batch of partitions pub partitions: DashMap<PartitionId, Vec<Arc<MicroPartition>>>,, it seemed redundant to have essentially HashMap<String, HashMap<String, Vec<MicroPartition>> this is what I was actually doing at first, but found there was currently no need for the outer hashmap. We'll likely need to refactor once we support distributed, but we're still a way away from that for spark.

so is it usually

GlobalHashMap<String, LocalHashMap<String, ...>> in a distributed setting?

…ctor-lp-3

andrewgazelka

I want to get this in as soon as possible even if it is not perfect because a lot of my work depends on this.

andrewgazelka · 2024-12-18T20:46:56Z

will be glad to help fix things in future in smaller PRs if there are any issues we run into

depends on #3554 [see here for proper diff](https://github.com/universalmind303/Daft/compare/refactor-lp-3...universalmind303:Daft:connect_show?expand=1)

i think this works

3273edc

github-actions bot added the refactor label Dec 12, 2024

renaming

b9c92c2

universalmind303 requested review from andrewgazelka and samster25 December 12, 2024 01:36

universalmind303 added 3 commits December 12, 2024 10:28

fix spark-connect

ff6d0a1

fix spark-connect

6812bde

fix spark-connect

d9ca5bb

universalmind303 marked this pull request as ready for review December 12, 2024 17:37

universalmind303 added 2 commits December 12, 2024 11:38

remove unused code

dbeab06

machete

e15124d

universalmind303 mentioned this pull request Dec 12, 2024

feat(connect): df.show #3560

Merged

andrewgazelka reviewed Dec 12, 2024

View reviewed changes

pr feedback

6df5fe2

universalmind303 requested a review from andrewgazelka December 12, 2024 20:16

samster25 approved these changes Dec 17, 2024

View reviewed changes

universalmind303 added 11 commits December 17, 2024 10:01

code cleanup

623c838

code cleanup

73214a4

pr feedback

2d3e7a3

Merge branch 'main' of https://github.com/Eventual-Inc/Daft into refa…

12e0205

…ctor-lp-3

add a test

b11d00e

add a test

3eb0c90

add a test

421c1ce

more comments

c2f525c

fix typo

7b4e8fa

code cleanup

754d805

code cleanup

84e96eb

andrewgazelka approved these changes Dec 18, 2024

View reviewed changes

universalmind303 merged commit 6602502 into Eventual-Inc:main Dec 18, 2024
39 of 41 checks passed

universalmind303 added a commit that referenced this pull request Dec 18, 2024

feat(connect): df.show (#3560)

ca4d3f7

depends on #3554 [see here for proper diff](https://github.com/universalmind303/Daft/compare/refactor-lp-3...universalmind303:Daft:connect_show?expand=1)

universalmind303 deleted the refactor-lp-3 branch January 23, 2025 06:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: allow InMemory to take in non python based entries #3554

refactor: allow InMemory to take in non python based entries #3554

universalmind303 commented Dec 12, 2024

codspeed-hq bot commented Dec 12, 2024 •

edited

Loading

codecov bot commented Dec 12, 2024 •

edited

Loading

andrewgazelka left a comment

andrewgazelka Dec 12, 2024

universalmind303 Dec 12, 2024

samster25 Dec 17, 2024

universalmind303 Dec 17, 2024

andrewgazelka Dec 17, 2024

andrewgazelka left a comment

andrewgazelka commented Dec 18, 2024 •

edited

Loading

refactor: allow InMemory to take in non python based entries #3554

refactor: allow InMemory to take in non python based entries #3554

Conversation

universalmind303 commented Dec 12, 2024

codspeed-hq bot commented Dec 12, 2024 • edited Loading

CodSpeed Performance Report

Merging #3554 will degrade performances by 25.51%

Summary

Benchmarks breakdown

codecov bot commented Dec 12, 2024 • edited Loading

Codecov Report

andrewgazelka left a comment

Choose a reason for hiding this comment

andrewgazelka Dec 12, 2024

Choose a reason for hiding this comment

universalmind303 Dec 12, 2024

Choose a reason for hiding this comment

samster25 Dec 17, 2024

Choose a reason for hiding this comment

universalmind303 Dec 17, 2024

Choose a reason for hiding this comment

andrewgazelka Dec 17, 2024

Choose a reason for hiding this comment

andrewgazelka left a comment

Choose a reason for hiding this comment

andrewgazelka commented Dec 18, 2024 • edited Loading

codspeed-hq bot commented Dec 12, 2024 •

edited

Loading

codecov bot commented Dec 12, 2024 •

edited

Loading

andrewgazelka commented Dec 18, 2024 •

edited

Loading