Store TPCH results in separate table #1506

fjetter · 2024-04-17T10:43:45Z

This cleans up a bit about our fixture infrastructure. A lot of this is likely subjective.

Most importantly, this stores the TPCH results additionally in a separate table which has a different schema and remembers more information about the test run. Most of this information is unnecessary for the ordinary test run

fjetter · 2024-04-17T10:44:56Z

tests/tpch/conftest.py

        client.register_plugin(TurnOnPandasCOW(), name="enable-cow")
-        with get_cluster_info(cluster), performance_report, benchmark_time:
+        with benchmark_all(client):
            yield client


@hendrikmakait this is the place where we are kicking off the timing. In this PR I am replacing this with a benchmark_all that is similar to what we're using in the rest of the suite

fjetter · 2024-04-17T10:46:39Z

benchmark_schema.py

@@ -62,3 +62,59 @@ class TestRun(Base):
    # Artifacts
    performance_report_url = Column(String, nullable=True)  # Not yet collected
    cluster_dump_url = Column(String, nullable=True)
+
+
+class TPCHRun(Base):


I found the information that is actually stored lacking for TPCH run comparisons. Particularly

polars/spark/duckdb versions

cluster spec information

query

scale

local

possibly more. Instead of squeezing this into the ordinary table I introduced a new one. This collides a little with our CI runs, the dashboard, the combine-db script, etc. but should be easily fixed. Haven't done so, yet.

From what I see, this has been taken care of.

This reverts commit 03e87e4.

fjetter · 2024-05-21T13:58:23Z

alembic/versions/e11cd1aaed38_add_cluster_spec_to_db.py

there is no particular reason for having two migrations other than me not wanting to go back invalidating everything I have stored locally.

fjetter · 2024-05-21T13:59:03Z

tests/tpch/conftest.py

            yield client


-@pytest.fixture(scope="module")
-def spark_setup(cluster, local):


This stuff just moved to test_pyspark since it isn't used anywhere else.

fjetter · 2024-05-21T14:00:08Z

tests/tpch/generate_plot.py

I think this entire file is very subjective. I haven't cleaned it up but I think it can be used later on if one wants to generate otehr plots.

@hendrikmakait in a follow up it may make sense to sync this with the colors, etc. that we ended up using

tests/tpch/utils.py

tests/tpch/test_polars.py

hendrikmakait

Thanks, @fjetter

fjetter added 20 commits April 8, 2024 18:01

new DB for tpch data

3f23823

tweak exec

dd5062e

export to csv

13c5533

prepare 10k

9a2cd98

use dask nanny restart fix

d8ca66c

submit dummy task to fool pyspark cluster

23a1297

adjust specs

5c724dd

fix for idleness

6078c55

comment out subset of tests

f0b274b

actually forward the client

f076491

add cluster spec to benchmark schema

322815f

pass client object as well

7369eb3

fix spark test setup

22dbdf2

do not cancel

d86279f

use larger cluster for 10k

c89710d

run all dask tests

755efd3

larger cluster for spark

7114874

give spark more disk

06355e2

skip 5 and 18

9e6fb0c

delete notebooks

2d6b659

fjetter commented Apr 17, 2024

View reviewed changes

fjetter added 9 commits April 17, 2024 12:53

skip 7

0383e0f

remove chart

a8da566

work around incident

10278e1

use same spec as dask

8cea348

use 8xlarge

0a1c8a9

use different cluster config

b9f8bd8

mount additional disk on workers

28ed195

re-enable the https thing again

d3a8422

run in benchmark account

9707cb0

fjetter added 13 commits April 18, 2024 08:09

skip 1 through 9

227ad16

skip spark 10 to 18

999e83c

use fresh cluster

03e87e4

Revert "use fresh cluster"

b91f1c9

This reverts commit 03e87e4.

don't run ordinary tests

d9a24a5

run in benchmark account

b5bb85f

reenable duckdb and spark

60b5bf4

bump polars version

2873aac

better plotting script

198a4d7

set shuffle auto option for pyspark

2cb7773

use main distributed

94a45ee

use fixture

e5c9ef4

fix cluster name

f003af0

hendrikmakait mentioned this pull request Apr 25, 2024

Enable SF 10,000 in Github Actions #1511

Merged

fjetter added 2 commits April 25, 2024 17:47

set config at runtime

593c2f7

combine dbs for new table

c0ff40b

fjetter force-pushed the new_db_tpch branch from ba0e30d to c0ff40b Compare May 3, 2024 13:37

clean up

23a56e2

fjetter changed the title ~~New db tpch~~ Store TPCH results in separate table May 21, 2024

fjetter added 2 commits May 21, 2024 15:53

revert requirements file

69461c1

Merge remote-tracking branch 'origin/main' into new_db_tpch

86e84b6

fjetter marked this pull request as ready for review May 21, 2024 13:55

fjetter added 2 commits May 21, 2024 15:56

fix single host cluster spec

181b994

remove skips from polars

3a4566e

fjetter commented May 21, 2024

View reviewed changes

tests/tpch/test_polars.py Outdated Show resolved Hide resolved

todos

d1ab4c1

hendrikmakait approved these changes May 27, 2024

View reviewed changes

hendrikmakait merged commit 4e3ec1a into main May 27, 2024
13 of 15 checks passed

hendrikmakait deleted the new_db_tpch branch May 27, 2024 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store TPCH results in separate table #1506

Store TPCH results in separate table #1506

fjetter commented Apr 17, 2024 •

edited

Loading

fjetter Apr 17, 2024

fjetter Apr 17, 2024

hendrikmakait May 27, 2024

fjetter May 21, 2024

fjetter May 21, 2024

fjetter May 21, 2024

hendrikmakait left a comment

Store TPCH results in separate table #1506

Store TPCH results in separate table #1506

Conversation

fjetter commented Apr 17, 2024 • edited Loading

fjetter Apr 17, 2024

Choose a reason for hiding this comment

fjetter Apr 17, 2024

Choose a reason for hiding this comment

hendrikmakait May 27, 2024

Choose a reason for hiding this comment

fjetter May 21, 2024

Choose a reason for hiding this comment

fjetter May 21, 2024

Choose a reason for hiding this comment

fjetter May 21, 2024

Choose a reason for hiding this comment

hendrikmakait left a comment

Choose a reason for hiding this comment

fjetter commented Apr 17, 2024 •

edited

Loading