Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store TPCH results in separate table #1506

Merged
merged 50 commits into from
May 27, 2024
Merged

Store TPCH results in separate table #1506

merged 50 commits into from
May 27, 2024

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Apr 17, 2024

This cleans up a bit about our fixture infrastructure. A lot of this is likely subjective.

Most importantly, this stores the TPCH results additionally in a separate table which has a different schema and remembers more information about the test run. Most of this information is unnecessary for the ordinary test run

client.register_plugin(TurnOnPandasCOW(), name="enable-cow")
with get_cluster_info(cluster), performance_report, benchmark_time:
with benchmark_all(client):
yield client
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hendrikmakait this is the place where we are kicking off the timing. In this PR I am replacing this with a benchmark_all that is similar to what we're using in the rest of the suite

@@ -62,3 +62,59 @@ class TestRun(Base):
# Artifacts
performance_report_url = Column(String, nullable=True) # Not yet collected
cluster_dump_url = Column(String, nullable=True)


class TPCHRun(Base):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the information that is actually stored lacking for TPCH run comparisons. Particularly

  • polars/spark/duckdb versions
  • cluster spec information
  • query
  • scale
  • local

possibly more. Instead of squeezing this into the ordinary table I introduced a new one. This collides a little with our CI runs, the dashboard, the combine-db script, etc. but should be easily fixed. Haven't done so, yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, this has been taken care of.

@fjetter fjetter changed the title New db tpch Store TPCH results in separate table May 21, 2024
@fjetter fjetter marked this pull request as ready for review May 21, 2024 13:55
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no particular reason for having two migrations other than me not wanting to go back invalidating everything I have stored locally.

yield client


@pytest.fixture(scope="module")
def spark_setup(cluster, local):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff just moved to test_pyspark since it isn't used anywhere else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this entire file is very subjective. I haven't cleaned it up but I think it can be used later on if one wants to generate otehr plots.

@hendrikmakait in a follow up it may make sense to sync this with the colors, etc. that we ended up using

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fjetter

@hendrikmakait hendrikmakait merged commit 4e3ec1a into main May 27, 2024
13 of 15 checks passed
@hendrikmakait hendrikmakait deleted the new_db_tpch branch May 27, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants