Add pg stats #2

preetansh · 2020-10-29T03:42:51Z

Welcome to the PR tracker for NoisePage! We're excited that you're interested in improving our system.

Before continuing with opening a PR, please read through the Pull Request Process page on our wiki. PRs that do not follow our guidelines will be immediately closed. In general, you should avoid creating a PR until you are reasonably confident the tests should pass, having tested locally first.

Please choose the appropriate labels on the Github panel and feel free to assign yourself. Additionally, if your PR solves an open issue, please link the issue on the Github panel. However, please DO NOT assign any reviewers to your PR. We will decide who best to assign for a review.

Heading

Please choose an appropriate heading for your PR, relevant to the changes you have made. For example, if your PR addresses the LIMIT clause on IndexScans, an appropriate PR name would be Index Scan Limit.

Description

Please create a description of the issue your PR solves, and how you went about implementing your solution. An example from a PR by @thepinetree follows:

Limit clauses are currently not propagated to the IndexScanPlanNode in the optimizer and as a result, the execution engine can't take advantage of the limit during operation. Instead, this is done in-post, with a LimitPlanNode doing so after the index scan is completed.

This PR adds functionality for the limit value to be pushed down to an index scan, and is used in TPC-C. Limits values will be pushed down to their child LogicalGet via transformation rule and converted to values in the PhysicalIndexScan which are then set in the IndexScanPlanNode. To appropriately act on the Limit value, we also add infrastructure for optional properties for a child to satisfy, which is tracked only in an Optimizer node. The PR also moves the OrderByOrderingType from the optimizer to the catalog as a precursor to further changes to involve the sort direction of columns in creating/scanning an index.

Remaining Tasks

Again, you should only create PR once you are reasonably confident you are near completion. However, if there are some tasks still remaining before the PR is ready to merge, please create a checklist to track active progress. An example from a PR by @thepinetree follows:

📌 TODOs:

~~Stash limit in OptimizerContext for pushdown (INVALID)~~
Move ordering type to catalog
Add transformation rule for limit pushdown
Add optional property support
Fix memory leaks
Add GitHub issues for OrderingType investigation, physical prune stage, and TPL break statement

Performance

If your PR has the potential to greatly affect the performance of the system, please address these by benchmarking your changes with respect to master, or profiling the performance. You may do this in one of the following ways:

Inline a table outlining performance results. An example from a PR @gonzalezjo follows:

Machine Type	Terminals	Scale Factor	Socket Type	Transactions / Second
Bare metal	10	10	UNIX	8346 (+34%)
Bare metal	10	10	INET	6214
Bare metal	1	1	UNIX	914 (+37%)
Bare metal	1	1	INET	668
VMware	10	10	UNIX	3728 (+24%)
VMware	10	10	INET	3005
VMware	1	1	UNIX	289 (+28%)
VMware	1	1	INET	226

Create a Google Sheets document noting baseline performance and scalability, as is done here in an example from @mbutrovich
Create an SVG of profiling results per the EC2 profiling instructions, as is done here in an example from @thepinetree

Further Work

If your PR unlocked the potential for further improvement, please note them here and create additional issues! Do the same if you discovered bugs in the process of development. An example from a PR by @gonzalezjo follows:

Investigating loopback and TCP overhead

This is probably a dead end, but less of a dead end than libevent/epoll stuff.

I'd like to work on improving our INET socket overhead, but at this point, that might not be doable without being a kernel engineer. Nothing I tried measurably reduced loopback and/or INET overhead, and I tried a lot. Still, I think it's worth digging some more.

One question I have is how much of the speedup comes from avoiding TCP, and how much comes from using a glorified pipe instead of loopback. This would be interesting to know, but I have no idea how I'd measure, and I'm not convinced that I'd be able to make anything useful from the answer.

Here's an empty template to format yourself!

Heading

Description

Remaining tasks

Foo
Bar
Baz

Performance

Further work

…atalog

Probably needs to be tested

Problem: dropping tables doesn't seem to work. Is this our fault? * Fix more copypaste errors * More copy paste bugs!!!!!!!!!!!

This has been sitting around on my computer for a while, and I don't remember much so I can't write a meaningful commit message

Hopefully this is the final boss of copy-paste bugs. Sanity check added to bwtree_index, so that future variations of this bug can be caught more easily (when deferred deletes fail, they don't do so with a stack trace).

Ensures that pg_statistic column is inserted and deleted as appropriate, and that GetTableStats() successfully uses the statistics stored in that row. Is this the right place to put such a test?

* Add pg_statistic hardcoding to recovery_manager * Fix brace placement in stats_catalog_test * Reduce copying in DeleteColumnStatistics

gengkev and others added 14 commits April 28, 2020 03:34

Initial work to create pg_statistic catalog

f97f7b2

Add TODOs for future updates

2ad1dc7

Add extra field numrows

c3e283a

create index entry and set table pointer when creating pg_statistic c…

78e5dc0

…atalog

Add and delete from pg_statistic for table columns

451b08f

Probably needs to be tested

Fix copypaste bugs, call DeleteColumnStatistics

b9add47

Problem: dropping tables doesn't seem to work. Is this our fault? * Fix more copypaste errors * More copy paste bugs!!!!!!!!!!!

Move statistics code to bottom of database_catalog

1d179cc

Add template instantiation for statistic functions

628e5c0

Implement GetTableStats (?)

99f518e

This has been sitting around on my computer for a while, and I don't remember much so I can't write a meaningful commit message

Add sanity check to bwtree && fix another bug

cf13d54

Hopefully this is the final boss of copy-paste bugs. Sanity check added to bwtree_index, so that future variations of this bug can be caught more easily (when deferred deletes fail, they don't do so with a stack trace).

Add catalog test for pg_statistic

7bf72c7

Ensures that pg_statistic column is inserted and deleted as appropriate, and that GetTableStats() successfully uses the statistics stored in that row. Is this the right place to put such a test?

Fix unused warning on release build

110f05a

Fix small issues with tests

4eef964

* Add pg_statistic hardcoding to recovery_manager * Fix brace placement in stats_catalog_test * Reduce copying in DeleteColumnStatistics

Merge branch 'master' into AddPgStats

e2a3203

preetansh closed this Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pg stats #2

Add pg stats #2

preetansh commented Oct 29, 2020

Add pg stats #2

Add pg stats #2

Conversation

preetansh commented Oct 29, 2020

Heading

Description

Remaining Tasks

Performance

Further Work

Investigating loopback and TCP overhead

Heading

Description

Remaining tasks

Performance

Further work