-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
advanced questions for join
tests
#18
Comments
We will also need a join on multiple columns (similar to multi-column group and sort). |
I pushed draft of join questions.
The list did not covered the cardinality/duplicates. At the current moment all fields used in join have no duplicates. We should consider adding questions for joining on fields that contains duplicates. Data is ready for that. Lines 26 to 34 in 00c8ae2
|
From the 7 questions proposed above, 5 are going to be categorised as
|
note to fix Line 51 in 39fee2f
should be removed when chk amended. |
join task for 5 basic questions has been implemented.
|
Add Datafusion solution
Presently
join
tests are made on 2 integer columns tables, equal size, inner join on single column. It is because it was difficult to achieve good random numbers for 1e10 datasets used before. Now we won't go beyond 1e9 so we can easily use another set of data.Based on the questions we want to answer in this tests we will pick/generate expected datasets.
Initial list of queries we might want to test listed below. We need to chose those which we want to have included in first iteration, rest will be left for future extensions. My picks are as follows.
Types of queries:
update on join
task #24)Types of fields:
Sizes of datasets:
Using different datasets will heavily complicate presenting benchmarks results (as this is another dimension to present on report). We can think how to overcome that.
Also we need to wisely choose subset of queries/fields/sizes as my current selection
4*2*3
gives 24 different questions, this multiply by 3 (1e7, 1e8, 1e9) and we have 72 tests. While current groupby tests has only 5*3 = 15 tests.@mattdowle
The text was updated successfully, but these errors were encountered: