tpcds-run-tool

This is a very basic set of scripts to automate the tasks of

Downloading/compiling the server (MariaDB, MySQL and PostgreSQL are supported)
Generation of TPC-DS dataset and its read queries (that is, this is not the whole benchmark)
Loading the dataset and running the queries.

Geting TPC-DS to run

MySQL/MariaDB cannot run all of the TPC-DS queries (See https://jira.mariadb.org/browse/MDEV-17802 for details). The scripts here patch tpc-ds-tool:

See

In order for SUM(query_time) value to be comparable, PostgreSQL also only runs the queries that I could get to work with MariaDB.

Look at (or run) setup_server/setup-os-ubuntu.sh to see what packages need to be installed.

Just run run-all-mariadb-10.4.sh to do all the steps with MariaDB 10.4.

Run run-all-pg.sh to do all the steps with PostgreSQL 11.

The results are saved in my_tpcds_result table in tpcds database. The table is defined as follows:

create table my_tpcds_result(
  query_name varchar(64),
  query_stream varchar(64),
  query_start datetime,
  query_time_ms bigint
);

Here:

query_stream is number of query streams. TPC-DS generates N (we use 10) streams, each stream has instances of all 100 queries.
query_name is the name of the query template (queryNNN.tpl)
query_start - datetime when query started running. This is only for diagnostic.
query_time_ms - how much the query took, in milliseconds.

This way, the total time it took to run the query stream can be computed as follows:

select 
  query_stream, count(*), sum(query_time_ms) 
from
  my_tpcds_result
group by
  query_stream;

and one can also get details about individual query run times.

Random query generator seed for the dataset seems to affect execution time A LOT.

It is currently hard-coded in mariadb-tpcds-tooling2/01-generate-dataset-pg.sh and "mariadb-tpcds-tooling2/01-generate-dataset-pg.sh.