Skip to content

Commit

Permalink
Merge pull request ibis-project#3 from semelianova/ibis_test
Browse files Browse the repository at this point in the history
Ibis build, tests, benches
  • Loading branch information
gshimansky authored Jan 31, 2020
2 parents b3848f7 + 3447db8 commit a2d68e7
Show file tree
Hide file tree
Showing 32 changed files with 1,135 additions and 646 deletions.
76 changes: 74 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# Benchmarking scripts that are used to run OmniSciDB benchmarks in automated way in TeamCity and for performance analyzes in development cycle.

## Requirements
Scripts require the following python3 packages to be installed:
pymapd, braceexpand, mysql-connector-python. OmnisciDB server often
Scripts require to be installed:
* the following python3 packages: pymapd, braceexpand, mysql-connector-python;
* conda or miniconda for ibis tests and benchmarks.

OmnisciDB server often
requires a lot of open files, so it is a good idea to run it with
`ulimit -n 10000`.

Expand Down Expand Up @@ -91,3 +94,72 @@ Sample script command line:
```
python3 taxi/taxibench_pandas.py -df 2 -i 5 -dp '/datadir/taxi/trips_*.csv.gz'
```

## Ibis script

Ibis build, tests and benchmarks run through `run_ibis_test.py`. It has three distinct
modes of operation:
* build and install ibis;
* run ibis tests using pytest;
* run benchmarks using Omnisci.

Parameters which can be used:

Switch | Default value | Meaning
------ | ------------- | -------
-t, --task | | Task for execute from supported list [build, test, benchmark]. Use "," separator for multiple tasks.
-en, --env_name | ibis-tests | Conda env name.
-ec, --env_check | False | Check if env exists. If it exists don't recreate.
-s, --save_env | False | Save conda env after executing.
-r, --report_path | parent dir of omnscripts | Path to report file.
-ci, --ci_requirements | ci_requirements.yml | File with ci requirements for conda env.
-py, --python_version | 3.7 | File with ci requirements for conda env.
-i, --ibis_path | | Path to ibis directory.
-e, --executable | | Path to omnisci_server executable.
-w, --workdir | | Path to omnisci working directory. By default parent directory of executable location is used. Data directory is used in this location.
-o, --omnisci_port | 6274 | TCP port number to run omnisci_server on.
-u, --user | admin | User name to use on omniscidb server.
-p, --password | HyperInteractive | User password to use on omniscidb server.
-n, --name | agent_test_ibis | Database name to use in omniscidb server.
-commit_omnisci | 123456... | Omnisci commit hash to use for tests.
-commit_ibis | 123456... | Ibis commit hash to use for tests.

For benchmark task and recording its results in a MySQL database:

Switch | Default value | Meaning
------ | ------------- | -------
-bn, --bench_name | | Benchmark name from supported list [ny_taxi, santander]
-db-server | localhost | Host name of MySQL server.
-db-port | 3306 | Port number of MySQL server.
-db-user | | Username to use to connect to MySQL database. If user name is specified, script attempts to store results in MySQL database using other -db-* parameters.
-db-pass | omniscidb | Password to use to connect to MySQL database.
-db-name | omniscidb | MySQL database to use to store benchmark results.
-db-table | | Table to use to store results for this benchmark.
-df, --dfiles_num | 1 | Number of datafiles to input into database for processing.
-dp, --dpattern | | Wildcard pattern of datafiles that should be loaded.
-it, --iters | 5 | Number of iterations to run every query. Best result is selected.

Script automatically creates conda environment if it doesn't exist or you want to recreate it,
starts up omniscidb server, creates and initializes data directory if it doesn't exist or it is not
initialized. All subsequent work is being done in created conda environment. Environment can be
removed or saved after executing.

Sample build ibis command line:
```
python3 run_ibis_tests.py --env_name ibis-test --env_check False --save_env True --python_version 3.7 --task build --name agent_test_ibis --ci_requirements /localdisk/username/omniscripts/ci_requirements.yml --ibis_path /localdisk/username/ibis/ --executable /localdisk/username/omniscidb/release/bin/omnisci_server
```

Sample run ibis tests command line:
```
python3 run_ibis_tests.py --env_name ibis-test --env_check True --save_env True --python_version 3.7 --task test --name agent_test_ibis --report /localdisk/username/ --ibis_path /localdisk/username/ibis/ --executable /localdisk/username/omniscidb/build/bin/omnisci_server --user admin --password HyperInteractive
```

Sample run taxi benchmark command line:
```
python3 run_ibis_tests.py --env_name ibis-test --env_check True --python_version 3.7 --task benchmark --ci_requirements /localdisk/username/omniscripts/ci_requirements.yml --save_env True --report /localdisk/username/ --ibis_path /localdisk/username/ibis/ --executable /localdisk/username/omniscidb/build/bin/omnisci_server -u admin -p HyperInteractive -n agent_test_ibis --bench_name ny_taxi --dfiles_num 20 --dpattern '/localdisk/username/benchmark_datasets/taxi/trips_xa{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t}.csv.gz' --iters 5 -db-server localhost -db-port 3306 -db-user user -db-pass omniscidb -db-name omniscidb -db-table taxibench_ibis
```

Sample run santander benchmark command line:
```
python3 run_ibis_tests.py --env_name ibis-test --env_check True --python_version 3.7 --task benchmark --ci_requirements /localdisk/username/omniscripts/ci_requirements.yml --save_env True --report /localdisk/username/ --ibis_path /localdisk/username/ibis/ --executable /localdisk/username/omniscidb/build/bin/omnisci_server -u admin -p HyperInteractive -n agent_test_ibis --bench_name santander --dpattern '/localdisk/benchmark_datasets/santander/train.csv.gz' --iters 5 -db-server localhost -db-port 3306 -db-user user -db-pass omniscidb -db-name omniscidb -db-table santander_ibis
```
5 changes: 5 additions & 0 deletions ci_requirements.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
- pytest-html
- braceexpand
- mysql
- mysql-connector-python

1 change: 1 addition & 0 deletions omnisci.conf
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ data = "data"
read-only = false
verbose = false
enable-watchdog = false
allow-cpu-retry = true

[web]
port = 62073
Expand Down
1 change: 1 addition & 0 deletions report/__init__ .py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .report import DbReport
2 changes: 1 addition & 1 deletion report/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import platform
import subprocess


class DbReport:
"Initialize and submit reports to MySQL database"

Expand Down Expand Up @@ -123,4 +124,3 @@ def submit(self, benchmark_specific_values):
print("Executing statement", sql_statement)
self.__database.cursor().execute(sql_statement)
self.__database.commit()

279 changes: 279 additions & 0 deletions run_ibis_tests.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
import os
import sys
import argparse
from server import OmnisciServer
from server import execute_process


def str_arg_to_bool(v):
if isinstance(v, bool):
return v
if v.lower() in ('yes', 'true', 't', 'y', '1'):
return True
elif v.lower() in ('no', 'false', 'f', 'n', '0'):
return False
else:
raise argparse.ArgumentTypeError('Cannot recognize boolean value.')


def add_conda_execution(cmdline):
cmd_res = ['conda', 'run', '-n', args.env_name]
cmd_res.extend(cmdline)
return cmd_res


def combinate_requirements(ibis, ci, res):
with open(res, "w") as f_res:
with open(ibis) as f_ibis:
for line in f_ibis:
f_res.write(line)
with open(ci) as f_ci:
for line in f_ci:
f_res.write(line)


omniscript_path = os.path.dirname(__file__)
omnisci_server = None
args = None

parser = argparse.ArgumentParser(description='Run internal tests from ibis project')
optional = parser._action_groups.pop()
required = parser.add_argument_group("required arguments")
parser._action_groups.append(optional)

possible_tasks = ['build', 'test', 'benchmark']
benchmarks = {'ny_taxi': os.path.join(omniscript_path, "taxi", "taxibench_ibis.py"),
'santander': os.path.join(omniscript_path, "santander", "santander_ibis.py")}
# Task
required.add_argument("-t", "--task", dest="task", required=True,
help=f"Task for execute {possible_tasks}. Use , separator for multiple tasks")

# Environment
required.add_argument('-en', '--env_name', dest="env_name", default="ibis-tests",
help="Conda env name.")
optional.add_argument('-ec', '--env_check', dest="env_check", default=False, type=str_arg_to_bool,
help="Check if env exists. If it exists don't recreate.")
optional.add_argument('-s', '--save_env', dest="save_env", default=False, type=str_arg_to_bool,
help="Save conda env after executing.")
optional.add_argument('-r', '--report_path', dest="report_path",
default=os.path.join(omniscript_path, ".."), help="Path to report file.")
optional.add_argument('-ci', '--ci_requirements', dest="ci_requirements",
default=os.path.join(omniscript_path, "ci_requirements.yml"),
help="File with ci requirements for conda env.")
optional.add_argument('-py', '--python_version', dest="python_version", default="3.7",
help="File with ci requirements for conda env.")
# Ibis
required.add_argument('-i', '--ibis_path', dest="ibis_path", required=True,
help="Path to ibis directory.")
# Benchmarks
optional.add_argument('-bn', '--bench_name', dest="bench_name", choices=list(benchmarks.keys()),
help=f"Benchmark name.")
optional.add_argument('-df', '--dfiles_num', dest="dfiles_num", default=1, type=int,
help="Number of datafiles to input into database for processing.")
optional.add_argument('-dp', '--dpattern', dest="dpattern",
help="Wildcard pattern of datafiles that should be loaded.")
optional.add_argument('-it', '--iters', default=5, type=int, dest="iters",
help="Number of iterations to run every query. Best result is selected.")
# MySQL database parameters
optional.add_argument('-db-server', dest="db_server", default="localhost",
help="Host name of MySQL server.")
optional.add_argument('-db-port', dest="db_port", default=3306, type=int,
help="Port number of MySQL server.")
optional.add_argument('-db-user', dest="db_user", default="",
help="Username to use to connect to MySQL database. "
"If user name is specified, script attempts to store results in MySQL "
"database using other -db-* parameters.")
optional.add_argument('-db-pass', dest="db_password", default="omniscidb",
help="Password to use to connect to MySQL database.")
optional.add_argument('-db-name', dest="db_name", default="omniscidb",
help="MySQL database to use to store benchmark results.")
optional.add_argument('-db-table', dest="db_table",
help="Table to use to store results for this benchmark.")
# Omnisci server parameters
optional.add_argument("-e", "--executable", dest="omnisci_executable", required=True,
help="Path to omnisci_server executable.")
optional.add_argument("-w", "--workdir", dest="omnisci_cwd",
help="Path to omnisci working directory. "
"By default parent directory of executable location is used. "
"Data directory is used in this location.")
optional.add_argument("-o", "--omnisci_port", dest="omnisci_port", default=6274, type=int,
help="TCP port number to run omnisci_server on.")
optional.add_argument("-u", "--user", dest="user", default="admin",
help="User name to use on omniscidb server.")
optional.add_argument("-p", "--password", dest="password", default="HyperInteractive",
help="User password to use on omniscidb server.")
optional.add_argument("-n", "--name", dest="name", default="agent_test_ibis", required=True,
help="Database name to use in omniscidb server.")

optional.add_argument("-commit_omnisci", dest="commit_omnisci",
default="1234567890123456789012345678901234567890",
help="Omnisci commit hash to use for tests.")
optional.add_argument("-commit_ibis", dest="commit_ibis",
default="1234567890123456789012345678901234567890",
help="Ibis commit hash to use for tests.")

try:
args = parser.parse_args()

os.environ["IBIS_TEST_OMNISCIDB_DATABASE"] = args.name
os.environ["IBIS_TEST_DATA_DB"] = args.name

required_tasks = args.task.split(',')
tasks = {}
task_checker = False
for task in possible_tasks:
if task in required_tasks:
tasks[task] = True
task_checker = True
else:
tasks[task] = False
if not task_checker:
print(f"Only {list(tasks.keys())} are supported, {required_tasks} cannot find possible tasks")
sys.exit(1)

if args.python_version not in ['3.7', '3,6']:
print(f"Only 3.7 and 3.6 python versions are supported, {args.python_version} is not supported")
sys.exit(1)
ibis_requirements = os.path.join(args.ibis_path, "ci",
f"requirements-{args.python_version}-dev.yml")
ibis_data_script = os.path.join(args.ibis_path, "ci", "datamgr.py")

requirements_file = "requirements.yml"
report_file_name = f"report-{args.commit_ibis[:8]}-{args.commit_omnisci[:8]}.html"
if not os.path.isdir(args.report_path):
os.makedirs(args.report_path)
report_file_path = os.path.join(args.report_path, report_file_name)

install_ibis_cmdline = ['python3',
os.path.join('setup.py'),
'install',
'--user']

check_env_cmdline = ['conda',
'env',
'list']

create_env_cmdline = ['conda',
'env',
'create',
'--name', args.env_name,
'--file', requirements_file]

remove_env_cmdline = ['conda',
'env',
'remove',
'--name', args.env_name]

dataset_download_cmdline = ['python3',
ibis_data_script,
'download']

dataset_import_cmdline = ['python3',
ibis_data_script,
'omniscidb',
'-P', str(args.omnisci_port),
'--database', args.name]

ibis_tests_cmdline = ['pytest',
'-m', 'omniscidb',
'--disable-pytest-warnings',
f'--html={report_file_path}']

if tasks['benchmark']:
if not args.bench_name or args.bench_name not in benchmarks.keys():
print(f"Benchmark {args.bench_name} is not supported, only {list(benchmarks.keys())} are supported")
sys.exit(1)

if not args.dpattern:
print(f"Parameter --dpattern was received empty, but it is required for benchmarks")
sys.exit(1)

benchmarks_cmd = {}

ny_taxi_bench_cmdline = ['python3',
benchmarks[args.bench_name],
'-e', args.omnisci_executable,
'-port', str(args.omnisci_port),
'-db-port', str(args.db_port),
'-df', str(args.dfiles_num),
'-dp', f"'{args.dpattern}'",
'-i', str(args.iters),
'-u', args.user,
'-p', args.password,
'-db-server', args.db_server,
'-n', args.name,
f'-db-user={args.db_user}',
'-db-pass', args.db_password,
'-db-name', args.db_name,
'-db-table',
args.db_table if args.db_table else 'taxibench_ibis',
'-commit_omnisci', args.commit_omnisci,
'-commit_ibis', args.commit_ibis]

benchmarks_cmd['ny_taxi'] = ny_taxi_bench_cmdline

santander_bench_cmdline = ['python3',
benchmarks[args.bench_name],
'-e', args.omnisci_executable,
'-port', str(args.omnisci_port),
'-db-port', str(args.db_port),
'-dp', f"'{args.dpattern}'",
'-i', str(args.iters),
'-u', args.user,
'-p', args.password,
'-db-server', args.db_server,
'-n', args.name,
f'-db-user={args.db_user}',
'-db-pass', args.db_password,
'-db-name', args.db_name,
'-db-table',
args.db_table if args.db_table else 'santander_ibis',
'-commit_omnisci', args.commit_omnisci,
'-commit_ibis', args.commit_ibis]

benchmarks_cmd['santander'] = santander_bench_cmdline

print("PREPARING ENVIRONMENT")
combinate_requirements(ibis_requirements, args.ci_requirements, requirements_file)
_, envs = execute_process(check_env_cmdline)
if args.env_name in envs:
if args.env_check is False:
execute_process(remove_env_cmdline)
execute_process(create_env_cmdline, print_output=False)
else:
execute_process(create_env_cmdline, print_output=False)

if tasks['build']:
print("IBIS INSTALLATION")
execute_process(add_conda_execution(install_ibis_cmdline), cwd=args.ibis_path,
print_output=False)

if tasks['test']:
print("STARTING OMNISCI SERVER")
omnisci_server = OmnisciServer(omnisci_executable=args.omnisci_executable,
omnisci_port=args.omnisci_port, database_name=args.name,
omnisci_cwd=args.omnisci_cwd, user=args.user,
password=args.password)
omnisci_server.launch()

if tasks['test']:
print("PREPARING DATA")
execute_process(add_conda_execution(dataset_download_cmdline))
execute_process(add_conda_execution(dataset_import_cmdline))

print("RUNNING TESTS")
execute_process(add_conda_execution(ibis_tests_cmdline), cwd=args.ibis_path)

if tasks['benchmark']:
print(f"RUNNING BENCHMARK {args.bench_name}")
execute_process(add_conda_execution(benchmarks_cmd[args.bench_name]))

except Exception as err:
print("Failed", err)
sys.exit(1)

finally:
if omnisci_server:
omnisci_server.terminate()
if args and args.save_env is False:
execute_process(remove_env_cmdline)
Loading

0 comments on commit a2d68e7

Please sign in to comment.