Skip to content

Commit

Permalink
Enabled hset and ft.add nyc ingestion tests on circleci (#46)
Browse files Browse the repository at this point in the history
* [wip] wip on fixing CI

* [wip] wip on fixing CI github benchmarks

* [fix] Fixed ecommerce_inventory to include key position

* [fix] removed spurious comments on circleci/config.yml

* [add] enabled hset and ft.add nyc ingestion tests on circleci

* [add] enabled hset and ft.add nyc ingestion tests on circleci

* [add] enabled hset and ft.add nyc ingestion tests on github actions

* [add] updated circle config to make usage of matrix jobs

* [add] updated circle config to make usage of matrix jobs

* [fix] fixing circleci executor name issue

* [fix] removed latest from ci benchmarks

* [wip] wip on circle matrix and executors

* [wip] wip on circle matrix and executors

* [wip] wip on circle matrix and executors

* [wip] wip on circle matrix and executors

* [wip] wip on circle matrix and executors

* [fix] executor variable working as expected on circleci

* [fix] Fixed nyc_taxis generated datasets to check for search module instead of ft

* [wip] wip on docs refactoring

* [fix] fixed benchmark docs for nyc_taxis ft.add how-to

* [fix] upgrade redisbench-admin to version 0.1.11 to support multiple teardowns

* [fix] upgrade redisbench-admin to version 0.1.12 to support multiple teardowns

* [add] moving back to implicit job setting on circleci due to precedence

* [add] moving back to implicit job setting on circleci due to precedence

* [add] moving back to implicit job setting on circleci due to precedence
  • Loading branch information
filipecosta90 authored Sep 15, 2020
1 parent 6ae21fe commit b971f68
Show file tree
Hide file tree
Showing 16 changed files with 305 additions and 242 deletions.
136 changes: 56 additions & 80 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -1,113 +1,77 @@
# Golang CircleCI 2.0 configuration file
#
# Check https://circleci.com/docs/2.0/language-go/ for more details
version: 2
jobs:
build-edge: # test with redisearch:edge
version: 2.1

executors:
edge:
docker:
- image: circleci/golang:1.13
- image: redislabs/redisearch:edge

working_directory: /go/src/github.com/RediSearch/ftsb
steps:
- checkout
- run: make test
- run: bash <(curl -s https://codecov.io/bash) -t ${CODECOV_TOKEN}

build-latest: # test with redisearch:latest
latest:
docker:
- image: circleci/golang:1.13
- image: redislabs/redisearch:latest

working_directory: /go/src/github.com/RediSearch/ftsb
steps:
- checkout
- run: make test

ci-benchmark-edge: # test nightly with redisearch:edge
docker:
- image: circleci/golang:1.13
- image: redislabs/redisearch:edge

working_directory: /go/src/github.com/RediSearch/ftsb
jobs:
ci-benchmark:
parameters:
redisearch_version:
type: executor
use_case:
type: string
executor: << parameters.redisearch_version >>
steps:
- checkout
- run: make
- run: sudo apt install python3.6 -y
- run: sudo apt install python3-pip -y
- run: python3 -m pip install wheel redisbench-admin==0.1.10
- run: python3 -m pip install wheel redisbench-admin==0.1.12
- run:
name: ecommerce-inventory use case
name: << parameters.use_case >> use case
command: |
redisbench-admin run \
--repetitions 7 \
--repetitions 3 \
--output-file-prefix circleci \
--upload-results-s3 \
--benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/ecommerce-inventory/ecommerce-inventory.redisearch.cfg.json
- run:
name: nyc_taxis CI use case with HSET
command: |
redisbench-admin run \
--repetitions 3 \
--output-file-prefix circleci \
--benchmark-requests 1000000 \
--upload-results-s3 \
--benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/nyc_taxis-hashes-CI/nyc_taxis-hashes-CI.redisearch.cfg.json
no_output_timeout: 30m
- run:
name: nyc_taxis CI use case with FT.ADD
command: |
redisbench-admin run \
--repetitions 3 \
--output-file-prefix circleci \
--benchmark-requests 1000000 \
--upload-results-s3 \
--benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/nyc_taxis-ftadd-CI/nyc_taxis-ftadd-CI.redisearch.cfg.json
no_output_timeout: 30m
--benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/<< parameters.use_case >>/<< parameters.use_case >>.redisearch.cfg.json
ci-benchmark-latest: # test nightly with redisearch:edge
build-edge: # test with redisearch:edge
docker:
- image: circleci/golang:1.13
- image: redislabs/redisearch:latest

working_directory: /go/src/github.com/RediSearch/ftsb
- image: redislabs/redisearch:edge
steps:
- checkout
- run: make
- run: sudo apt install python3.6 -y
- run: sudo apt install python3-pip -y
- run: python3 -m pip install wheel redisbench-admin==0.1.10
- run: |
redisbench-admin run \
--repetitions 7 \
--output-file-prefix circleci \
--upload-results-s3 \
--benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/ecommerce-inventory/ecommerce-inventory.redisearch.cfg.json
- run: make test
- run: bash <(curl -s https://codecov.io/bash) -t ${CODECOV_TOKEN}

build-latest: # test with redisearch:latest
docker:
- image: circleci/golang:1.13
- image: redislabs/redisearch:latest

build-multiarch-docker:
machine:
enabled: true
steps:
- checkout
- run: |
echo "$DOCKER_REDISBENCH_PWD" | base64 --decode | docker login --username $DOCKER_REDISBENCH_USER --password-stdin
- run:
name: Build
command: |
make docker-release
no_output_timeout: 20m
- run: make test

workflows:
version: 2
commit:
jobs:
- build-edge
- build-latest
- ci-benchmark-edge
- ci-benchmark-latest:
- ci-benchmark:
name: edge-ecommerce-inventory
redisearch_version: edge
use_case: "ecommerce-inventory"
- ci-benchmark:
name: edge-nyc_taxis-ft.add
redisearch_version: edge
use_case: "nyc_taxis-ft.add"
requires:
- ci-benchmark-edge
- edge-ecommerce-inventory
- ci-benchmark:
name: edge-nyc_taxis-hashes
redisearch_version: edge
use_case: "nyc_taxis-hashes"
requires:
- edge-nyc_taxis-ft.add

ci_benchmarks:
triggers:
Expand All @@ -118,7 +82,19 @@ workflows:
only:
- master
jobs:
- ci-benchmark-edge
- ci-benchmark-latest:
- ci-benchmark:
name: edge-ecommerce-inventory
redisearch_version: edge
use_case: "ecommerce-inventory"
- ci-benchmark:
name: edge-nyc_taxis-ft.add
redisearch_version: edge
use_case: "nyc_taxis-ft.add"
requires:
- edge-ecommerce-inventory
- ci-benchmark:
name: edge-nyc_taxis-hashes
redisearch_version: edge
use_case: "nyc_taxis-hashes"
requires:
- ci-benchmark-edge
- edge-nyc_taxis-ft.add
8 changes: 0 additions & 8 deletions .dockerignore

This file was deleted.

15 changes: 7 additions & 8 deletions .github/workflows/ci-benchmarks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@ jobs:
strategy:
matrix:
go: [ '1.14']
redisearch_version: ['edge','latest']
redisearch_version: ['edge']
use_case: ['ecommerce-inventory','nyc_taxis-ft.add','nyc_taxis-hashes']
services:
redis:
image: redislabs/redisearch:${{ matrix.redisearch_version }}
ports:
- 6379:6379
name: Benchmark redisearch:${{ matrix.redisearch_version }} with Go ${{ matrix.go }}
name: Benchmark ${{ matrix.use_case }} redisearch:${{ matrix.redisearch_version }} with Go ${{ matrix.go }}
steps:
- uses: actions/checkout@v2
- name: Build and Run Benchmark
Expand All @@ -35,18 +36,16 @@ jobs:
mkdir -p $GOPATH/src/github.com/$GITHUB_REPOSITORY
mv $(pwd)/* $GOPATH/src/github.com/$GITHUB_REPOSITORY
cd $GOPATH/src/github.com/$GITHUB_REPOSITORY
go get ./...
go test ./...
go install ./...
make test
sudo apt install python3.6 -y
sudo apt install python3-pip -y
sudo apt-get install python3-setuptools -y
cd $GOPATH/src/github.com/$GITHUB_REPOSITORY
sudo python3 -m pip install wheel
python3 -m pip install redisbench-admin==0.1.10
python3 -m pip install redisbench-admin==0.1.12
~/.local/bin/redisbench-admin run \
--repetitions 7 \
--repetitions 3 \
--output-file-prefix github-actions \
--upload-results-s3 \
--benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/ecommerce-inventory/ecommerce-inventory.redisearch.cfg.json
--benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/${{ matrix.use_case }}/${{ matrix.use_case }}.redisearch.cfg.json
11 changes: 8 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
# FTSB outputs #
################

cmd/ftsb_generate_redisearch/__pycache__/*
cmd/ftsb_generate_redisearch/nyc_taxis/tmp/*
cmd/ftsb_redisearch/ftsb_redisearch

###################
# Data generators #
###################
*.txt
*.csv
cmd/ftsb_generate_data/*.csv
*.pyc
*__pycache__*
scripts/datagen_redisearch/__pycache__/**
scripts/datagen_redisearch/nyc_taxis/tmp/*

###################
# Idea / others #
#################

Expand Down
24 changes: 0 additions & 24 deletions Dockerfile

This file was deleted.

25 changes: 23 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,7 @@ including RediSearch.
This code is based on a fork of work initially made public by TSBS
at https://github.com/timescale/tsbs.

Current databases supported:

+ RediSearch

## Overview
The Full-Text Search Benchmark (FTSB) is a collection of Python and Go programs that are used to generate datasets (Python) and then benchmark(Go) read and write performance of various databases. The intent is to make the FTSB extensible so that a variety of use cases (e.g., ecommerce, jsondata, logs, etc.), query types, and databases can be included and benchmarked.
Expand All @@ -27,6 +25,26 @@ To this end, we hope to help SAs, and prospective database administrators find t

FTSB is used to benchmark bulk load performance and query execution performance. To accomplish this in a fair way, the data to be inserted and the queries to run are always pre-generated and native Go clients are used wherever possible to connect to each database.

## Current databases supported

+ RediSearch

### Current use cases

Currently, FTSB supports three use cases:
- **nyc_taxis** [[details kere](docs/nyc_taxis-benchmark/description.md)]. This benchmark focus himself on write performance, making usage of TLC Trip Record Data that contains the rides that have been performed in yellow taxis in New York in 2015. On total, the benchmark loads >12M documents


- **enwiki-abstract** [[details kere](docs/enwiki-abstract-benchmark/description.md)], from English-language [Wikipedia:Database](https://en.wikipedia.org/wiki/Wikipedia:Database_download) page abstracts. This use case generates
3 TEXT fields per document, and focus himself on full text queries performance.


- **ecommerce-inventory** [[details kere](docs/ecommerce-inventory-benchmark/description.md)], from a base dataset of [10K fashion products on Amazon.com](https://data.world/promptcloud/fashion-products-on-amazon-com/workspace/file?filename=amazon_co-ecommerce_sample.csv) which are then multiplexed by categories, sellers, and countries to produce larger datasets > 1M docs. This benchmark focuses on updates and aggregate performance, splitting into Reads (FT.AGGREGATE), Cursor Reads (FT.CURSOR), and Updates (FT.ADD) the performance numbers.
The use case generates an index with 10 TAG fields (3 sortable and 1 non indexed), and 16 NUMERIC sortable non indexed fields per document.
The aggregate queries are designed to be extremely costly both on computation and network TX, given that on each query we're aggregating and filtering over a large portion of the dataset while additionally loading 21 fields.
Both the update and read rates can be adjusted.



### Installation

Expand All @@ -40,6 +58,9 @@ cd $GOPATH/src/github.com/RediSearch/ftsb
make
```




## How to use it?

Using FTSB for benchmarking involves 2 phases: data and query generation, and query execution.
Expand Down
14 changes: 0 additions & 14 deletions docker_entrypoint.sh

This file was deleted.

Loading

0 comments on commit b971f68

Please sign in to comment.