Enabled hset and ft.add nyc ingestion tests on circleci (#46)

* [wip] wip on fixing CI * [wip] wip on fixing CI github benchmarks * [fix] Fixed ecommerce_inventory to include key position * [fix] removed spurious comments on circleci/config.yml * [add] enabled hset and ft.add nyc ingestion tests on circleci * [add] enabled hset and ft.add nyc ingestion tests on circleci * [add] enabled hset and ft.add nyc ingestion tests on github actions * [add] updated circle config to make usage of matrix jobs * [add] updated circle config to make usage of matrix jobs * [fix] fixing circleci executor name issue * [fix] removed latest from ci benchmarks * [wip] wip on circle matrix and executors * [wip] wip on circle matrix and executors * [wip] wip on circle matrix and executors * [wip] wip on circle matrix and executors * [wip] wip on circle matrix and executors * [fix] executor variable working as expected on circleci * [fix] Fixed nyc_taxis generated datasets to check for search module instead of ft * [wip] wip on docs refactoring * [fix] fixed benchmark docs for nyc_taxis ft.add how-to * [fix] upgrade redisbench-admin to version 0.1.11 to support multiple teardowns * [fix] upgrade redisbench-admin to version 0.1.12 to support multiple teardowns * [add] moving back to implicit job setting on circleci due to precedence * [add] moving back to implicit job setting on circleci due to precedence * [add] moving back to implicit job setting on circleci due to precedence
RediSearch · Sep 15, 2020 · b971f68 · b971f68
1 parent 6ae21fe
commit b971f68
Show file tree

Hide file tree

Showing 16 changed files with 305 additions and 242 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -1,113 +1,77 @@
-# Golang CircleCI 2.0 configuration file
-#
-# Check https://circleci.com/docs/2.0/language-go/ for more details
-version: 2
-jobs:
-  build-edge: # test with redisearch:edge
+version: 2.1
+
+executors:
+  edge:
     docker:
       - image: circleci/golang:1.13
       - image: redislabs/redisearch:edge
-
-    working_directory: /go/src/github.com/RediSearch/ftsb
-    steps:
-      - checkout
-      - run: make test
-      - run: bash <(curl -s https://codecov.io/bash) -t ${CODECOV_TOKEN}
-
-  build-latest: # test with redisearch:latest
+  latest:
     docker:
       - image: circleci/golang:1.13
       - image: redislabs/redisearch:latest
 
-    working_directory: /go/src/github.com/RediSearch/ftsb
-    steps:
-      - checkout
-      - run: make test
-
-  ci-benchmark-edge: # test nightly with redisearch:edge
-    docker:
-      - image: circleci/golang:1.13
-      - image: redislabs/redisearch:edge
-
-    working_directory: /go/src/github.com/RediSearch/ftsb
+jobs:
+  ci-benchmark:
+    parameters:
+      redisearch_version:
+        type: executor
+      use_case:
+        type: string
+    executor: << parameters.redisearch_version >>
     steps:
       - checkout
       - run: make
       - run: sudo apt install python3.6 -y
       - run: sudo apt install python3-pip -y
-      - run: python3 -m pip install wheel redisbench-admin==0.1.10
+      - run: python3 -m pip install wheel redisbench-admin==0.1.12
       - run:
-          name: ecommerce-inventory use case
+          name: << parameters.use_case >> use case
           command: |
             redisbench-admin run \
-             --repetitions 7 \
+             --repetitions 3 \
              --output-file-prefix circleci \
              --upload-results-s3 \
-             --benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/ecommerce-inventory/ecommerce-inventory.redisearch.cfg.json
-      - run:
-          name: nyc_taxis CI use case with HSET
-          command: |
-            redisbench-admin run \
-            --repetitions 3 \
-            --output-file-prefix circleci \
-            --benchmark-requests 1000000 \
-            --upload-results-s3 \
-            --benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/nyc_taxis-hashes-CI/nyc_taxis-hashes-CI.redisearch.cfg.json
-          no_output_timeout: 30m
-      - run:
-          name: nyc_taxis CI use case with FT.ADD
-          command: |
-            redisbench-admin run \
-            --repetitions 3 \
-            --output-file-prefix circleci \
-            --benchmark-requests 1000000 \
-            --upload-results-s3 \
-            --benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/nyc_taxis-ftadd-CI/nyc_taxis-ftadd-CI.redisearch.cfg.json
-          no_output_timeout: 30m
+             --benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/<< parameters.use_case >>/<< parameters.use_case >>.redisearch.cfg.json
 
-  ci-benchmark-latest: # test nightly with redisearch:edge
+  build-edge: # test with redisearch:edge
     docker:
       - image: circleci/golang:1.13
-      - image: redislabs/redisearch:latest
-
-    working_directory: /go/src/github.com/RediSearch/ftsb
+      - image: redislabs/redisearch:edge
     steps:
       - checkout
-      - run: make
-      - run: sudo apt install python3.6 -y
-      - run: sudo apt install python3-pip -y
-      - run: python3 -m pip install wheel redisbench-admin==0.1.10
-      - run: |
-          redisbench-admin run \
-          --repetitions 7 \
-          --output-file-prefix circleci \
-          --upload-results-s3 \
-          --benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/ecommerce-inventory/ecommerce-inventory.redisearch.cfg.json
+      - run: make test
+      - run: bash <(curl -s https://codecov.io/bash) -t ${CODECOV_TOKEN}
 
+  build-latest: # test with redisearch:latest
+    docker:
+      - image: circleci/golang:1.13
+      - image: redislabs/redisearch:latest
 
-  build-multiarch-docker:
-    machine:
-      enabled: true
     steps:
       - checkout
-      - run: |
-          echo "$DOCKER_REDISBENCH_PWD" | base64 --decode | docker login --username $DOCKER_REDISBENCH_USER --password-stdin
-      - run:
-          name: Build
-          command: |
-            make docker-release
-          no_output_timeout: 20m
+      - run: make test
 
 workflows:
-  version: 2
   commit:
     jobs:
       - build-edge
       - build-latest
-      - ci-benchmark-edge
-      - ci-benchmark-latest:
+      - ci-benchmark:
+          name: edge-ecommerce-inventory
+          redisearch_version: edge
+          use_case: "ecommerce-inventory"
+      - ci-benchmark:
+          name: edge-nyc_taxis-ft.add
+          redisearch_version: edge
+          use_case: "nyc_taxis-ft.add"
           requires:
-            - ci-benchmark-edge
+            - edge-ecommerce-inventory
+      - ci-benchmark:
+          name: edge-nyc_taxis-hashes
+          redisearch_version: edge
+          use_case: "nyc_taxis-hashes"
+          requires:
+            - edge-nyc_taxis-ft.add
 
   ci_benchmarks:
     triggers:
@@ -118,7 +82,19 @@ workflows:
               only:
                 - master
     jobs:
-      - ci-benchmark-edge
-      - ci-benchmark-latest:
+      - ci-benchmark:
+          name: edge-ecommerce-inventory
+          redisearch_version: edge
+          use_case: "ecommerce-inventory"
+      - ci-benchmark:
+          name: edge-nyc_taxis-ft.add
+          redisearch_version: edge
+          use_case: "nyc_taxis-ft.add"
+          requires:
+            - edge-ecommerce-inventory
+      - ci-benchmark:
+          name: edge-nyc_taxis-hashes
+          redisearch_version: edge
+          use_case: "nyc_taxis-hashes"
           requires:
-            - ci-benchmark-edge
+            - edge-nyc_taxis-ft.add
diff --git a/.dockerignore b/.dockerignore
diff --git a/.github/workflows/ci-benchmarks.yml b/.github/workflows/ci-benchmarks.yml
@@ -12,13 +12,14 @@ jobs:
     strategy:
       matrix:
         go: [ '1.14']
-        redisearch_version: ['edge','latest']
+        redisearch_version: ['edge']
+        use_case: ['ecommerce-inventory','nyc_taxis-ft.add','nyc_taxis-hashes']
     services:
       redis:
         image: redislabs/redisearch:${{ matrix.redisearch_version }}
         ports:
           - 6379:6379
-    name: Benchmark redisearch:${{ matrix.redisearch_version }} with Go ${{ matrix.go }}
+    name: Benchmark ${{ matrix.use_case }} redisearch:${{ matrix.redisearch_version }} with Go ${{ matrix.go }}
     steps:
       - uses: actions/checkout@v2
       - name: Build and Run Benchmark
@@ -35,18 +36,16 @@ jobs:
           mkdir -p $GOPATH/src/github.com/$GITHUB_REPOSITORY
           mv $(pwd)/* $GOPATH/src/github.com/$GITHUB_REPOSITORY
           cd $GOPATH/src/github.com/$GITHUB_REPOSITORY
-          go get ./...
-          go test ./...
-          go install ./...
+          make test
           sudo apt install python3.6 -y
           sudo apt install python3-pip -y
           sudo apt-get install python3-setuptools -y
           cd $GOPATH/src/github.com/$GITHUB_REPOSITORY
           sudo python3 -m pip install wheel
-          python3 -m pip install redisbench-admin==0.1.10
+          python3 -m pip install redisbench-admin==0.1.12
           ~/.local/bin/redisbench-admin run \
-               --repetitions 7 \
+               --repetitions 3 \
                --output-file-prefix github-actions \
                --upload-results-s3 \
-               --benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/ecommerce-inventory/ecommerce-inventory.redisearch.cfg.json
+               --benchmark-config-file https://s3.amazonaws.com/benchmarks.redislabs/redisearch/datasets/${{ matrix.use_case }}/${{ matrix.use_case }}.redisearch.cfg.json
 
diff --git a/.gitignore b/.gitignore
@@ -1,14 +1,19 @@
 # FTSB outputs #
 ################
 
-cmd/ftsb_generate_redisearch/__pycache__/*
-cmd/ftsb_generate_redisearch/nyc_taxis/tmp/*
 cmd/ftsb_redisearch/ftsb_redisearch
 
+###################
+# Data generators #
+###################
 *.txt
 *.csv
-cmd/ftsb_generate_data/*.csv
+*.pyc
+*__pycache__*
+scripts/datagen_redisearch/__pycache__/**
+scripts/datagen_redisearch/nyc_taxis/tmp/*
 
+###################
 # Idea / others #
 #################
 

diff --git a/Dockerfile b/Dockerfile
diff --git a/README.md b/README.md
@@ -14,9 +14,7 @@ including RediSearch.
 This code is based on a fork of work initially made public by TSBS
 at https://github.com/timescale/tsbs.
 
-Current databases supported:
 
-+ RediSearch
 
 ## Overview
 The Full-Text Search Benchmark (FTSB) is a collection of Python and Go programs that are used to generate datasets (Python) and then benchmark(Go) read and write performance of various databases. The intent is to make the FTSB extensible so that a variety of use cases (e.g., ecommerce, jsondata, logs, etc.), query types, and databases can be included and benchmarked.
@@ -27,6 +25,26 @@ To this end, we hope to help SAs, and prospective database administrators find t
 
 FTSB is used to benchmark bulk load performance and query execution performance. To accomplish this in a fair way, the data to be inserted and the queries to run are always pre-generated and native Go clients are used wherever possible to connect to each database.
 
+## Current databases supported
+
++ RediSearch
+
+### Current use cases
+
+Currently, FTSB supports three use cases: 
+- **nyc_taxis** [[details kere](docs/nyc_taxis-benchmark/description.md)]. This benchmark focus himself on write performance, making usage of TLC Trip Record Data that contains the rides that have been performed in yellow taxis in New York in 2015.                                                                                                                                                                             On total, the benchmark loads >12M documents
+
+
+- **enwiki-abstract** [[details kere](docs/enwiki-abstract-benchmark/description.md)], from English-language [Wikipedia:Database](https://en.wikipedia.org/wiki/Wikipedia:Database_download) page abstracts. This use case generates
+3 TEXT fields per document, and focus himself on full text queries performance.
+
+
+- **ecommerce-inventory** [[details kere](docs/ecommerce-inventory-benchmark/description.md)], from a base dataset of [10K fashion products on Amazon.com](https://data.world/promptcloud/fashion-products-on-amazon-com/workspace/file?filename=amazon_co-ecommerce_sample.csv) which are then multiplexed by categories, sellers, and countries to produce larger datasets > 1M docs. This benchmark focuses on updates and aggregate performance, splitting into Reads (FT.AGGREGATE), Cursor Reads (FT.CURSOR), and Updates (FT.ADD) the performance numbers. 
+The use case generates an index with 10 TAG fields (3 sortable and 1 non indexed), and 16 NUMERIC sortable non indexed fields per document.
+The aggregate queries are designed to be extremely costly both on computation and network TX, given that on each query we're aggregating and filtering over a large portion of the dataset while additionally loading 21 fields. 
+Both the update and read rates can be adjusted.
+
+
 
 ### Installation
 
@@ -40,6 +58,9 @@ cd $GOPATH/src/github.com/RediSearch/ftsb
 make
 ```
 
+
+
+
 ## How to use it?
 
 Using FTSB for benchmarking involves 2 phases: data and query generation, and query execution.

diff --git a/docker_entrypoint.sh b/docker_entrypoint.sh