Reranking KeyError with quantization_type=none and --skip-reranking Option + num_concurrency Configuration Issue in Streamlit #398

jimin0 · 2024-11-04T05:35:19Z

Hi....

Bug Description

There appears to be an issue where reranking configuration is required even when:

quantization-type is set to 'none'
--skip-reranking CLI option is used

The code throws KeyError: 'reranking' regardless of these settings.

Error log

(.myenv03119) [root@localhost VectorDBBench]# vectordbbench pgvectorivfflat --config-file pgvector_config.yml --case-type Performance1536D50K --skip-reranking
2024-11-04 00:20:45,478 | INFO: Task:
TaskConfig(db=<DB.PgVector: 'PgVector'>, db_config=PgVectorConfig(db_label='pgVecTest', version='', note='', user_name=SecretStr('**********'), password=SecretStr('**********'), host='localhost', port=5432, db_name='mydatabase'), db_case_config=PgVectorIVFFlatConfig(metric_type=None, create_index_before_load=False, create_index_after_load=True, lists=10, probes=1, index=<IndexType.ES_IVFFlat: 'ivfflat'>, maintenance_work_mem=None, max_parallel_workers=None, quantization_type='none'), case_config=CaseConfig(case_id=<CaseType.Performance1536D50K: 50>, custom_case={}, k=100, concurrency_search_config=ConcurrencySearchConfig(num_concurrency=[1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50], concurrency_duration=30)), stages=['drop_old', 'load', 'search_serial', 'search_concurrent'])
 (cli.py:494) (234212)
2024-11-04 00:20:45,478 | INFO: generated uuid for the tasks: e4b437c3dd834c5b929f7310fc5f7c47 (interface.py:66) (234212)
2024-11-04 00:20:45,564 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:338)
2024-11-04 00:20:45,564 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:338)
2024-11-04 00:20:45,564 | INFO | PgVector-pgVecTest | Performance  OpenAI-SMALL-50K        None | e4b437c3dd834c5b929f7310fc5f7c47 (task_runner.py:338)
2024-11-04 00:20:45,564 | INFO: task submitted: id=e4b437c3dd834c5b929f7310fc5f7c47, e4b437c3dd834c5b929f7310fc5f7c47, case number: 1 (interface.py:231) (234212)
2024-11-04 00:20:46,098 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'OpenAI', 'size': 50000, 'dim': 1536, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'PgVector-pgVecTest'}, drop_old=True (interface.py:164) (234223)
2024-11-04 00:20:46,098 | INFO: Starting run (task_runner.py:100) (234223)
2024-11-04 00:20:46,202 | INFO: PgVector config values: {'host': 'localhost', 'port': 5432, 'dbname': 'mydatabase', 'user': 'postgres', 'password': 'imeidpaw'}
metric_type=<MetricType.COSINE: 'COSINE'> create_index_before_load=False create_index_after_load=True lists=10 probes=1 index=<IndexType.ES_IVFFlat: 'ivfflat'> maintenance_work_mem=None max_parallel_workers=None quantization_type='none' (pgvector.py:54) (234223)
2024-11-04 00:20:46,203 | INFO: PgVector client drop index : pgvector_index (pgvector.py:239) (234223)
2024-11-04 00:20:46,204 | INFO: PgVector client drop table : pg_vector_collection (pgvector.py:215) (234223)
2024-11-04 00:20:46,209 | INFO: PgVector client create table : pg_vector_collection (pgvector.py:382) (234223)
2024-11-04 00:20:47,195 | INFO: Read the entire file into memory: test.parquet (dataset.py:229) (234223)
2024-11-04 00:20:47,282 | INFO: Read the entire file into memory: neighbors.parquet (dataset.py:229) (234223)
2024-11-04 00:20:47,311 | INFO: Start performance case (task_runner.py:158) (234223)
2024-11-04 00:20:47,981 | WARNING: VectorDB load dataset error: 'reranking' (serial_runner.py:115) (234223)
2024-11-04 00:20:48,092 | WARNING: Failed to run performance case, reason = 'reranking' (task_runner.py:191) (234223)
Traceback (most recent call last):
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/task_runner.py", line 164, in _run_perf_case
    _, load_dur = self._load_train_data()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/utils.py", line 41, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/task_runner.py", line 205, in _load_train_data
    raise e from None
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/task_runner.py", line 203, in _load_train_data
    runner.run()
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 147, in run
    count, dur = self._insert_all_batches()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/utils.py", line 41, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 116, in _insert_all_batches
    raise e from e
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 107, in _insert_all_batches
    count = future.result(timeout=self.timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
KeyError: 'reranking'
2024-11-04 00:20:48,094 | WARNING: [1/1] case {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'OpenAI', 'size': 50000, 'dim': 1536, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'PgVector-pgVecTest'} failed to run, reason='reranking' (interface.py:184) (234223)
Traceback (most recent call last):
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/interface.py", line 165, in _async_task_v2
    case_res.metrics = runner.run(drop_old)
                       ^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/task_runner.py", line 107, in run
    return self._run_perf_case(drop_old)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/task_runner.py", line 193, in _run_perf_case
    raise e from None
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/task_runner.py", line 164, in _run_perf_case
    _, load_dur = self._load_train_data()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/utils.py", line 41, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/task_runner.py", line 205, in _load_train_data
    raise e from None
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/task_runner.py", line 203, in _load_train_data
    runner.run()
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 147, in run
    count, dur = self._insert_all_batches()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/utils.py", line 41, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 116, in _insert_all_batches
    raise e from e
  File "/home/jimin/vectordb/VectorDBBench/vectordb_bench/backend/runner/serial_runner.py", line 107, in _insert_all_batches
    count = future.result(timeout=self.timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
KeyError: 'reranking'
2024-11-04 00:20:48,095 | INFO |Task summary: run_id=e4b43, task_label=e4b437c3dd834c5b929f7310fc5f7c47 (models.py:355)
2024-11-04 00:20:48,095 | INFO |DB       | db_label     case                label                            | load_dur    qps        latency(p99)    recall        max_load_count | label (models.py:355)
2024-11-04 00:20:48,095 | INFO |-------- | ------------ ------------------- -------------------------------- | ----------- ---------- --------------- ------------- -------------- | ----- (models.py:355)
2024-11-04 00:20:48,095 | INFO |PgVector | pgVecTest    Performance1536D50K e4b437c3dd834c5b929f7310fc5f7c47 | 0.0         0.0        0.0             0.0           0              | x     (models.py:355)
2024-11-04 00:20:48,095 | INFO: write results to disk /home/jimin/vectordb/VectorDBBench/vectordb_bench/results/PgVector/result_20241104_e4b437c3dd834c5b929f7310fc5f7c47_pgvector.json (models.py:213) (234223)
2024-11-04 00:20:48,096 | INFO: Success to finish task: label=e4b437c3dd834c5b929f7310fc5f7c47, run_id=e4b437c3dd834c5b929f7310fc5f7c47 (interface.py:203) (234223)
(.myenv03119) [root@localhost VectorDBBench]#

Steps to Reproduce

pgvector_config.yml:

pgvectorivfflat:
  db_label: pgVecTest
  user_name: postgres
  db_name: mydatabase
  host: localhost
  password: xxx
  port: 5432
  probes: 1
  lists: 10
  maintenance_work_mem: 2GB
  max_parallel_workers: 7
  quantization_type: none
  num_concurrency: 1,5,10,15,20,25,30,35,40,45,50

Run command:

vectordbbench pgvectorivfflat --config-file pgvector_config.yml --case-type Performance1536D50K

or

vectordbbench pgvectorivfflat --config-file pgvector_config.yml --case-type Performance1536D50K --skip-reranking

-> Both attempts result in the following error: KeyError: 'reranking'

Additional Question:

Could the num_concurrency configuration be reviewed as well? The current issues include:

UI Control: Need Streamlit UI controls to adjust num_concurrency range.

Configuration Error: An error occurs when attempting to modify num_concurrency in VectorDBBench/vectordb_bench/__init__.py as follows:

NUM_CONCURRENCY = env.list("NUM_CONCURRENCY", [1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75], subcast=int)

After making this change, running Streamlit returns an error, which makes it difficult to adjust num_concurrency values dynamically.

Thank you for your time and attention to this issue! 😊

Thank you:) !! ..

The text was updated successfully, but these errors were encountered:

alwayslove2013 · 2024-11-05T08:40:14Z

@jimin0 Good suggestion~ We will seriously consider exposing more parameter controls in the user interface.

UI Control: Need Streamlit UI controls to adjust num_concurrency range.

Need more details.

Configuration Error: An error occurs when attempting to modify num_concurrency in VectorDBBench/vectordb_bench/init.py as follows:

It seems that there is an issue with the pgvector client. @Sheharyar570 could you help resolve it~

KeyError: 'reranking'

Sheharyar570 · 2024-11-05T08:54:12Z

@alwayslove2013 yes, I can look into it.

alwayslove2013 · 2024-11-06T02:14:30Z

Thanks for @Sheharyar570 's professional, prompt, and enthusiastic assistance. The Reranking KeyError has been fixed in PR: #401 .

@jimin0 v0.0.16 has been released, and you can update the code to the latest version for testing. If you have any questions, please feel free to reach out to us.

jimin0 changed the title ~~reranking KeyError with quantization_type=none and --skip-reranking Option + num_concurrency Configuration Issue in Streamlit~~ Reranking KeyError with quantization_type=none and --skip-reranking Option + num_concurrency Configuration Issue in Streamlit Nov 4, 2024

alwayslove2013 assigned Sheharyar570 Nov 5, 2024

Sheharyar570 mentioned this issue Nov 5, 2024

Fix pgvectorivfflat reranking key bug #401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reranking KeyError with quantization_type=none and --skip-reranking Option + num_concurrency Configuration Issue in Streamlit #398

Reranking KeyError with quantization_type=none and --skip-reranking Option + num_concurrency Configuration Issue in Streamlit #398

jimin0 commented Nov 4, 2024

alwayslove2013 commented Nov 5, 2024

Sheharyar570 commented Nov 5, 2024

alwayslove2013 commented Nov 6, 2024

Reranking KeyError with quantization_type=none and --skip-reranking Option + num_concurrency Configuration Issue in Streamlit #398

Reranking KeyError with quantization_type=none and --skip-reranking Option + num_concurrency Configuration Issue in Streamlit #398

Comments

jimin0 commented Nov 4, 2024

Bug Description

Error log

Steps to Reproduce

alwayslove2013 commented Nov 5, 2024

Sheharyar570 commented Nov 5, 2024

alwayslove2013 commented Nov 6, 2024