Failed to run performance case, reason = Performance case optimize timeout [PG Vector Scale] #369

agandra30 · 2024-09-18T04:19:58Z

I ran multiple tests with PGvector scale using diskann and one of the biggest problem is that the vectordbbench just exits the execution with optmize timeout , this is very true for PGVECTOR SCALE and pgvectorRS.

I know we can increase the time in the scripts , but did anyone observed or recommend any configuration settings that can complete the execution with in the 5hours(default timeout) for 10M cohere 768 Dimension dataset ,we want to look at the cross comparison with out editing the default time outs for a large datastes , did any successfully completed with in that timeout (Milvus Yes , but other DBs ?)

Error message :

024-09-17 22:14:02,230 | WARNING: VectorDB optimize timeout in 18000 (task_runner.py:249) (3816719) 2024-09-17 22:14:02,274 | WARNING: Failed to run performance case, reason = Performance case optimize timeout (task_runner.py:191) (3816719) Traceback (most recent call last): File "/root/vectordbbench_runs/lib/python3.12/site-packages/vectordb_bench/backend/task_runner.py", line 247, in _optimize return future.result(timeout=self.ca.optimize_timeout)[1]

Query:

`CREATE INDEX IF NOT EXISTS "pgvectorscale_index" ON public. "pg_vectorscale_collection"
USING "diskann" (embedding "vector_cosine_ops" )
WITH ( "storage_layout" = "memory_optimized", "num_neighbors" = "50", "search_list_size" = "100", "max_alpha" = "1.2", "num_bits_per_dimension" = "2" ); (pgvectorscale.py:200) (3935818)

`

My Postgres server infra configuration :

Installed on baremetal Ubuntu 22
postgre 16.4 v(untuned)
Memory available :

# free -mh total used free shared buff/cache available Mem: 1.0Ti 13Gi 986Gi 152Mi 7.3Gi 988Gi Swap: 0B 0B 0B
4. Extensions used :

`

The text was updated successfully, but these errors were encountered:

alwayslove2013 · 2024-09-19T09:23:24Z

@Sheharyar570 Could you please share any experiences or advice you have regarding the building performance of PGVectorScale's DiskANN? We would greatly appreciate it!

xiaofan-luan · 2024-09-21T14:35:01Z

@Sheharyar570 Could you please share any experiences or advice you have regarding the building performance of PGVectorScale's DiskANN? We would greatly appreciate it!

Maybe we just need to give a user a timeout config?

xiaofan-luan · 2024-09-21T14:37:21Z

I ran multiple tests with PGvector scale using diskann and one of the biggest problem is that the vectordbbench just exits the execution with optmize timeout , this is very true for PGVECTOR SCALE and pgvectorRS.

I know we can increase the time in the scripts , but did anyone observed or recommend any configuration settings that can complete the execution with in the 5hours(default timeout) for 10M cohere 768 Dimension dataset ,we want to look at the cross comparison with out editing the default time outs for a large datastes , did any successfully completed with in that timeout (Milvus Yes , but other DBs ?)

Error message :

024-09-17 22:14:02,230 | WARNING: VectorDB optimize timeout in 18000 (task_runner.py:249) (3816719) 2024-09-17 22:14:02,274 | WARNING: Failed to run performance case, reason = Performance case optimize timeout (task_runner.py:191) (3816719) Traceback (most recent call last): File "/root/vectordbbench_runs/lib/python3.12/site-packages/vectordb_bench/backend/task_runner.py", line 247, in _optimize return future.result(timeout=self.ca.optimize_timeout)[1]

Query:

`CREATE INDEX IF NOT EXISTS "pgvectorscale_index" ON public. "pg_vectorscale_collection" USING "diskann" (embedding "vector_cosine_ops" ) WITH ( "storage_layout" = "memory_optimized", "num_neighbors" = "50", "search_list_size" = "100", "max_alpha" = "1.2", "num_bits_per_dimension" = "2" ); (pgvectorscale.py:200) (3935818)

`

My Postgres server infra configuration :

Installed on baremetal Ubuntu 22

postgre 16.4 v(untuned)

Memory available :

# free -mh total used free shared buff/cache available Mem: 1.0Ti 13Gi 986Gi 152Mi 7.3Gi 988Gi Swap: 0B 0B 0B 4. Extensions used :

` pgdiskann=# \dx; List of installed extensions Name | Version | Schema | Description -------------+---------+------------+------------------------------------------------------ plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language vector | 0.7.4 | public | vector data type and ivfflat and hnsw access methods vectorscale | 0.3.0 | public | pgvectorscale: Advanced indexing for vector data (3 rows)

`

BTW, I don't think it's a wise choice to run 10M or more data on PG vector, it is simply too slow.

KendrickChou · 2024-09-23T11:15:24Z

@Sheharyar570 Could you please share any experiences or advice you have regarding the building performance of PGVectorScale's DiskANN? We would greatly appreciate it!

Maybe we just need to give a user a timeout config?

A timeout config is exactly useful for me too!

alwayslove2013 · 2024-09-24T01:55:32Z

@KendrickChou @xiaofan-luan I will consider adding a timeout setting in the UI.

Maybe we just need to give a user a timeout config?

A timeout config is exactly useful for me too!

We have set different default timeout based on the size of the dataset. Currently, the timeout config can be modified through code.

VectorDBBench/vectordb_bench/__init__.py

Lines 40 to 56 in b364fe3

    
           CAPACITY_TIMEOUT_IN_SECONDS = 24 * 3600 # 24h 
        
           LOAD_TIMEOUT_DEFAULT        = 2.5 * 3600 # 2.5h 
        
           LOAD_TIMEOUT_768D_1M        = 2.5 * 3600 # 2.5h 
        
           LOAD_TIMEOUT_768D_10M       =  25 * 3600 # 25h 
        
           LOAD_TIMEOUT_768D_100M      = 250 * 3600 # 10.41d 
        
           LOAD_TIMEOUT_1536D_500K     = 2.5 * 3600 # 2.5h 
        
           LOAD_TIMEOUT_1536D_5M       =  25 * 3600 # 25h 
        
           OPTIMIZE_TIMEOUT_DEFAULT    = 30 * 60   # 30min 
        
           OPTIMIZE_TIMEOUT_768D_1M    =  30 * 60   # 30min 
        
           OPTIMIZE_TIMEOUT_768D_10M   = 5 * 3600 # 5h 
        
           OPTIMIZE_TIMEOUT_768D_100M  =  50 * 3600 # 50h 
        
           OPTIMIZE_TIMEOUT_1536D_500K =  15 * 60   # 15min 
        
           OPTIMIZE_TIMEOUT_1536D_5M   =   2.5 * 3600 # 2.5h

Sheharyar570 · 2024-09-24T10:59:40Z

@agandra30 Well, I've not tried running 10M cohere on PG Vector Scale. So I won't be able to suggest any specific configuration.

Although, I would suggest to make maintenance_work_mem larger than the index_size, but first you would need create the index to get index_size.

But you still may need to update the default timeout.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to run performance case, reason = Performance case optimize timeout [PG Vector Scale] #369

Failed to run performance case, reason = Performance case optimize timeout [PG Vector Scale] #369

agandra30 commented Sep 18, 2024 •

edited

Loading

alwayslove2013 commented Sep 19, 2024

xiaofan-luan commented Sep 21, 2024

xiaofan-luan commented Sep 21, 2024

KendrickChou commented Sep 23, 2024

alwayslove2013 commented Sep 24, 2024 •

edited

Loading

Sheharyar570 commented Sep 24, 2024

Failed to run performance case, reason = Performance case optimize timeout [PG Vector Scale] #369

Failed to run performance case, reason = Performance case optimize timeout [PG Vector Scale] #369

Comments

agandra30 commented Sep 18, 2024 • edited Loading

alwayslove2013 commented Sep 19, 2024

xiaofan-luan commented Sep 21, 2024

xiaofan-luan commented Sep 21, 2024

KendrickChou commented Sep 23, 2024

alwayslove2013 commented Sep 24, 2024 • edited Loading

Sheharyar570 commented Sep 24, 2024

agandra30 commented Sep 18, 2024 •

edited

Loading

alwayslove2013 commented Sep 24, 2024 •

edited

Loading