Make sqlalchemy use connection pooling for catalog db #710

nmaytan · 2024-04-05T16:10:31Z

This PR results from the investigation done in #663, as well as a group coding session to implement a new benchmark for the catalog db (especially with input from @genematx, @Kezzsim, @danielballan).

I spent some extra time experimenting with asv to understand better what it does before putting up this PR. The new benchmark can be singled out by running: asv run --environment existing:same --bench "time_repeated_write".

While some initial benchmarks seemed to show a modest performance improvement from connection pooling, I found (after running additional benchmarks) that this improvement was really just variance in the benchmark results. Instead, benchmarks seem to show that performance is ~identical regardless of connection pooling. This at least means that there is no apparent regression, which was our primary concern.

asv has several parameters that can adjust how benchmarks are taken, with the ability to customize things like warmup time, rounds per benchmark, samples per round, benchmark timeouts, etc. I played with these settings to see if they had any impact on our results, and it seemed the answer is "no".

It is worth pointing out that asv defaults to timeit.default_timer - and so it should measure the time spent in subprocesses. I was worried about this, since aiosqlite drops into a thread (per db connection).

danielballan · 2024-04-05T16:46:59Z

For test failures, it looks like you are missing an import:

        engine = create_async_engine(
>           uri, echo=echo, json_serializer=json_serializer, poolclass=AsyncAdaptedQueuePool
        )
E       NameError: name 'AsyncAdaptedQueuePool' is not defined

For linter issues:

pip install pre-commit
pre-commit install  # set up git hooks
pre-commit run --all-files
git commit -am "Fix linter issues."

nmaytan · 2024-04-05T17:15:01Z

Sorry, silly blunder.

danielballan · 2024-04-05T19:19:10Z

I see some pytest failures. They look related but it is not immediately obvious to me what broke.

nmaytan · 2024-04-05T22:14:57Z

Correction: I had misunderstood that asv was running against the version of tiled installed in my environment, and not against the local repository I was working in. Now performing proper benchmarks where the change between pooling / not pooling is actually captured, the speedup looks significant when using pooling:

Pool	Run	Time	Uncertainty	Unit
Yes	1	2.69	0.02	s
Yes	2	2.84	0.2	s
Yes	3	2.94	0.05	s
No	4	5.17	0.07	s
No	5	5.26	0.07	s
No	6	5.31	0.1	s
Yes	7	2.87	0.02	s
Yes	8	3.04	0.1	s
Yes	9	2.77	0.02	s
No	10	4.84	0	s
No	11	5.18	0.1	s
No	12	5.41	0.01	s

edit: this improvement scales, here is a run for range(1000) rather than 100:

Pool	Run	Time	Uncertainty	Unit
No	1	failed
Yes	2	25.8	0.3	s
No	3	54.5	0.3	s

The first run was too slow for the default timeout, and I had to increase it:

asv run --environment existing:same --bench "time_repeated_write" -a timeout=300

danielballan · 2024-04-25T21:06:25Z

The test failures arose from trying to use a pool with an in-memory (:memory:) SQLite database. Making the pool conditional resolved the problem in a commit pushed above. With this change in place, I re-verified the benchmarks, before and after this diff:

diff --git a/tiled/catalog/adapter.py b/tiled/catalog/adapter.py
index e1c2f1af..b297155d 100644
--- a/tiled/catalog/adapter.py
+++ b/tiled/catalog/adapter.py
@@ -1293,7 +1293,7 @@ def from_uri(
         uri = f"sqlite+aiosqlite:///{uri}"
 
     parsed_url = make_url(uri)
-    if (parsed_url.get_dialect().name == "sqlite") and (
+    if False and (parsed_url.get_dialect().name == "sqlite") and (
         parsed_url.database != ":memory:"
     ):
         # For file-backed SQLite databases, connection pooling offers a

Pool	Run	Time	Uncertainty	Unit
Yes	1	1.36	0.01	s
Yes	2	1.51	0.04	s
Yes	3	1.43	0.01	s
No	4	2.04	0.05	s
No	5	1.97	0.04	s
No	6	2.07	0.01	s

nmaytan added 2 commits April 5, 2024 11:28

Make catalog db use connection pool for sqlite

ce8e36f

Add asv benchmark for catalog db write

2c3f60f

nmaytan requested a review from danielballan April 5, 2024 16:10

nmaytan added 2 commits April 5, 2024 13:10

Re-add poolclass import for async pool

30b987c

Fix linter issues

392a358

danielballan added this to the v0.1.0b1 (first beta release) milestone Apr 5, 2024

Do not apply pool to in-memory database.

effe623

danielballan merged commit f44177e into bluesky:main Apr 25, 2024
9 checks passed

danielballan mentioned this pull request Apr 25, 2024

Investigate and optimize SQLite connection pool management #663

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sqlalchemy use connection pooling for catalog db #710

Make sqlalchemy use connection pooling for catalog db #710

nmaytan commented Apr 5, 2024

danielballan commented Apr 5, 2024

nmaytan commented Apr 5, 2024

danielballan commented Apr 5, 2024

nmaytan commented Apr 5, 2024 •

edited

Loading

danielballan commented Apr 25, 2024

Make sqlalchemy use connection pooling for catalog db #710

Make sqlalchemy use connection pooling for catalog db #710

Conversation

nmaytan commented Apr 5, 2024

danielballan commented Apr 5, 2024

nmaytan commented Apr 5, 2024

danielballan commented Apr 5, 2024

nmaytan commented Apr 5, 2024 • edited Loading

danielballan commented Apr 25, 2024

nmaytan commented Apr 5, 2024 •

edited

Loading