feat(persistence): experimental bulk inserter for spans #2808

RogerHYang · 2024-04-08T16:27:08Z

resolves #2806

This PR implements the queuing method described below.

Two strategies for write transactions:

Each request has a separate writer.
- Pro: Returning a 200 response means data is persisted.
- Con: Lock contention/starvation, which is probably rare. Granular transactions are less efficient overall.
Queue the requests in a buffer and funnel them into a single bulk-writer.
- Pro: Bulk writing is more efficient, and having a single writer also means no lock contention.
- Con: Buffer is volatile an can be lost if crashed, so returning a 200 response is not a genuine confirmation of data persistence. Also, transaction errors are not propagated back the client.

src/phoenix/server/main.py

axiomofjoy · 2024-04-08T18:46:07Z

src/phoenix/server/app.py

@@ -181,7 +198,16 @@ def create_app(
    debug: bool = False,
    read_only: bool = False,
    enable_prometheus: bool = False,
+    initial_spans: Optional[Iterable[Union[Span, Tuple[Span, str]]]] = None,


Suggested change

initial_spans: Optional[Iterable[Union[Span, Tuple[Span, str]]]] = None,

initial_spans: Optional[Iterable[Tuple[Span, str]]] = None,

Suggestion: Simplify the input type. Did we add the union type for handling fixtures? If so, we could just convert the fixtures to use the default project.

Good point. On the other hand. i have to do that in two places: main.py and session.py. So i opted to just do it in one place (here) instead.

axiomofjoy · 2024-04-08T19:45:38Z

src/phoenix/db/bulk_inserter.py

+        return self._queue_span
+
+    async def __aexit__(self, *args: Any) -> None:
+        self._running = False


Set self._task to None?

i'll let the garbage collect do it later. it's harmless either way

axiomofjoy · 2024-04-08T19:51:01Z

src/phoenix/db/bulk_inserter.py

+    if await session.scalar(select(1).where(models.Span.span_id == span.context.span_id)):
+        # Span already exists
+        return


Is there not a setting to ignore inserts if the record already exists so we don't need to hit the database an extra time?

Yes, but it'll raise a IntegrityError which is annoying. On the other hand, this operation here is not expensive, because the B-tree is most likely already in the buffer pool.

axiomofjoy · 2024-04-08T19:52:29Z

src/phoenix/db/bulk_inserter.py

+        # Span already exists
+        return
+    if not (
+        project_rowid := await session.scalar(


Should we start moving away from the rowid naming convention since we are trying to support Postgres in addition to SQLite?

discussed offline: currently don't have a good alternative name, but will reconsider later

axiomofjoy

Just commenting so @mikeldking can take a look.

mikeldking · 2024-04-08T20:11:41Z

src/phoenix/db/bulk_inserter.py

@@ -0,0 +1,177 @@
+import asyncio


nit: the file location of this feels off if it's specific to spans

it could be used for bulk inserting evals too. it'll just need second queue

src/phoenix/db/bulk_inserter.py

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 8, 2024

wip

24a67a6

RogerHYang force-pushed the bulk-insert branch from 8485e34 to 24a67a6 Compare April 8, 2024 16:37

RogerHYang changed the title ~~refactor(persistence): experimental bulk inserter for spans~~ feat(persistence): experimental bulk inserter for spans Apr 8, 2024

axiomofjoy reviewed Apr 8, 2024

View reviewed changes

mikeldking approved these changes Apr 8, 2024

View reviewed changes

RogerHYang added 4 commits April 8, 2024 14:01

clean up

99c9881

clean up

58ada84

Merge branch 'sql' into bulk-insert

00d10df

clean up

507ec3a

RogerHYang merged commit 9ce841e into sql Apr 8, 2024
11 checks passed

RogerHYang deleted the bulk-insert branch April 8, 2024 21:50

RogerHYang linked an issue Apr 8, 2024 that may be closed by this pull request

[persistence] initialize span fixtures for read-only deployment #2806

Closed

This was referenced Apr 9, 2024

2815 launch app with persist #2829

Closed

mikeldking/openapi #2886

Closed

fix: include migration files #2887

Merged

feat: project page metrics display last 7D #2896

Merged

github-actions bot mentioned this pull request May 9, 2024

chore(main): release arize-phoenix 5.0.0 #3134

Closed

mikeldking mentioned this pull request May 9, 2024

chore(main): release arize-phoenix 4.0.0 #3143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(persistence): experimental bulk inserter for spans #2808

feat(persistence): experimental bulk inserter for spans #2808

RogerHYang commented Apr 8, 2024 •

edited

Loading

axiomofjoy Apr 8, 2024

RogerHYang Apr 8, 2024 •

edited

Loading

axiomofjoy Apr 8, 2024

RogerHYang Apr 8, 2024

axiomofjoy Apr 8, 2024

RogerHYang Apr 8, 2024

axiomofjoy Apr 8, 2024

RogerHYang Apr 8, 2024 •

edited

Loading

axiomofjoy left a comment

mikeldking Apr 8, 2024

RogerHYang Apr 8, 2024

	initial_spans: Optional[Iterable[Union[Span, Tuple[Span, str]]]] = None,
	initial_spans: Optional[Iterable[Tuple[Span, str]]] = None,

feat(persistence): experimental bulk inserter for spans #2808

feat(persistence): experimental bulk inserter for spans #2808

Conversation

RogerHYang commented Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

RogerHYang Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerHYang Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

axiomofjoy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerHYang commented Apr 8, 2024 •

edited

Loading

RogerHYang Apr 8, 2024 •

edited

Loading

RogerHYang Apr 8, 2024 •

edited

Loading