Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(persistence): experimental bulk inserter for spans #2808

Merged
merged 5 commits into from
Apr 8, 2024
Merged

Conversation

RogerHYang
Copy link
Contributor

@RogerHYang RogerHYang commented Apr 8, 2024

resolves #2806

This PR implements the queuing method described below.

Two strategies for write transactions:

  1. Each request has a separate writer.
    • Pro: Returning a 200 response means data is persisted.
    • Con: Lock contention/starvation, which is probably rare. Granular transactions are less efficient overall.
  2. Queue the requests in a buffer and funnel them into a single bulk-writer.
    • Pro: Bulk writing is more efficient, and having a single writer also means no lock contention.
    • Con: Buffer is volatile an can be lost if crashed, so returning a 200 response is not a genuine confirmation of data persistence. Also, transaction errors are not propagated back the client.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 8, 2024
@RogerHYang RogerHYang changed the title refactor(persistence): experimental bulk inserter for spans feat(persistence): experimental bulk inserter for spans Apr 8, 2024
src/phoenix/server/main.py Outdated Show resolved Hide resolved
src/phoenix/server/main.py Outdated Show resolved Hide resolved
@@ -181,7 +198,16 @@ def create_app(
debug: bool = False,
read_only: bool = False,
enable_prometheus: bool = False,
initial_spans: Optional[Iterable[Union[Span, Tuple[Span, str]]]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
initial_spans: Optional[Iterable[Union[Span, Tuple[Span, str]]]] = None,
initial_spans: Optional[Iterable[Tuple[Span, str]]] = None,

Suggestion: Simplify the input type. Did we add the union type for handling fixtures? If so, we could just convert the fixtures to use the default project.

Copy link
Contributor Author

@RogerHYang RogerHYang Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. On the other hand. i have to do that in two places: main.py and session.py. So i opted to just do it in one place (here) instead.

return self._queue_span

async def __aexit__(self, *args: Any) -> None:
self._running = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set self._task to None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll let the garbage collect do it later. it's harmless either way

Comment on lines +70 to +72
if await session.scalar(select(1).where(models.Span.span_id == span.context.span_id)):
# Span already exists
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not a setting to ignore inserts if the record already exists so we don't need to hit the database an extra time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it'll raise a IntegrityError which is annoying. On the other hand, this operation here is not expensive, because the B-tree is most likely already in the buffer pool.

# Span already exists
return
if not (
project_rowid := await session.scalar(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we start moving away from the rowid naming convention since we are trying to support Postgres in addition to SQLite?

Copy link
Contributor Author

@RogerHYang RogerHYang Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline: currently don't have a good alternative name, but will reconsider later

Copy link
Contributor

@axiomofjoy axiomofjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commenting so @mikeldking can take a look.

@@ -0,0 +1,177 @@
import asyncio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the file location of this feels off if it's specific to spans

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be used for bulk inserting evals too. it'll just need second queue

src/phoenix/db/bulk_inserter.py Outdated Show resolved Hide resolved
src/phoenix/db/bulk_inserter.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[persistence] initialize span fixtures for read-only deployment
3 participants