Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: sqlite async session for graphql api #2784

Merged
merged 6 commits into from
Apr 5, 2024

Conversation

RogerHYang
Copy link
Contributor

@RogerHYang RogerHYang commented Apr 5, 2024

resolves #2753

Future TODOs

  1. enable sort and filter on spans
  2. ingest fixture without networking (for read-only deployments)
  3. improve span insertion code
  4. add dataloader for span descendants or change the query pattern
  5. add mock for db in unit tests
  6. migrate latency percentile statistics to sql
  7. handle ingestions with high concurrency
  8. figure out why timezone is not working for insertion of timezone-aware datetime

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 5, 2024
src/phoenix/server/api/context.py Show resolved Hide resolved
Comment on lines 74 to 82
project_rowid := await session.scalar(
text("SELECT rowid FROM projects WHERE name = :name;"),
{"name": project_name},
)
):
project_rowid = await session.scalar(
text("INSERT INTO projects(name) VALUES(:name) RETURNING rowid;"),
{"name": project_name},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes it so that you have to hit the database on every insert. I think this is where we can build a bit of optimization via dynamic programming where you can keep a list of seen projects. If you've seen it before you don't need to insert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, that makes sense. we may have to cache it outside the handler though

async def _insert_span(session: AsyncSession, span: Span, project_name: str) -> None:
if not (
project_rowid := await session.scalar(
text("SELECT rowid FROM projects WHERE name = :name;"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to sql injection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. colon is one of the three ways for parameter binding like ?, but sqlalchemy only allows the colon syntax.

src/phoenix/server/api/routers/trace_handler.py Outdated Show resolved Hide resolved
Comment on lines 117 to 130
await session.execute(
text(
"""
SELECT
sum(cumulative_error_count),
sum(cumulative_llm_token_count_prompt),
sum(cumulative_llm_token_count_completion)
FROM spans
WHERE parent_span_id = :parent_span_id
"""
), # noqa E501
{"parent_span_id": span.context.span_id},
)
).first():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also feels a tad in-efficient - where a read needs to be performed before you insert. If anything you should be able to execute all precursor work on a single async gather call so that the db calls are parallelized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine assuming that the child spans are still in the buffer pool. On the other hand, parallelization doesn't really help sqlite because it locks the whole database

src/phoenix/server/api/routers/trace_handler.py Outdated Show resolved Hide resolved
src/phoenix/server/api/types/Trace.py Show resolved Hide resolved
src/phoenix/server/main.py Outdated Show resolved Hide resolved
tests/server/api/types/conftest.py Outdated Show resolved Hide resolved
@RogerHYang RogerHYang merged commit 30b2ab0 into sql Apr 5, 2024
13 checks passed
@RogerHYang RogerHYang deleted the sqlite-async-session-for-gql branch April 5, 2024 23:26
@RogerHYang RogerHYang linked an issue Apr 8, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[persistence] sqlAlchemy session mounting in graphql
2 participants