Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(persistence): span query DSL with SQL #2911

Merged
merged 13 commits into from
Apr 19, 2024
Merged

Conversation

RogerHYang
Copy link
Contributor

@RogerHYang RogerHYang commented Apr 17, 2024

resolves #2081
resolves #2780
resolves #2782
resolves #2358

Notes

  1. The UI changes are a result of resolving Consistent partitioning and nesting by the . (dot) separator for span attributes  #2081, giving us consistent JSON paths for attributes.
  2. Unfortunately no. 1 has made hiding the embedding vector quite difficult. Punting for now.
  3. latency_ms is changed to a calculated field in DB. Unfortunately sqlite doesn’t give the same values as python or postgres (not sure why), So sqlite values are slightly off and filtered results can be slightly different.
  4. Unfortunately substring search (e.g. 'service' in output.value) in sqlite is using the LIKE operator which is case-insensitive, which gives different filtered results than python or postgres. It’s not easy to fix this. Punting for now. (Also special characters such as % are not escaped. Also punting.)
  5. Postgres is strongly type. JSON values are interpreted as string by default. For custom attributes, user can now use str(), float(), int() to change the type. I have tried to infer as much as possible, but this is not always possible.
  6. I’ve added a notebook under integration-tests for testing. The results match those from the main branch. One caveat is that the main branch endpoint needs to be patched from GET to POST for this to work. I’ll need to get integration tests working on CI next.
  7. Also, note that JSONB doesn’t guarantee dict key ordering so the ordering will appear different than those in sqlite which stores the attributes as TEXT
  8. The use_active_session_if_available option on px.Client is more difficult to now that the db is async. So i have nixed it. Now both thread and process sessions are making network calls using px.Client.
  9. Also, i had to rename some of the fields in the database just to make it easier for backward compatibility. E.g. parent_span_id is renamed as parent_id
  10. sqlalchemy is bounded below by 2.0.4 in order to use inplace.expression for hybrid_property.
  11. openinference.span.kind is now included in the atributes. this shows the original string value in the event that it gets overwrittend to UNKNOWN because it doesn't fit the enum. openinference.span.kind is also outputted in the exported spans dataframe.

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 17, 2024
@RogerHYang RogerHYang linked an issue Apr 17, 2024 that may be closed by this pull request
@Arize-ai Arize-ai deleted a comment from review-notebook-app bot Apr 17, 2024
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
src/phoenix/utilities/attributes.py Outdated Show resolved Hide resolved
tests/utilities/test_attributes.py Outdated Show resolved Hide resolved
tests/trace/dsl/test_query.py Outdated Show resolved Hide resolved
src/phoenix/trace/dsl/query.py Show resolved Hide resolved
Copy link
Contributor

@axiomofjoy axiomofjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice job, Roger. Stepping through the test cases helped me understand how the filters, query, and translator works.

@RogerHYang RogerHYang merged commit 7c01420 into sql Apr 19, 2024
9 checks passed
@RogerHYang RogerHYang deleted the span-query-dsl-with-sql branch April 19, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
3 participants