-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🗺️ Persistence #1689
Comments
I think at least for the second need you can see here: https://docs.arize.com/phoenix/integrations/llamaindex#traces |
I am investigating using Parquet as the file format. Here's a snippet to add custom metadata to a Parquet file: """
Snippet to write custom metadata to a single parquet file.
NB: "Pyarrow maps the file-wide metadata to a field in the table's schema
named metadata. Regrettably there is not (yet) documentation on this."
From https://stackoverflow.com/questions/52122674/how-to-write-parquet-metadata-with-pyarrow
"""
import json
import pandas as pd
import pyarrow
from pyarrow import parquet
dataframe = pd.DataFrame(
{
"field0": [1, 2, 3],
"eval0": ["a", "b", "c"],
}
)
OPENINFERENCE_METADATA_KEY = b"openinference"
openinference_metadata = {
"version": "v0",
"evaluation_ids": ["eval0", "eval1"],
}
original_table = pyarrow.Table.from_pandas(dataframe)
print("Metadata:")
print("=========")
print(original_table.schema.metadata)
print()
updated_write_table = original_table.replace_schema_metadata(
{
OPENINFERENCE_METADATA_KEY: json.dumps(openinference_metadata),
**original_table.schema.metadata,
}
)
parquet.write_table(updated_write_table, "test.parquet")
updated_read_table = parquet.read_table("test.parquet")
print("Metadata:")
print("=========")
print(updated_read_table.schema.metadata)
print()
updated_metadata = updated_read_table.schema.metadata
updated_metadata.pop(OPENINFERENCE_METADATA_KEY)
assert updated_metadata == original_table.schema.metadata |
Notes on Parquet and PyArrow:
|
@axiomofjoy what kind of backends will you target? i see some code related to file backends, but why not sql databases (given the |
@mikeldking Can you provide an update for @stdweird? |
@stdweird - good point - I think we see some limitations with sql backends and so we are currently benchmarking different backends. In general we will probably have a storage interface and you will be able to choose your storage mechanism but for now we are working on keeping the backend pretty lean and figuring out the interface as we go |
🥳 |
As a user of phoenix, I would like a persistent backend - notably a way to
Spikes
Server
document_retrieval_metrics
at the span level #2957document_retrieval_metrics
for spans #2961document_index
column todocument_position
#2913UI
Metrics / Observability
Infra
Remote Session management
Performance
Notebook-Side Persistence
Docs
Breaking Changes
Testing
Open Questions
The text was updated successfully, but these errors were encountered: