-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: evaluation ingestion (no user-facing feature is added) #1764
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
d53b5b6
wip
RogerHYang 55a4eb3
recompile proto
RogerHYang 2baa268
Merge branch 'main' into evaluation-ingestion
RogerHYang 313f64e
Merge branch 'main' into evaluation-ingestion
RogerHYang d8ba773
simulate streaming
RogerHYang ea44b31
remove unused functions
RogerHYang 2909a3a
Merge branch 'main' into evaluation-ingestion
RogerHYang 059e3ad
fix typo
RogerHYang 0514c6f
clean up gql
RogerHYang 196963e
fix receiver
RogerHYang ae753d9
add back dropped param
RogerHYang fc48438
clean up notebook
RogerHYang 7408f38
improve gql descriptions
RogerHYang 1ab6acc
clean up gql
RogerHYang 4526ecc
Merge branch 'main' into evaluation-ingestion
RogerHYang 0b7fded
fix typo
RogerHYang 15f71f5
fix typo
RogerHYang ada921e
handle missing values as result of interrupts
RogerHYang 7357c6b
fix format
RogerHYang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
import weakref | ||
from collections import defaultdict | ||
from queue import SimpleQueue | ||
from threading import RLock, Thread | ||
from types import MethodType | ||
from typing import DefaultDict, Dict, List, Optional | ||
|
||
from typing_extensions import TypeAlias | ||
|
||
import phoenix.trace.v1 as pb | ||
from phoenix.trace.schemas import SpanID | ||
|
||
END_OF_QUEUE = None # sentinel value for queue termination | ||
|
||
EvaluationName: TypeAlias = str | ||
DocumentPosition: TypeAlias = int | ||
|
||
|
||
class Evals: | ||
def __init__(self) -> None: | ||
self._queue: "SimpleQueue[Optional[pb.Evaluation]]" = SimpleQueue() | ||
weakref.finalize(self, self._queue.put, END_OF_QUEUE) | ||
self._lock = RLock() | ||
self._start_consumer() | ||
self._span_evaluations_by_name: DefaultDict[ | ||
EvaluationName, Dict[SpanID, pb.Evaluation] | ||
] = defaultdict(dict) | ||
self._evaluations_by_span_id: DefaultDict[ | ||
SpanID, Dict[EvaluationName, pb.Evaluation] | ||
] = defaultdict(dict) | ||
self._document_evaluations_by_span_id: DefaultDict[ | ||
SpanID, DefaultDict[EvaluationName, Dict[DocumentPosition, pb.Evaluation]] | ||
] = defaultdict(lambda: defaultdict(dict)) | ||
|
||
def put(self, evaluation: pb.Evaluation) -> None: | ||
self._queue.put(evaluation) | ||
|
||
def _start_consumer(self) -> None: | ||
Thread( | ||
target=MethodType( | ||
self.__class__._consume_evaluations, | ||
weakref.proxy(self), | ||
), | ||
daemon=True, | ||
).start() | ||
|
||
def _consume_evaluations(self) -> None: | ||
while (item := self._queue.get()) is not END_OF_QUEUE: | ||
with self._lock: | ||
self._process_evaluation(item) | ||
|
||
def _process_evaluation(self, evaluation: pb.Evaluation) -> None: | ||
subject_id = evaluation.subject_id | ||
name = evaluation.name | ||
subject_id_kind = subject_id.WhichOneof("kind") | ||
if subject_id_kind == "document_retrieval_id": | ||
document_retrieval_id = subject_id.document_retrieval_id | ||
span_id = SpanID(document_retrieval_id.span_id) | ||
document_position = document_retrieval_id.document_position | ||
self._document_evaluations_by_span_id[span_id][name][document_position] = evaluation | ||
elif subject_id_kind == "span_id": | ||
span_id = SpanID(subject_id.span_id) | ||
self._evaluations_by_span_id[span_id][name] = evaluation | ||
self._span_evaluations_by_name[name][span_id] = evaluation | ||
else: | ||
raise ValueError(f"unrecognized subject_id type: {type(subject_id_kind)}") | ||
|
||
def get_span_evaluation_names(self) -> List[EvaluationName]: | ||
with self._lock: | ||
return list(self._span_evaluations_by_name.keys()) | ||
|
||
def get_evaluations_by_span_id(self, span_id: SpanID) -> List[pb.Evaluation]: | ||
with self._lock: | ||
return list(self._evaluations_by_span_id[span_id].values()) | ||
|
||
def get_document_evaluations_by_span_id(self, span_id: SpanID) -> List[pb.Evaluation]: | ||
all_evaluations: List[pb.Evaluation] = [] | ||
with self._lock: | ||
for evaluations in self._document_evaluations_by_span_id[span_id].values(): | ||
all_evaluations.extend(evaluations.values()) | ||
return all_evaluations |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
syntax = "proto3"; | ||
package phoenix.proto.evaluation.v1; | ||
|
||
import "google/protobuf/wrappers.proto"; | ||
|
||
message Evaluation { | ||
string name = 1; | ||
message SubjectId { | ||
message DocumentRetrievalId { | ||
string span_id = 1; | ||
int32 document_position = 2; // zero-based index | ||
} | ||
oneof kind { | ||
string trace_id = 1; | ||
string span_id = 2; | ||
DocumentRetrievalId document_retrieval_id = 3; | ||
} | ||
} | ||
SubjectId subject_id = 2; | ||
message Result { | ||
google.protobuf.DoubleValue score = 1; | ||
google.protobuf.StringValue label = 2; | ||
google.protobuf.StringValue explanation = 3; | ||
} | ||
Result result = 3; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
from typing import Optional | ||
|
||
import strawberry | ||
|
||
import phoenix.trace.v1 as pb | ||
from phoenix.trace.schemas import SpanID | ||
|
||
|
||
@strawberry.interface | ||
class Evaluation: | ||
name: str = strawberry.field( | ||
description="Name of the evaluation, e.g. 'helpfulness' or 'relevance'." | ||
) | ||
score: Optional[float] = strawberry.field( | ||
description="Result of the evaluation in the form of a numeric score." | ||
) | ||
label: Optional[str] = strawberry.field( | ||
description="Result of the evaluation in the form of a string, e.g. " | ||
"'helpful' or 'not helpful'. Note that the label is not necessarily binary." | ||
) | ||
explanation: Optional[str] = strawberry.field( | ||
description="The evaluator's explanation for the evaluation result (i.e. " | ||
"score or label, or both) given to the subject." | ||
) | ||
|
||
|
||
@strawberry.type | ||
class SpanEvaluation(Evaluation): | ||
span_id: strawberry.Private[SpanID] | ||
|
||
@staticmethod | ||
def from_pb_evaluation(evaluation: pb.Evaluation) -> "SpanEvaluation": | ||
result = evaluation.result | ||
score = result.score.value if result.HasField("score") else None | ||
label = result.label.value if result.HasField("label") else None | ||
explanation = result.explanation.value if result.HasField("explanation") else None | ||
span_id = SpanID(evaluation.subject_id.span_id) | ||
return SpanEvaluation( | ||
name=evaluation.name, | ||
score=score, | ||
label=label, | ||
explanation=explanation, | ||
span_id=span_id, | ||
) | ||
|
||
|
||
@strawberry.type | ||
class DocumentEvaluation(Evaluation): | ||
span_id: strawberry.Private[SpanID] | ||
document_position: int = strawberry.field( | ||
description="The zero-based index among retrieved documents, which " | ||
"is collected as a list (even when ordering is not inherently meaningful)." | ||
) | ||
|
||
@staticmethod | ||
def from_pb_evaluation(evaluation: pb.Evaluation) -> "DocumentEvaluation": | ||
result = evaluation.result | ||
score = result.score.value if result.HasField("score") else None | ||
label = result.label.value if result.HasField("label") else None | ||
explanation = result.explanation.value if result.HasField("explanation") else None | ||
document_retrieval_id = evaluation.subject_id.document_retrieval_id | ||
document_position = document_retrieval_id.document_position | ||
span_id = SpanID(document_retrieval_id.span_id) | ||
return DocumentEvaluation( | ||
name=evaluation.name, | ||
score=score, | ||
label=label, | ||
explanation=explanation, | ||
document_position=document_position, | ||
span_id=span_id, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs: Adding descriptions for the fields would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do