Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: plumb Dataset start/end times through graphQL #194

Merged
merged 16 commits into from
Jan 27, 2023

Conversation

nate-mar
Copy link
Contributor

Resolves #172

@nate-mar nate-mar changed the title [feat] plumb Dataset start/end times through grpahQL [feat] plumb Dataset start/end times through graphQL Jan 26, 2023
src/phoenix/config.py Outdated Show resolved Hide resolved
from phoenix.datasets import Dataset
from phoenix.datasets.schema import EmbeddingFeatures
from phoenix.core.datasets import Dataset
from phoenix.core.datasets.schema import EmbeddingFeatures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I sorta wanted the code that the user interacts with be unnested and not be part of core. Core being part of the application. I think voxel has a similar organization. Makes for easier discovery in the notebook. But we would dissuade people from importing core in the notebook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok gotcha -- so datasets is the API interface so to speak, and users should never need to interact with anything in core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was the idea. That the user interacts with datasets and metrics, phoenix serves core via an api.

@nate-mar nate-mar changed the title [feat] plumb Dataset start/end times through graphQL feat: plumb Dataset start/end times through graphQL Jan 26, 2023
Comment on lines 8 to 9
startTime: DateTime
endTime: DateTime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
startTime: DateTime
endTime: DateTime
startTime: DateTime!
endTime: DateTime!

Let's make these non-maybe

Comment on lines 20 to 22
readonly endTime: any | null;
readonly name: string;
readonly startTime: any | null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update relay.config and these will become strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err, what do I update relay.config with? do I need to add something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nm, got it.

@@ -7,5 +7,6 @@ module.exports = {
noFutureProofEnums: true,
customScalars: {
GlobalID: "String",
"DateTime": "string",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mapping to another base primitive so it should be uppercase String

def start_time(self) -> datetime:
"""Returns the datetime of the earliest inference in the dataset"""
ts_col_name: str = cast(str, self.schema.timestamp_column_name)
dt: datetime = self.__dataframe[ts_col_name].min()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this never changes you can use dynamic programming to compute once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you're referring to the caching aspect of dp? how about just using the @cached_property annotation? or do we have a preexisting convention for how we'd like to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's great. Very cool

@property
def end_time(self) -> datetime:
"""Returns the datetime of the latest inference in the dataset"""
ts_col_name: str = cast(str, self.schema.timestamp_column_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer dynamic programming to compute once

"""Returns the datetime of the latest inference in the dataset"""
ts_col_name: str = cast(str, self.schema.timestamp_column_name)
dt: datetime = self.__dataframe[ts_col_name].max()
return dt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been advocating for non abbreviated variable names right now https://youtu.be/-J3wNP6u5YU

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 -- done

@nate-mar nate-mar marked this pull request as ready for review January 26, 2023 18:56
@nate-mar nate-mar merged commit dc6c88d into main Jan 27, 2023
@nate-mar nate-mar deleted the plumb-dataset-start-end branch January 27, 2023 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset bookends plumbed through to GQL
2 participants