-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement __repr__ on user-facing classes #399
Conversation
@@ -46,6 +46,7 @@ dev = [ | |||
"pytest-lazy-fixture", | |||
"strawberry-graphql[debug-server]==0.155.3", | |||
"pre-commit", | |||
"mypy==0.991", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newer version of MyPy causing the MyPy daemon on VSCode to crash.
def __getitem__(self, key: str) -> Dataset: | ||
try: | ||
return cast(Dataset, getattr(self, key)) | ||
except AttributeError: | ||
raise KeyError(f"Invalid key: {key}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-classing off of Dict
does not make this a dictionary. Adding this dunder so users can actually use indexing as you would expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@fjcasti1 I like your idea for displaying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little unsure about the Viewable
class. Maybe we can sync offline about that
src/phoenix/datasets/dataset.py
Outdated
def __repr__(self) -> str: | ||
""" | ||
Return a string to display the dataset's name, dataframe, and schema. | ||
""" | ||
repr_string = ( | ||
"""Phoenix Dataset | ||
===============\n\n""" | ||
+ f"name: '{self.name}'\n\n" | ||
+ f"""dataframe: | ||
columns: {list(self.dataframe.columns)} | ||
shape: {self.dataframe.shape}\n\n""" | ||
+ f"schema: {self.schema}" | ||
) | ||
return repr_string | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the dunder methods need docstrings in general since their intentions are self-explanatory. I also think it's common practice to place these methods at the top. Could be wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this StackOverflow post, there's no official recommendation on this topic. Let's have a conversation about what convention we want to use.
def __getitem__(self, key: str) -> Dataset: | ||
try: | ||
return cast(Dataset, getattr(self, key)) | ||
except AttributeError: | ||
raise KeyError(f"Invalid key: {key}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
src/phoenix/datasets/fixtures.py
Outdated
def _format_dataset(dataset: Dataset) -> str: | ||
return f"""Dataset( | ||
dataframe=..., | ||
schema=..., | ||
name='{dataset.name}', | ||
)""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in love with the ...
but I think for now we can pass that until we get more feedback. Any other solution that comes to mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure about this as well. I haven't thought of anything better, but open to suggestions.
I'm not sure what that would look like for
For |
I don't have any thing better, so it's not blocking...just pointing out that this approach is unorthodox.
Output from pandas: >>> repr(pd.DataFrame({"x":[1,2,3]}))
' x\n0 1\n1 2\n2 3'
>>> print(pd.DataFrame({"x":[1,2,3]}))
x
0 1
1 2
2 3 |
|
overly complicated, closing in favor of #425 |
Implements
__repr__
on user-facing classes so users can inspect and understand the classes they are interacting with in their notebooks.The
__repr__
forSchema
andEmbeddingColumnNames
is implemented so that a user can copy and paste in order to instantiate an identical dataclass. I made an effort to handle invalid values nicely so the user can still inspect their dataclasses to see where they messed up.Example Output for
Schema
Example Output for
EmbeddingColumnNames
Example Output for
DatasetDict
Example Output for
Dataset