feat: implement repr on user-facing classes #399

axiomofjoy · 2023-03-19T00:05:29Z

Implements __repr__ on user-facing classes so users can inspect and understand the classes they are interacting with in their notebooks.

The __repr__ for Schema and EmbeddingColumnNames is implemented so that a user can copy and paste in order to instantiate an identical dataclass. I made an effort to handle invalid values nicely so the user can still inspect their dataclasses to see where they messed up.

Example Output for `Schema`

Schema(
    prediction_id_column_name='prediction_id',
    timestamp_column_name='timestamp',
    feature_column_names=[
        'feature_1',
        'feature_2',
    ],
    embedding_feature_column_names={
        'embedding_feature': EmbeddingColumnNames(
            vector_column_name='embedding_vector',
            raw_data_column_name='raw_data',
        ),
    },
)

Schema(
    feature_column_names=   A  B
                         0  1  7
                         1  5  2
                         2  3  8,
)

Example Output for `EmbeddingColumnNames`

EmbeddingColumnNames(
    vector_column_name='embedding_vector',
)

EmbeddingColumnNames(
    vector_column_name='embedding_vector',
    raw_data_column_name='raw_data',
)

Example Output for `DatasetDict`

DatasetDict({
    'primary': Dataset(
        dataframe=...,
        schema=...,
        name='primary',
    ),
    'reference': Dataset(
        dataframe=...,
        schema=...,
        name='reference',
    ),
})

Example Output for `Dataset`

Phoenix Dataset
===============

name: 'example'

dataframe:
    columns: ['A', 'B', 'C', 'D', 'E', 'timestamp', 'prediction_id']
    shape: (10, 7)

schema: Schema(
    prediction_id_column_name='prediction_id',
    timestamp_column_name='timestamp',
    feature_column_names=[
        'A',
        'B',
        'C',
        'D',
        'E',
    ],
)

axiomofjoy · 2023-03-19T00:06:27Z

pyproject.toml

@@ -46,6 +46,7 @@ dev = [
  "pytest-lazy-fixture",
  "strawberry-graphql[debug-server]==0.155.3",
  "pre-commit",
+  "mypy==0.991",


Newer version of MyPy causing the MyPy daemon on VSCode to crash.

axiomofjoy · 2023-03-19T00:07:06Z

src/phoenix/datasets/fixtures.py

+    def __getitem__(self, key: str) -> Dataset:
+        try:
+            return cast(Dataset, getattr(self, key))
+        except AttributeError:
+            raise KeyError(f"Invalid key: {key}")
+


Sub-classing off of Dict does not make this a dictionary. Adding this dunder so users can actually use indexing as you would expect.

axiomofjoy · 2023-03-19T02:06:33Z

@fjcasti1 I like your idea for displaying Dataset instances long-term: https://arize-ai.slack.com/archives/C04QMRADE1L/p1679023435224509

RogerHYang · 2023-03-20T05:35:25Z

Not blocking, but this doesn't seem like the idiomatic use of __repr__. See discussion here.

From the docs:

[...] For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval() [...]

fjcasti1

Little unsure about the Viewable class. Maybe we can sync offline about that

fjcasti1 · 2023-03-20T05:26:04Z

src/phoenix/datasets/dataset.py

+    def __repr__(self) -> str:
+        """
+        Return a string to display the dataset's name, dataframe, and schema.
+        """
+        repr_string = (
+            """Phoenix Dataset
+===============\n\n"""
+            + f"name: '{self.name}'\n\n"
+            + f"""dataframe:
+    columns: {list(self.dataframe.columns)}
+    shape: {self.dataframe.shape}\n\n"""
+            + f"schema: {self.schema}"
+        )
+        return repr_string
+


I don't think the dunder methods need docstrings in general since their intentions are self-explanatory. I also think it's common practice to place these methods at the top. Could be wrong here.

According to this StackOverflow post, there's no official recommendation on this topic. Let's have a conversation about what convention we want to use.

fjcasti1 · 2023-03-20T05:27:28Z

src/phoenix/datasets/fixtures.py

+    def __getitem__(self, key: str) -> Dataset:
+        try:
+            return cast(Dataset, getattr(self, key))
+        except AttributeError:
+            raise KeyError(f"Invalid key: {key}")
+


fjcasti1 · 2023-03-20T05:30:26Z

src/phoenix/datasets/fixtures.py

+    def _format_dataset(dataset: Dataset) -> str:
+        return f"""Dataset(
+        dataframe=...,
+        schema=...,
+        name='{dataset.name}',
+    )"""


Not in love with the ... but I think for now we can pass that until we get more feedback. Any other solution that comes to mind?

I was unsure about this as well. I haven't thought of anything better, but open to suggestions.

axiomofjoy · 2023-03-20T05:58:17Z

Not blocking, but this doesn't seem like the idiomatic use of __repr__. See discussion here.

From the docs:

[...] For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval() [...]

I'm not sure what that would look like for Dataset and DatasetDict. Do you have something in mind? If you call repr on a Pandas DataFrame, for example, it looks something like this:

For Schema and EmbeddingColumnNames, you are able to copy/ paste the __repr__ output to instantiate an identical dataclass.

RogerHYang · 2023-03-20T07:15:53Z

Do you have something in mind? If you call repr on a Pandas DataFrame, for example, it looks something like this:

I don't have any thing better, so it's not blocking...just pointing out that this approach is unorthodox.

rule of thumb: __repr__ is for developers, __str__ is for customers.

Output from pandas:

>>> repr(pd.DataFrame({"x":[1,2,3]}))
'   x\n0  1\n1  2\n2  3'
>>> print(pd.DataFrame({"x":[1,2,3]}))
   x
0  1
1  2
2  3

axiomofjoy · 2023-03-20T18:25:49Z

Do you have something in mind? If you call repr on a Pandas DataFrame, for example, it looks something like this:

I don't have any thing better, so it's not blocking...just pointing out that this approach is unorthodox.

rule of thumb: __repr__ is for developers, __str__ is for customers.

Output from pandas:
>>> repr(pd.DataFrame({"x":[1,2,3]}))
'   x\n0  1\n1  2\n2  3'
>>> print(pd.DataFrame({"x":[1,2,3]}))
   x
0  1
1  2
2  3

>>> pd.DataFrame({"x":[1,2,3]})
   x
0  1
1  2
2  3

__repr__ also called here.

axiomofjoy · 2023-03-23T08:04:01Z

overly complicated, closing in favor of #425

axiomofjoy added 2 commits March 18, 2023 14:29

implemented viewable mixin with tests

d649e58

handle non-happy path

fbfea79

axiomofjoy commented Mar 19, 2023

View reviewed changes

axiomofjoy marked this pull request as ready for review March 19, 2023 00:11

axiomofjoy requested a review from fjcasti1 March 19, 2023 00:12

axiomofjoy added 2 commits March 18, 2023 17:57

repr for DatasetDict

6c8ef1e

make DatasetDict repr reflect Dataset init

d15f613

axiomofjoy changed the title ~~feat: make Schema and EmbeddingColumnNames inspectable~~ feat: make user-facing classes inspectable Mar 19, 2023

axiomofjoy changed the title ~~feat: make user-facing classes inspectable~~ feat: implement __repr__ on user-facing classes Mar 19, 2023

axiomofjoy added 2 commits March 18, 2023 18:17

small tweak to DatasetDict __repr__

205df66

repr for Dataset

1925cda

axiomofjoy added 4 commits March 18, 2023 20:31

change dataframe view in Dataset repr

bdc440e

display DatasetDict fields on separate lines

dc72aeb

fix broken test for DatasetDict repr

d8bea9a

add quotes to keys of DatasetDict

6d0fabd

axiomofjoy removed the request for review from fjcasti1 March 20, 2023 05:12

fjcasti1 reviewed Mar 20, 2023

View reviewed changes

axiomofjoy closed this Mar 20, 2023

axiomofjoy reopened this Mar 20, 2023

axiomofjoy self-assigned this Mar 22, 2023

axiomofjoy mentioned this pull request Mar 22, 2023

🗺️ Pre-launch cleanup #416

Closed

25 tasks

want to save this for later, but probably going to scrap

6fd9b3a

axiomofjoy closed this Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement repr on user-facing classes #399

feat: implement repr on user-facing classes #399

axiomofjoy commented Mar 19, 2023 •

edited

Loading

axiomofjoy Mar 19, 2023

axiomofjoy Mar 19, 2023

fjcasti1 Mar 20, 2023

axiomofjoy commented Mar 19, 2023

RogerHYang commented Mar 20, 2023 •

edited

Loading

fjcasti1 left a comment

fjcasti1 Mar 20, 2023

axiomofjoy Mar 20, 2023

fjcasti1 Mar 20, 2023

fjcasti1 Mar 20, 2023

axiomofjoy Mar 20, 2023

axiomofjoy commented Mar 20, 2023 •

edited

Loading

RogerHYang commented Mar 20, 2023

axiomofjoy commented Mar 20, 2023 •

edited

Loading

axiomofjoy commented Mar 23, 2023

feat: implement __repr__ on user-facing classes #399

feat: implement __repr__ on user-facing classes #399

Conversation

axiomofjoy commented Mar 19, 2023 • edited Loading

Example Output for Schema

Example Output for EmbeddingColumnNames

Example Output for DatasetDict

Example Output for Dataset

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axiomofjoy commented Mar 19, 2023

RogerHYang commented Mar 20, 2023 • edited Loading

fjcasti1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axiomofjoy commented Mar 20, 2023 • edited Loading

RogerHYang commented Mar 20, 2023

axiomofjoy commented Mar 20, 2023 • edited Loading

axiomofjoy commented Mar 23, 2023

feat: implement repr on user-facing classes #399

feat: implement repr on user-facing classes #399

axiomofjoy commented Mar 19, 2023 •

edited

Loading

Example Output for `Schema`

Example Output for `EmbeddingColumnNames`

Example Output for `DatasetDict`

Example Output for `Dataset`

RogerHYang commented Mar 20, 2023 •

edited

Loading

axiomofjoy commented Mar 20, 2023 •

edited

Loading

axiomofjoy commented Mar 20, 2023 •

edited

Loading