Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: convert DatasetDict to dict and add __repr__ for Dataset #425

Merged
merged 9 commits into from
Mar 23, 2023
4 changes: 2 additions & 2 deletions .github/workflows/python-CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ jobs:
steps:
- name: Checkout Repository
uses: actions/checkout@v3
- name: Set up python 3.10
- name: Set up python 3.8
uses: actions/setup-python@v4
with:
python-version: "3.10"
python-version: "3.8"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run type checks and linting on same version of Python we're testing on.

Copy link
Contributor

@RogerHYang RogerHYang Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably ok to have the latest lint unless there are any specific problems. On the other hand, if we leave the versions different, people will always want to change it. haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to fix a typing issue I was having, but downgrading the version did not work. Changed it back.

- name: Install dependencies
run: |
python -m pip install --upgrade pip==${{ env.pip-version }}
Expand Down
3 changes: 3 additions & 0 deletions src/phoenix/datasets/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ def __init__(
self.to_disc()
logger.info(f"""Dataset: {self.__name} initialized""")

def __repr__(self) -> str:
return f'<Dataset "{self.name}">'

@cached_property
def start_time(self) -> datetime:
"""Returns the datetime of the earliest inference in the dataset"""
Expand Down
9 changes: 5 additions & 4 deletions src/phoenix/datasets/fixtures.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import logging
import os
from dataclasses import dataclass, replace
from typing import Dict, Tuple
from typing import Tuple

from pandas import read_parquet

Expand Down Expand Up @@ -325,9 +325,10 @@ def _download_and_persist_dataset_if_missing(
return dataset


@dataclass(frozen=True)
class DatasetDict(Dict[str, Dataset]):
"""A dictionary of datasets, split out by dataset type (primary, reference)."""
class DatasetDict(dict):
"""
A dictionary of datasets, split out by dataset type (primary, reference).
"""

primary: Dataset
reference: Dataset
Expand Down
2 changes: 1 addition & 1 deletion src/phoenix/metrics/mixins.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ def calc(self, dataframe: pd.DataFrame) -> float:
how="outer",
).fillna(0)
# remove rows with all zeros
merged_counts = merged_counts.loc[(merged_counts > 0).any(axis=1)] # type: ignore
merged_counts = merged_counts.loc[(merged_counts > 0).any(axis=1)]
primary_histogram = merged_counts.primary_histogram
reference_histogram = merged_counts.reference_histogram
return self.divergence(
Expand Down