Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Deprecate datasets module, rename to inferences #2785

Merged
merged 13 commits into from
Apr 8, 2024

Conversation

anticorrelator
Copy link
Contributor

@anticorrelator anticorrelator commented Apr 5, 2024

Resolves #2732

  • Deprecates phoenix.datasets module, interfaces are available under phoenix.inferences
  • px.Dataset class renamed to px.Inference
  • px.ExampleDatasets class renmaed to px.ExampleInferences
  • Datasets.from_open_inference deprecated and removed from Inference

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 5, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@anticorrelator anticorrelator changed the title chore: Deprecate datasets module, rename to inferences feat: Deprecate datasets module, rename to inferences Apr 5, 2024
@@ -2,8 +2,8 @@

import pytest
from pandas import DataFrame, Timestamp
from phoenix.datasets.dataset import Dataset as InternalDataset
from phoenix.datasets.dataset import Schema
from phoenix.inferences.inference import Inference as InternalDataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to InternalInference.

from phoenix.datasets.dataset import Dataset
from phoenix.datasets.schema import EmbeddingColumnNames, EmbeddingFeatures
from phoenix.inferences.inference import Inference
from phoenix.inferences.schema import EmbeddingColumnNames, EmbeddingFeatures

from .embedding_dimension import EmbeddingDimension


def _get_embedding_dimensions(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rename these arguments for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave renames alone I think, this is a big refactor and I'm already having a tough time keeping it all straight

def _is_dataset(obj: Optional[Dataset]) -> TypeGuard[Dataset]:
return type(obj) is Dataset
def _is_dataset(obj: Optional[Inference]) -> TypeGuard[Inference]:
return type(obj) is Inference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These probably deserve to be renamed.

@@ -114,9 +114,9 @@ def _load_items(
trace_dataset_name: Optional[str] = None
simulate_streaming: Optional[bool] = None

primary_dataset: Dataset = EMPTY_DATASET
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can be renamed.

_normalize_timestamps,
_parse_dataframe_and_schema,
)
from phoenix.datasets.errors import DatasetError
from phoenix.datasets.schema import (
from phoenix.inferences.schema import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test names in this file are still named with dataset names.

Comment on lines -37 to +38
primary_dataset: InternalDataset,
reference_dataset: InternalDataset,
primary_dataset: InternalInference,
reference_dataset: InternalInference,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this file still has a lot of dataset names

NAME_TO_FIXTURE = {fixture.name: fixture for fixture in FIXTURES}


def get_datasets(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be renamed.

no_internet: bool = False,
) -> Tuple[Inference, Optional[Inference], Optional[Inference]]:
"""
Downloads primary and reference datasets for a fixture if they are not found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

SchemaLike: TypeAlias = Any


class Inference:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be plural because it represents multiple inferences?

Copy link
Contributor

@axiomofjoy axiomofjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inference vs. Inferences?

Just checking the diff, I'm still seeing a large number of old dataset names.

Copy link
Contributor

@mikeldking mikeldking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be plural. Also can we cascade the notebook changes? We need to wait until the release. You will also have to pin the notebooks lowerbound

Copy link
Contributor

@mikeldking mikeldking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock

@anticorrelator anticorrelator merged commit 4987ea3 into main Apr 8, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] rename Dataset to Inferences
4 participants