Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project API #303

Closed
tuscland opened this issue Sep 5, 2024 · 4 comments · Fixed by #325
Closed

Project API #303

tuscland opened this issue Sep 5, 2024 · 4 comments · Fixed by #325
Assignees
Labels
chore Internal enhancement work to be done enhancement New feature or request

Comments

@tuscland
Copy link
Member

tuscland commented Sep 5, 2024

Project Class

Methods

  1. put(key: str, value: Any) -> None
    • Adds an item to the project.
    • Parameters:
      • key: The identifier for the item.
      • value: The value to store. The appropriate Item subclass will be used based on the value type.
  2. put_item(key: str, item: Item) -> None
    • Adds a specific Item instance to the project.
    • Parameters:
      • key: The identifier for the item.
      • item: An instance of an Item subclass.
  3. get(key: str) -> Any
    • Retrieves the value of an item from the project.
    • Parameters:
      • key: The identifier of the item to retrieve.
    • Returns: The stored value.
  4. get_item(key: str) -> Item
    • Retrieves an Item instance from the project.
    • Parameters:
      • key: The identifier of the item to retrieve.
    • Returns: An Item instance.
  5. list_keys() -> List[str]
    • Lists all item keys in the project.
    • Returns: A list of item keys.
  6. delete_item(key: str) -> None
    • Removes an item from the project.
    • Parameters:
      • key: The identifier of the item to remove.

Notes

  1. The API focuses on storing data relevant to report building, including model results, visualizations, and performance metrics.
  2. Input data is limited to known types to ensure compatibility across different environments and programming languages.
  3. The put() method automatically selects the appropriate Item subclass based on the input value type.
  4. The get() method returns the value of the item, while get_item() returns the Item instance itself.

Item Class Hierarchy

Base Class: Item

Abstract base class for all item types.

Subclasses

  1. JSONItem
    • Constructor: JSON serializable Python object
  2. DataFrameItem
    • Constructor: pandas or polars data frame
  3. NumpyArrayItem
    • Constructor: Numpy Array
  4. MediaItem
    • Constructors:
      • Convenience:
        • Matplotlib Figure -> UTF-8 string + image/svg+xml
        • Pillow Image -> byte-array + image/png
        • Altair -> UTF-8 string + application/vnd.vega.v5+json
        • UTF-8 string + media-type -> byte-array + media-type
      • Primitive:
        • byte-array + media-type
    • Examples of basic content types:
      • HTML
      • Markdown (text/markdown)
      • Image (various formats)
      • Audio (various formats)
  5. ScikitLearnModelItem
    • Constructor: sklearn model serialized with skops (+ environment)

Item Type Mapping

The following table shows how different input value types provided to the add() function correspond to specific Item types:

Input Value Type Corresponding Item Type
str JSONItem
int JSONItem
float JSONItem
bool JSONItem
list JSONItem
dict JSONItem
pandas.DataFrame DataFrameItem
polars.DataFrame DataFrameItem
numpy.ndarray NumpyArrayItem
matplotlib.figure.Figure MediaItem (as SVG)
PIL.Image.Image MediaItem (as PNG)
altair.Chart MediaItem (as Vega-Lite JSON)
bytes + media type MediaItem
str + media type MediaItem
Scikit-learn model ScikitLearnModelItem

Note: When adding a MediaItem with raw bytes or string data, you need to specify the media type explicitly using the add_item() method instead of add().

Usage Examples

import mandr
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from PIL import Image

# Load a project
project = mandr.load("my_project")

# Add items of various types
project.put("string_item", "Hello, World!")  # JSONItem
project.put("int_item", 42)  # JSONItem
project.put("float_item", 3.14)  # JSONItem
project.put("bool_item", True)  # JSONItem
project.put("list_item", [1, 2, 3])  # JSONItem
project.put("dict_item", {"key": "value"})  # JSONItem

# Add a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
project.put("pandas_df", df)  # DataFrameItem

# Add a Numpy array
arr = np.array([1, 2, 3, 4, 5])
project.put("numpy_array", arr)  # NumpyArrayItem

# Add a Matplotlib figure
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4])
project.put("mpl_figure", fig)  # MediaItem (SVG)

# Add a PIL Image
pil_image = Image.new('RGB', (100, 100), color='red')
project.put("pil_image", pil_image)  # MediaItem (PNG)

# Add raw bytes with media type
raw_bytes = b"Some raw data"
project.put_item("raw_data", MediaItem(raw_bytes, "application/octet-stream"))

# Add a scikit-learn model
model = RandomForestClassifier()
model.fit(np.array([[1, 2], [3, 4]]), [0, 1])
project.put("rf_model", model)  # ScikitLearnModelItem

# List all item keys
print(project.list_keys())
@tuscland tuscland added the chore Internal enhancement work to be done label Sep 5, 2024
@tuscland tuscland added this to the PyData Paris milestone Sep 5, 2024
@tuscland tuscland added the enhancement New feature or request label Sep 5, 2024
@tuscland tuscland changed the title [draft] Project API Project API Sep 6, 2024
@thomass-dev
Copy link
Collaborator

thomass-dev commented Sep 10, 2024

remaining TODO:

  • fix frontend
  • add support of vega
  • add support of pillow
  • rollback directory to project name
  • move load to own class (pending future naming strategy)
  • reshape Item
  • simplify skops by keeping only bytes
  • rename modules and clean repository

@rouk1 rouk1 linked a pull request Sep 10, 2024 that will close this issue
@tuscland
Copy link
Member Author

simplify skops by keeping only bytes

This is the case of every kind of items, they are all eventually serialized as bytes: JSON, Media or scikit-learn models.

@thomass-dev
Copy link
Collaborator

thomass-dev commented Sep 11, 2024

they are all eventually serialized as bytes

@tuscland isn't this question (storing the content of an item as bytes or str) just a implementation detail ?

@augustebaum
Copy link
Contributor

Writing the docs for this, I am wondering two things:

  • delete_item should be called delete
  • Entering HTML is done this way:
project.put_item(
    "my_string",
    MediaItem.factory("<p><h1>Hello</h1></p>", media_type="text/html"),
)

But when I wrote the docs, I often accidentally wrote project.put("string", MediaItem(...)) which results in an error: NotImplementedError: Type MediaItem is not supported yet. It would be easier if Skore did the right thing when I put an Item, or at least give me a more precise error message.

@tuscland What do you think? Should we do this before this gets merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Internal enhancement work to be done enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants