Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a report programmatically #334

Closed
tuscland opened this issue Sep 16, 2024 · 13 comments · Fixed by #374
Closed

Create a report programmatically #334

tuscland opened this issue Sep 16, 2024 · 13 comments · Fixed by #374
Assignees
Labels
enhancement New feature or request

Comments

@tuscland
Copy link
Member

tuscland commented Sep 16, 2024

Consequently to #303 refactoring, we want to separate reports from the hierarchal organization of items, and provide storage for reports.

New Project APIs:

In the following APIs, key has the same value as an Item key, i.e. the / character is conventionnaly used as a path separator.

Project.create_report(key)
Create a new report at key.

  • If a report at key already exists a warning message is logged.
  • If an item at key already exists, an error is raised.
  • The Report object is returned.

Project.get_report(key)
Returns the report at key. If the report does not exist, an error is raised. Otherwise, the Report object is returned.

Project.delete_report(key)
Deletes the report at key. If a report at key does not exists, a warning message is logged.

Project.list_reports()
Returns a list of all report keys.

Report API

Report.set_layout(layout)
Sets the report layout as a list of layout objects. A layout object is a dictionary with the following key/value pairs:

  • key:, the key of the item to display in the report
  • size:, the size of presented item with possible values "small", "medium", "large"

Report.get_layout()
Returns the layout of the report.

Report.to_html()
Export the report as a self contained HTML document.

@tuscland tuscland added the enhancement New feature or request label Sep 16, 2024
@tuscland tuscland changed the title [draft] Create a report programmatically Create a report programmatically Sep 16, 2024
@augustebaum
Copy link
Contributor

augustebaum commented Sep 17, 2024

  • What is the difference between a Report and a Layout? What does a Report have that a Layout doesn't?

  • Why would we want to control reports programatically if there is only one Project we can apply the layout to?

  • Really, the main question is this: currently, Project has these two methods:

    def put_report_layout(self, layout: Layout) -> None:
        """Add a report layout to the Project."""
    
    def get_report_layout(self) -> Layout:
        """Get the report layout from the Project."""

    Would the following be enough?

    def put_report_layout(self, key: str, layout: Layout) -> None:
        """Add a report layout to the Project associating it to key `key`."""
    
    def get_report_layout(self, key: str) -> Layout:
        """Get the report layout corresponding to key `key` from the Project."""
    
    def list_reports(self) -> list[Layout]: ...
    
    def to_html_report(self, layout: Layout) -> str:
        """Generate an HTML report from the Project with layout `layout`."""

    Note that Report.set_layout(layout) can be achieved just by re-running Project.put_report_layout, like re-running Project.put on an existing key will overwrite the value associated with the key.

@thomass-dev

@tuscland
Copy link
Member Author

Thank you for your thoughtful input. I'd like to propose some considerations to help us iterate on this design:

What is the difference between a Report and a Layout? What does a Report have that a Layout doesn't?

We should design with scalability in mind. How can we structure this to allow the Report feature to grow and evolve? Addressing only small bits of information at the Project level may limit our future options.

Besides layout, a Report might include metadata like creation date, author, description, or version. It could also have methods for data processing, rendering in different formats, or integration with other systems.

Why would we want to control reports programatically if there is only one Project we can apply the layout to?

A Project should have zero to many Reports.

Also, I would like to add that a Report is a higher-level concept with more domain value than just a Layout. I propose renaming Layout to Report, with layout being a field within the Report object. This better reflects the domain model we're working with.

@augustebaum
Copy link
Contributor

augustebaum commented Sep 18, 2024

Thanks for your response, that clears it up. A Report is clearly bigger than just a Layout.

To us, Report sounds like it should be a dataclass, because it is a domain object. It should not have access to the persistence layer like Project does.
Creating a Report object should be done simply by instantiating a Report, i.e. this:

class Report:
    layout: Layout

    def __init__(self, layout: Layout): ...

report = Report(layout=...)

and then we can get/put Reports from a Project like this:

class Project:
    ...

    def put_report(self, key: str, report: Report): ...

    def get_report(...): ...

    def delete_report(...): ...

    ...

And modifying the layout of a report should be done with something like:

report = project.get_report(key)

report.layout = new_layout

project.put_report(key, report)

or maybe like

project.put_report(key, Report(layout=new_layout))

We note that in the proposal, Project.create_report is meant to avoid for the user to have to initialize Reports themselves (thus they don't need to know about metadata and the rest); if we need to, we can add a Report.factory(layout) factory to keep this ease of use. For now though, since we have only the layout, an __init__ should be sufficient.

One other thing that doesn't really make sense to us is the intent to mix Items and Reports in the same key system. For instance, what should Project.list_keys() now return?

@tuscland
Copy link
Member Author

Note: If it is a domain object, then the user cannot access it behind a service. Also, domain objects do not have to be dataclasses (anemic domain model), they can and should have behaviors (rich domain model).

We can try your more consistent proposal, but I would like @MarieS-WiMLDS to chime her opinion.

One other thing that doesn't really make sense to us is the intent to mix Items and Reports in the same key system. For instance, what should Project.list_keys() now return?

The idea is that items, like reports are organized by the user in the same (experiments) space. Currently, the only means of organization we have is the convention of using the / character to indicate a path separation and use that in the UI to display a hierarchy of items.

Thus, Project.list_keys() may be renamed Project.list_items().

@MarieS-WiMLDS
Copy link
Contributor

If I understand correctly where you agree already:

  • a report belongs to 1 and only 1 project (*-1 relationship)
  • a report has a layout (1-1 relationship, even though at the beginning the layout is empty)
  • there is a differentiation between items and reports, so the method Project.list_items() outputs only items and not the reports.

Now about the discussion.

I like the layout aspect because it's reproducible.
At first I asked @augustebaum to have a method to add items on the fly to a report. However, it's not reproducible, plus it's difficult to visualize what kind of output it will render in the end, so it was not such a good idea. The layout, with its "all in 1" declaration, offers somehow a pre-visualization of the report.

About the classes & methods of reports and projects. I'd like to be able to interact directly with the report object, without having to go through the project, it's more consistent with how I represent these objects in my head. A report is an object of its own, and it's weird to me to have to go through another object which has to transfer the request.
Speaking with Auguste, I understand that we would prefer that only Project class has access to file system.

Therefore, here is a possible solution:

class Report:
    layout: Layout
    project: Project

    def __init__(self, layout: Layout, project: Project):     ...

    def update(self, layout):
        self.project.put_report() # or something similar

and then we can create/update/delete Reports from a Project like this:

class Project:
    ...

    def create_report(self, key: str): ...

    def get_report(...): ...

    def delete_report(...): ...

   # hidden method
   def put_report(self, key: str, report: Report): 

    ...

And modifying the layout of a report should be done with something like:

report = project.create_report(key)

report.update(layout = new_layout)

How do it seems?

@augustebaum
Copy link
Contributor

This is something I forgot to ask about initially: how does create_report work exactly? It creates an empty report, which the user then has to fill? Why not directly do

project.put_report(key, Report(layout=my_layout))

@MarieS-WiMLDS
Copy link
Contributor

It's completely something you can add in create_report(key, layout) indeed (or put_report), it allows to go a bit faster if you know what you want to do already. Good idea!

@augustebaum
Copy link
Contributor

But then, according to your proposed implementation, that last snippet wouldn't work: it would need to be

project.put_report(key, Report(layout=layout, project=project))

which is really weird

@augustebaum
Copy link
Contributor

Summary of conversation with @MarieS-WiMLDS

A Report object should be linked to a Project at some point, and we'd rather do this once, at the Report creation (project.create_report(key, layout: Layout | None)).

Having to call upon project again to update the Report is a bit awkward to Marie; she would expect the Report to know where it came from and how to update itself. In particular, the report should know its Project and its key.

Similarly, the idea that Report.set_layout(layout) doesn't actually update the Report until some further action is taken (e.g. project.put_report(key, report) in my examples) is counter-intuitive to Marie, kind of like the fact that in Pandas, modifying a dataframe needs to be done with the in_place=True argument, whereas she would expect all actions to mutate the original dataframe.

Example user flow:

project = Project()

# Put some items in the project
project.put(...)
project.put(...)

# Create a Report in the UI with some key
# This will rely on `project.create_report(key)`

# Get the report to modify it from code
report = project.get_report(key)

# Change the layout of the report from code
# This automatically refreshes the UI
report.set_layout(layout)

@tuscland
Copy link
Member Author

tuscland commented Sep 21, 2024

Given recent discussions (see 2024-09-20 workshop with Gaël), it would be good to discuss the semantic value of Reports and how it fits to avoid potential conflicts with future developments.

Some facts:

  • Reports are a result to communicate with other stakeholders.
  • Also, since the discussion, I see there is a need to display information as it is put in the project to offer more interactivity.
  • Finally, we want to introduce "managed" features, where you would call something like skore.evaluation.cross_validate and it would add a bunch of items automatically to your project.

So, what are our options? I can suggest several ones:

  1. Rename Report to View, and use it simply to present items. Views would update dynamically as items are updated. Views can eventually become Reports at the end of the project. Reports would not update, being snapshot at one point (designed to be shared with someone else).
  2. Keep Reports, but find an alternative UI/UX way to display items as they are generated to offer more interactivity.
  3. Other ideas?

This is great to have this discussion because it shows we are gradually bringing the gap between prototyping and reporting.

@tuscland tuscland pinned this issue Sep 21, 2024
@tuscland
Copy link
Member Author

tuscland commented Sep 23, 2024

After discussion, we decide to:

  • Rename Reports to View, keeping the recently designed UI/UX to let users display the items and see their updates
  • Adopt the functional approach proposed by Auguste.
  • Defer work on instant display of items. A first idea is to display an "activity feed" of items. Will follow-up in a new issue.

@augustebaum could you please draft the latest version of the API taking in account these new elements?

@thomass-dev
Copy link
Collaborator

thomass-dev commented Sep 24, 2024

Do not to address in this PR; reflection on API design

Just my feeling: @tuscland its a bit weird to me to mix items and views in the same API. I think user will be confused too.

There is get/put only for item (and put_item), but get_view/put_view for views.
I would have preferred the use of namespaces (project.item.get/put and project.view.get/put).

@tuscland
Copy link
Member Author

Right now I believe this is good, here's why.

We should prioritize user experience. The primary goal for skore is to help DS store their results and track them. Automating view creation is secondary. If we get negative feedback (confusing API), we will address this issue.

thomass-dev added a commit that referenced this issue Sep 24, 2024
Addresses the backend part of #334:

- Replace `Layout` by the new `View` object, a DTO containing a
`Layout`.
- Add `Project.put_view()`
- Add `Project.get_view()`
- Add `Project.delete_view()`
- Add `Project.list_view_keys()`

This PR does not change the skore-ui API; this will come in a future PR.

---------

Co-authored-by: Thomas S. <thomas@probabl.ai>
@tuscland tuscland unpinned this issue Sep 27, 2024
thomass-dev added a commit that referenced this issue Dec 2, 2024
Addresses the backend part of #334:

- Replace `Layout` by the new `View` object, a DTO containing a
`Layout`.
- Add `Project.put_view()`
- Add `Project.get_view()`
- Add `Project.delete_view()`
- Add `Project.list_view_keys()`

This PR does not change the skore-ui API; this will come in a future PR.

---------

Co-authored-by: Thomas S. <thomas@probabl.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants