Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dataset folders (backend) #32520

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

feat: dataset folders (backend) #32520

wants to merge 1 commit into from

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Mar 5, 2025

SUMMARY

Tentative implementation of the backend for #32351.

I ended up doing the simplest thing that works, based on these requirements:

  1. We want to reuse existing dataset APIs (there will be no new APIs to manage folders).
  2. Order of folders is important, as well as the order of elements within folders.
  3. Support for nested folders should be present from the start (new requirement).

I started the implementation with a Folder model, and mapped relationships:

  • Folder n:1 SqlaTable
  • SqlMetric n:1 Folder
  • TableColumn n:1 Folder

One problem with this approach is that order is important, so I had to keep track of the order of each folder and each element inside a folder. This required additional columns, and complex bookkeeping when a dataset was updated — did the positions change? Were metrics removed, columns added, folders renamed? Moving the last element in a folder to the first position, for example, would require updating the position of all elements inside the folder.

Additionally, representing nested folders would require additional relationships to be tracked.

I ditched that approach, and instead opted to serialize the folder structure to a new JSON column in the SqlaTable model called folders. When doing a GET request to /api/v1/dataset/ or /api/v1/explore/ the response now includes UUIDs for metrics and columns, and has the new attribute folders:

{
  ...
  "metrics": [
    {
      "metric_name": "count",
      "uuid": "uuid2",
      ...
    },
    ...
  ],
  "columns": [
    {
      "column_name": "country",
      "uuid": "uuid5",
      ...
    },
    {
      "column_name": "column-not-in-any-folder",
      "uuid": "uuid6",
      ...
    },
    ...
  ],
  "folders": [
    {
      "type": "folder",
      "uuid": "uuid1",
      "name": "My metrics",
      "children": [
        {
          "type": "metric",
          "uuid": "uuid2",
          "name": "count",
        },
      ],
    },
    {
      "type": "folder",
      "uuid": "uuid3",
      "name": "My columns",
      "children": [
        {
          "type": "folder",
          "uuid": "uuid4",
          "name": "Dimensions",
          "children": [
            {
              "type": "column",
              "uuid": "uuid5",
              "name": "country",
            },
          ],
        },
      ],
    },
  ]
}

Note that for metrics and columns type and name are redundant (since they're still present in the response payload under metrics and columns, respectively). But to make the API clearer I uncluded them in the response, even if technically only the UUID is needed.

With this solution the frontend can easily build the UI from this response. Note that the payload only includes custom folders, and the metrics and columns attribute in the response are unmodified (meaning a metric that is present in a folder will still show up under metrics). It's up to the frontend to build the existing "Columns" and "Metrics" sections by removing any elements that are present in custom folders. This way we can build the feature progressively by first enhancing the API, and later adding the custom UI.

To organize the metrics and columns into folders, as well as create new folders, the user must edit the dataset (dataset creation doesn't show metrics nor columns). After creating folders and adding metrics/columns to them the user can save the dataset. The PUT request will then send a payload that can also be enhanced with the folders attribute.

There is no model for folders, since it offered little to no value. Instead, the client simply declare the folder name, UUID, and an optional description, as well as the tree structure. The UUID is used for external bookkeeping, for example, for systems where semantics are defined outside of Superset and periodically synced via the API.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Added tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link

korbit-ai bot commented Mar 5, 2025

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

@github-actions github-actions bot added risk:db-migration PRs that require a DB migration api Related to the REST API preset-io labels Mar 5, 2025
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Mar 5, 2025
@mistercrunch
Copy link
Member

Pointing out something related here, is that the tree component is likely going to be composed of react nodes (not just the description). Today we have [from memory, components names may not match] MetricLabel, ColumnLabel, CalculatedColumnLabel, and these require props beyond what's in folders (description to make a InfoBubbleTooltip, sql_expression, data type, ...). Assuming we want for a rich tree with the full labels (at least on the left panel in explore), this means we'll still have to lookup the related objects from the API.

Thinking about future frontend development, we'll need some sort of assembleTreeDataForComponent(apiPayload) method, and that it'll have to build rich react nodes. One the other side, we'll need something that takes the AntdTree.treeData and prepares the POST payload required by the API, this one should be fairly simple I think.

@betodealmeida
Copy link
Member Author

/korbit-review

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Fix Detected
Design Non-normalized JSON storage design ▹ view
Performance Unbounded Recursive Nesting ▹ view
Suppressed issues based on your team's Korbit activity
This issue Is similar to Because

lines 565:567:

The DATASET_FOLDERS feature flag is disabled by default which prevents users from using the newly implemented folder organization functionality.

Default value of CATALOGS_SIMPLIFIED_MIGRATION negates intended performance improvement

Similar issues were not addressed in the past

When you react to issues (for example, an upvote or downvote) or you fix them, Korbit will tune future reviews based on these signals.

Files scanned
File Path Reviewed
superset/migrations/versions/2025-03-03_20-52_94e7a3499973_add_folder_table.py
superset/commands/dataset/update.py
superset/datasets/schemas.py
superset/datasets/api.py
superset/connectors/sqla/models.py
superset/config.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

  • You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.
  • You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.
  • Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.
  • On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
  • Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

  • Check out our docs on how you can make Korbit work best for you and your team.
  • Customize Korbit for your organization through the Korbit Console.

Current Korbit Configuration

General Settings
Setting Value
Review Schedule Automatic excluding drafts
Max Issue Count 10
Automatic PR Descriptions
Issue Categories
Category Enabled
Documentation
Logging
Error Handling
Readability
Design
Performance
Security
Functionality

Feedback and Support

Note

Korbit Pro is free for open source projects 🎉

Looking to add Korbit to your team? Get started with a free 2 week trial here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API preset-io review:draft risk:db-migration PRs that require a DB migration size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants