Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] Decide on versioning: keep or remove #4467

Closed
merelcht opened this issue Feb 7, 2025 · 1 comment
Closed

[Spike] Decide on versioning: keep or remove #4467

merelcht opened this issue Feb 7, 2025 · 1 comment
Assignees

Comments

@merelcht
Copy link
Member

merelcht commented Feb 7, 2025

Description

While we've done a lot of exploration into integrating Kedro with other tools to facilitate versioning, we haven't actually made the decision to keep or remove the native Kedro versioning feature.

The purpose of this spike is to collect information and insights into what it would mean if we removed the native versioning feature, so that we can make a decision.

  • Pros & cons of removal
  • Impact of removal: what does this mean for users? For other features?
  • Estimate of engineering effort of removal

Context

#4199 (comment)
#4199 (comment)
#4129 (comment)

@merelcht merelcht added this to the Dataset Versioning milestone Feb 7, 2025
@merelcht merelcht moved this to To Do in Kedro 🔶 Feb 10, 2025
@astrojuanlu astrojuanlu moved this from To Do to In Progress in Kedro 🔶 Feb 14, 2025
@astrojuanlu
Copy link
Member

This was discussed on Tech Design 2025-02-19

@merelcht and @lrcouto exposed the pros and cons of keeping and removing versioning, as well as a "third way" which was moving versioning information to the catalog. From the discussion, it was clear that, even with the available quantitative evidence that versioning has low usage #4129 (comment), there's some qualitative evidence that this is very much desired.

A comment from today:

Since I'm using other versioned datasets I didnt want to introduce a different tool to manage versioning.
I dont know much about delta lake and iceberg, but seems like an overkill for most usecases, no? dvc seems promising.

@jccalvojackson in kedro-org/kedro-plugins#1010 (comment)

And on top of that, despite how promising Delta and Iceberg are, they are not yet fully mature. @deepyaman wanted us to have close integration with those, which is a good idea anyway.

@ElenaKhaustova made some clarifications on the "move to the catalog" idea (details to be filled) but in the end the decision was to keep versioning as is.

I'm closing this spike and making some updates in the other (scattered) issues we have around the topic. Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants