-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc and apache hudi integration #4937
Comments
@LuisMoralesAlonso Could you please elaborate? |
|
Incremental data, might be related to #331 |
@LuisMoralesAlonso there are a few more questions...
|
answers: |
any comment here? |
@LuisMoralesAlonso sorry for the delay. I'm trying to understand where do you have data versioning already and when it needs to be introduced. So far, it seems like DVC and Hudi have a bit different purposes and I'm trying to understand your scenario (and Hudi) better. Does Hudi have proper versioning? I'm not a Hudi expert, but it seems like it can efficiently support the latest version but not the whole history.
Are you building/deriving features for datascience-hub from the regular tables/datahubs or from some other sources/streaming? Do you have any versioning for regular data hubs/tables?
Would you like to create a version of Hudi "table" by a request?
It is usually done with real streaming. I thought that Hudi cannot handle this level of latency but I'm not an expert in Hudi. PS: It can be way more efficient to schedule a chat - please feel free to shoot me an email to my-first-name at iterative.ai or DM at https://twitter.com/fullstackml |
Closing as stale. Please feel free to reopen. |
Does it make sense this kind of integration? We could rely on hudi to manage versions (incrementals this time, so with less storage needs).
hope your comments,
luis
The text was updated successfully, but these errors were encountered: