-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: move indexing to an application layer #39
Comments
We need to think how to deal with additional tables that are created during indexing, like buckets or partials. So this is not just normal UDF that has an output of some rows in a dataset table, but needs to insert into |
I think we should start getting rid of partials. They are too complicated for the value they provide. Same with buckets / sources - I would reconsider also drop them. Each path that we pass to from_storage can be creating a versioned dataset. We can decide to reuse those (as a way to cache things) with some expiration date, etc. What are the major things we are loosing by getting rid of bucket, sources, partials? |
Partials are needed to be able to index part of a bucket and to avoid re-indexing subdirectories. I have a feeling though that this can all be done even without that partials table, just on the fly but this needs to be investigated. |
and
Both are good ideas! Let's try to simplify this as much as we can.
Right. We need to find a way to fit the buckets (as well as partials i if needed) into "just normal UDF" and normal datasets. I hope these datasets won't be visible to users (by default). |
Prioritizing this. It's an epic. Need to add first steps. |
I can take over this one and make a plan / subtasks |
Description
i.e make it based on a feature schema and if possible, with udfs.
Subtasks
DataChain.from_records()
#246.from_storage()
to use listing generator #266.get_storage()
caching listing lazy #317DataChain.from_storage()
#318Catalog
storages methods to use new listing mechanism #329DatasetQuery
#340source
andpath
#447The text was updated successfully, but these errors were encountered: