Epic: move indexing to an application layer #39

skshetry · 2024-06-18T05:55:21Z

Description

i.e make it based on a feature schema and if possible, with udfs.

Subtasks

Give feedback

The text was updated successfully, but these errors were encountered:

ilongin · 2024-07-22T15:47:01Z

We need to think how to deal with additional tables that are created during indexing, like buckets or partials. So this is not just normal UDF that has an output of some rows in a dataset table, but needs to insert into buckets and partials tables.
It's easy for us to implement this, but if we want users to implement their own indexing maybe we need to provide framework to do so implicitly (user should not care about those tables explicitly) ... WDYT?

shcheklein · 2024-07-22T15:52:33Z

I think we should start getting rid of partials. They are too complicated for the value they provide. Same with buckets / sources - I would reconsider also drop them.

Each path that we pass to from_storage can be creating a versioned dataset. We can decide to reuse those (as a way to cache things) with some expiration date, etc.

What are the major things we are loosing by getting rid of bucket, sources, partials?

ilongin · 2024-07-22T16:35:46Z

Partials are needed to be able to index part of a bucket and to avoid re-indexing subdirectories. I have a feeling though that this can all be done even without that partials table, just on the fly but this needs to be investigated.

dmpetrov · 2024-07-30T18:32:57Z

I think we should start getting rid of partials. T

and

that this can all be done even without that partials table, just on the fly but this needs to be investigated.

Both are good ideas! Let's try to simplify this as much as we can.

We need to think how to deal with additional tables that are created during indexing, like buckets or partials. So this is not just normal UDF that has an output of some rows in a dataset table

Right. We need to find a way to fit the buckets (as well as partials i if needed) into "just normal UDF" and normal datasets. I hope these datasets won't be visible to users (by default).

shcheklein · 2024-07-31T16:59:44Z

Prioritizing this. It's an epic. Need to add first steps.

ilongin · 2024-07-31T18:25:24Z

I can take over this one and make a plan / subtasks

skshetry added the bug Something isn't working label Jun 18, 2024

skshetry self-assigned this Jun 18, 2024

skshetry removed the bug Something isn't working label Jun 18, 2024

skshetry changed the title ~~Move indexing to application layer~~ Move indexing to an application layer Jun 18, 2024

dmpetrov mentioned this issue Jul 13, 2024

Epic: File IO in application level #33

Closed

skshetry removed their assignment Jul 8, 2024

dmpetrov transferred this issue from another repository Jul 13, 2024

shcheklein changed the title ~~Move indexing to an application layer~~ Epic: move indexing to an application layer Jul 31, 2024

ilongin self-assigned this Jul 31, 2024

This was referenced Aug 11, 2024

Clean up cache #271

Open

Flatten file signals are still working in filters #212

Closed

shcheklein closed this as completed Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: move indexing to an application layer #39

Epic: move indexing to an application layer #39

skshetry commented Jun 18, 2024 •

edited by ilongin

Loading

Subtasks

ilongin commented Jul 22, 2024

shcheklein commented Jul 22, 2024

ilongin commented Jul 22, 2024

dmpetrov commented Jul 30, 2024

shcheklein commented Jul 31, 2024

ilongin commented Jul 31, 2024

Epic: move indexing to an application layer #39

Epic: move indexing to an application layer #39

Comments

skshetry commented Jun 18, 2024 • edited by ilongin Loading

Description

Subtasks

ilongin commented Jul 22, 2024

shcheklein commented Jul 22, 2024

ilongin commented Jul 22, 2024

dmpetrov commented Jul 30, 2024

shcheklein commented Jul 31, 2024

ilongin commented Jul 31, 2024

skshetry commented Jun 18, 2024 •

edited by ilongin

Loading