Support eager mode #2331

jieguangzhou · 2024-07-19T06:16:16Z

Description

Related Issues

Checklist

Is this code covered by new or existing unit tests or integration tests?
Did you run make unit_testing and make integration-testing successfully?
Do new classes, functions, methods and parameters all have docstrings?
Were existing docstrings updated, if necessary?
Was external documentation updated, if necessary?

Additional Notes or Comments

jieguangzhou · 2024-07-19T15:32:48Z

TODO: Need to add the document

jieguangzhou · 2024-07-22T15:31:28Z

superduper/backends/mongodb/query.py

+        def replace_id(part):
+            find_params = part[1]
+            if not find_params:
+                find_params = ({}, {})
+
+            assert isinstance(find_params[1], dict)
+            find_params = list(find_params)
+            find_params[1]['_id'] = 1
+            return (part[0], tuple(find_params), part[2])


Only return _id

jieguangzhou · 2024-07-22T15:32:45Z

superduper/backends/mongodb/query.py

+    def filter(self, filter):
+        """Return a query that filters the documents.
+
+        :param filter: The filter to apply.
+        """
+
+        def replace_function(part):
+            find_params = part[1]
+            if not find_params:
+                find_params = ({},)
+
+            assert isinstance(find_params[0], dict)
+
+            find_params[0].update(filter)
+            return (part[0], tuple(find_params), part[2])
+
+        return self._replace_part('find', replace_function)


Add the key of the filter to the list of returned fields.⬤

blythed

I see that you are using the attribute/ method .datas() to fetch data eagerly.

Why not just use .execute() directly?

CFG.eager = True
r = table.select().execute()[0]

jieguangzhou · 2024-07-24T03:56:28Z

I see that you are using the attribute/ method .datas() to fetch data eagerly.

Why not just use .execute() directly?
CFG.eager = True
r = table.select().execute()[0]

I think the Eager mode can be always enabled, so the returned data is not the raw data.

This approach allows users to purposefully use the Eager mode.

If we directly combine it with .execute(), we need to return the original data type (or write related records of SuperDuperData as hidden attributes into the original data without modifying its data type), otherwise, it will cause hidden bugs in downstream applications.

WDYT?

kartik4949 · 2024-07-23T10:20:25Z

superduper/backends/base/query.py

+
+        assert self.db is not None, 'No datalayer (db) provided'
+        query = self
+        if not len(query.parts):


This means query is Simply a Table, right?

kartik4949 · 2024-07-23T10:23:43Z

superduper/components/model.py

+    def _eager_call__(self, *args, **kwargs):
+        from superduper.misc.eager import SuperDuperData, SuperDuperDataType, TrackData
+
+        have_sdd, graph = SuperDuperData.detect_and_get_graph(*args, **kwargs)


already been called at line 910?

Yes, the graph parameter is retrieved again because I don’t want the graph parameter to be passed from upstream. This function has very low overhead.

jieguangzhou · 2024-07-24T14:14:01Z

I see that you are using the attribute/ method .datas() to fetch data eagerly.

Why not just use .execute() directly?
CFG.eager = True
r = table.select().execute()[0]

After discuss with @kartik4949 , I change to datas = list(db["documents"].select().execute(eager_mode=True))

When eager_mode=False, the original data is returned without any additional information. Otherwise, it enters eager mode, returning SuperDuperData and begins tracking operations performed on the data.

blythed · 2024-07-24T15:02:37Z

docs/content/tutorials/eager_mode.md

+When using `select.execute(eager_mode=True)`, all returned data will enter eager mode, which can be used for interactive model pipeline construction.
+
+```python
+datas = list(db["documents"].select().execute(eager_mode=True))


Interesting language thing: plural of "data" is "data", singular is "datum".

Hmm, Got it, I changed all of them

blythed · 2024-07-25T01:34:13Z

superduper/backends/mongodb/query.py

+            project.update({k: 1 for k in filter_mapping_base.keys()})
+
+        predict_ids_in_filter = [
+            key.replace("_outputs__", "") for key in filter_mapping_outptus.keys()


Why this _outputs__?

Optimized the logic by using predict_id.

Additionally, since MongoMock does not support ‘.’ in the ‘as’ field for lookups, the key was converted to ‘outputs_’.

After the join, the complete outputs data can be queried as _outputs__{predict_id}._outputs.{predict_id} : result.

Explanations have already been added in the code.

superduper/misc/eager.py

…_ids during the application build_from_db.

- Support eager mode condition filter - Add eager mode tutorial - Use select.execute instead of select.datas to initiate eager mode.

jieguangzhou force-pushed the feat/eager-mode branch 3 times, most recently from dbb1caf to b229454 Compare July 19, 2024 15:32

jieguangzhou force-pushed the feat/eager-mode branch 4 times, most recently from a5ca324 to 00d77d3 Compare July 22, 2024 15:29

jieguangzhou commented Jul 22, 2024

View reviewed changes

jieguangzhou requested review from kartik4949 and blythed July 22, 2024 15:57

blythed reviewed Jul 24, 2024

View reviewed changes

kartik4949 suggested changes Jul 24, 2024

View reviewed changes

jieguangzhou force-pushed the feat/eager-mode branch from 00d77d3 to 548814c Compare July 24, 2024 13:57

jieguangzhou requested a review from kartik4949 July 24, 2024 14:15

blythed reviewed Jul 24, 2024

View reviewed changes

blythed approved these changes Jul 24, 2024

View reviewed changes

blythed reviewed Jul 25, 2024

View reviewed changes

superduper/misc/eager.py Outdated Show resolved Hide resolved

jieguangzhou added 3 commits July 25, 2024 10:46

Fix the bug of components with the same identifier but different type…

af011fe

…_ids during the application build_from_db.

Support eager mode

627023f

Fixed ibis outputs

149af2c

jieguangzhou force-pushed the feat/eager-mode branch 3 times, most recently from 1981447 to eabcea8 Compare July 25, 2024 03:53

Optimized the Eager mode.

eabcea8

- Support eager mode condition filter - Add eager mode tutorial - Use select.execute instead of select.datas to initiate eager mode.

kartik4949 approved these changes Jul 25, 2024

View reviewed changes

jieguangzhou merged commit 78f4ede into superduper-io:main Jul 25, 2024
3 checks passed

jieguangzhou linked an issue Jul 25, 2024 that may be closed by this pull request

[USER-EXP-LIST] [FEAT] Optimizing the building experience of data pipelines #2258

Closed

blythed mentioned this pull request Aug 8, 2024

Draft: Implement data observation properties (#2124) #2182

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support eager mode #2331

Support eager mode #2331

jieguangzhou commented Jul 19, 2024

jieguangzhou commented Jul 19, 2024

jieguangzhou Jul 22, 2024

jieguangzhou Jul 22, 2024

blythed left a comment

jieguangzhou commented Jul 24, 2024 •

edited

Loading

kartik4949 Jul 23, 2024

jieguangzhou Jul 24, 2024

kartik4949 Jul 23, 2024

jieguangzhou Jul 24, 2024

jieguangzhou commented Jul 24, 2024 •

edited

Loading

blythed Jul 24, 2024

jieguangzhou Jul 25, 2024

blythed Jul 25, 2024

jieguangzhou Jul 25, 2024 •

edited

Loading

Support eager mode #2331

Support eager mode #2331

Conversation

jieguangzhou commented Jul 19, 2024

Description

Related Issues

Checklist

Additional Notes or Comments

jieguangzhou commented Jul 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blythed left a comment

Choose a reason for hiding this comment

jieguangzhou commented Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jieguangzhou commented Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jieguangzhou Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

jieguangzhou commented Jul 24, 2024 •

edited

Loading

jieguangzhou commented Jul 24, 2024 •

edited

Loading

jieguangzhou Jul 25, 2024 •

edited

Loading