-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support eager mode #2331
Support eager mode #2331
Conversation
dbb1caf
to
b229454
Compare
TODO: Need to add the document |
a5ca324
to
00d77d3
Compare
def replace_id(part): | ||
find_params = part[1] | ||
if not find_params: | ||
find_params = ({}, {}) | ||
|
||
assert isinstance(find_params[1], dict) | ||
find_params = list(find_params) | ||
find_params[1]['_id'] = 1 | ||
return (part[0], tuple(find_params), part[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only return _id
superduper/backends/mongodb/query.py
Outdated
def filter(self, filter): | ||
"""Return a query that filters the documents. | ||
|
||
:param filter: The filter to apply. | ||
""" | ||
|
||
def replace_function(part): | ||
find_params = part[1] | ||
if not find_params: | ||
find_params = ({},) | ||
|
||
assert isinstance(find_params[0], dict) | ||
|
||
find_params[0].update(filter) | ||
return (part[0], tuple(find_params), part[2]) | ||
|
||
return self._replace_part('find', replace_function) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the key of the filter to the list of returned fields.⬤
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you are using the attribute/ method .datas()
to fetch data eagerly.
Why not just use .execute()
directly?
CFG.eager = True
r = table.select().execute()[0]
I think the Eager mode can be always enabled, so the returned data is not the raw data. This approach allows users to purposefully use the Eager mode. If we directly combine it with WDYT? |
superduper/backends/base/query.py
Outdated
|
||
assert self.db is not None, 'No datalayer (db) provided' | ||
query = self | ||
if not len(query.parts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means query is Simply a Table, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
def _eager_call__(self, *args, **kwargs): | ||
from superduper.misc.eager import SuperDuperData, SuperDuperDataType, TrackData | ||
|
||
have_sdd, graph = SuperDuperData.detect_and_get_graph(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already been called at line 910?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the graph parameter is retrieved again because I don’t want the graph parameter to be passed from upstream. This function has very low overhead.
00d77d3
to
548814c
Compare
After discuss with @kartik4949 , I change to When |
docs/content/tutorials/eager_mode.md
Outdated
When using `select.execute(eager_mode=True)`, all returned data will enter eager mode, which can be used for interactive model pipeline construction. | ||
|
||
```python | ||
datas = list(db["documents"].select().execute(eager_mode=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting language thing: plural of "data" is "data", singular is "datum".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, Got it, I changed all of them
superduper/backends/mongodb/query.py
Outdated
project.update({k: 1 for k in filter_mapping_base.keys()}) | ||
|
||
predict_ids_in_filter = [ | ||
key.replace("_outputs__", "") for key in filter_mapping_outptus.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this _outputs__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimized the logic by using predict_id.
Additionally, since MongoMock
does not support ‘.’ in the ‘as’ field for lookups, the key was converted to ‘outputs_’.
After the join, the complete outputs data can be queried as _outputs__{predict_id}._outputs.{predict_id}
: result.
Explanations have already been added in the code.
…_ids during the application build_from_db.
1981447
to
eabcea8
Compare
- Support eager mode condition filter - Add eager mode tutorial - Use select.execute instead of select.datas to initiate eager mode.
Description
Related Issues
Checklist
make unit_testing
andmake integration-testing
successfully?Additional Notes or Comments