Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory usage during materialization #2073

Merged
merged 1 commit into from
Nov 19, 2021

Conversation

judahrand
Copy link
Member

@judahrand judahrand commented Nov 19, 2021

What this PR does / why we need it:
I have run memory usage into issues on initial materialization of a large FeatureView with approaching 10 million rows and 30 columns. Acquiring the Arrow table succeeds but the conversion to Protobuf seems to require duplicating the whole dataset in memory which is not practical.

This initial improvement batches the table returned by to_arrow into RecordBatches which improves the situation. Looking forward I think that there should be the option (or requirement?) for RetrievalJobs to yield these RecordBatches directly to further reduce memory requirements.

This PR improves the situation from #2071 but I do not believe is enough to close it.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

None

@feast-ci-bot
Copy link
Collaborator

Hi @judahrand. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@judahrand
Copy link
Member Author

/ok-to-test

@feast-ci-bot
Copy link
Collaborator

@judahrand: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
@adchia
Copy link
Collaborator

adchia commented Nov 19, 2021

/ok-to-test

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, judahrand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@judahrand
Copy link
Member Author

Failures don't look related to PR

@adchia
Copy link
Collaborator

adchia commented Nov 19, 2021

FYI @tsotnet looks like an issue with feature server docker images

@codecov-commenter
Copy link

Codecov Report

Merging #2073 (1d31495) into master (91b37e7) will increase coverage by 25.12%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #2073       +/-   ##
===========================================
+ Coverage   58.19%   83.31%   +25.12%     
===========================================
  Files         100      100               
  Lines        8082     8084        +2     
===========================================
+ Hits         4703     6735     +2032     
+ Misses       3379     1349     -2030     
Flag Coverage Δ
integrationtests 73.74% <100.00%> (?)
unittests 58.20% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/infra/passthrough_provider.py 100.00% <100.00%> (+11.11%) ⬆️
sdk/python/feast/infra/provider.py 91.73% <100.00%> (+4.95%) ⬆️
sdk/python/feast/data_source.py 61.29% <0.00%> (+0.40%) ⬆️
sdk/python/feast/online_response.py 87.71% <0.00%> (+1.75%) ⬆️
sdk/python/feast/infra/online_stores/sqlite.py 96.77% <0.00%> (+2.15%) ⬆️
sdk/python/feast/feature.py 72.72% <0.00%> (+3.03%) ⬆️
sdk/python/feast/errors.py 70.00% <0.00%> (+5.00%) ⬆️
sdk/python/feast/infra/offline_stores/file.py 96.80% <0.00%> (+8.00%) ⬆️
sdk/python/feast/registry.py 80.41% <0.00%> (+10.13%) ⬆️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91b37e7...1d31495. Read the comment docs.

@feast-ci-bot feast-ci-bot merged commit 380cc62 into feast-dev:master Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants