-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Benchmark] Remove ingest results collection #272
Conversation
Removes collecting ingest results from the benchmarking tool. On big data sets, this will prevent the process from going out of memory. Signed-off-by: John Mazanec <jmazane@amazon.com>
Codecov Report
@@ Coverage Diff @@
## main #272 +/- ##
=========================================
Coverage 83.38% 83.38%
Complexity 884 884
=========================================
Files 127 127
Lines 3833 3833
Branches 361 361
=========================================
Hits 3196 3196
Misses 475 475
Partials 162 162 Continue to review full report at Codecov.
|
Nice finding. Do you know why those responses were introduced in a first place - was it some un-implemented idea or it's a leftover code after some changes in a past? |
for i in range(0, self.doc_count, self.bulk_size): | ||
partition = self.dataset.read(self.bulk_size) | ||
if partition is None: | ||
break | ||
body = bulk_transform(partition, self.field_name, action, i) | ||
result = bulk_index(self.opensearch, self.index_name, body) | ||
index_responses.append(result) | ||
bulk_index(self.opensearch, self.index_name, body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it makes sense to document our decision with a code comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes will add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
@martin-gaievski Initially, we collected the responses and parse the took time per request after the loop. If we wanted to do this again, we would need to do the parsing in the loop |
Signed-off-by: John Mazanec <jmazane@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Removes collecting ingest results from the benchmarking tool. On big data sets, this will prevent the process from going out of memory. Signed-off-by: John Mazanec <jmazane@amazon.com>
Removes collecting ingest results from the benchmarking tool. On big data sets, this will prevent the process from going out of memory. Signed-off-by: John Mazanec <jmazane@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Removes collecting ingest results from the benchmarking tool. On big data sets, this will prevent the process from going out of memory. Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec jmazane@amazon.com
Description
When running a benchmark with one of the BIGANN data sets, the tool was killed due to OOM during ingest step:
Looking into the ingest step, I realized the problem was most likely that we were collecting the responses to the index requests. Because we dont use these responses (we just track total time), we can just get rid of this collection.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.