[Benchmark] Remove ingest results collection #272

jmazanec15 · 2022-01-28T17:06:46Z

Signed-off-by: John Mazanec jmazane@amazon.com

Description

When running a benchmark with one of the BIGANN data sets, the tool was killed due to OOM during ingest step:

dmesg -T| grep -E -i -B100 'killed process'
...
[Fri Jan 28 02:00:36 2022] Out of memory: Killed process 8825 (python3) total-vm:72456280kB, anon-rss:70883120kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:139388kB oom_score_adj:0

Looking into the ingest step, I realized the problem was most likely that we were collecting the responses to the index requests. Because we dont use these responses (we just track total time), we can just get rid of this collection.

Check List

Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Removes collecting ingest results from the benchmarking tool. On big data sets, this will prevent the process from going out of memory. Signed-off-by: John Mazanec <jmazane@amazon.com>

codecov-commenter · 2022-01-28T17:23:16Z

Codecov Report

Merging #272 (788f09b) into main (7e9d4c5) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main     #272   +/-   ##
=========================================
  Coverage     83.38%   83.38%           
  Complexity      884      884           
=========================================
  Files           127      127           
  Lines          3833     3833           
  Branches        361      361           
=========================================
  Hits           3196     3196           
  Misses          475      475           
  Partials        162      162

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e9d4c5...788f09b. Read the comment docs.

martin-gaievski · 2022-01-28T17:32:07Z

Nice finding. Do you know why those responses were introduced in a first place - was it some un-implemented idea or it's a leftover code after some changes in a past?

martin-gaievski · 2022-01-28T17:34:38Z

benchmarks/perf-tool/okpt/test/steps/steps.py

        for i in range(0, self.doc_count, self.bulk_size):
            partition = self.dataset.read(self.bulk_size)
            if partition is None:
                break
            body = bulk_transform(partition, self.field_name, action, i)
-            result = bulk_index(self.opensearch, self.index_name, body)
-            index_responses.append(result)
+            bulk_index(self.opensearch, self.index_name, body)


Do you think it makes sense to document our decision with a code comment?

Yes will add

jmazanec15 · 2022-01-28T17:37:58Z

@martin-gaievski Initially, we collected the responses and parse the took time per request after the loop. If we wanted to do this again, we would need to do the parsing in the loop

Signed-off-by: John Mazanec <jmazane@amazon.com>

martin-gaievski

Lgtm

Removes collecting ingest results from the benchmarking tool. On big data sets, this will prevent the process from going out of memory. Signed-off-by: John Mazanec <jmazane@amazon.com>

Removes collecting ingest results from the benchmarking tool. On big data sets, this will prevent the process from going out of memory. Signed-off-by: John Mazanec <jmazane@amazon.com> Signed-off-by: Martin Gaievski <gaievski@amazon.com>

Removes collecting ingest results from the benchmarking tool. On big data sets, this will prevent the process from going out of memory. Signed-off-by: John Mazanec <jmazane@amazon.com>

[Benchmark] Remove ingest results collection

b83b02d

Removes collecting ingest results from the benchmarking tool. On big data sets, this will prevent the process from going out of memory. Signed-off-by: John Mazanec <jmazane@amazon.com>

jmazanec15 requested review from a team, martin-gaievski and VijayanB January 28, 2022 17:06

jmazanec15 added the Infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. label Jan 28, 2022

martin-gaievski reviewed Jan 28, 2022

View reviewed changes

Add descriptive comment

788f09b

Signed-off-by: John Mazanec <jmazane@amazon.com>

jmazanec15 requested a review from martin-gaievski January 28, 2022 17:48

martin-gaievski approved these changes Jan 28, 2022

View reviewed changes

VijayanB approved these changes Jan 28, 2022

View reviewed changes

jmazanec15 merged commit c46b3de into opensearch-project:main Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] Remove ingest results collection #272

[Benchmark] Remove ingest results collection #272

jmazanec15 commented Jan 28, 2022

codecov-commenter commented Jan 28, 2022 •

edited

Loading

martin-gaievski commented Jan 28, 2022

martin-gaievski Jan 28, 2022

jmazanec15 Jan 28, 2022

martin-gaievski Jan 28, 2022

jmazanec15 commented Jan 28, 2022

martin-gaievski left a comment

[Benchmark] Remove ingest results collection #272

[Benchmark] Remove ingest results collection #272

Conversation

jmazanec15 commented Jan 28, 2022

Description

Check List

codecov-commenter commented Jan 28, 2022 • edited Loading

Codecov Report

martin-gaievski commented Jan 28, 2022

martin-gaievski Jan 28, 2022

Choose a reason for hiding this comment

jmazanec15 Jan 28, 2022

Choose a reason for hiding this comment

martin-gaievski Jan 28, 2022

Choose a reason for hiding this comment

jmazanec15 commented Jan 28, 2022

martin-gaievski left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 28, 2022 •

edited

Loading