Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rally benchmark aws.billing #8403

Merged
merged 7 commits into from
Nov 23, 2023
Merged

Conversation

aspacca
Copy link
Contributor

@aspacca aspacca commented Nov 6, 2023

Enhancement

Proposed commit message

Add artifacts for elastic-package rally benchmark

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
    - [ ] I have added an entry to my package's changelog.yml file.
    - [ ] I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

Checkout branch from elastic/elastic-package#1522, build elastic-package and execute from aws package root (remember to bring up the elastic-package stack before):
./elastic-package benchmark rally --benchmark billing-benchmark -v

Related issues

Screenshots

--- Benchmark results for package: aws - START ---
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
│ info                                                                                             │
├────────────────────────┬─────────────────────────────────────────────────────────────────────────┤
│ benchmark              │                                                       metrics-benchmark │
│ description            │                                         Benchmark 20000 events ingested │
│ run ID                 │                                    dd19ebda-e59b-4927-9c75-6c76b191f248 │
│ package                │                                                                     aws │
│ start ts (s)           │                                                              1699247322 │
│ end ts (s)             │                                                              1699247354 │
│ duration               │                                                                     32s │
│ generated corpora file │ /Users/andreaspacca/.elastic-package/tmp/rally_corpus/corpus-1153363049 │
╰────────────────────────┴─────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────╮
│ parameters                                                            │
├─────────────────────────────────┬─────────────────────────────────────┤
│ package version                 │                               2.8.5 │
│ data_stream.name                │                             billing │
│ corpora.generator.total_events  │                               20000 │
│ corpora.generator.template.path │ ./metrics-benchmark/template.ndjson │
│ corpora.generator.template.raw  │                                     │
│ corpora.generator.template.type │                              gotext │
│ corpora.generator.config.path   │      ./metrics-benchmark/config.yml │
│ corpora.generator.config.raw    │                               map[] │
│ corpora.generator.fields.path   │      ./metrics-benchmark/fields.yml │
│ corpora.generator.fields.raw    │                               map[] │
╰─────────────────────────────────┴─────────────────────────────────────╯
╭───────────────────────╮
│ cluster info          │
├───────┬───────────────┤
│ name  │ elasticsearch │
│ nodes │             1 │
╰───────┴───────────────╯
╭─────────────────────────────────────────────────────╮
│ data stream stats                                   │
├────────────────────────────┬────────────────────────┤
│ data stream                │ metrics-aws.billing-ep │
│ approx total docs ingested │                  20000 │
│ backing indices            │                      1 │
│ store size bytes           │                4226827 │
│ maximum ts (ms)            │          1701767195992 │
╰────────────────────────────┴────────────────────────╯
╭───────────────────────────────────────╮
│ disk usage for index .ds-metrics-aws. │
│ billing-ep-2023.11.06-000001 (for all │
│ fields)                               │
├──────────────────────────────┬────────┤
│ total                        │ 3.5 MB │
│ inverted_index.total         │ 703 kB │
│ inverted_index.stored_fields │ 1.4 MB │
│ inverted_index.doc_values    │ 1.1 MB │
│ inverted_index.points        │ 348 kB │
│ inverted_index.norms         │    0 B │
│ inverted_index.term_vectors  │    0 B │
│ inverted_index.knn_vectors   │    0 B │
╰──────────────────────────────┴────────╯
╭───────────────────────────────────────────────────────────────────────────────╮
│ pipeline metrics-aws.billing-2.8.5 stats in node 7AYCd2EXQaCSOf-0fKxFBg       │
├───────────────────────────────────────┬───────────────────────────────────────┤
│ Totals                                │ Count: 20000 | Failed: 0 | Time: 27ms │
│ fingerprint ()                        │ Count: 20000 | Failed: 0 | Time: 13ms │
│ pipeline (metrics-aws.billing@custom) │  Count: 20000 | Failed: 0 | Time: 2ms │
╰───────────────────────────────────────┴───────────────────────────────────────╯
╭────────────────────────────────────────────────────────────────────────────────────────────╮
│ rally stats                                                                                │
├────────────────────────────────────────────────────────────────┬───────────────────────────┤
│ Cumulative indexing time of primary shards                     │    13.694183333333333 min │
│ Min cumulative indexing time across primary shards             │                     0 min │
│ Median cumulative indexing time across primary shards          │                0.2554 min │
│ Max cumulative indexing time across primary shards             │     2.162183333333333 min │
│ Cumulative indexing throttle time of primary shards            │                     0 min │
│ Min cumulative indexing throttle time across primary shards    │                     0 min │
│ Median cumulative indexing throttle time across primary shards │                     0 min │
│ Max cumulative indexing throttle time across primary shards    │                     0 min │
│ Cumulative merge time of primary shards                        │     3.796183333333333 min │
│ Cumulative merge count of primary shards                       │                      3085 │
│ Min cumulative merge time across primary shards                │                     0 min │
│ Median cumulative merge time across primary shards             │   0.05238333333333333 min │
│ Max cumulative merge time across primary shards                │                0.8367 min │
│ Cumulative merge throttle time of primary shards               │               0.34765 min │
│ Min cumulative merge throttle time across primary shards       │                     0 min │
│ Median cumulative merge throttle time across primary shards    │                     0 min │
│ Max cumulative merge throttle time across primary shards       │               0.34765 min │
│ Cumulative refresh time of primary shards                      │    1.6356166666666667 min │
│ Cumulative refresh count of primary shards                     │                     96550 │
│ Min cumulative refresh time across primary shards              │                     0 min │
│ Median cumulative refresh time across primary shards           │   0.07865000000000001 min │
│ Max cumulative refresh time across primary shards              │   0.26766666666666666 min │
│ Cumulative flush time of primary shards                        │     64.62176666666667 min │
│ Cumulative flush count of primary shards                       │                     95628 │
│ Min cumulative flush time across primary shards                │ 6.666666666666667e-05 min │
│ Median cumulative flush time across primary shards             │               3.42245 min │
│ Max cumulative flush time across primary shards                │     5.019633333333333 min │
│ Total Young Gen GC time                                        │                   0.027 s │
│ Total Young Gen GC count                                       │                         3 │
│ Total Old Gen GC time                                          │                       0 s │
│ Total Old Gen GC count                                         │                         0 │
│ Store size                                                     │    0.30456075351685286 GB │
│ Translog size                                                  │ 0.00012316182255744934 GB │
│ Heap used for segments                                         │                      0 MB │
│ Heap used for doc values                                       │                      0 MB │
│ Heap used for terms                                            │                      0 MB │
│ Heap used for norms                                            │                      0 MB │
│ Heap used for points                                           │                      0 MB │
│ Heap used for stored fields                                    │                      0 MB │
│ Segment count                                                  │                       457 │
│ Total Ingest Pipeline count                                    │                     20031 │
│ Total Ingest Pipeline time                                     │                    0.32 s │
│ Total Ingest Pipeline failed                                   │                         0 │
│ Min Throughput                                                 │           50861.41 docs/s │
│ Mean Throughput                                                │           50861.41 docs/s │
│ Median Throughput                                              │           50861.41 docs/s │
│ Max Throughput                                                 │           50861.41 docs/s │
│ 50th percentile latency                                        │      355.9317709999998 ms │
│ 100th percentile latency                                       │     411.11254200000144 ms │
│ 50th percentile service time                                   │      355.9317709999998 ms │
│ 100th percentile service time                                  │     411.11254200000144 ms │
│ error rate                                                     │                    0.00 % │
╰────────────────────────────────────────────────────────────────┴───────────────────────────╯

--- Benchmark results for package: aws - END   ---
Done

@aspacca aspacca requested a review from a team as a code owner November 6, 2023 05:33
@elasticmachine
Copy link

elasticmachine commented Nov 6, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-11-15T04:04:11.234+0000

  • Duration: 83 min 27 sec

Test stats 🧪

Test Results
Failed 0
Passed 223
Skipped 3
Total 226

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@aspacca
Copy link
Contributor Author

aspacca commented Nov 8, 2023

/test

@aspacca aspacca requested a review from a team as a code owner November 8, 2023 01:01
@aspacca aspacca self-assigned this Nov 8, 2023
@elasticmachine
Copy link

elasticmachine commented Nov 8, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (17/17) 💚
Files 94.444% (17/18) 👎 -5.556
Classes 94.444% (17/18) 👎 -5.556
Methods 89.701% (270/301) 👎 -2.399
Lines 86.083% (7571/8795) 👎 -5.468
Conditionals 100.0% (0/0) 💚

@@ -0,0 +1,51 @@
- name: timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aspacca Could we read this also directly from the dataset itself? How is this exactly used in addition to the dataset template fields?

Happy to keep it for now if it is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot use from dataset template fields. because they are schema-c: fields might be different that what we have in schema-b, also some complex integrations might need extra fields to apply logic in the shape of the events (see https://github.com/elastic/elastic-integration-corpus-generator-tool/blob/main/assets/templates/aws.ec2_metrics/schema-b/gotext.tpl for an example)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, makes sense. Two follow up questions:

  • Do we need it for all fields or could we for example assume "keyword" by default?
  • Could we add it directly to config.yml. I remember there was a reason it is separate but don't remember why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do we need it for all fields or could we for example assume "keyword" by default?

we need an entry in fields only if we want to generate a value for it (meaning that there's no need for a dataset field if we can just write in the template {"dataset":"aws.billing"}), we could default the type to "keyword" if not type is defined. but I'd rather keep it explicit

we don't need an entry in config for every entry in fields if we don't want to configure any special generation behavior (range, period, etc)

  • Could we add it directly to config.yml. I remember there was a reason it is separate but don't remember why.

the historical reason was that there were no fields.yml in the beginning, since the generator started with schema-c and we already had it in the package. so only the config was required. we kept them separated once we moved to other schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the historical reason was that there were no fields.yml in the beginning, since the generator started with schema-c and we already had it in the package. so only the config was required. we kept them separated once we moved to other schema.

I'm tempted to recommend to have only one file as currently I'm jumping forth and back between config.yml and fields.yml but not top priority at the moment. If I remember correctly, we do a merge anyways of the two files into 1? If yes, does it mean potentially having if in config.yml directly would already work, thinking of the migration path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think having a required type field in config.yaml would be better than two files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, we do a merge anyways of the two files into 1?

indeed no: we unmarshal fields.yml in a Field struct and config.yml in a ConfigField struct
at least one definition is present for both: value (in case you want to use a fixed value for a field. image the dataset).

it's not a problem to merge the two struct in a single one and remove the repetition of the definitions, but this will work only for non-schema-c data.
if we still want to support schema-c data (I know that horde uses them) we end up with two different type of configurations: separated fields.yml and config.yml for schema-c and a single yml file for the rest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For schema c, why do we need the fields.yml? Isn't this already defined by the dataset fields.yml?

Lots of connected things here. Lets get the tracks in and then interate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For schema c, why do we need the fields.yml? Isn't this already defined by the dataset fields.yml?

we don't create a fields.yml for schema-c, since we use the ones defined by the dataset, but still it they are separated from config.yml: what's what I meant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will help here will be a quick write up on "why" we have each file somewhere in the docs. The files can be used in different scenarios and everyone only looking at generation of schema B will miss that there is a bigger picture. Having this write up will also simplify any future refactoring and to bring everyone up-to-speed.

- name: metricset.period
value: 86400
- name: aws.billing.group_definition.key
# NOTE: repeated values are needed to produce 10% cases with "" value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a weights option would really help here, we have lots of these types of enums eh

@aspacca
Copy link
Contributor Author

aspacca commented Nov 16, 2023

@tommyers-elastic all good here? :)

@aspacca
Copy link
Contributor Author

aspacca commented Nov 22, 2023

@elastic/ecosystem I'd need your CR

@jsoriano jsoriano requested a review from a team November 23, 2023 13:09
@aspacca aspacca merged commit becb92c into elastic:main Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants