Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Migrate Anomaly Detector plugin to work as an Extension #24

Closed
owaiskazi19 opened this issue Jun 20, 2022 · 7 comments
Closed

[META] Migrate Anomaly Detector plugin to work as an Extension #24

owaiskazi19 opened this issue Jun 20, 2022 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@owaiskazi19
Copy link
Member

owaiskazi19 commented Jun 20, 2022

Is your feature request related to a problem?

To make sure AD works as an extension, all the required extension points for creating a detector should be covered.

  1. Build AD, JS plugin with 3.0.0-SNAPSHOT and publish it to maven local. It can be consume later by the SDK. - [Feature/extensions] Build AD with OpenSearch 3.0  anomaly-detection#645
  2. Remove the dependency of JS in AD plugin. - [FEATURE] Remove Job Scheduler code from AD Extension #113
  3. Remove the dependency of Common Utils in AD plugin - [FEATURE] Remove Common Utils code from AD Extension  #117
  4. Integrate the local published SDK with AD build.gradle - [FEATURE] Integrate the local published SDK with AD  #133
  5. Create a main class to use the imported code available from SDK - [FEATURE] Integrate the local published SDK with AD  #133
  6. Replacement of opensearchplugin for Extensions -[FEATURE] Replacement of opensearchplugin for Extensions #137
@saratvemulapalli
Copy link
Member

saratvemulapalli commented Aug 8, 2022

@owaiskazi19 I've assigned it to you as you've already done some work and wanted to pick up.
In the meanwhile, I'll pick up: #80

@owaiskazi19
Copy link
Member Author

owaiskazi19 commented Aug 15, 2022

For integrating SDK with AD plugin's issue. There are 2 approaches we can look for

  1. Publish artifacts to Maven and use SDK as a library. This will help us to just import the code from SDK rather than having the whole SDK package in AD extension. Issue for the same [FEATURE] Publish Artifacts to Maven Local #98
  2. Integrate SDK repo with AD extension's feature/extenions branch

The 1st helps us to integrate SDK in a more generalized way.

@dbwiddis
Copy link
Member

There are 2 approaches we can look for

  1. Publish artifacts to Maven and use SDK as a library. This will help us to just import the code from SDK rather than having the whole SDK package in AD extension.

Strongly in favor of this. SDK will be consumed by multiple extensions so having it in a library will help us keep the "common" SDK code separated from extension-specific applications and force us to keep the concepts separate.

I think we're already moving in the direction of doing this for "Maven Local" but as the API becomes more stable and we are ready to collect more community input we will want to publish SNAPSHOTs.

  1. Integrate SDK repo with AD extension's feature/extenions branch

Not a good primary plan. Everything will eventually need to move to a separate library so there is no reason to intentionally put things in one location.

Granted, I expect that some things we write for AD, we will later find we are duplicating the code in another extension and we might find a way to combine them into an Abstract SDK class both extensions can inherit from, but this should be the exception rather than the rule.

@owaiskazi19
Copy link
Member Author

Strongly in favor of this. SDK will be consumed by multiple extensions so having it in a library will help us keep the "common" SDK code separated from extension-specific applications and force us to keep the concepts separate.

Thanks @dbwiddis for your input on this. We are moving forward with the 1st approach.

@owaiskazi19
Copy link
Member Author

owaiskazi19 commented Aug 26, 2022

Upon further exploring AD plugin and Job Scheduler are building with OpenSearch 2.2.0 while SDK is using 3.0.0-SNAPSHOT. This is resulting in dependencies conflicts.
The next steps are:

  1. Build AD, JS plugin with 3.0.0-SNAPSHOT and publish it to maven local. It can be consume later by the SDK. - [Feature/extensions] Build AD with OpenSearch 3.0  anomaly-detection#645
  2. Remove the dependency of JS in AD plugin. - [FEATURE] Remove Job Scheduler code from AD Extension #113
  3. Remove the dependency of Common Utils in AD plugin - [FEATURE] Remove Common Utils code from AD Extension  #117
  4. Integrate the local published SDK with AD build.gradle - [FEATURE] Integrate the local published SDK with AD  #133
  5. Create a main class to use the imported code available from SDK - [FEATURE] Integrate the local published SDK with AD  #133
  6. Replacement of opensearchplugin for Extensions -[FEATURE] Replacement of opensearchplugin for Extensions #137

@vibrantvarun
Copy link
Member

vibrantvarun commented Nov 9, 2022

Extension Architecture with OpenSearch Benchmark

[INFO] Downloading workload data (30.6 kB total size) [100.0%]
[INFO] Decompressing workload data from [/home/ec2-user/.benchmark/benchmarks/data/nyc_taxis/documents-1k.json.bz2] to [/home/ec2-user/.benchmark/benchmarks/data/nyc_taxis/documents-1k.json] ... [OK]
[INFO] Preparing file offset table for [/home/ec2-user/.benchmark/benchmarks/data/nyc_taxis/documents-1k.json] ... [OK]
[INFO] Executing test with workload [nyc_taxis], test_procedure [append-no-conflicts] and provision_config_instance ['external'] with version [3.0.0-SNAPSHOT].

Running delete-index [100% done]
Running create-index [100% done]
Running check-cluster-health [100% done]
Running index [100% done]
Running refresh-after-index [100% done]
Running force-merge [100% done]
Running refresh-after-force-merge [100% done]
Running wait-until-merges-finish [100% done]
Running default [100% done]
Running range [100% done]
Running distance_amount_agg [100% done]
Running autohisto_agg [100% done]
Running date_histogram_agg [100% done]


Metric Task Value Unit
Cumulative indexing time of primary shards 0.0520667 min
Min cumulative indexing time across primary shards 0.0520667 min
Median cumulative indexing time across primary shards 0.0520667 min
Max cumulative indexing time across primary shards 0.0520667 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 0 min
Cumulative merge count of primary shards 0
Min cumulative merge time across primary shards 0 min
Median cumulative merge time across primary shards 0 min
Max cumulative merge time across primary shards 0 min
Cumulative merge throttle time of primary shards 0 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0 min
Max cumulative merge throttle time across primary shards 0 min
Cumulative refresh time of primary shards 0.00358333 min
Cumulative refresh count of primary shards 5
Min cumulative refresh time across primary shards 0.00358333 min
Median cumulative refresh time across primary shards 0.00358333 min
Max cumulative refresh time across primary shards 0.00358333 min
Cumulative flush time of primary shards 0 min
Cumulative flush count of primary shards 0
Min cumulative flush time across primary shards 0 min
Median cumulative flush time across primary shards 0 min
Max cumulative flush time across primary shards 0 min
Total Young Gen GC time 0 s
Total Young Gen GC count 0
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 0.000246089 GB
Translog size 5.12227e-08 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 8
Min Throughput index 1891.56 docs/s
Mean Throughput index 1891.56 docs/s
Median Throughput index 1891.56 docs/s
Max Throughput index 1891.56 docs/s
50th percentile latency index 511.239 ms
100th percentile latency index 551.657 ms
50th percentile service time index 511.239 ms
100th percentile service time index 551.657 ms
error rate index 0 %
Min Throughput wait-until-merges-finish 52.59 ops/s
Mean Throughput wait-until-merges-finish 52.59 ops/s
Median Throughput wait-until-merges-finish 52.59 ops/s
Max Throughput wait-until-merges-finish 52.59 ops/s
100th percentile latency wait-until-merges-finish 18.3101 ms
100th percentile service time wait-until-merges-finish 18.3101 ms
error rate wait-until-merges-finish 0 %
Min Throughput default 11.16 ops/s
Mean Throughput default 11.16 ops/s
Median Throughput default 11.16 ops/s
Max Throughput default 11.16 ops/s
100th percentile latency default 98.2469 ms
100th percentile service time default 7.98621 ms
error rate default 0 %
Min Throughput range 34.17 ops/s
Mean Throughput range 34.17 ops/s
Median Throughput range 34.17 ops/s
Max Throughput range 34.17 ops/s
100th percentile latency range 37.2717 ms
100th percentile service time range 7.57088 ms
error rate range 0 %
Min Throughput distance_amount_agg 19.06 ops/s
Mean Throughput distance_amount_agg 19.06 ops/s
Median Throughput distance_amount_agg 19.06 ops/s
Max Throughput distance_amount_agg 19.06 ops/s
100th percentile latency distance_amount_agg 59.6809 ms
100th percentile service time distance_amount_agg 6.75511 ms
error rate distance_amount_agg 0 %
Min Throughput autohisto_agg 24.74 ops/s
Mean Throughput autohisto_agg 24.74 ops/s
Median Throughput autohisto_agg 24.74 ops/s
Max Throughput autohisto_agg 24.74 ops/s
100th percentile latency autohisto_agg 51.2514 ms
100th percentile service time autohisto_agg 10.3925 ms
error rate autohisto_agg 0 %
Min Throughput date_histogram_agg 48.84 ops/s
Mean Throughput date_histogram_agg 48.84 ops/s
Median Throughput date_histogram_agg 48.84 ops/s
Max Throughput date_histogram_agg 48.84 ops/s
100th percentile latency date_histogram_agg 27.8313 ms
100th percentile service time date_histogram_agg 6.94377 ms
error rate date_histogram_agg 0 %

[INFO] SUCCESS (took 14 seconds)
--------------------------------`

@vibrantvarun
Copy link
Member

vibrantvarun commented Nov 10, 2022

Plugin Architecture with OpenSearch Benchmark

[INFO] Executing test with workload [nyc_taxis], test_procedure [append-no-conflicts] and provision_config_instance ['external'] with version [3.0.0-SNAPSHOT].

Running delete-index [100% done]
Running create-index [100% done]
Running check-cluster-health [100% done]
Running index [100% done]
Running refresh-after-index [100% done]
Running force-merge [100% done]
Running refresh-after-force-merge [100% done]
Running wait-until-merges-finish [100% done]
Running default [100% done]
Running range [100% done]
Running distance_amount_agg [100% done]
Running autohisto_agg [100% done]
Running date_histogram_agg [100% done]


Metric Task Value Unit
Cumulative indexing time of primary shards 0.0499333 min
Min cumulative indexing time across primary shards 0.0499333 min
Median cumulative indexing time across primary shards 0.0499333 min
Max cumulative indexing time across primary shards 0.0499333 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 0 min
Cumulative merge count of primary shards 0
Min cumulative merge time across primary shards 0 min
Median cumulative merge time across primary shards 0 min
Max cumulative merge time across primary shards 0 min
Cumulative merge throttle time of primary shards 0 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0 min
Max cumulative merge throttle time across primary shards 0 min
Cumulative refresh time of primary shards 0.00358333 min
Cumulative refresh count of primary shards 5
Min cumulative refresh time across primary shards 0.00358333 min
Median cumulative refresh time across primary shards 0.00358333 min
Max cumulative refresh time across primary shards 0.00358333 min
Cumulative flush time of primary shards 0 min
Cumulative flush count of primary shards 0
Min cumulative flush time across primary shards 0 min
Median cumulative flush time across primary shards 0 min
Max cumulative flush time across primary shards 0 min
Total Young Gen GC time 0 s
Total Young Gen GC count 0
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 0.000229322 GB
Translog size 5.12227e-08 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 7
Min Throughput index 1788.38 docs/s
Mean Throughput index 1788.38 docs/s
Median Throughput index 1788.38 docs/s
Max Throughput index 1788.38 docs/s
50th percentile latency index 510.658 ms
100th percentile latency index 537.261 ms
50th percentile service time index 510.658 ms
100th percentile service time index 537.261 ms
error rate index 0 %
Min Throughput wait-until-merges-finish 62.35 ops/s
Mean Throughput wait-until-merges-finish 62.35 ops/s
Median Throughput wait-until-merges-finish 62.35 ops/s
Max Throughput wait-until-merges-finish 62.35 ops/s
100th percentile latency wait-until-merges-finish 15.3661 ms
100th percentile service time wait-until-merges-finish 15.3661 ms
error rate wait-until-merges-finish 0 %
Min Throughput default 9.64 ops/s
Mean Throughput default 9.64 ops/s
Median Throughput default 9.64 ops/s
Max Throughput default 9.64 ops/s
100th percentile latency default 114.247 ms
100th percentile service time default 9.88235 ms
error rate default 0 %
Min Throughput range 30.3 ops/s
Mean Throughput range 30.3 ops/s
Median Throughput range 30.3 ops/s
Max Throughput range 30.3 ops/s
100th percentile latency range 41.3047 ms
100th percentile service time range 7.95881 ms
error rate range 0 %
Min Throughput distance_amount_agg 18.78 ops/s
Mean Throughput distance_amount_agg 18.78 ops/s
Median Throughput distance_amount_agg 18.78 ops/s
Max Throughput distance_amount_agg 18.78 ops/s
100th percentile latency distance_amount_agg 61.501 ms
100th percentile service time distance_amount_agg 7.89941 ms
error rate distance_amount_agg 0 %
Min Throughput autohisto_agg 23.93 ops/s
Mean Throughput autohisto_agg 23.93 ops/s
Median Throughput autohisto_agg 23.93 ops/s
Max Throughput autohisto_agg 23.93 ops/s
100th percentile latency autohisto_agg 55.2257 ms
100th percentile service time autohisto_agg 13.049 ms
error rate autohisto_agg 0 %
Min Throughput date_histogram_agg 44.55 ops/s
Mean Throughput date_histogram_agg 44.55 ops/s
Median Throughput date_histogram_agg 44.55 ops/s
Max Throughput date_histogram_agg 44.55 ops/s
100th percentile latency date_histogram_agg 30.1235 ms
100th percentile service time date_histogram_agg 7.30196 ms
error rate date_histogram_agg 0 %

[INFO] SUCCESS (took 13 seconds)
--------------------------------`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants