Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Splunk HEC trace exporter #182

Conversation

tigrannajaryan
Copy link
Contributor

Description:

  • Add Splunk HEC trace exporter to AWS OpenTelemetry distro
  • Add end to end tests following wiki description
  • Add exporter to README list

Link to tracking Issue: n/a

Testing:

Documentation:

@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/add-splunkhec-exporter branch from 351556c to 14e73b6 Compare November 26, 2020 16:21
@codecov-io
Copy link

codecov-io commented Nov 26, 2020

Codecov Report

Merging #182 (c92c34f) into main (6527d0b) will increase coverage by 0.27%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #182      +/-   ##
==========================================
+ Coverage   78.20%   78.48%   +0.27%     
==========================================
  Files           4        4              
  Lines          78       79       +1     
==========================================
+ Hits           61       62       +1     
  Misses          9        9              
  Partials        8        8              
Impacted Files Coverage Δ
pkg/defaultcomponents/defaults.go 77.77% <100.00%> (+0.63%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6527d0b...c92c34f. Read the comment docs.

@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/add-splunkhec-exporter branch 2 times, most recently from 52e88b0 to ac9e94d Compare December 1, 2020 01:34
@tigrannajaryan
Copy link
Contributor Author

@mxiamxia @wyTrivail can you please review this PR?

Copy link
Contributor

@wyTrivail wyTrivail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/add-splunkhec-exporter branch from ac9e94d to c92c34f Compare December 3, 2020 14:12
@tigrannajaryan
Copy link
Contributor Author

@wyTrivail thank you for reviewing. Is there anything else needed to merge this?

**Description:**

- Add Splunk HEC trace exporter to AWS OpenTelemetry distro
- Add end to end tests following wiki description
- Add exporter to README list

**Link to tracking Issue:** n/a

**Testing:**

- Unit tests in collector contrib repository: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/exporter/splunkhecexporter
- E2E tests aws-observability/aws-otel-test-framework#140

**Documentation:**

- Added exporter to README list
- Usage documentation: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/master/exporter/splunkhecexporter/README.md
@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/add-splunkhec-exporter branch from c92c34f to 59aa0c6 Compare December 7, 2020 14:24
@wyTrivail
Copy link
Contributor

@tigrannajaryan thx for reminding!Merging it.

btw, did you receive anything from your PM/team that a performance model with splunk real endpoint needs to be provided before AWS Otel Collector release it?

@wyTrivail wyTrivail merged commit a53c2c7 into aws-observability:main Dec 7, 2020
@tigrannajaryan
Copy link
Contributor Author

Thank you for merging @wyTrivail

btw, did you receive anything from your PM/team that a performance model with splunk real endpoint needs to be provided before AWS Otel Collector release it?

No, I am afraid I haven't heard anything. I was under the impression that the performance tests that run as part of this repository is all we needed. Do you know who can provide this additional information about the performance tests?

@tigrannajaryan tigrannajaryan deleted the feature/tigran/add-splunkhec-exporter branch December 7, 2020 20:00
@wyTrivail
Copy link
Contributor

thx for letting me know @tigrannajaryan.

I'm not aware of the POC for that, but i will send an email internally to our PM to ensure the information is reached to every partner.

The performance test runs in the repo is using the mocked_server instead of the real endpoint, we had some internal discussion and would like to have partners to do a round of performance test upon real endpoint as that's what we can't do in the repo. I will double check whether the information is communicated correctly.

@tigrannajaryan
Copy link
Contributor Author

@wyTrivail ok, thanks for checking. I look forward to hear more about this.

Note that performance tests in official OpenTelemetry Collector repository use use real protocol implementations, so they are more realistic. Could you perhaps base your performance test requirements on this?

@wyTrivail
Copy link
Contributor

Note that performance tests in official OpenTelemetry Collector repository use use real protocol implementations, so they are more realistic. Could you perhaps base your performance test requirements on this?

Thx for the suggestion! i will look into it. When we say "real protocol implementation", i reckon you mean the mocked server has validate the data structure of the metric/trace?

Right now we are still exploring the proper way to have a realistic mocked_server which could give us the "real" performance data. From exporter view, I'd prefer to simulate the real latency, throttling and other possible factor to impact the performance of collector.

btw, this just reminded me one more thing you might want to take a look. the sapm exporter was failing to pass the negative soaking test which route the endpoint to a "bad/invalid" endpoint, the memory of collector went to around 4GB. below is the error message:

validator_1  | 10:21:08.753 [main] ERROR com.amazon.aoc.validators.AlarmPullingValidator - alarm otel-soaking-mem-alarm-7a37315a9b7b5147 is alarming, status is ALARM, metric is [{Id: mem,MetricStat: {Metric: {Namespace: AWSOtelCollector/SoakingTest,MetricName: procstat_memory_rss,Dimensions: [{Name: negative_soaking,Value: true}, {Name: data_rate,Value: trace-1000}, {Name: testcase,Value: sapm_exporter_trace_mock}, {Name: testing_ami,Value: soaking_linux}, {Name: process_name,Value: aws-otel-collector}, {Name: InstanceId,Value: i-0d952c01207774a21}, {Name: instance_type,Value: m5.2xlarge}, {Name: exe,Value: aws-otel-collector}, {Name: launch_date,Value: 2020-12-05}, {Name: commit_id,Value: 4a2e09059cffc498e91aa972c34575e7bfd9e50f}]},Period: 60,Stat: Average,},ReturnData: true,}], failing to bake

link: https://github.com/aws-observability/aws-otel-collector/runs/1503147806?check_suite_focus=true

It will be great if you can take a look, to me looks like this is an issue about backfilling.

@tigrannajaryan
Copy link
Contributor Author

i reckon you mean the mocked server has validate the data structure of the metric/trace?

Yes. Collector testbed uses mocked servers, which however fully implement the receiving side of the corresponding protocol and return the expected responses to the Collector. From the perspective of testing of the performance and functionality of the Collector the testbed is very close to using the actual backend to send data to.

btw, this just reminded me one more thing you might want to take a look. the sapm exporter was failing to pass the negative soaking test which route the endpoint to a "bad/invalid" endpoint, the memory of collector went to around 4GB.

Do you use memory limiter processor in the soak test's Collector config? If not then the memory can grow up to the configured sending queue size (default is 5000 requests). Queue limits are based on the number of request so memory usage in bytes is hard to estimate without knowing the data composition of the request. 4GB may be expected if there is no memory limiter and requests are large.
Memory limiter is a default recommended processor for production configurations.

@wyTrivail
Copy link
Contributor

@tigrannajaryan thank you for the suggestions and sorry for my late response!

For the performance test, i added a delay(15ms) in the response of mock server, I can see the cpu and memory of collector became bigger than before, and it's almost the same performance result comparing with the real endpoint performance test we've done.
So I feel it's better to simulate the true network environment, and different response code rather than implementing the actual expected response body. And seems most of exporters just check the status code.
I will definitely look into the official performance test to see if we can leverage it to make contributors' life easier :)

For the memory limiter, We didn't use memory limiter before. I tried it yesterday, the memory became stable and collector started to drop data. I think it is a more realistic user case, i will add memory limiter into our testing. Also thanks again for this suggestion, i will also update our example to guide customers to use it in production.

@tigrannajaryan
Copy link
Contributor Author

@wyTrivail sounds good.

BTW, I am working on improving memory limiter to use less CPU. If you notice high CPU usage you may need to update the dependency once open-telemetry/opentelemetry-collector#2250 is merged.

@wyTrivail
Copy link
Contributor

@tigrannajaryan thx, will check it and let you know if any anomaly.

vastin pushed a commit to vastin/aws-otel-collector that referenced this pull request Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants