Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Metricbeat ETCD overview dashboard #10591

Closed

Conversation

sayden
Copy link
Contributor

@sayden sayden commented Feb 5, 2019

An overview dashboard for ETCD Metricbeat module. I have initially placed it in 7 folder for Kibana.

I configured a cluster and send some random operations in a for loop to generate some data but the screenshot is still a bit empty of dynamism 😄

I also added an entry in the docs and a screenshot file.

image

@sayden sayden added enhancement Metricbeat Metricbeat :Dashboards Team:Integrations Label for the Integrations team labels Feb 5, 2019
@sayden sayden self-assigned this Feb 5, 2019
@sayden sayden requested review from a team as code owners February 5, 2019 21:50
@ruflin ruflin added the review label Feb 5, 2019
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Tested it locally and seems to load as expected. My view is even more borking as I just have 1 node in the dropdown. ++ on the dropdown for the node selection.

@sayden sayden force-pushed the feature/mb/etcd-overview-dashboard branch from 4c0eaa5 to 56ecf99 Compare February 5, 2019 23:24
@sayden
Copy link
Contributor Author

sayden commented Feb 6, 2019

jenkins, test this

@sayden
Copy link
Contributor Author

sayden commented Feb 6, 2019

I'm really struggling trying to understand what's wrong with https://travis-ci.org/elastic/beats/jobs/489300684#L6610 @ruflin @jsoriano if any of you have some time can you help me with this, please?

@jsoriano
Copy link
Member

jsoriano commented Feb 7, 2019

I'm really struggling trying to understand what's wrong with https://travis-ci.org/elastic/beats/jobs/489300684#L6610 @ruflin @jsoriano if any of you have some time can you help me with this, please?

Can you reproduce this failure running test_dashboards.py locally? There you should have more info in logs.

@sayden
Copy link
Contributor Author

sayden commented Feb 7, 2019

Thanks @jsoriano but there's not test_dashboards.py in the entire Beats repo. The file in execution there is test_base.py and I have tried to run it few times already without luck

@jsoriano
Copy link
Member

jsoriano commented Feb 7, 2019

Thanks @jsoriano but there's not test_dashboards.py in the entire Beats repo. The file in execution there is test_base.py and I have tried to run it few times already without luck

Oh yes, sorry, I meant this one 🙂 let me try...

@jsoriano
Copy link
Member

jsoriano commented Feb 7, 2019

Umm this actually works for me, have you tried to relaunch builds?

@jsoriano
Copy link
Member

jsoriano commented Feb 7, 2019

jenkins, test this

@sayden
Copy link
Contributor Author

sayden commented Feb 7, 2019

jenkins, test this before I get totally nuts 😄

  • First I couldn't manage to reproduce the problem.
  • Then, in some moment, I managed to reproduce the error locally. I could go back in master as much as I could and it was failing each time.
  • Then I deleted my build folder and recreate it with python-env and now the error is gone again.

3 builds, 2 dashboard error, 1 The job exceeded the maximum time limit for jobs, and has been terminated. on metricbeat... let's see the 4th...

@sayden
Copy link
Contributor Author

sayden commented Feb 7, 2019

jenkins, test this please. Now we are in 4 builds, 2 dashboard error, 2 The job exceeded the maximum time limit for jobs, and has been terminated. on metricbeat... let's see the 5th...

@ruflin
Copy link
Member

ruflin commented Feb 11, 2019

The CI failure is related to this change. The test_dashboards build has the following error:

2019-02-07T21:28:32.123Z	ERROR	instance/beat.go:788	Exiting: Failed to import dashboard: Failed to load directory /go/src/github.com/elastic/beats/metricbeat/build/system-tests/run/test_base.Test.test_dashboards/kibana/7/dashboard:
  error loading /go/src/github.com/elastic/beats/metricbeat/build/system-tests/run/test_base.Test.test_dashboards/kibana/7/dashboard/Metricbeat-etcd-overview.json: returned 400 to import file: <nil>. Response: {"statusCode":400,"error":"Bad Request","message":"Document \"4c811bd0-2946-11e9-9303-89807bf647b5\" has property \"visualization\" which belongs to a more recent version of Kibana (7.0.0)."}

What version did you use the export the dashboard?

@sayden
Copy link
Contributor Author

sayden commented Feb 12, 2019

@ruflin I'm not doing anything very special with this dashboard. It has been done like the previous dashboards I did in recent weeks. 7.0.0 in Kibana and ES.

Where did you see that error? I can only find this on Travis and I don't get any error on local.

ok  	github.com/elastic/beats/metricbeat/module/kvm/dommemstat	0.005s	coverage: 48.1% of statements
command [go test -tags integration -cover -coverprofile /tmp/gotestcover-492695661 github.com/elastic/beats/metricbeat/module/kibana/stats]: exit status 1
Building kibana
Step 1/2 : FROM docker.elastic.co/kibana/kibana:6.6.0
 ---> dfc685453eaa
Step 2/2 : HEALTHCHECK --interval=1s --retries=300 CMD curl -f http://localhost:5601/api/status | grep '"disconnects"'
 ---> Using cache
 ---> f87076599096
Successfully built f87076599096
Successfully tagged metricbeatcfe4f4909f32174462b3047c9e5ad1832f33594f_kibana:latest
Creating metricbeatcfe4f4909f32174462b3047c9e5ad1832f33594f_kibana_1 ... 

�[1A�[2K
Creating metricbeatcfe4f4909f32174462b3047c9e5ad1832f33594f_kibana_1 ... �[32mdone�[0m
�[1B--- FAIL: TestFetch (116.60s)
	stats_integration_test.go:60: 
			Error Trace:	stats_integration_test.go:60
			Error:      	Should be empty, but was [HTTP error 500 in stats: 500 Internal Server Error]
			Test:       	TestFetch
	stats_integration_test.go:61: 
			Error Trace:	stats_integration_test.go:61
			Error:      	Should NOT be empty, but was []
			Test:       	TestFetch
--- FAIL: TestData (1.03s)
	stats_integration_test.go:76: getting kibana version key not found
FAIL
coverage: 13.3% of statements
FAIL	github.com/elastic/beats/metricbeat/module/kibana/stats	117.645s
?   	github.com/elastic/beats/metricbeat/module/logstash/node	[no test files]
?   	github.com/elastic/beats/metricbeat/module/logstash/node_stats	[no test files]
?   	github.com/elastic/beats/metricbeat/module/memcached	[no test files]

@ruflin
Copy link
Member

ruflin commented Feb 12, 2019

I copied the above error from test_dashboards.py (metricbeat) test run on Jenkins. Does this pass locally?

@sayden sayden force-pushed the feature/mb/etcd-overview-dashboard branch 2 times, most recently from 9fe0987 to d114b03 Compare February 13, 2019 17:42
@sayden
Copy link
Contributor Author

sayden commented Feb 15, 2019

jenkins, test this please

@sayden sayden force-pushed the feature/mb/etcd-overview-dashboard branch from d114b03 to 0996789 Compare February 18, 2019 20:16
@sayden
Copy link
Contributor Author

sayden commented Feb 19, 2019

jenkins, test this please

There's something wrong in my setup probably but here's the output of the same command ran in my local setup:

INTEGRATION_TESTS=true nosetests -v -s tests/system/test_base.py
Step 1/2 : FROM docker.elastic.co/kibana/kibana:6.6.0
 ---> dfc685453eaa
Step 2/2 : HEALTHCHECK --interval=1s --retries=300 CMD curl -f http://localhost:5601/api/status | grep '"disconnects"'
 ---> Using cache
 ---> 31d3d9b881c7
Successfully built 31d3d9b881c7
Successfully tagged metricbeat_kibana:latest
Step 1/2 : FROM docker.elastic.co/elasticsearch/elasticsearch:6.6.0
 ---> 13aa43015aa1
Step 2/2 : HEALTHCHECK --interval=1s --retries=300 CMD curl -f http://localhost:9200/_xpack/license
 ---> Using cache
 ---> 3c737ffa8adc
Successfully built 3c737ffa8adc
Successfully tagged metricbeat_elasticsearch:latest
Recreating metricbeat_kibana_1        ... done
Recreating metricbeat_elasticsearch_1 ... done
Test that the dashboards can be loaded with `setup --dashboards` ... render config
ok
Metricbeat starts and stops without error. ... render config
ok
Test that the template can be loaded with `setup --template` ... render config
ok
Killing metricbeat_kibana_1           ... done
Killing metricbeat_elasticsearch_1    ... done

----------------------------------------------------------------------
Ran 3 tests in 53.188s

OK
(python-env) ➜  metricbeat git:(feature/mb/etcd-overview-dashboard) python --version
Python 2.7.15
(python-env) ➜  metricbeat git:(feature/mb/etcd-overview-dashboard) go version
go version go1.10.3 linux/amd64
(python-env) ➜  metricbeat git:(feature/mb/etcd-overview-dashboard)

I really don't see anything very relevant apart from the fact that the dashboard was created with 7.0.0 and here it's testing with 6.6.0 but without errors anyways.

@sayden
Copy link
Contributor Author

sayden commented Feb 19, 2019

Ok. I found the reason why it is not failing on my local. It seems that if I have an environment launched with make start, which means that they are 7.0.0-SNAPSHOT those tests in test_base.py will be tested with them, even when test recreates docker.elastic.co/kibana/kibana:6.6.0.

So if I ensure that the environment launched with make start is down, the test will launch its own Kibana, which will be 6.6 and the test will fail (because the dashboard was made with 7.0)

INTEGRATION_TESTS=true nosetests -v -s tests/system/test_base.py
Step 1/2 : FROM docker.elastic.co/kibana/kibana:6.6.0
 ---> dfc685453eaa
Step 2/2 : HEALTHCHECK --interval=1s --retries=300 CMD curl -f http://localhost:5601/api/status | grep '"disconnects"'
 ---> Using cache
 ---> 31d3d9b881c7
Successfully built 31d3d9b881c7
Successfully tagged metricbeat_kibana:latest
Step 1/2 : FROM docker.elastic.co/elasticsearch/elasticsearch:6.6.0
 ---> 13aa43015aa1
Step 2/2 : HEALTHCHECK --interval=1s --retries=300 CMD curl -f http://localhost:9200/_xpack/license
 ---> Using cache
 ---> 3c737ffa8adc
Successfully built 3c737ffa8adc
Successfully tagged metricbeat_elasticsearch:latest
Recreating metricbeat_elasticsearch_1 ... done
Recreating metricbeat_kibana_1        ... done
Test that the dashboards can be loaded with `setup --dashboards` ... render config
FAIL
Metricbeat starts and stops without error. ... render config
ok
Test that the template can be loaded with `setup --template` ... render config
FAIL
Killing metricbeat_kibana_1           ... done
Killing metricbeat_elasticsearch_1    ... done

======================================================================
FAIL: Test that the dashboards can be loaded with `setup --dashboards`
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mcastro/go/src/github.com/elastic/beats/metricbeat/tests/system/test_base.py", line 76, in test_dashboards
    assert exit_code == 0
AssertionError: 
-------------------- >> begin captured logging << --------------------
compose.config.config: DEBUG: Using configuration files: ./docker-compose.yml
docker.utils.config: DEBUG: Trying paths: ['/home/mcastro/.docker/config.json', '/home/mcastro/.dockercfg']
docker.utils.config: DEBUG: No config file found
docker.utils.config: DEBUG: Trying paths: ['/home/mcastro/.docker/config.json', '/home/mcastro/.dockercfg']
docker.utils.config: DEBUG: No config file found
compose.service: INFO: Building kibana
docker.api.build: DEBUG: Looking for auth config
docker.api.build: DEBUG: No auth config in memory - loading from filesystem
docker.utils.config: DEBUG: Trying paths: ['/home/mcastro/.docker/config.json', '/home/mcastro/.dockercfg']
docker.utils.config: DEBUG: No config file found
docker.api.build: DEBUG: Sending auth config ()
compose.service: INFO: Building elasticsearch
docker.api.build: DEBUG: Looking for auth config
docker.api.build: DEBUG: No auth config in memory - loading from filesystem
docker.utils.config: DEBUG: Trying paths: ['/home/mcastro/.docker/config.json', '/home/mcastro/.dockercfg']
docker.utils.config: DEBUG: No config file found
docker.api.build: DEBUG: Sending auth config ()
compose.parallel: DEBUG: Pending: set([<Service: elasticsearch>, <Service: kibana>])
compose.parallel: DEBUG: Starting producer thread for <Service: elasticsearch>
compose.parallel: DEBUG: Starting producer thread for <Service: kibana>
compose.parallel: DEBUG: Pending: set([<Container: metricbeat_elasticsearch_1 (9b9747)>])
compose.parallel: DEBUG: Pending: set([<Container: metricbeat_kibana_1 (5c3df9)>])
compose.parallel: DEBUG: Starting producer thread for <Container: metricbeat_kibana_1 (5c3df9)>
compose.parallel: DEBUG: Starting producer thread for <Container: metricbeat_elasticsearch_1 (9b9747)>
compose.service: DEBUG: Added config hash: 7e9d728c9d78349c18963a9da4c38b03da283546f3b15b2965b80156b5b96c90
compose.service: DEBUG: Added config hash: 5571053dd18a6272a3c2361b71143ef69407bf7341ba0a359194554bf6237c82
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Finished processing: <Container: metricbeat_elasticsearch_1 (9b9747)>
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Finished processing: <Service: elasticsearch>
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Finished processing: <Container: metricbeat_kibana_1 (5c3df9)>
compose.parallel: DEBUG: Pending: set([])
compose.parallel: DEBUG: Finished processing: <Service: kibana>
compose.parallel: DEBUG: Pending: set([])
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: Test that the template can be loaded with `setup --template`
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mcastro/go/src/github.com/elastic/beats/metricbeat/tests/system/test_base.py", line 51, in test_template
    assert exit_code == 0
AssertionError

----------------------------------------------------------------------
Ran 3 tests in 50.221s

FAILED (failures=2)

@sayden sayden requested a review from a team as a code owner February 19, 2019 12:46
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR still ready for review or should we set it to in progress?

metricbeat/docker-compose.yml Outdated Show resolved Hide resolved
@sayden sayden added in progress Pull request is currently in progress. and removed review labels Mar 13, 2019
@sayden sayden force-pushed the feature/mb/etcd-overview-dashboard branch from a1aedc3 to 97a5963 Compare April 26, 2019 14:43
@sayden sayden added [zube]: In Review review and removed [zube]: Backlog in progress Pull request is currently in progress. labels Apr 26, 2019
@odacremolbap
Copy link
Contributor

Can we rename this dashboard as ETCD V2?
metrics included here are V2 oriented only

A V3 dashboard should remove most of these metrics and include the metricbeat etcd metricset metrics.

@sayden
Copy link
Contributor Author

sayden commented Apr 30, 2019

Can we rename this dashboard as ETCD V2?
metrics included here are V2 oriented only

A V3 dashboard should remove most of these metrics and include the metricbeat etcd metricset metrics.

I have concerns about naming a dashboard with a version. The main reason is that we usually just "publish" what is already developed, so we maintain a snapshot of Beats.

Once we have a new ETCD module in the future that supports v3, we must decide if we will maintain support for version 2 or not and how, which can be tricky and / or affect the name of some metrics which will affect the dashboard too.

Another less important reason is that, right now, we don't have any version-specific dashboard so this could lead into confusion to some users, specially when the module itself is not version named either.

I propose to maintain the name like this and then, once we have a v3 specific module, decide about what to do with it (delete, rename, modify, whatever) so the state of modules <-> dashboards is consistent. WDYT @odacremolbap ? 😉

@odacremolbap
Copy link
Contributor

#11280 (comment)
^^
Probably we need to look at Etcd V2 and Etcd V3 as different products that do the same.

My main issue is that everyone should be using Etcd V3 (same binary as V2) and almost noone should be using V2. When I tested the dashboard it was empty, it took me a minute to remember that this was targeting V2 only.

I think we need to highlight this is V2 only.

If we need to come up with only 1 dashboard, it would be a V3 one.
My take is having this one as Etcd V2 and a new one as Etcd or Etcd V3, but as said above, think of them as different products.

@sayden
Copy link
Contributor Author

sayden commented Apr 30, 2019

Oh, but I didn't know that we already were fetching v3 metrics, so this changes everything.

So now I'm more towards closing this PR without merging if v2 is going to be deprecated in the short term anyways. WDYT @ruflin ?

@ruflin
Copy link
Member

ruflin commented Apr 30, 2019

If most users are on v3 already, I would say we only need a v3 dashboard. But if there is still a big portion (let's say more then 20%) on v2 for some reason, we could have 2 dashboards (we do that for other modules too). We could then have a drop down which allows to jump between the dashboards or a link list.

@odacremolbap
Copy link
Contributor

We already ship the V2 API metrics and the Dashboard looks good. My take is going forward with this one, just because we worked on it and someone will probably benefit from it. If Etcd decided to include V2 API into the V3 binary, I assume there will be people using it, mainly apps that integrate with etcd and didn't modify their client libs when V3, which was fully incompatible, came out (just as a reminder, data created using one version of etcd can't be seen using the other version)

It is hard to know who is still using V2 API. I would say less than that 20% because I'm guessing most of etcd users are kubernetes clusters --> V3

@odacremolbap
Copy link
Contributor

@sayden not sure if this might be related to my environment, will double check with a new setup in a while, but I'm having some flickering and lack of info with the dashboard

At my local host I'm running 3 etcd instances and running a single metribeat where etcd is configured as

- module: etcd
  period: 10s
  hosts: ["localhost:12379","localhost:22379","localhost:32379"]

clients processes are reading and writing to each instance.

I've had an scenario where shown data would flicker, usually showing no data, then at some refreshes showing some. Tried that from different browsers and computers, same results

My first configured etcd instance seemed to have an influence on the graphs, while the other 2 didn't. Even nuking members 2 and 3, and leaving the cluster with no quota, there was no visualization changes for such critical scenario.

As said above, I'll setup a more real world scenario and re-test.

@sayden
Copy link
Contributor Author

sayden commented May 3, 2019

Closing this, if not many users are using v2 anymore and you found some issues, I think it's better to work on a v3 dashboard directly. Thanks for the comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants