Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability tools #160

Merged
merged 4 commits into from
Dec 29, 2021
Merged

Observability tools #160

merged 4 commits into from
Dec 29, 2021

Conversation

garyparrot
Copy link
Collaborator

@garyparrot garyparrot commented Dec 22, 2021

Context: #10

This PR implement/improve the scripts for the following tools

  • Grafana
  • Node Exporter
  • Prometheus

Test this on our cluster, work well.

image
image

dashboard source here.

  1. https://grafana.com/grafana/dashboards/1860
  2. https://grafana.com/grafana/dashboards/721

Demo

# execute node exporter
./docker/start_node_exporter.sh
[INFO] Container ID of node_exporter: 16d0e741ba516495465fad6977bcea7abd80b4c872ab44bdc2713db08557e6a9
[INFO] node_exporter running at http://192.168.0.2:9100

# execute Prometheus
./docker/start_prometheus.sh start 192.168.0.2:5566 192.168.0.2:9100
[INFO] Start existing prometheus instance
prometheus-9090
[INFO] =================================================
[INFO] config file: /tmp/prometheus-9090.yml
[INFO] prometheus address: http://192.168.0.2:9090
[INFO] command to run grafana at this host: ./docker/grafana.sh start
[INFO] command to add prometheus to grafana datasource: ./docker/grafana.sh add_prom_source <USERNAME>:<PASSWORD> Prometheus http://192.168.0.2:9090
[INFO] =================================================

# execute Grafana
./docker/grafana.sh start
[INFO] Restart Grafana docker image
Grafana_3000
[INFO] Access Grafana dashboard here:  http://192.168.0.2:3000

# add prometheus datasource to Grafana
./docker/grafana.sh add_prom_source <USERNAME>:<PASSWORD> Prometheus http://192.168.0.2:9090
{"datasource":{"id":3,"uid":"H_C2ZBTnk","orgId":1,"name":"MyPrometheus","type":"prometheus","typeLogoUrl":"","access":"proxy","url":"http://192.168.0.2:9090","password":"","user":"","database":"","basicAuth":false,"basicAuthUser":"","basicAuthPassword":"","withCredentials":false,"isDefault":false,"jsonData":{},"secureJsonFields":{},"version":1,"readOnly":false},"id":3,"message":"Datasource added","name":"MyPrometheus"}

# refresh/update Prometheus configuration (from argument) 
./docker/start_prometheus.sh refresh 192.168.0.2:5566 192.168.0.2:9100

# refresh/update Prometheus configuration (from config file /tmp/prometheus-9090)
./docker/start_prometheus.sh refresh

* Grafana
* Prometheus
* Node Exporter
On some machine, the `curl` program takes longer time to realize target
service is done. This commit fixes this issue by specifying the connect
timeout second explicitly.
Copy link
Contributor

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garyparrot 感謝新腳本和refactor,幾個設計的想法請看一下

elif [[ "$1" == "refresh" ]]; then
refresh_config_from_file
elif [[ "$1" == "stop" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你打算讓腳本除了“start"以外也要具備"stop"的功能嗎?如果是的話,腳本名稱都必須要更換一下,因為現在腳本都是start開頭

image_name=prom/prometheus
prometheus_port="$(($(($RANDOM % 10000)) + 10000))"
container_name="prometheus-${prometheus_port}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這邊要確認一下想法,原本這些用container包裝的腳本是預期使用者可以在同一個節點上快速啟動多個實體,因此port的部分才都會用random的方式產生,現在這個改法會讓這個腳本變成只能一台跑一個節點,除非使用者要自行更換port

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

快速啟動多個實體

我想不到一臺跑多個 Prometheus 的情境,測試環境內會需要在一臺設備上跑多個 Prometheus 嗎
當初基於上面的理由所以把這邊弄輕鬆一點,node exporter 也因為相同的原因所以預設用固定 port

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主要這些腳本是用來測試,因此蠻有可能在同一個節點跑多個服務,當然如果Prometheus可以串多個kafka的話,實務上也可能用不到,看你的想法

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我覺得 80% 的情況下我們應該還是只會使用一個 Prometheus, 所以針對這個情境下去做預設我覺得好處比較大。如果真的有人需要在同一個 instance 跑多個 Prometheus, 那他再自己改 Port。

要讓使用者用固定特定 port 背後的一個意義是希望他們別把 Prometheus 用拋棄式的方法用它(建起來, 使用, 刪除),重複利用同一個 Prometheus,其裡面保留的過去實驗資料,可以拿來和未來的實驗結果做比較。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是希望使用者要保留實驗結果的話,也要讓資料能保存在host裡面。

function stop() {
info "Stop prometheus"
docker stop "$container_name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

承接上面,當使用者自行指定port後,使用者在呼叫stop的時候就必須也指定port才能關閉正確的container,有沒有能防呆的方式?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不然我把這個功能砍掉好了,其實不需要幫他們 stop 也行,有時候這些危險的動作別弄太自動可能比較好。

@chia7712 chia7712 merged commit 66d6f26 into opensource4you:main Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants