-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observability tools #160
Observability tools #160
Conversation
* Grafana * Prometheus * Node Exporter
On some machine, the `curl` program takes longer time to realize target service is done. This commit fixes this issue by specifying the connect timeout second explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garyparrot 感謝新腳本和refactor,幾個設計的想法請看一下
docker/start_prometheus.sh
Outdated
elif [[ "$1" == "refresh" ]]; then | ||
refresh_config_from_file | ||
elif [[ "$1" == "stop" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你打算讓腳本除了“start"以外也要具備"stop"的功能嗎?如果是的話,腳本名稱都必須要更換一下,因為現在腳本都是start
開頭
image_name=prom/prometheus | ||
prometheus_port="$(($(($RANDOM % 10000)) + 10000))" | ||
container_name="prometheus-${prometheus_port}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
這邊要確認一下想法,原本這些用container包裝的腳本是預期使用者可以在同一個節點上快速啟動多個實體,因此port的部分才都會用random的方式產生,現在這個改法會讓這個腳本變成只能一台跑一個節點,除非使用者要自行更換port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
快速啟動多個實體
我想不到一臺跑多個 Prometheus 的情境,測試環境內會需要在一臺設備上跑多個 Prometheus 嗎
當初基於上面的理由所以把這邊弄輕鬆一點,node exporter 也因為相同的原因所以預設用固定 port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
主要這些腳本是用來測試,因此蠻有可能在同一個節點跑多個服務,當然如果Prometheus可以串多個kafka的話,實務上也可能用不到,看你的想法
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我覺得 80% 的情況下我們應該還是只會使用一個 Prometheus, 所以針對這個情境下去做預設我覺得好處比較大。如果真的有人需要在同一個 instance 跑多個 Prometheus, 那他再自己改 Port。
要讓使用者用固定特定 port 背後的一個意義是希望他們別把 Prometheus 用拋棄式的方法用它(建起來, 使用, 刪除),重複利用同一個 Prometheus,其裡面保留的過去實驗資料,可以拿來和未來的實驗結果做比較。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果是希望使用者要保留實驗結果的話,也要讓資料能保存在host裡面。
docker/start_prometheus.sh
Outdated
function stop() { | ||
info "Stop prometheus" | ||
docker stop "$container_name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
承接上面,當使用者自行指定port後,使用者在呼叫stop
的時候就必須也指定port才能關閉正確的container,有沒有能防呆的方式?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不然我把這個功能砍掉好了,其實不需要幫他們 stop 也行,有時候這些危險的動作別弄太自動可能比較好。
Context: #10
This PR implement/improve the scripts for the following tools
Test this on our cluster, work well.
dashboard source here.
Demo