You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
tf-operator version :v1.2.1
我们发现一些tfjob 从创建到开始调度耗时很久,达到8分钟,初步结论是tf-operator cpu 配少了 导致消费workqueue 速度不够(tfjob 每天400+,pod 3800+),后来增加了cpu和 threadiness,现在看大致正常了。
因此建议 将client-go workqueue 的metric加入到 tf-operator 的metric,这样分析延迟原因和证明问题解决 都方便一些,也便于用户作为调整cpu和 threadiness 的依据。
The text was updated successfully, but these errors were encountered: