-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"IsAlive" for drivers goes red after a while #279
Comments
by the way I scaled up to 32 drivers and now the controller is losing contact much more quickly now. Within a few hours all the "IsAlive" indicators are red. I have log_level=DEBUG on the controller and all drivers, but nothing is reported in any of the logs to indicate what's wrong. The drivers are responding to hits from my browser, too. I am running the drivers in Docker, and they are launched in such a way that port 18088 inside the container is not necessarily seen as 18088 from the outside. But I am adjusting the configs accordingly and like I said, for some period of time after restarting the controller everything runs fine. |
do you have http proxy set before cosbench running? it's suggested to "unset http_proxy" before running cosbench. |
We're running cosbench controller and drivers inside docker containers, on a Mesos cluster, via Marathon. Definitely there is some network address and port mapping due to docker. I prefer not to eliminate Docker from our test setup though. Mesos is how we run our production services. I was hoping that raising the logging level to DEBUG would tell me more about the failure but there is no logging implemented in that part of the code base.
|
Specifically answering your question: no, we do not have http_proxy set before running cosbench. |
So could you list the information you expect to see from DEBUG level? From: gonewest818 [mailto:notifications@github.com] We're running cosbench controller and drivers inside docker containers, on a Mesos cluster, via Marathon. Definitely there is some network address and port mapping due to docker. I prefer not to eliminate Docker from our test setup though. Mesos is how we run our production services. I was hoping that raising the logging level to DEBUG would tell me more about the failure but there is no logging implemented in that part of the code base.
— |
Well, for example, information to assist with debugging. On the controller side
and logging from the driver side
|
I'm working on this myself. As far as I can see the controller's "ping" is just connecting to the driver hostnames and ports one at a time. I'm logging the attempt and, if an exception is thrown, then I log the exception. |
Thanks for your work, feel free to contribute your code. -yaguang From: gonewest818 [mailto:notifications@github.com] I'm working on this myself. As far as I can see the controller's "ping" is just connecting to the driver hostnames and ports one at a time. I'm logging the attempt and, if an exception is thrown, then I log the exception. — |
Signed-off-by: ywang19 <yaguang.wang@intel.com>
see upcoming 0.4.2.c3 for fixing. |
Signed-off-by: ywang19 <yaguang.wang@intel.com>
As the heading says: We are running the controller and drivers as docker containers on openstack VMs. Overnight the entire system is usually idle (because we are just ramping up with cosbench, trying to understand the system, learning how to construct synthetic workloads, and so on). By morning the controller shows all its drivers as "red". No indication of any error in the logs on the controller nor on the driver. We can recover by restarting the controller, but it seems something is wrong.
The text was updated successfully, but these errors were encountered: