Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webots: blinking vgl, some times #172

Closed
juju2013 opened this issue Jul 11, 2021 · 10 comments
Closed

webots: blinking vgl, some times #172

juju2013 opened this issue Jul 11, 2021 · 10 comments

Comments

@juju2013
Copy link

Application: webots R2021a (https://www.cyberbotics.com/)
VirtualGL: 2.6.90
TurboVNC: 2.2.6
Container OS: Ubuntu 18.04.5 LTS
Host OS: Centos 7
GPU: Tesla M2090, driver version: 390.116

When webots runs simulations, some run all good, but some have the main display (vgl window) blinking. Screen recorded example: https://video.fbxl.net/videos/watch/3389fa83-c31e-4cc4-a861-805e94b16a59

By trial and error, it looks like that wired behavior is linked to some "robots" (PROTO Nodes/robots), but not all of them; in the same simulation, adding some robots will cause blinking, remove them and add other robots and the simulation will run OK.

Here's a trace output of a (almost) empty simulation: http://dl.free.fr/lH4IZIt0F

The same plus a Boston Dynamics dog (screen recorded one): http://dl.free.fr/wl8bbqSMD

Command to lauch webots: vglrun +v -d /dev/dri/card1 webots

Docker compose file:

webots:
container_name: webots
shm_size: 2G
image: webots
runtime: nvidia
privileged: true
environment:
- NVIDIA_VISIBLE_DEVICES=all
networks:
dockernet:
ipv4_address: 10.99.0.33
volumes:
- /home/xxx/webots/:/data/
- ./init.sh:/init.sh
- ./xorg.conf:/etc/X11/xorg.conf
devices:
- "/dev/dri"
- "/dev/vga_arbiter"
- "/dev/nvidia0"
- "/dev/nvidia1"
- "/dev/nvidiactl"
- "/dev/nvidia-modeset"
- "/dev/nvidia-uvm"
- "/dev/nvidia-uvm-tools"
- "/dev/fb0"
command:
/init.sh

@dcommander
Copy link
Member

Reproduced. Investigating. This issue is specific to the EGL back end, which is not surprising.

@dcommander
Copy link
Member

In the process of investigating this issue, I discovered several other issues with the EGL back end and fixed those, but I have yet to find the cause of this issue. Since I have very limited resources to work on the EGL back end, it may take a while for this issue to get fixed. Please be patient.

@dcommander
Copy link
Member

Thanks to piglit, I have discovered and fixed numerous conformance issues in the EGL back end, but this issue still eludes me. Either it is somehow related to rarely-used OpenGL and GLX features that are still missing in the EGL back end (see #134 and #136) or it is yet another conformance issue that hasn't been uncovered yet. Unfortunately I must declare defeat for now.

@LeehanLee
Copy link

I found that seems only if I launched a robot which included webots/camera.h and used the camera related API, the main display (vgl window) blinking.
image
image

blinking1.mp4

if I comment out these camera related code and rebuild the project, the blinking disappears:

non-blinking.mp4

Don't know how to solve this issue.

@dcommander
Copy link
Member

@LeehanLee That is a good clue. I reproduced the issue, and it is almost certainly a bug in VGL, but I have thus far been unable to find the bug. Now that I know that it is specific to one mode of operation, I can hopefully look at the application source code and figure out what that mode of operation does at the OpenGL level.

@dcommander
Copy link
Member

I have spent more hours trying to diagnose this, including comparing the apitrace output with and without cameras enabled. Unfortunately I am still at a loss.

@LeehanLee
Copy link

LeehanLee commented Jun 18, 2022

I found the description of the Webots function "wb_camera_enable" here:
https://www.cyberbotics.com/doc/reference/camera#wb_camera_enable
image

you can see from the above screenshot, I changed the second parameter "sampling_period" to "50 * time_step"(which was "2 * time_step" before I changed it), and then I found that the blinking frequency in the rendering area was slowed down:

slowly_blinking.mp4

I'm not sure if this could help you to diagnose this issue.

@dcommander
Copy link
Member

That is a good clue. I’ll see if I can find where in the code it copies the image.

@dcommander
Copy link
Member

I am now able to build Webots from source and get an OpenGL API trace from it, both with and without cameras enabled. Unfortunately, it hasn't revealed any obvious issues, so I am still clueless. It still isn't clear exactly how enabling cameras changes the OpenGL call sequence. I have tried to add print statements to the code to understand the mechanism by which that happens, but so far it hasn't been revealing. Unfortunately I have to shelve this yet again, as I don't have any more time right now to pursue it.

@dcommander
Copy link
Member

Ouch, that was difficult. It ultimately took more than 60 uncompensated hours to diagnose the problem, and I still cannot figure out how to reproduce it in isolation (using fakerut.) However, the following patch seems to fix it:

--- a/server/backend.cpp
+++ b/server/backend.cpp
@@ -73,25 +73,32 @@ static FakePbuffer *getCurrentFakePbuffer(EGLint readdraw)
 void bindFramebuffer(GLenum target, GLuint framebuffer, bool ext)
 {
 	#ifdef EGLBACKEND
+	const GLenum *oldDrawBufs = NULL;  GLsizei nDrawBufs = 0;
+	GLenum oldReadBuf = GL_NONE;
+	FakePbuffer *drawpb = NULL, *readpb = NULL;
+
 	if(fconfig.egl)
 	{
 		if(framebuffer == 0)
 		{
 			if(target == GL_DRAW_FRAMEBUFFER || target == GL_FRAMEBUFFER)
 			{
-				FakePbuffer *pb = pbhashegl.find(getCurrentDrawableEGL());
-				if(pb)
+				drawpb = pbhashegl.find(getCurrentDrawableEGL());
+				if(drawpb)
 				{
-					framebuffer = pb->getFBO();
+					oldDrawBufs =
+						ctxhashegl.getDrawBuffers(_eglGetCurrentContext(), nDrawBufs);
+					framebuffer = drawpb->getFBO();
 					ctxhashegl.setDrawFBO(_eglGetCurrentContext(), 0);
 				}
 			}
 			if(target == GL_READ_FRAMEBUFFER || target == GL_FRAMEBUFFER)
 			{
-				FakePbuffer *pb = pbhashegl.find(getCurrentReadDrawableEGL());
-				if(pb)
+				readpb = pbhashegl.find(getCurrentReadDrawableEGL());
+				if(readpb)
 				{
-					framebuffer = pb->getFBO();
+					oldReadBuf = ctxhashegl.getReadBuffer(_eglGetCurrentContext());
+					framebuffer = readpb->getFBO();
 					ctxhashegl.setReadFBO(_eglGetCurrentContext(), 0);
 				}
 			}
@@ -107,6 +114,20 @@ void bindFramebuffer(GLenum target, GLuint framebuffer, bool ext)
 	#endif
 	if(ext) _glBindFramebufferEXT(target, framebuffer);
 	else _glBindFramebuffer(target, framebuffer);
+	#ifdef EGLBACKEND
+	if(fconfig.egl)
+	{
+		if(oldDrawBufs)
+		{
+			if(nDrawBufs == 1)
+				drawpb->setDrawBuffer(oldDrawBufs[0], false);
+			else if(nDrawBufs > 0)
+				drawpb->setDrawBuffers(nDrawBufs, oldDrawBufs, false);
+			delete [] oldDrawBufs;
+		}
+		if(oldReadBuf) readpb->setReadBuffer(oldReadBuf, false);
+	}
+	#endif
 }

In a nutshell, the complexity of the Webots camera rendering code exposed a really esoteric aspect of FBO behavior, which is that the draw and read buffer state is attached to the FBO state, but when using the default framebuffer, the draw and read buffer state should be attached to the context instead. I already emulated that behavior in the EGL back end glXMake*Current() functions but didn't realize that I also needed to emulate it in the EGL back end implementation of glBindFramebuffer(..., 0). Ugh. Did I mention how much simpler this would be if EGL supported a "multiview" Pbuffer extension similar to EGL_EXT_multiview_window? I tried to get nVidia on board with that several years ago, but no dice. Thus, here I am with hundreds of unpaid hours invested in the EGL back end, with barely enough money in the General Fund to maintain it, much less divert project resources for weeks to track down complicated bugs with it.

I am going to do some regression testing tomorrow, and I should be able to push this patch by the end of the day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants