Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Resource temporarily unavailable" on NVIDIA Jetson TX2 #6780

Closed
rcroset opened this issue Jul 8, 2020 · 91 comments
Closed

"Resource temporarily unavailable" on NVIDIA Jetson TX2 #6780

rcroset opened this issue Jul 8, 2020 · 91 comments

Comments

@rcroset
Copy link

rcroset commented Jul 8, 2020

Required Info
Camera Model D400
Operating System & Version Ubuntu 18.04.4
Kernel Version (Linux Only) 4.9.140-tegra
Platform NVIDIA Jetson TX2
SDK Version 3.35.2 }
Language python
Segment other

Issue Description

I'm currently working on a system involving 3 D435 cameras connected to a NVIDIA Jetson TX2 platform via a USB3 hub. Quite often when trying to access the cameras, I get a "Resource temporarily unavailable" from one of the camera and I can't access the device that throws this error, even in the RealSense Viewer. It happens quite randomly, on any of the cameras. To get things back on track, I have to shutdown the TX2 platform and reconnect the USB hub manually. Is there any way to do this (programmatically or other) without having to manually reconnect things ? Since this system will soon go to production, it won't be easy to manually fix things up. Any help or suggestion on how to get rid of this issue ? Many thanks :)

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 8, 2020

Hi @rcroset You can add a hardware_reset() routine to your scripting to perform an automated hardware reset of the camera that has a similar effect to physically unplugging and re-plugging the camera in the USB port. The link below has example scripting for doing so with Python and multiple cameras using the camera serial numbers:

#5428

If you are able to check the CPU usage of your TX2, what percentage of the CPU is being used when your project is running, please? Is it "maxing out" at or near 100% usage?

If you are using poll_for_frames() in your project (as is recommended for multicam applications) then it is important to control when the CPU is put to sleep and for how long, otherwise the CPU can max-out its processing. More information about this can be found in the link below.

#2422 (comment)

@rcroset
Copy link
Author

rcroset commented Jul 9, 2020

I unfortunately can't perform the hardware_reset() routine because I cannot access the device. When I want to query a device from the context, I get the error (see below), so I cannot call any routine on any devices:

>>> ctx = rs2.context()
>>> list = ctx.query_devices()
>>> list[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: set_xu(ctrl=1) failed! Last Error: Resource temporarily unavailable

The CPU usage of the TX2 is quite high (but still not near 100%) when running my project, but even when running the snippet above the error occurs.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 9, 2020

@tispratik had a similar situation with this set_xu(ctrl=1) failed! Last Error: Resource temporarily unavailable error when using Ubuntu and Python, and doing a hardware_reset() did not correct the problem for them either.

#6132

@rcroset
Copy link
Author

rcroset commented Jul 9, 2020

Thank you for pointing this link. Unfortunately, it doesn't seem to provide a solution. Rebooting the system is not enough in our case.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 9, 2020

In past cases where doing a hardware_reset() on the camera has not been practical due to detection problems and you cannot reset the computer, there has been the possibility of achieving the same effect by using a USB port reset script with Ubuntu. Please google for ubuntu usb port reset script for more details.

@rcroset
Copy link
Author

rcroset commented Jul 9, 2020

I've already tried that kind of script, unsuccessfully. It appears that only a manual reconnection works.

@MartyG-RealSense
Copy link
Collaborator

Thanks for your patience. I have been considering further possibilities. To aid that analysis, could you tell me please if the project is being used outdoors, with the cameras exposed to sunlight?

@rcroset
Copy link
Author

rcroset commented Jul 9, 2020

The project is used indoor. There are a small window providing sunlight but the cameras are not directly exposed (i.e. they are on the same wall as the window).

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 9, 2020

Okay, thank you very much. My reading of your situation is that 3 cameras are attached to one TX2 board. And when one camera stops responding, the other two remain accessible. Is that correct, please?

What FPS speed are the cameras running at?

@rcroset
Copy link
Author

rcroset commented Jul 9, 2020

The 3 cameras are actually connected to a USB3 hub connected to the TX2 board. When a camera stops responding, it also shuts down the cameras plugged below on the hub (e.g. when the camera plugged on the USB slot 2 of the hub stops responding, it also affect the one plugged in slot 3). So when the first cameras stops responding, it blocks all cameras.

The cameras are running at 6 FPS with maximum resolution for depth and color stream.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 9, 2020

6 FPS is a speed that is prone to errors that disappear at 15 FPS or higher. Would it be possible to try 15 FPS and see if you continue to experience the problems, please?

@rcroset
Copy link
Author

rcroset commented Jul 10, 2020

Ok I'll try this and I'll get back to you in a few days to inform you if the problem comes back.

How is it possible that low FPS can introduce such errors that disappear at higher framerate ?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 10, 2020

I believe it is because when the frames are updating at a slow rate, it can lead to timeouts while waiting for frames to arrive.

@rcroset
Copy link
Author

rcroset commented Jul 16, 2020

We've been running at 15 FPS for almost one week and this problem didn't occur (yet). But a new problem appeared. The depth frames seemed not to update anymore. They update every minute or so. I've tried using wait_for_frames and poll_for_frames and nothing changes. When I switch back to 6 FPS, everything is back to normal. How is this possible ?

@MartyG-RealSense
Copy link
Collaborator

As you are using Python, how you are storing the frames could be a factor. Like in the case below:

#946

@rcroset
Copy link
Author

rcroset commented Jul 16, 2020

Thanks for this link. However, I am not storing the frames in an array. All post-processing is done on a copy of the frame, and the variable containing the original frame is overridden at each iteration of the loop.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 16, 2020

My understanding is that when a frame is modified (for example, by a post-processing filter), the original is not destroyed but instead a copy of the frame is automatically created. Every frame has a counter stating how many copies of the same frame are being held in the frameset (which is a collection of frame objects).

Old frames get pushed out when a new frame enters the frame queue. wait_for_frames() pulls a frame from the queue.

@rcroset
Copy link
Author

rcroset commented Jul 16, 2020

Indeed, but the original is destroyed and overridden by the return value of wait_for_frames. How to access this copy counter?

Is it possible to have an example on how to properly use wait_for_frames or poll_for_frames for multicam in Python? I can't find any.

@MartyG-RealSense
Copy link
Collaborator

Remember from earlier in this case that for multicam projects, it is recommended that poll_for_frames() is used instead of wait_for_frames(). This includes programs that do not use multicam hardware sync, such as rs-multicam.

#6780 (comment)

I could not find a short and neat example of multicam use of poll_for_frames() in Python, though a device manager script in the multi-camera box_dimensioner_multicam example program makes use of it:

https://github.com/IntelRealSense/librealsense/blob/master/wrappers/python/examples/box_dimensioner_multicam/realsense_device_manager.py#L208

You can read the frame counter from metadata with Python, though I'm not sure that is what you were asking for.

#3179 (comment)

@rcroset
Copy link
Author

rcroset commented Jul 16, 2020

I remember, thanks :)
I tried to do the same thing as in the box_dimensioner_multicam example, but still, the depth frames do not update. The color frames are fine. And when switching back to 6 FPS, the problem disappears.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 16, 2020

I apologise for the limitations of my Python programming knowledge, which may be slowing this diagnostic process.

Some Python users who have multiple streams active have found that it can help to separate the stream types into different pipelines. The Python script in the link below separates RGB / depth in one pipeline and IMU in the other. Whilst you are not using IMU, conceivably you could put RGB in one pipeline and depth in the other pipeline.

#6773 (comment)

@rcroset
Copy link
Author

rcroset commented Jul 16, 2020

Thanks for the link and the idea but I'm not sure this will suit our project. We need to gather color frames and the corresponding depth frames to be able to look at them at the same time. Will those two pipelines be perfectly synchronous with each other?

@MartyG-RealSense
Copy link
Collaborator

I would think so. Librealsense can freely pass data between threads for purposes such as having a different processing pipeline for each stream type.

https://dev.intelrealsense.com/docs/frame-management#section-frames-and-threads

@rcroset
Copy link
Author

rcroset commented Jul 16, 2020

Thanks, I will try this. Do you recommend continuing to use pipelines or switching to something else like a syncer ?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 16, 2020

My understanding is that syncer is useful for synchronizing between different streams and between different devices (e.g all sensors across all cameras). So it may be appropriate for your three-camera project.

@MartyG-RealSense
Copy link
Collaborator

Hi @rcroset Do you have an update that you can provide about this case, please? Thanks!

@rcroset
Copy link
Author

rcroset commented Sep 9, 2020

Hi @MartyG-RealSense we still didn't tried your idea, and we have issues with the cameras disconnecting by themselves forcing us to reboot the platform every hour, so things are moving slowly. I'll come back to you as soon as possible.

@MartyG-RealSense
Copy link
Collaborator

Thanks @rcroset for the update - good luck with your work.

@rcroset
Copy link
Author

rcroset commented Sep 10, 2020

Hi @MartyG-RealSense ! We still didn't try your idea (increasing the frame queue size) but we noticed something strange during out investigations... Just before the cameras die and crash the whole USB system (which forces us to reboot to have again access to the cameras), it seems that librealsense calls USBDEVFS_CLEAR_HALT (similar dmesg messages as mentioned in issue #6123). Recall that the cameras might run perfectly fine for about one hour before crashing.
Any suggestions ?
FIY we recompiled the library with the CMAKE flag FORCE_RSUSB_BACKEND set to True, as suggested in issue #6123

@MartyG-RealSense
Copy link
Collaborator

Does your program perform checking of events during the long-run? The RSUSB method has the potential to miss events because it only checks for device changes every 5 seconds by default.

#6921 (comment)

@rcroset
Copy link
Author

rcroset commented Sep 16, 2020

Hi @MartyG-RealSense! Sorry for the long delay answer. We don't perform checking of events. We'll try to do as suggested in the issue you pointed soon.
In the meantime, I have another question. It seems that those issues mainly come from the USB traffic and the USB bus that has trouble to handle frames coming from 3 cameras at the time. In our project, all cameras need to be streaming together but only for a given amount of time, and then can be stopped, but should be able to start again quite quickly. For now we let them stream and just drop the frames when we don't need them. To reduce the amount of USB traffic while we don't need the frames, we are looking for a way to pause the pipelines (i.e. keep the cameras alive but not streaming on USB) without having to perform auto-exposure. We can't just stop the pipeline and start it again later because we need them to be reactive and the auto-exposure takes too much time. Is there a way to do so ? I've tried looking at the documentation and examples but didn't found anything.
(Sorry if this is a little off-topic for this issue)

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Sep 16, 2020

Hii @rcroset What some projects do to keep the cameras running in 'low power' mode when capture is not currently required is to set the Laser Power value to zero. Doing so turns off the projector but keeps the pipeline active. When a capture needs to be performed, the Laser Power value is increased above zero, activating the projector. When Laser Power is minimized, the depth image will be sparse in detail. When the capture is completed, Laser Power is set to zero again.

A way to have fine-control over timing and camera triggering that is compatible with D435 is external synchronization (genlock).

https://www.intelrealsense.com/depth-camera-external-sync-trigger/

https://dev.intelrealsense.com/docs/external-synchronization-of-intel-realsense-depth-cameras

@rcroset
Copy link
Author

rcroset commented Sep 17, 2020

Thanks for your answer.
So if I understood correctly, when the projector is off, no frames (color and depth) are sent through the USB cables ?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Sep 17, 2020

The projector is a component that is separate from the imagers. It can enhance the image by providing light and a dot-pattern projection that the camera can use as a texture source to perform depth analysis of surfaces that have low texture or no texture (doors, walls, desks, etc). The streams will still be active if the projector is turned off. By having Laser Power minimized, you can reduce the camera's power draw and lower the operating temperature during periods of low / no activity between capture periods.

@MartyG-RealSense
Copy link
Collaborator

Hi @rcroset Do you still require assistance with this case, please? Thanks!

@rcroset
Copy link
Author

rcroset commented Sep 28, 2020

Hi @MartyG-RealSense ! Yes, we still have issues with the cameras disconnecting for no reasons and killing the whole usb bus. We need to fix this before going any further with this issue. I'll come back to you as soon as we have some results.

Btw if you have any hints for those cameras disconnecting, we'll be happy to read about them ;)

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Sep 28, 2020

SDK 2.35.2 was the version where improvements to the handling of multicam were introduced. This was during a period where there were a number of cases where Jetson boards were having non-detection problems with more than one camera (with the failed to set power state error particularly), so this SDK version is a good choice for Jetson multicam.

The improvements mainly address problems related to rs2::pipeline though, and Jetson issues related to specific models of USB hub may still occur. A brand of mains-powered USB 3 hub that Intel have successfully tested with when developing their multiple camera white-paper document is AmazonBasics. I have one myself on my workstation and have no problems with it.

@rcroset
Copy link
Author

rcroset commented Sep 28, 2020

Great, thanks for your answer ! We'll try to upgrade the SDK version as soon as possible. I'll keep you posted

@MartyG-RealSense
Copy link
Collaborator

Hi @rcroset Do you have an update for us please? Thanks!

@rcroset
Copy link
Author

rcroset commented Oct 9, 2020

Hi @MartyG-RealSense Not yet, we still have some other issues to solve before going any further

@MartyG-RealSense
Copy link
Collaborator

Okay, thanks very much @rcroset for the update. I will keep this case open for a further time period.

@MartyG-RealSense
Copy link
Collaborator

Adding a note to keep this case open for a further time period.

@MartyG-RealSense
Copy link
Collaborator

Adding a note to keep this case open for a further period.

2 similar comments
@MartyG-RealSense
Copy link
Collaborator

Adding a note to keep this case open for a further period.

@MartyG-RealSense
Copy link
Collaborator

Adding a note to keep this case open for a further period.

@MartyG-RealSense
Copy link
Collaborator

Hi @rcroset Do you have an update about whether you are ready to proceed with the subject on this case, please? Thanks!

@rcroset
Copy link
Author

rcroset commented Nov 24, 2020

Hi @MartyG-RealSense. Not yet, sorry. But we have observed that the cameras crash less often when also enabling the infrared stream. Unfortunately, we cannot investigate further yet as we have some other things to take care of before. Sorry the the long delay.

@MartyG-RealSense
Copy link
Collaborator

No problem @rcroset I totally understand - thanks for the continued updates.

@MartyG-RealSense
Copy link
Collaborator

Adding a note to keep this case open for a further time period.

3 similar comments
@MartyG-RealSense
Copy link
Collaborator

Adding a note to keep this case open for a further time period.

@MartyG-RealSense
Copy link
Collaborator

Adding a note to keep this case open for a further time period.

@MartyG-RealSense
Copy link
Collaborator

Adding a note to keep this case open for a further time period.

@MartyG-RealSense
Copy link
Collaborator

Hi @rcroset Do you have an update that you can provide, please? Thanks!

@MartyG-RealSense
Copy link
Collaborator

Case closed due to no further comments received.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants