Skip to content

Commit

Permalink
edits
Browse files Browse the repository at this point in the history
  • Loading branch information
freddyaboulton committed Sep 26, 2024
1 parent b14d30c commit 7504773
Showing 1 changed file with 23 additions and 30 deletions.
53 changes: 23 additions & 30 deletions guides/07_streaming/02_object-detection-from-webcam.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,13 @@

Tags: VISION, STREAMING, WEBCAM

In this guide we'll use Yolo-v10 to do real time object detection in Gradio from a user's webcam feed.
Along the way, we'll be using the latest streaming features introduced in Gradio 5.0. You can see the finished product in action below:
In this guide, we'll use YOLOv10 to perform real-time object detection in Gradio from a user's webcam feed. We'll utilize the latest streaming features introduced in Gradio 5.0. You can see the finished product in action below:

![WebRTC Object Detection Demo](https://github.com/user-attachments/assets/4584cec6-8c1a-401b-9b61-a4fe0718b558)


## Setting up

We're going to start by installing all the dependencies. Add the following lines to a `requirements.txt` file and run `pip install -r requirements.txt`:
Start by installing all the dependencies. Add the following lines to a `requirements.txt` file and run `pip install -r requirements.txt`:

```bash
opencv-python
Expand All @@ -20,20 +18,19 @@ gradio-webrtc
onnxruntime-gpu
```

We'll use the ONNX runtime to speed up YoloV10 inference. This guide will assume you have access to a GPU. If you don't, change `onnxruntime-gpu` to `onnxruntime`. Without a GPU the model will run slower so the demo will appear laggy.

Additionally, we'll use opencv for some image manipulation and the [Gradio WebRTC](https://github.com/freddyaboulton/gradio-webrtc) custom component to use [WebRTC](https://webrtc.org/) under the hood to achieve near-zero latency.
We'll use the ONNX runtime to speed up YOLOv10 inference. This guide assumes you have access to a GPU. If you don't, change `onnxruntime-gpu` to `onnxruntime`. Without a GPU, the model will run slower, resulting in a laggy demo.

Tip: If you want to deploy this app on any cloud provider, you'll have to use the free twilio api to use their [TURN servers](https://www.twilio.com/docs/stun-turn). So head there and create a free account. If you are not familiar with TURN servers, consult this [guide](https://www.twilio.com/docs/stun-turn/faq#faq-what-is-nat).
We'll use OpenCV for image manipulation and the [Gradio WebRTC](https://github.com/freddyaboulton/gradio-webrtc) custom component to use [WebRTC](https://webrtc.org/) under the hood, achieving near-zero latency.

**Tip:** If you want to deploy this app on any cloud provider, you'll need to use the free Twilio API for their [TURN servers](https://www.twilio.com/docs/stun-turn). Create a free account on Twilio. If you're not familiar with TURN servers, consult this [guide](https://www.twilio.com/docs/stun-turn/faq#faq-what-is-nat).

## The Inference Function

We'll download the YOLO V10 model from the Hugging Face hub and instantiate a custom inference class to use this model.
We'll download the YOLOv10 model from the Hugging Face hub and instantiate a custom inference class to use this model.

We won't cover the implementation of the inference class in this guide, but the source code is located [here](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n/blob/main/inference.py#L9) if you're interested.
The implementation of the inference class isn't covered in this guide, but you can find the source code [here](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n/blob/main/inference.py#L9) if you're interested.

Tip: We are using the `yolov10-n` variant because it has the lowest latency. See the [Performance](https://github.com/THU-MIG/yolov10?tab=readme-ov-file#performance) section of the README in the yolo-v10 github repository.
**Tip:** We're using the `yolov10-n` variant because it has the lowest latency. See the [Performance](https://github.com/THU-MIG/yolov10?tab=readme-ov-file#performance) section of the README in the YOLOv10 GitHub repository.

```python
from huggingface_hub import hf_hub_download
Expand All @@ -45,45 +42,42 @@ model_file = hf_hub_download(

model = YOLOv10(model_file)


def detection(image, conf_threshold=0.3):
image = cv2.resize(image, (model.input_width, model.input_height))
new_image = model.detect_objects(image, conf_threshold)
return new_image
```

Our inference function called `detection` will accept a numpy array from the webcam as well as a desired conference threshold. Object detection models like YOLO identify many objects and assign a confidence score to each object. The lower the confidence, the higher the chance of a false positive. So we will let our users play with the conference threshold.

The function will return a numpy array corresponding to the same input image with all the detected objects in bounding boxes.
Our inference function, `detection`, accepts a numpy array from the webcam and a desired confidence threshold. Object detection models like YOLO identify many objects and assign a confidence score to each. The lower the confidence, the higher the chance of a false positive. We'll let users adjust the confidence threshold.

The function returns a numpy array corresponding to the same input image with all detected objects in bounding boxes.

## The Gradio Demo

The Gradio demo will be pretty straightforward but we'll do a couple of things that are specific to this use case:
The Gradio demo is straightforward, but we'll implement a few specific features:

* We will use the `WebRTC` custom component to ensure the input and output are sent to/from the server with WebRTC.
* The WebRTC component will be both an input and an output component.
* We'll use the `time_limit` parameter of the `stream` event. The `time_limit` parameter will mean that we'll process each user's stream for that amount of time. In a multi-user setting, such as on Spaces, this means that after this period of time, we'll stop processing the current user's stream and move on to the next.
1. Use the `WebRTC` custom component to ensure input and output are sent to/from the server with WebRTC.
2. The WebRTC component will serve as both an input and output component.
3. Utilize the `time_limit` parameter of the `stream` event. This parameter sets a processing time for each user's stream. In a multi-user setting, such as on Spaces, we'll stop processing the current user's stream after this period and move on to the next.

In addition, we'll apply some custom css so that the webcam and slider are centered on the page.
We'll also apply custom CSS to center the webcam and slider on the page.

```python
css = """.my-group {max-width: 600px !important; max-height: 600 !important;}
.my-column {display: flex !important; justify-content: center !important; align-items: center !important};"""

css = """.my-group {max-width: 600px !important; max-height: 600px !important;}
.my-column {display: flex !important; justify-content: center !important; align-items: center !important;}"""

with gr.Blocks(css=css) as demo:
gr.HTML(
"""
<h1 style='text-align: center'>
YOLOv10 Webcam Stream (Powered by WebRTC ⚡️)
</h1>
"""
<h1 style='text-align: center'>
YOLOv10 Webcam Stream (Powered by WebRTC ⚡️)
</h1>
"""
)
gr.HTML(
"""
<h3 style='text-align: center'>
<a href='https://arxiv.org/abs/2405.14458' target='_blank'>arXiv</a> | <a href='https://github.com/THU-MIG/yolov10' target='_blank'>github</a>
<a href='https://arxiv.org/abs/2405.14458' target='_blank'>arXiv</a> | <a href='https://github.com/THU-MIG/yolov10' target='_blank'>GitHub</a>
</h3>
"""
)
Expand All @@ -106,9 +100,8 @@ if __name__ == "__main__":
demo.launch()
```


## Conclusion

Our app is hosted on Hugging Face Spaces [here](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n).

You can use this app as a starting point to build real-time image applications with Gradio. Don't hesitate to open issues in the space or in the [WebRTC component github repo](https://github.com/freddyaboulton/gradio-webrtc).
You can use this app as a starting point to build real-time image applications with Gradio. Don't hesitate to open issues in the space or in the [WebRTC component GitHub repo](https://github.com/freddyaboulton/gradio-webrtc) if you have any questions or encounter problems.

0 comments on commit 7504773

Please sign in to comment.