Author: Fasermaler
This documentation serves to collate various OpenCV concepts I had to self-learn. I have also compiled some of them into easy-to-use template code.
This will not be a carbon copy of the OpenCV documentation because it would not make sense to lift the official docs when referencing the official docs is faster and more comprehensive. Instead, it serves to bridge the holes in understanding and certain approaches I have found to work in the past.
I will also attempt to supplement the content with real problems and solutions I have worked on in the past where possible.
- Imports
1.1 Basic Imports
1.1.1 cv2
1.1.2 numpy
1.2 Other Useful Imports
1.2.1 glob
1.2.2 Pillow
1.2.3 imutils
1.2.4 Time - Image I/O
2.1 Image Read
2.2 Image Write
2.3 Supported Image Extensions - Video I/O
3.1 Webcam/USB Connected Camera
3.2 IP Camera
3.2.1 Process
3.2.2 Troubleshooting Notes
3.3 Full Video Display Script
3.4 Video Writing - Image Related Functions
4.1 Shape
4.2 Slicing Images - Keystrokes
5.1 Close Program
5.2 Adding Additional User Functionality with Keystrokes
5.2.1 Working with Special Keys - Working with Picamera
6.1 Imports
6.2 Initialization
6.2.1 Additional Paramerters
6.2.1.1 Setting White Balance
6.3 Reading from Pi Camera
6.3.1 Video Formats
6.3.2 Flushing the Frame Buffer
6.4 Scripts - Drawing Functions
7.1 Rectangle
7.2 Text
7.2.1 Font Options
7.2.2 Common Text Locations
7.3 Line
7.4 Circle
7.5 Other Drawing Functions - Hough Lines Transform
8.1 Basic Hough Lines Transform Implementation - Hough Circles Transform
9.1 Simple Demo - Morphology Transforms
10.1 Preparing an Image for Morphology Transforms
10.1.1 Convert to Grayscale
10.1.2 Image Thresholding
10.1.3 Kernel
10.1.4 Inverting an Image
10.2 Erode and Dilate
10.3 Black Hat and Top Hat Transforms
10.4 Demo Script - Centroid Detection
11.1 Contour Detection
11.2 Centroid Detection of Contours - Mouse Events
12.1 Bind Callback Function
12.2 Mouse Event Detection
12.3 Scripts - SIFT
13.1 SIFT in OpenCV 3
13.2 Using SIFT
13.3 When not to use SIFT - Camera Calibration
14.1 Checkerboard Calibration
14.2 Getting the Camera Mtx
14.3 Getting the Optimal Camera Matrix for Undistortion
14.4 Steps for Demo - Aruco Marker Detection
- Miscellaneous Tricks and Scripts
16.1 Limit Display FPS
16.2 Video Cropping Utility
16.3 ROI Selection Utility
16.4 Use Odd Shaped ROI
16.5 Working with Model Bounding Boxes
16.6 Circle center detection using Circumcenter
16.7 Arrow Detection with Hough Lines Transform
16.8 Red Color Thresholding Using RGB
16.9 OpenCV 4 Build Instructions
16.10 Optimized OpenCV Build Instructions for Raspberry Pi
Basic imports needed for OpenCV to even work.
import cv2
Imports OpenCV. On older code, you might have to import it as cv
instead and even then there might be a few changes needed as some attribute names were modified.
import cv2 as cv
import numpy as np
np
is simply the convention for most example code. If the code does not explicitly require numpy
, this import statement can be omitted without significant issues.
Here are some other additional imports that could be useful. Do note that these will have to be installed in the virtual environment or just pip
installed depending on the circumstance.
import glob
glob
is a very useful in-built python library when trying to process multiple images or videos. It allows the program to list files within a directory (even by extension), which then allows the program to cv2.imread
the files sequentially for processing. Glob documentation can be found: here.
import PIL
Or sometimes:
import PIL as Image
PIL
or the Pillow
library is a powerful image processing python library that adds additional tools to the script in modifying images. These include resizing, rotation, histograms, automatic contrast enhancements and more. Full documentation: here.
import imutils
imutils
or image utilities
is a convenience library that contains various functions such as skeletonization, translation, fps counter and etc. It's most important use is when referring to sample code by pyimagesearch as their code sees judicious use of the imutils
library. Documentation for imutils
can be found: here.
import time
time
is a very useful library overall but especially when dealing with image post-processing work. It can be used to generate the evaluation time per frame as well as used as the file name for output images.
Tip: If the script requires an FPS counter, you can simply use imutils
instead of making one on your own using the time
library. It will help you to save... time.
Basic image read function.
image = cv2.imread('/path/to/image.jpg')
Accepts both relative and absolute paths - though absolute paths is recommended especially when dealing with neural networks and the like.
It is also possible to set flags when calling .imread()
image = cv2.imread('/path/to/image.jpg', CV_LOAD_IMAGE_COLOR) # Convert image to color
image = cv2.imread('/path/to/image.jpg', CV_LOAD_IMAGE_GRAYSCALE) # Convert image to grascale
Basic image write function.
cv2.imwrite('/path/to/image.jpg', frame)
This writes the pixel data stored in frame
as an image saved on the HDD at the path /path/to/image.jpg
.
It is possible to set certain parameters when calling .imwrite()
cv2.imwrite('/path/to/image.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 90]) # Set JPEG quality to 90
cv2.imwrite('/path/to/image.png', frame, [CV_IMWRITE_PNG_COMPRESSION, 9]) # Set PNG compression level to 9
The following is a list of supported file extensions for images for both .imread()
and .imwrite()
.
- Bitmaps:
*bmp
,*.dib
- JPEG:
*.jpeg
,*.jpg
,*.jpe
- JPEG 2000:
*.jp2
- Portable Network Graphics:
*.png
- WebP:
*.webp
- Portable image format:
*.pbm
,*.pgm
, *.ppm` - Sun Rasters:
*.sr
, `.ras - TIFF:
*.tiff
,*.tif
Do note that some file extensions might look for codecs within the OS image. It might be necessary to install the necessary packages for full support (though JPEG and PNG are always supported).
To get footage from a webcam or USB connected camera, it is necessary to first define the capture object (the convention is to call it cap
). Then invoke the .read()
subroutine on the capture object to yield 1 frame.
cap = cv2.VideoCapture(0) # Set the capture object
ret, frame = cap.read() # Returns a frame
cap.release() # Release the capture object
Firstly, the VideoCapture
object is set to 0
. By default, the first webcam connected to the computer will be 0
. The number can be changed if there are multiple webcams and a specific one has to be connected to. Additionally, if a script connected to a camera fails to exit properly or release the capture object properly, you might find that the capture object number has to be incremented on the subsequent runs. Due to this, ensure that scripts always exit properly and release the capture object before doing so.
.read()
returns a return, ret
as well as the frame, frame
. ret
and frame
are merely common conventions.
It is also possible to connect to an IP camera as OpenCV supports FFMPEG via HTTP or RTSP. Simply feed the relevant URL into the VideoCapture
object. ispyconnect has a good list of camera URLs and their open source utility can be used to find the appropriate URL of the camera as well.
-
Check that the IP address assigned to the camera is within the same range of the camera. For instance, if the camera is set to
192.168.0.11
, the computer IP address should be set to192.168.0.xx
. This can be checked usingipconfig
in the command prompt (windows) orifconfig
in the bash terminal (linux).- If the IP addresses are not within the same range, the camera cannot be connected to. Assign a static IP to the computer under the network settings
-
Login to the camera normally using the manufacturer's camera viewing utility to verify the camera is operational and that the login details are correct.
- Alternatively, connect to the camera using the ispyconnect utility
-
If everything is in order, find the correct URL - it should look something like this:
http://admin:12345@192.168.0.11/?cgi-bin/h264
Simply feed that exact URL into the VideoCapture
object:
cap = cv2.VideoCapture(http://admin:12345@192.168.0.11/?cgi-bin/h264) # Set the capture object
# Rest of the code is identical
ret, frame = cap.read()
cap.release()
If the URL does not seem to work, do not worry - as long as the URL works on browser or in ispyconnect, the actual URL cannot be far off.
-
For RTSP, generally only the login details and IP are required. So if there are extra strings behind the URL, try removing them to see if it works.
cap = cv2.VideoCapture(http://admin:12345@192.168.0.11/MediaInput) # This might not be working cap = cv2.VideoCapture(http://admin:12345@192.168.0.11) # But this might work
-
Check that the port is correct - sometimes the URL will require a port. Remove the port if there is one and try to find a port if there isn't one.
-
Double check the URL - ensure that you are not using HTTPS or that the RTSP link does not start with a HTTP header by mistake.
In most cases, we do not only want 1 frame but we seek to grab multiple frames from the camera. This section will show the full example script.
import cv2
cap = cv2.VideoCapture(0) # Set the capture object
while True:
ret, frame = cap.read() # Returns a frame
if ret:
cv2.imshow('frame', frame)
# break is 'Q' is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
# break if no frames left in the video
break
cv2.destroyAllWindows() # Destroy the viewing window
cap.release() # Release the capture object
To write video, first define the output object with the relevant codec:
img_width = 640
img_height = 480
video_fps = 30
out = cv2.VideoWriter('/path/to/output.avi', cv2.VideoWriter_fourcc(*'XVID'), video_fps, (img_width, img_height)) # defines output object for .avi output format
out = cv2.VideoWriter('/path/to/output.mp4', cv2.VideoWriter_fourcc(*'MPEG'), video_fps, (img_width, img_height)) # defines output object for .mp4 output format
When defining the output writer, the resolution and codec are very important. Ensure that the codec matches the video format to be saved. If the resolution of the writer object does not match the frame size the video file will also be corrupted.
If the video FPS does not match up, the output video might be faster or slower though it does not result in file corruption.
Whenever a new frame is obtained, simply write the frame to the video writer:
out.write(frame)
Then close the output writer at the end of the program. If the output writer is not closed properly, the video file will also be corrupted.
out.release()
The shape attribute returns the rows, columns and channels of the image (in that order).
img_height = image.shape[0]
img_length = image.shape[1]
img_layers = image.shape[2]
.shape[0]
and .shape[1]
are some of the most commonly used attributes. It is easy to confuse the two so do refer back when troubleshooting code.
It is possible to slice images by working on them as numpy arrays. This would allow a specific section of the image to be cropped and saved as another image.
crop = frame[y_start:y_end, x_start:x_end]
A more concrete example can be found in the split_four.py
script that can be found in the example code folder.
def split_into_four(frame):
Width, Height = frame.shape[1], frame.shape[0]
crop_h, crop_w = int(Height/2), int(Width/2)
# Get the 4 cropped frames and put them in an array
crop1 = frame[0:crop_h, 0:crop_w]
crop2 = frame[0:crop_h, crop_w:Width]
crop3 = frame[crop_h:Height, 0:crop_w]
crop4 = frame[crop_h:Height, crop_w:Width]
crop_array = [crop1, crop2, crop3, crop4]
return crop_array
In this example, the image is split into 4 equally sized images. The way this is done is by finding the halfway point of both the height and width of the frame (assigned to crop_h
and crop_y
).
The script then proceeds to slice the image 4 times to form 4 cropped images which are then placed within a crop array to be returned by the function.
OpenCV uses the waitKey
function to draw new frames on to any OpenCV active windows. This means that the waitKey
function has 2 secondary purposes
- Primary Function: Update the window
- Secondary Function 1: Allows developer to control the delay of the update loop
- Secondary Function 2: Allows developer to check for specific keystrokes to do specific actions
The most ubiquitous use of all is the use of the waitKey
function to close a program.
In most OpenCV programs (especially pertaining to video), there is an active loop within the program. Should this loop be handled or exited incorrectly, it can cause various memory or GPU issues.
This is why the waitKey
function is often used to check if the quit key (usually assigned to q
by convention) has been pressed.
while True:
ret, frame = cap.read()
if ret:
cv2.imshow('frame', frame)
# if 'q' was pressed, break out of the loop
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Most people have taken to appending if cv2.waitKey(1) & 0xFF == ord('q'):
automatically to the end of their loops. But I personally think that the next approach, while more verbose is superior. It is more extensible and builds a better habit than appending a line without thinking about what it actually means.
In the case where you'd like to add a functionality to c
apture an image in a video stream, the waitKey
function can also be used to help detect the c
keystroke just like it was used to detect the q
keystroke as a break condition.
key = cv2.waitKey(1) & 0xFF
if key == ord("q"):
break
elif key == ord('c'):
cv2.imwrite('output.jpg', image)
In the event that a special key is desired such as Esc
, spacebar
and etc. The dec code of the key can be used instead. Refer to the ASCII table for specific dec codes.
The following is a snippet of code from an ROI selection utility I wrote in the past. In this case, the Esc
key is used to allow the user to reset the currently selected ROI points.
# Various key events
key = cv2.waitKey(1) & 0xFF
# Esc key will reset the ref_points list
if key == 27: # 27 is the dec value for the Escape Character
ref_points = []
status = "RESET"
# U key will remove last point selected
if key == ord('u'):
ref_points.pop()
status = "UNDO"
# S key will save if the ref_points list has 4 points
if key == ord('s'):
if len(ref_points) < 4:
status = "ERROR"
else:
cfg_read.write_roi_coordinates(*corrected_ref_points)
status = "SAVED"
# Q will exit
if key == ord('q'):
break
The RPi Picamera is unique as it connects directly to the RPi using a ribbon cable. Due to this it requires use of the picamera
library (to be installed separately) and does not use the VideoCapture
object.
picamera
generally requires 2 imports in order to work.
from picamera.array import PiRGBArray
from picamera import PiCamera
Basic initialization of the Pi Camera is as follows:
resolution_height = 240
resolution_width = 426
camera = PiCamera()
camera.resolution = (resolution_width, resolution_height)
camera.framerate = 24
rawCapture = PiRGBArray(camera, size=(resolution_width, resolution_height))
Check that the resolution is correct. Most Pi Cameras support a few very specific resolutions and associated frame-rates. The picamera
library will warn and default to the closest available resolution if the input resolution is invalid. However, this might break other parts of the code so do be aware of this.
One of the more powerful aspects of using a Pi Camera is the ability to modify the camera settings on the software side instead of going into a camera menu built into the camera firmware. This allows the camera lighting and other related settings to be set to specifically be optimal for a set of lighting conditions, boosting performance in specific use cases.
Some other useful parameters include the following:
camera.exposure_mode = 'off' # Turn off auto exposure compensation
camera.exposure_compensation = -3 # Set the specific level of exposure compensation
camera.drc_strength = 'off' # Sets the dynamic range compression
camera.still_stats = False # Retrieves or sets whether statistics will be calculated from still frames or the prior preview frame
camera.awb_mode = 'off' # Turn off auto white balance
camera.awb_gains = (Fraction(25, 16), Fraction(25,16)) # Set a specific white balance gain
The full API (and associate list of options) can be found: here.
With regards to camera.awb_gains
, it is better to print out the awb
value first by invoking the camera attribute:
print(camera.awb_gains)
Then compensate accordingly. The .awb_gains
attribute requires the use of the fraction
library so the full setting should be as follows:
from fractions import Fraction
camera.awb_mode = 'off' # Turn off auto white balance
camera.awb_gains = (Fraction(25, 16), Fraction(25,16)) # Set a specific white balance gain
Reading video from the Pi Camera involves reading from the camera rawCapture
buffer. This means that the entire capture object method normally used is completely circumvented. The following is the code for reading frames after the Pi Camera has been initialized:
for img in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
frame = img.array
rawCapture.truncate(0)
It is important to truncate the rawCapture
buffer or else it will not receive any more frames.
Now with OpenCV viewing functionality:
for img in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
frame = img.array
cv2.imshow('frame', frame)
rawCapture.truncate(0)
if k == 0xFF & ord("q"):
break
cv2.destroyAllWindows()
camera.close()
In addition to the bgr
format, picamera
also supports other formats.
jpeg
png
gif
bmp
yuv
rgb
rgba
bgr
bgra
However as we are working with OpenCV, bgr
will likely be the format used.
Occasionally in certain use cases, the output framer-ate would significantly stutter, dropping to as low as 2 FPS where one would expect 24 or 30 FPS. On possible solution is to simply flush the frame buffer. In most cases, there will be no noticeable hit to performance if the actual camera frame-rate is high (24 or higher) and the buffer is set low (5 or so frames).
for img in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
for i in range(5): # Clears the 5 frame buffer
frame = img.array
frame = img.array
rawCapture.truncate(0)
The following are the scripts available in the sample code/template code folder:
pi_camera_single_frame.py
: Grabs a single pi camera frame and displays it. Can be used to test if the pi camera even workspi_camera_video.py
: Grabs video output from the pi camera. Has various options for the camera as well as the option to flush the frame buffer. Also allows you to capture the active frame by pressingc
.
Drawing functions are basically the bread and butter of OpenCV. It allows you to highlight important information for client as well as do debugging. Detailed documentation of the drawing functions is available here.
Rectangle functions can be used to draw bounding boxes, region of interest, etc.
# Syntax
cv2.rectangle(image, pt1, pt2, color, thickness)
# Example
cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 1)
This draws a rectangle from x1
, y1
coordinates to x2
, y2
coordinates. The tuple after defines the color of the rectangle in bgr format. The last number refers to the thickness.
Text can be used to draw all sorts of useful information such as class names, confidence levels or even instructions to the user on how to use the utility.
# Syntax
cv2.putText(image, text, pt, font, fontScale, color, thickness)
# Example
cv2.putText(frame, "hi", (x, y), cv2.FONT_HERSHEY_PLAIN, fontScale=2.0, color=(255, 0, 0), thickness=2)
Pretty straightforward syntax-wise. It basically tries to put the text (in this case 'hi') on the frame at the coordinates x
and y
with the font FONT_HERSHEY_PLAIN
with the scale of 2.0
, blue color and the line thickness of 2 pixels.
There are other font options for OpenCV. Usually the font is declared at the top of the program like so:
font = cv2.FONT_HERSHEY_PLAIN
cv2.putText(frame, "hi", (x, y), font, fontScale=2.0, color=(255, 0, 0), thickness=2)
Other font options include the following:
FONT_HERSHEY_SIMPLEX
FONT_HERSHEY_PLAIN
FONT_HERSHEY_DUPLEX
FONT_HERSHEY_COMPLEX
FONT_HERSHEY_TRIPLEX
FONT_HERSHEY_COMPLEX_SMALL
FONT_HERSHEY_SCRIPT_COMPLEX
FONT_HERSHEY_SCRIPT_SIMPLEX
Here is an image preview of what the fonts look like (image from [codeyarns]()):
It is common to put text in the edges of the image (such as FPS counters). Due to this, the coordinates of the text can be dynamically obtained using the .shape
attribute. Examples below:
cv2.putText(frame, fps, (10, (frame.shape[0] - 20)), cv2.FONT_HERSHEY_PLAIN, fontScale=2.0, color=(0, 0, 255), thickness=2) # fps counter placed relative to the frame's height
Take note that text is drawn with the top left of the text at the specified coordinates, thus the need to subtract 20px from the height of the image.
Reminder: Origin of the image is in the top left corner of the image
Lines are another common entity used in image processing outputs. It can be used to visualize the ROI or even visualize velocities and trends of tracked objects.
# Syntax
cv2.line(image, (pt1), (p2), color, thickness)
# Example
cv2.line(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
Similar to a rectangle except that it draws a straight line between the 2 points being referenced.
Circles are primarily used to draw houghcircles or dots (by drawing a tiny circle). Dots can be used to mark the center of bounding boxes for tracking or other purposes.
# Syntax
cv2.circle(image, pt, radius, color, thickness)
# Example
cv2.circle(frame, (x, y), 5, (255, 0, 0), 2)
There are other drawing functions but as they are not as commonly used, they will not be discussed within the scope of this document. Full drawing documentation available here.
Hough lines is a line detection method that works as an additional layer after a canny edge filter. This allows straight edges in an image to be detected.
It is a rather expensive algorithm to run and is susceptible to noise, thus the input image has to have a clean signal and not have too many possible lines unless real-time performance is not a requirement.
Full Hough Lines Transform theory and documentation here.
The following is a Hough Line Transform used to detect a grid and subsequently a lit square in the grid. Notice how the grid can still be properly detected despite it being non-uniform in spacing. This is possible when hough line transform is used instead of template matching.
import cv2
import numpy as np
img = cv2.imread('test.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) # convert the image to gray and use a canny edge detector
edges = cv2.Canny(gray,50,150,apertureSize = 3)
lines = cv2.HoughLines(edges,1,np.pi/180,200)
for rho,theta in lines[0]:
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))
cv2.line(img,(x1,y1),(x2,y2),(0,0,255),2)
cv2.imshow("detected lines", img)
Notice that the Hough Lines detector actually only gives the rho and theta of the lines. This means that it is possible to generate what type of line or even the equation of the lines from the given output. This information will be of use in the arrow detection implementation as outlined in the Miscellaneous section.
Like Hough Lines Transform, Hough Circles Transform simply attempts to find circles in the environment.
Hough Circles Transform can be quite unreliable and susceptible to noise. Additionally it requires the circles to be circular when viewed head on (as is it's output). It is incapable of detecting warped circles or even circles with obscured parts (or parts out of the frame). This means that it can be a rather unreliable algorithm for certain use cases.
See the circumcenter detection section under miscellaneous for a possible algorithm that might fit your use case.
This demo is taken from the OpenCV docs.
import numpy as np
import cv2 as cv
img = cv.imread('test.jpg',0)
img = cv.medianBlur(img,5)
cimg = cv.cvtColor(img,cv.COLOR_GRAY2BGR)
circles = cv.HoughCircles(img,cv.HOUGH_GRADIENT,1,20, param1=50,param2=30,minRadius=0,maxRadius=0)
circles = np.uint16(np.around(circles))
for i in circles[0,:]:
# draw the outer circle
cv.circle(cimg,(i[0],i[1]),i[2],(0,255,0),2)
# draw the center of the circle
cv.circle(cimg,(i[0],i[1]),2,(0,0,255),3)
cv.imshow('detected circles',cimg)
cv.waitKey(0)
cv.destroyAllWindows()
The result is as seen below:
Morphology transforms are very useful in image processing to bring out certain features or reducing unwanted noise to create a stronger signal.
Morphology transforms can only be be applied to grayscale images. One way to get a grayscale image is to simple convert to grayscale.
This can be done using the .cvtColor()
function:
gray_image = cv2.cvtColor(raw_img, cv2.COLOR_BGR2GRAY)
The other more common way to get a gray image is to simply do some form of thresholding to clean up the signal in the first place. By doing so, the image becomes cleaner as noise is filtered and a grayscale image is obtained automatically.
Here is a simple example of how to apply a binary threshold:
ret, thresh_img = cv2.threshold(raw_img,10,255,cv2.THRESH_BINARY)
ret
is simply the return (True
/ False
) while thresh_img
(or more commonly simply left as th
) is the thresholded single-channel image.
Morphology Transforms first require the selection of a kernel. There are various kernels available. In the case of the arrow, a large cross kernel was used and because of that, you can see that the middles of the arrow have been removed.
Selection of the right kernel and size can be tricky but it is necessary process as each use case is different. Start with a small kernel and then increase the size if the results look promising.
Some example kernels:
kernel = np.ones((5,5),np.uint8) # Creates a 5x5 kernel of 1s
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(19,19)) # 19x19 cross kernel
Documentation on Kernels here.
Warning: Do note that kernels have to be odd-number in size
Eroding an image is not the same as inverting an image and then dilating the image. Nor is dilating an image the same as inverting an image and then eroding the image. Due to this, it is prudent to try inverting an image to see if the desired features can be brought out better than the original image.
Inverting an image is relatively simple - just do a bitwise not operation on itself.
inverted_img = cv2.bitwise_not(raw_img, raw_img)
The OpenCV functions .erode()
and .dilate()
are the simplest of the morphology transforms. These 2 morphological operators are especially useful when trying to create more features in otherwise simple shapes with little lighting changes.
For an illustration, we can use this image of an i
:
Applying dilation:
Inverting the image and applying dilation:
Erosion of the original image:
Inverting the image and applying erosion:
If you'd like to know more about the theory, feel free to head over to the official docs.
Eroding or Dilating an image is rather simple, simply call .erode()
with the frame and the kernel:
eroded_frame = cv2.erode(image, kernel)
dilated_frame = cv2.dilate(image, kernel)
Black Hat and Top Hat are more advanced morphological transforms. In a Black hat transform, the image is dilated and then eroded before being compared to the original image. In a top hat transform the image is eroded and then dilated before being compared to the original image. More detailed information can be found here.
Instead of talking about the theory, here is a practical example a cross-kernel being used to do a blackhat/tophat transform on an arrow to detect key features:
By comparing the eroded features, we can see if 2 of the points are on the left or right of the centroid to determine the direction of the arrow.
Black Hat and Top hat require a kernel and grayscale image as well, however the method to call them is different as they are part of the .morphologyEx
function.
tophat = cv2.morphologyEx(raw_img, cv2.MORPH_TOPHAT, kernel) # Top Hat transform
blackhat = cv2.morphologyEx(raw_img, cv2.MORPH_BLACKHAT, kernel) # Black Hat transform
It makes no sense to keep talking about the theory when a quick demonstration will allow you to understand how these tools can be used to emphasize features you might need. Thus the demo script should be a good start in trying to find out if any of these morphological transforms are useful for you.
Feel free to find the demo script morph_transform_demo.py
. The options available are as follows:
KERNEL
: define the kernel to be usedINVERT
: Sets whether the image should be invertedTHRESH
: Sets whether to obtain a grayscale image via thresholding or by color conversion to grayscale
Centroid detection is a method to find the "center of mass" of a detected blob. This can be useful when trying to determine the center of a uniform object or used to track the position or velocity of an object that does not have a fixed form (relative to the camera or otherwise).
Centroid detection starts with Contour detection:
img2, contours, hierarchy = cv2.findContours(img,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
The most important return of interest are the contours found in contour
It is then possible to iterate through each contour and find the centroid. It is also possible to filter the contours such that only the largest contour is processed.
for c in contours:
if cv2.contourArea(c) > minimumArea:
# calculate moments for each contour
M = cv2.moments(c)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
cv2.circle(frame, (cX, cY), 5, (255, 100, 255), -1)
cv2.putText(frame, "centroid", (cX - 25, cY - 25),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
cv2.drawContours(frame, [c], -1, (255, 100, 255), 2)
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
rX = int(x+(w/2))
rY = int(y+(h/2))
cv2.circle(frame, (rX, rY), 5, (0,255,255), -1)
In the example, a rectangle was used to enclosed the contour. However a circle could be used if the objects within the use case are spherical in nature.
OpenCV allows for mouse events on an active OpenCV window to do a callback to a specific function. This can be used to allow users to set ROI, select pixels or just add additional functionality.
As a side note, it's possible to make an entire GUI in OpenCV using rectangles and putText as well as checking mouse click locations - but please don't.
To begin, the callback function has to be bound to a specific OpenCV named window. This can be done by defining the name of the named window before hand and setting the mouse callback.
# Defines some function
def some_function(event):
pass
cv2.namedWindows('test') # Declares a named wwindow called 'test'
# Attach some_function as a callback for the window 'test'
cv2.setMouseCallback('test', some_function)
# Actually start the window 'test'
cv2.imshow('test', frame)
Within the callback function itself, it takes an argument event
which is the specific mouse event passed onto the function when the callback is invoked. the most basic events needed for most functionality is mouse left button down and mouse left button up.
Note: It is important to check that the mouse button has been raised or else the functionality enclosed within the mouse button down will take place forever.
The example scripts that use mouse events are the video_crop.py
and roi_selection_demo.py
scripts. Both of these use mouse events to allow the user to define a specific region of interest.
Scale-Invariant Feature Transform (SIFT) is a template matching algorithm. The issue with their implementation is that template matching tend to be expensive and only work well in a specific set of circumstances.
SIFT utilizes feature recognition to match an object in the image to a template that it has as reference.
One very common issue with example code on the net for SIFT is that they are often outdated for OpenCV 3. Another possibility is that the programmer might find the module nonexistent.
For a start, SIFT and SURF are both under the opencv-contrib library which is separate from OpenCV itself. Be sure to check that the install/build method that is reference actually clones the opencv-contrib repo and references it during the OpenCV build process.
Additionally, the syntax for SIFT and SURF is a bit different in OpenCV 3 as compared to prior versions.
OpenCV Version < 3:
import cv2
detector = cv2.FeatureDetector_create("SIFT")
OpenCV Version 3 and up:
import cv2
detector = cv2.xfeatures2d.SIFT_create(1000)
Hopefully if you already know how to use SIFT and SURF, this has resolved the number 1 issue you've had going into this section.
frame = cv2.imread('test.png', 0) # read the test image in gray
ref_img = cv2.imread('ref.png',0) # read the reference image in gray
# Initiate SIFT detector
sift1 = cv2.xfeatures2d.SIFT_create(1000)
sift2 = cv2.xfeatures2d.SIFT_create(1000)
# Run feature detection on both images
kp1, des1 = sift1.detectAndCompute(frame,None)
kp2, des2 = sift2.detectAndCompute(ref_img,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 2)
search_params = dict(checks=100) # or pass empty dictionary
flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
# Need to draw only good matches, so create a mask
matchesMask = [[0,0] for i in xrange(len(matches))]
# ratio test as per Lowe's paper
matched_f = 0
for i,(m,n) in enumerate(matches):
if m.distance < 0.62*n.distance:
matched_f += 1
matchesMask[i]=[1,0]
draw_params = dict(matchColor = (0,255,0),
singlePointColor = (255,0,0),
matchesMask = matchesMask,
flags = 0)
final_img = cv2.drawMatchesKnn(img_rec_red2,kp1,img2,kp2,matches,None,**draw_params)
cv2.imshow('image',final_img)
So what was all that? Basically, 2 sift objects were created to find features in both the reference and the actual image. Then the points were compared to see if enough features were similar. The output is then shown.
Of course the theory is more complex but the idea here is to show how it works and provide example code for you to get going.
Template matching is a pretty powerful tool, but there is a reason why I have left it to one of the last few sections. This is because template matching should not be used as a crutch. It is slow, unreliable and generally gets affected by too much noisy in complex environments. Where possible, you should always aim to use more efficient or generic algorithms instead of template matching - it should be a last resort.
At times, the use case will call for a fish-eye camera or even simply a lens that has an FOV beyond 80 degrees - causing a slightly distortion effect. While this should not affect the more robust or general algorithms, it can cause issues with tracking or detection models that are more sensitive to object size, shape and the like. Especially when it comes to extreme fish-eye lenses where the same object can appear to warp and stretch differently at different parts of the image.
To circumvent this, it is possible to actually map each pixel in the image to another location such that image becomes undistorted. This would obviously result in the edges of the image to have all sorts of artifacts but for the most part the corrected image will be able to be used for image processing purposes.
To obtain the this "map" or more correctly known as a camera calibration matrix, the camera has to be calibrated using a grid of black and white (checkerboard pattern). This is because the spacing and "groundtruth" of a checkerboard pattern can be easily created in OpenCV (to the point that there is a function to generate this groundtruth). By comparing the groundtruth and the actual checkerboard pattern detected in the image, the camera calibration matrix can be generated.
The checkerboard calibration involves taking many images of the checkerboard at various locations within the frame. The images are then globbed and converted to grayscale before .findChessboardCorners
is invoked to find the checkerboard corners.
ret, corners = cv.findChessboardCorners(gray, (7,6), None)
In this case (7,9)
means that the checkerboard is 7x9 in size.
The camera mtx
is then obtained by using the corners from all detected checkerboards:
ret, mtx, dist, rvecs, tvecs = cv.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
mtx
in this case refers to the camera calibration matrix.
With the mtx
and dist
matrices, the optimal camera matrix can be generated and invoked by .undistort
to yield the undistorted image.
newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (width,height), 1, (width,height))
undistort = cv2.undistort(frame, mtx, dist, None, newcameramtx)
2 demo scripts are available for trying out the OpenCV undistort functionality.
- Print out the checkerboard pattern from mrpt.org or whichever checkerboard pattern desired
- Use an image capture script to capture images of the checkerboard pattern at different locations in the frame but perpendicular to the camera lens (capture at least 30 images)
- Save these images within a folder and then run
checkerboard_calib.py
. Theret, mtx, dist, rvecs, tvecs
will be printed in the console. These value will be important - Enter the values into the
camera_undistort_demo.py
and run the script
Aruco markers are little markers that look like QR codes. They are very robust and thus where possible, they can be used as the preferred method of object detection or even localization (it is possible to estimate pose from Aruco markers).
First the OpenCV aruco dictionary is initialized based on aruco size.
You can generate aruco markers of appropriate sizes at http://chev.me/arucogen/.
import cv2.aruco as aruco
aruco_dict = aruco.Dictionary_get(aruco.DICT_4X4_250) # initializing a dictionary for 4x4 aruco markers
Then the image is converted to gray and the detection subroutine is called.
parameters = aruco.DetectorParameters_create()
corners, ids, rejectedImgPoints = aruco.detectMarkers(gray, aruco_dict, parameters=parameters)
The aruco_detector.py
script is available and implemented as a class to allow easy import and integration into other code. Run it as a standalone script with a local file named test.jpg
to test the code.
This allows the display loop to be set at a specific desired FPS.
desired_FPS = 30
img_ms = int(1000 / desired_FPS)
if cv2.waitKey(img_ms) & 0xFF == ord('q'):
break
video_cropper.py
is a script that allows the user to set an ROI and then output a cropped version of the input video.
Simply define the input and output paths for the videos and then launch the utility.
Use the mouse to select the ROI:
- Press
r
to reset the ROI - Press
c
to confirm the ROI an begin the crop process
Within the example code is the roi_selection_demo.py
script. This script allows the user to select 4 points to form an ROI. In the demo, the points are only printed in the console as output after being 'saved'. But that segment of the code can be modified as necessary.
The main considerations are to allow the user to select ROI beyond the scope of the image so as to allow them to select the shape they really want. Additionally, if the image is too large, it will be scaled down accordingly so that the user can actually see the whole image on a normal monitor without investing in 4K.
The key takeaways are that:
- OpenCV accepts negative coordinates and will compute them properly relative to the origin (top left of the image)
- Images and coordinates can be scaled up or down with almost no loss in accuracy.
An advanced version of this code was used as part of an ROI selection utility during my internship stint.
This works in conjunction with the ROI selection utility in the previous section as the user is likely to have selected a weird ROI.
The use case is as follows:
- The ROI is made up of a bunch of points that do not form a nice rectangle
- The script has to reject bounding boxes beyond the ROI
The approach uses contours and compares the center of bounding boxes to the contour.
roi_coordinates = [(54, 500), (64, 3), (995, 678)]
ctr = np.array(cfg_read.roi_coordinates).reshape((-1,1,2)).astype(np.int32) # Creates a contour based on the ROI coordinates
cv2.drawContours(frame, [ctr], 0, (0, 255, 255),1) # Draws the contour on the frame (optional)
To compare a specific bounding box with format [x, y, w, h]
x = bbox[0]
w = bbox[2]
y = bbox[1]
h = bbox[3]
dist = cv2.pointPolygonTest(ctr, (int(x + w/2), int(y + h/2)), True)
if dist >= 0:
# If the bounding box is within the ROI
# Draw the box as a green rectangle
cv2.rectangle(frame, (x, y), ((x + w), (y + h)), (0,255,0), 3)
else:
# If the bounding box is not within the ROI
# Draw the box as a red rectangle
cv2.rectangle(frame, (x, y), ((x + w), (y + h)), (0,0,255),
When working if Neural Network models and the like, the output is usually a list of bounding boxes. Generally, the bounding boxes can be of 2 types of formats:
[x, y, w, h]
[x1, y1, x2, y2]
By my own conventions, I refer to them as convention 0
and convention 1
respectively.
To allow swapping of models within the code, it would be good to take into account the 2 possible conventions.
Functions to draw bounding boxes for both types of conventions can be found in draw_bounding_boxes.py
.
This is an alternative algorithm to Hough Circles Transform if you use case is as follows:
- There is only 1 circle to be detected
- The signal can be isolated such that only parts of the circle are detected and nothing else
- Parts of the circle may be obscured but the center of the circle still has to be located
This algorithm takes 3 points from what it assumes to be parts of the circle (this is why a clean signal of the circle and nothing but the circle is required). It sorts all possible candidate points and splits the 'circle' into 3 sections - top, middle, bottom and chooses one random point in each section.
Then using the circumcenter formula, it attempts to compute the center of the circle.
The demo implementation can be found in the sample code as circumcenter_demo.py
. It requires sympy
to run. It also assumes that you have a grayscale image with a strong signal to detect with.
To determine the direction of an arrow, template matching or other methods could be used. However one approach could see the use of Hough Lines Transform.
for rho,theta in line:
if theta > 0.698 and theta < 1.047:
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))
x_cood = int(-(rY - (rho/b)) / (a/b))
print(x_cood,rY)
cv2.line(frame,(x1,y1),(x2,y2),(0,255,0),2)
cv2.circle(frame, (x_cood, rY), 5, (0,255,255), -1)
elif theta > 2.269 and theta < 2.443:
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))
x_cood = int(-(rY - (rho/b)) / (a/b))
print(x_cood,rY)
cv2.line(frame,(x1,y1),(x2,y2),(0,0,255),2)
cv2.circle(frame, (x_cood, rY), 5, (0,255,255), -1)
By checking rho
and theta
, the lines forming the arrow head can be categorized. rY
is the y coordinate of the centroid. the value of rY
is then substituted into the equation of the line to get the x_cood
of the point had it followed the line.
By comparing the x coordinate of rY
on the line and the tru x coordinate of the centroid of the arrow. It is possible to determine if the arrow was pointed to the left or right.
Unfortunately the arrow test script is an unedited full script of a drone flight test script. Feel free to reference it however - I will edit it in the future to isolate a demo snippet if I have the time.
This is a method of color filtering that does not use hue and saturation, instead opting to work directly on the image layers using numpy.
The use case is highly specific and the filtering method itself works extremely well within a controlled condition but is not robust enough to adapt itself to changing lighting or situations. Thus it is not recommended in most cases - it is however, very accurate when placed within the specific set of circumstances it has been calibrated for.
To illustrate the legitimacy of this algorithm, here is an image of a drone flying into a red-lit hoop running on this filtering method:
To reaffirm, this is a method used when detecting Red (or the other 2 base colors, Blue and Green). It will give a strong signal when used to evaluate base colors but is unlikely to work well with any other colors.
# Sets the percentage red threshold, below which a pixel will be eliminated from the final result
PERC_RED_THRESH = 0.3
ret, frame = cap.read()
# Gets the individual color channels
# The format assumed here is bgr
b_channel = np.array(frame[:,:,0]).astype('float')
g_channel = np.array(frame[:,:,1]).astype('float')
r_channel = np.array(frame[:,:,2]).astype('float')
# Create a base channel where all values are added together
bgr_channel = np.add((np.add(b_channel, g_channel)), r_channel)
# Take the average of the blue and green channels and subtract it from the red channel
# This will eliminate white color from the final image
reduce_white = np.subtract(r_channel,((b_channel + g_channel)/ 2))
# Find the percentage red by dividing the previous channel by 255
perc_red = np.divide(reduce_white,255)
# Eliminate all pixels that fail to meet the minimum percentage red using numpy conditional array value setting
perc_red[perc_red < PERC_RED_THRESH] = 0
# Set the values back to within a range of 255 and set the type back to uint8 so that the frame becomes valid for OpenCV processing
final_image = perc_red * 255
final_image = np.floor(final_image).astype('uint8')
# Run another round of openCV thresholding as required
ret, th = cv2.threshold(final_image,10,255,cv2.THRESH_BINARY)
If you would like to try this approach for your use case, feel free to run the script red_thresholding_demo.py
in the example code.
Fair Warning: Much tuning is required to produce good results
The following instructions are from pyimagesearch and works for Ubuntu 16.04 and 18.04.
-
Update and upgrade all existing packages
$ sudo apt-get update $ sudo apt-get upgrade
-
Install cmake developer tools
sudo apt-get install build-essential cmake unzip pkg-config
-
Install video and image dependencies
$ sudo apt-get install libjpeg-dev libpng-dev libtiff-dev $ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev $ sudo apt-get install libxvidcore-dev libx264-dev
-
Install gtk for GUI backend
$ sudo apt-get install libgtk-3-dev
-
Install atlas and gfortran for mathematical optimizations
$ sudo apt-get install libatlas-base-dev gfortran
-
Install python 3 development headers
$ sudo apt-get install python3-dev
-
Download OpenCV 4 to the home directory
$ cd ~ $ wget -O opencv.zip https://github.com/opencv/opencv/archive/4.0.0.zip $ wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.0.0.zip
-
Unzip the archives
$ unzip opencv.zip $ unzip opencv_contrib.zip
-
Install
numpy
$ pip install numpy
-
Create a build folder
$ cd ~/opencv $ mkdir build $ cd build
-
Run CMake
$ cmake -D CMAKE_BUILD_TYPE=RELEASE \ -D CMAKE_INSTALL_PREFIX=/usr/local \ -D INSTALL_PYTHON_EXAMPLES=ON \ -D INSTALL_C_EXAMPLES=OFF \ -D OPENCV_ENABLE_NONFREE=ON \ -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \ -D PYTHON_EXECUTABLE=~/.virtualenvs/cv/bin/python \ -D BUILD_EXAMPLES=ON ..
-
Compile OpenCV
$ make
You can run
make -j4
to use multiple cores to make faster but I tend to just wait it out because it is less error prone to compile with noj
flag. Especially on low-end devices. -
Install OpenCV
$ sudo make install $ sudo ldconfig
It is also possible to install OpenCV in a virtual environment but I tend not to do that as most of my devices are locked down into a specific use case. Thus it is not necessary for me to use a virtual environment.
The following are instructions to build an optimized version of OpenCV on Raspberyy Pi to get a few more frames out of your algorithms. Instructions taken from pyimagesearch.
-
Expand your filesystem if needed. This can be done by running:
$ sudo raspi-config
Reboot your system afterwards to ensure the changes take effect.
-
Get more space by removing unnecessary pre-packaged software
$ sudo apt-get purge wolfram-engine $ sudo apt-get purge libreoffice* $ sudo apt-get clean $ sudo apt-get autoremove
-
Install all the dependencies required for the optimizations:
$ sudo apt-get update && sudo apt-get upgrade $ sudo apt-get install build-essential cmake pkg-config $ sudo apt-get install libjpeg-dev libtiff5-dev libjasper-dev libpng12-dev $ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev $ sudo apt-get install libxvidcore-dev libx264-dev $ sudo apt-get install libgtk2.0-dev libgtk-3-dev $ sudo apt-get install libcanberra-gtk* $ sudo apt-get install libatlas-base-dev gfortran $ sudo apt-get install python2.7-dev python3-dev
-
Download OpenCV source (in this example OpenCV 3.3.0 is used but any version should work):
$ cd ~ $ wget -O opencv.zip https://github.com/opencv/opencv/archive/3.3.0.zip $ unzip opencv.zip $ wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/3.3.0.zip $ unzip opencv_contrib.zip
-
Install numpy:
$ pip install numpy
-
Create a build directory and CMake OpenCV:
$ cd ~/opencv-3.3.0/ $ mkdir build $ cd build $ cmake -D CMAKE_BUILD_TYPE=RELEASE \ -D CMAKE_INSTALL_PREFIX=/usr/local \ -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib-3.3.0/modules \ -D ENABLE_NEON=ON \ -D ENABLE_VFPV3=ON \ -D BUILD_TESTS=OFF \ -D INSTALL_PYTHON_EXAMPLES=OFF \ -D BUILD_EXAMPLES=OFF ..
-
Increase the swap size by opening
/etc/dphys-swapfile
. This can be done using nano or vi:$ sudo nano /etc/dphys-swapfile # nano $ sudo vi /etc/dphys-swapfile # vi
Then modify the
CONF_SWAPSIZE
variable:# set size to absolute value, leaving empty (default) then uses computed value # you most likely don't want this, unless you have an special disk situation # CONF_SWAPSIZE=100 CONF_SWAPSIZE=1024
Exit the file and saving:
- Nano:
ctrl
+o
,ctrl
+x
- Vi:
Esc
then type::wq
- Nano:
-
Restart the swap service:
$ sudo /etc/init.d/dphys-swapfile stop $ sudo /etc/init.d/dphys-swapfile start
-
Make OpenCV
$ make
You can run
make -j4
to use multiple cores to make faster but I tend to just wait it out because it is less error prone to compile with noj
flag. Especially on low-end devices. -
Install OpenCV
$ sudo make install $ sudo ldconfig
-
Repeat steps 7 and 8 but set
CONF_SWAPSIZE
back to 100mb.