Images are just 2D arrays.
Specifically objects of numpy arrays.
Images are made of pixels - individual cells of the array
Terms like VGA, HD, FulHD, 4k defined the size of image in pixels.
Videos are nothing just images flashed multiple time a second.
Binary Image : Image whose cells only have two values typically zero and one.
Usually black is denoted by 0 and white with 255, each image size being 8 bit long.
Human eyes has three types of photoreceptor cells for color, kind of three types of sensors for perception and each sensor responds to RGB individually.
Birds like gannets determine the time to collision and not distance.
humans have telescopic visual system.
Primary components for cameras.
- Lens
- Imaging chip
Changing the focal length of the lens results in blurring of the image.
Increase the focal length and hhe felid of view decreases and the object magnifies, decrease the focal length and the field of view increases reducing the object size.
Image plane is situated at the focal length of the camera.
Intrinsic camera parameters
Intrinsic camera parameters refer to the internal characteristics and properties of a camera that are essential for capturing images.
-
Focal Length (f): The focal length is the distance between the lens and the image sensor when the lens is focused at infinity. It determines the magnification and field of view of the camera.
-
Principal Point (or Optical Center): The principal point is the point where the optical axis intersects the image plane. It represents the center of the image and is crucial for correcting lens distortion.
-
Pixel Aspect Ratio: This parameter describes the ratio of the width of a pixel to its height. It is necessary for accurately mapping pixel coordinates to physical dimensions.
= Lens Distortion Parameters: Distortion can occur due to imperfections in the camera lens. Common distortion types include radial distortion (caused by the curvature of the lens) and tangential distortion (resulting from the lens not being perfectly parallel to the image plane). Distortion parameters are used to correct these distortions.
where x, y, z is point in image world (coordinates of object in the image) and x', y', z' is point in optical world (coordinates of the object wrt camera).
ax, ay is the pixel scaling factor.
px, py is the principle point, where optical axis hits the image plane (ideally center)
s is the skew factor when image plan is not perpendicular to ideal image plane.
f is the focal length of the camera
This matrix is also know as calibration matrix.
Computation of Intrinsic parameters using Vanishing points
The orientation of horizon tells how much the camera is tilted.
Find three vanishing points in the image (vertical as well as horizontal) the ortho-center of the triangle formed by these three vanishing point is the optical center of the image.
Projective Transformation
When the perspective of the image plane changed.
A point in the world plane can be presented in the image plane after performing homography transformation. First column of the homography matrix represents the vanishing point in x direction of world plane and second column represents the vanishing point in y direction.
Cross Ratio
When a image is taken a segment in the world a is projected in the image, the center of the segment is not preserved but the cross ratio of the segment is preserved.
To interpret the position of the camera by observing the orientation of objects within the image.
Single View Measurement
Unwrap the image to do measurements.
Vanishing Point
Wiki - A vanishing point is a point on the image plane of a perspective rendering where the two-dimensional perspective projections of mutually parallel lines in three-dimensional space appear to converge.
Vanishing points lie on horizon.
An image can have multiple vanishing point.
Any two parallel lines have same vanishing point.
Height of the horizon in the image plane can used to deduce the height of the camera lifted in ground frame.
Geometric intuition
Homogenous coordinates
A point in the image is a ray in projective space. Association of a ray to a point. The origin is considered as the camera itself anf the image plane along the z-axis at unit distance.
A line in the image plane can be considered as plane in the projective space.
Axis convention:
(0,0)
--------→
|
|
|
↓
Color Spaces:
- RGB => Red, Green, Blue
- BGR => Blue, Green, Red
- HSV => Hue, Saturation, Value
Operations:
- Convert to greyscale
- Blur : Replace a pixel by the average of neighboring pixels
- Edge detection : Canny Algorithm
- Dilation : Enlarge regions to make feature more prominent
- Erosion : Reduce features
- Thresholding
- Bitwise => AND, OR, NOT, XOR
- Masking
Transformations:
- Translations
- Rotations
- Flip
- Crop
- Resize
Objects:
- Generation of Image
- Lines
- Rectangles
- Circles
- Text
Warp Perspective
Change the perspective of image
Stack Images
Color Detection
A good practice to convert RGB mode to HSV as HSV also takes lighting conditions into account.
Create a mask and filter out the colors using bitwise AND operation not wanted, that simple.
Contour Detection
Convert to GrayScale. To simply the process
Apply Blur if necessary to reduce noise
Apply edge detection
Or simply use findContours method.
Shapes can also be detected using contours.
Bounding Boxes are added to enclose the contours and display the object.
Histogram
Distribution of pixel intensity in the image.
Type of barcode like markers used for mostly calibration of camera and reference points for tracking and recognizing objects or positions in the real world.
ArUco markers are based on Hamming code.
In the grid, the first, third and fifth columns represent parity bits. The second and fourth columns represent the data bits. Hence, there are ten total data bits.
A predefined dictionary is used for detection and generation of markers.
The dictionaries follow a specific naming convention NxN_M
Where NxN defines the size of the marker in terms of gid cells and also bit size of information it contains so A 5x5 marker hsa 5x5 grd cell and contains 25 bit info, each cell represents a single bit
M represents the total unique markers will be generate.
Each pattern within the dictionary has a unique ID associated.
Requirements:
Python version: 3.10.12
Numpy version: 1.21.5
OpenV version: 4.6.0
Numpy path: /home/aditya/.local/lib/python3.10/site-packages/numpy/__init__.py
OpenV path: /home/aditya/.local/lib/python3.10/site-packages/cv2/__init__.py
pip3 install opencv-contrib-python==4.6.0.66
Clone the repository.
git clone git@github.com:maker-ATOM/Computer-Vision-for-Robotics.git
Move to root of directory.
cd <path_to_cloned_directory>/fundamentals
Execute the script
python3 read_data.py
If in any case
pip3 uninstall opencv-contrib-python
pip3 uninstall opencv-python
sudo apt-get remove python3-numpy
Aruco Marker Generation site
https://chev.me/arucogen/