History and development of OpenCV
The library was created in 1999 by Intel Research scientists under the leadership of Gary Bradsky. Initially the project was intended to accelerate research in computer vision and provide common tools for developers. The first public release appeared in 2000, and in 2008 the project became fully open‑source.
Key development milestones:
- 1999‑2005: Development at Intel Research, focus on performance
- 2006‑2012: Transition to an open development model, creation of OpenCV 2.x
- 2013‑2015: Release of OpenCV 3.x with major architectural changes
- 2018‑present: OpenCV 4.x with enhanced deep‑learning support
Today OpenCV is maintained by the OpenCV Foundation and an active community of developers worldwide.
Architecture and modules of OpenCV
OpenCV is built on a modular principle, which provides flexibility and the ability to use only the required components:
Main modules:
Core – basic data structures and algorithms Imgproc – image processing and filtering Imgcodecs – image encoding and decoding Videoio – video and camera handling Highgui – user interface Features2d – feature detectors and descriptors Calib3d – camera calibration and 3‑D reconstruction Objdetect – object detection DNN – deep‑learning module ML – classic machine‑learning algorithms
System requirements and supported platforms
OpenCV supports a wide range of operating systems and architectures:
Operating systems:
- Windows (7, 8, 10, 11)
- Linux (Ubuntu, CentOS, Debian and other distributions)
- macOS
- Android
- iOS
Programming languages:
- C++
- Python
- Java
- C#
- JavaScript (OpenCV.js)
Hardware accelerators:
- CUDA (NVIDIA GPU)
- OpenCL
- Intel TBB
- Intel IPP
Installation and configuration of OpenCV
Installation for Python
Basic installation:
pip install opencv-python
Installation with extra modules:
pip install opencv-contrib-python
For video processing you may need:
pip install opencv-python-headless # for server‑side applications
Verification of installation
import cv2
print(cv2.__version__)
print(cv2.getBuildInformation())
IDE setup
For efficient work with OpenCV it is also recommended to install:
pip install numpy matplotlib jupyter
Core data structures of OpenCV
Mat – the primary class for image handling
In OpenCV images are represented as multi‑dimensional arrays of type Mat (in C++) or numpy.ndarray (in Python). Each pixel can contain 1 to 4 channels (grayscale, RGB, RGBA).
Fundamental data types:
- CV_8U – 8‑bit unsigned integers (0‑255)
- CV_8S – 8‑bit signed integers
- CV_16U – 16‑bit unsigned integers
- CV_16S – 16‑bit signed integers
- CV_32S – 32‑bit signed integers
- CV_32F – 32‑bit floating‑point numbers
- CV_64F – 64‑bit floating‑point numbers
Working with images
Loading and saving images
import cv2
# Load an image
image = cv2.imread('image.jpg')
gray_image = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
# Display the image
cv2.imshow('Original', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
# Save the image
cv2.imwrite('output.jpg', image)
cv2.imwrite('output.png', image, [cv2.IMWRITE_PNG_COMPRESSION, 9])
Handling various file formats
OpenCV supports many formats:
- Raster: JPEG, PNG, BMP, TIFF, WebP
- Vector: SVG (limited support)
- Specialized: OpenEXR, JPEG 2000, PFM
Color spaces and conversions
Main color spaces:
BGR – default OpenCV format (Blue, Green, Red) RGB – standard format for most libraries HSV – Hue, Saturation, Value (convenient for color‑based segmentation) LAB – perceptually uniform Lab color space YUV – used in video coding XYZ – CIE 1931 color space
Conversion examples:
# Convert BGR to various color spaces
rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lab_image = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Channel operations
b, g, r = cv2.split(image)
merged = cv2.merge([b, g, r])
Geometric transformations
Resizing and scaling
# Resize while preserving aspect ratio
def resize_with_aspect_ratio(image, width=None, height=None):
h, w = image.shape[:2]
if width is None and height is None:
return image
if width is None:
ratio = height / h
width = int(w * ratio)
else:
ratio = width / w
height = int(h * ratio)
return cv2.resize(image, (width, height))
# Different interpolation methods
resized_nearest = cv2.resize(image, (300, 300), interpolation=cv2.INTER_NEAREST)
resized_linear = cv2.resize(image, (300, 300), interpolation=cv2.INTER_LINEAR)
resized_cubic = cv2.resize(image, (300, 300), interpolation=cv2.INTER_CUBIC)
Rotation and affine transforms
# Rotate an image
def rotate_image(image, angle, center=None, scale=1.0):
h, w = image.shape[:2]
if center is None:
center = (w // 2, h // 2)
rotation_matrix = cv2.getRotationMatrix2D(center, angle, scale)
rotated = cv2.warpAffine(image, rotation_matrix, (w, h))
return rotated
# Affine transformation
src_points = np.float32([[0, 0], [w, 0], [0, h]])
dst_points = np.float32([[0, 0], [w, 0], [100, h]])
affine_matrix = cv2.getAffineTransform(src_points, dst_points)
warped = cv2.warpAffine(image, affine_matrix, (w, h))
Filtering and image processing
Linear filters
# Various blur types
gaussian_blur = cv2.GaussianBlur(image, (15, 15), 0)
box_blur = cv2.blur(image, (15, 15))
median_blur = cv2.medianBlur(image, 15)
# Bilateral filter (preserves edges)
bilateral = cv2.bilateralFilter(image, 9, 75, 75)
# Custom kernels
kernel_sharpen = np.array([[-1, -1, -1],
[-1, 9, -1],
[-1, -1, -1]])
sharpened = cv2.filter2D(image, -1, kernel_sharpen)
Edge detectors
# Canny edge detector
edges = cv2.Canny(gray_image, 50, 150, apertureSize=3)
# Sobel operator
sobel_x = cv2.Sobel(gray_image, cv2.CV_64F, 1, 0, ksize=3)
sobel_y = cv2.Sobel(gray_image, cv2.CV_64F, 0, 1, ksize=3)
sobel_combined = cv2.magnitude(sobel_x, sobel_y)
# Laplacian operator
laplacian = cv2.Laplacian(gray_image, cv2.CV_64F)
Morphological operations
Basic operations
# Create structuring element
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
kernel_ellipse = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
# Core morphological ops
eroded = cv2.erode(binary_image, kernel, iterations=1)
dilated = cv2.dilate(binary_image, kernel, iterations=1)
opened = cv2.morphologyEx(binary_image, cv2.MORPH_OPEN, kernel)
closed = cv2.morphologyEx(binary_image, cv2.MORPH_CLOSE, kernel)
gradient = cv2.morphologyEx(binary_image, cv2.MORPH_GRADIENT, kernel)
Working with contours
Finding and analyzing contours
# Find contours
contours, hierarchy = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Analyze each contour
for contour in contours:
area = cv2.contourArea(contour)
perimeter = cv2.arcLength(contour, True)
# Approximate the contour
epsilon = 0.02 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
# Bounding rectangle
x, y, w, h = cv2.boundingRect(contour)
# Minimum enclosing circle
(x, y), radius = cv2.minEnclosingCircle(contour)
# Minimum area rectangle
rect = cv2.minAreaRect(contour)
box = cv2.boxPoints(rect)
box = np.int0(box)
Working with video
Capturing video from a camera
cap = cv2.VideoCapture(0) # 0 – first camera
# Camera settings
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
cap.set(cv2.CAP_PROP_FPS, 30)
while True:
ret, frame = cap.read()
if not ret:
break
# Process the frame
processed_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cv2.imshow('Video', processed_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Working with video files
# Read a video file
cap = cv2.VideoCapture('video.mp4')
# Retrieve video information
fps = cap.get(cv2.CAP_PROP_FPS)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
duration = frame_count / fps
# Write video
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi', fourcc, fps, (width, height))
while True:
ret, frame = cap.read()
if not ret:
break
# Process the frame
processed_frame = cv2.flip(frame, 1)
out.write(processed_frame)
cap.release()
out.release()
Object detection and recognition
Haar cascades
# Load cascades
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_eye.xml')
# Detect faces
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))
for (x, y, w, h) in faces:
cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)
# Detect eyes inside the face region
roi_gray = gray[y:y+h, x:x+w]
roi_color = image[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex, ey, ew, eh) in eyes:
cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)
Feature detectors
# SIFT detector
sift = cv2.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(gray, None)
# ORB detector
orb = cv2.ORB_create()
keypoints, descriptors = orb.detectAndCompute(gray, None)
# Visualize keypoints
img_with_keypoints = cv2.drawKeypoints(image, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
Deep‑learning module (DNN)
Loading and using pretrained models
# Load a model
net = cv2.dnn.readNetFromONNX('model.onnx')
net = cv2.dnn.readNetFromTensorflow('model.pb')
net = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'model.caffemodel')
# Prepare input blob
blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size=(224, 224), mean=(104, 117, 123))
# Forward pass
net.setInput(blob)
outputs = net.forward()
# Process results
for output in outputs:
for detection in output[0, 0]:
confidence = detection[2]
if confidence > 0.5:
# Handle detection
pass
Camera calibration and 3‑D reconstruction
Camera calibration
# Prepare calibration points
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
objp = np.zeros((6*7, 3), np.float32)
objp[:, :2] = np.mgrid[0:7, 0:6].T.reshape(-1, 2)
objpoints = [] # 3‑D points in real world
imgpoints = [] # 2‑D points in image
# Find chessboard corners
ret, corners = cv2.findChessboardCorners(gray, (7, 6), None)
if ret:
objpoints.append(objp)
corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
imgpoints.append(corners2)
# Calibrate camera
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
# Undistort image
undistorted = cv2.undistort(image, mtx, dist, None, mtx)
Object tracking
Optical flow
# Parameters for corner detection
feature_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)
# Parameters for Lucas‑Kanade optical flow
lk_params = dict(winSize=(15, 15), maxLevel=2, criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))
# Detect features to track
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)
# Compute optical flow
p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)
OpenCV trackers
# Different tracker types
tracker_types = ['BOOSTING', 'MIL', 'KCF', 'TLD', 'MEDIANFLOW', 'GOTURN', 'MOSSE', 'CSRT']
def create_tracker(tracker_type):
if tracker_type == 'BOOSTING':
tracker = cv2.legacy.TrackerBoosting_create()
elif tracker_type == 'MIL':
tracker = cv2.legacy.TrackerMIL_create()
elif tracker_type == 'KCF':
tracker = cv2.legacy.TrackerKCF_create()
elif tracker_type == 'TLD':
tracker = cv2.legacy.TrackerTLD_create()
elif tracker_type == 'MEDIANFLOW':
tracker = cv2.legacy.TrackerMedianFlow_create()
elif tracker_type == 'MOSSE':
tracker = cv2.legacy.TrackerMOSSE_create()
elif tracker_type == 'CSRT':
tracker = cv2.TrackerCSRT_create()
return tracker
# Initialise tracker
tracker = create_tracker('CSRT')
ok = tracker.init(frame, bbox)
# Update tracker
ok, bbox = tracker.update(frame)
Complete table of OpenCV methods and functions
| Category | Function/Method | Description | Usage example |
|---|---|---|---|
| Image I/O | cv2.imread() |
Load an image | img = cv2.imread('image.jpg') |
cv2.imshow() |
Display an image | cv2.imshow('Image', img) |
|
cv2.imwrite() |
Save an image | cv2.imwrite('output.jpg', img) |
|
cv2.waitKey() |
Wait for a key press | cv2.waitKey(0) |
|
cv2.destroyAllWindows() |
Close all windows | cv2.destroyAllWindows() |
|
| Video handling | cv2.VideoCapture() |
Capture video | cap = cv2.VideoCapture(0) |
cv2.VideoWriter() |
Write video | out = cv2.VideoWriter('out.avi', fourcc, fps, size) |
|
cap.read() |
Read a frame | ret, frame = cap.read() |
|
cap.release() |
Release resources | cap.release() |
|
| Color conversion | cv2.cvtColor() |
Convert color space | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) |
cv2.split() |
Split channels | b, g, r = cv2.split(img) |
|
cv2.merge() |
Merge channels | merged = cv2.merge([b, g, r]) |
|
cv2.inRange() |
Create a mask based on color range | mask = cv2.inRange(hsv, lower, upper) |
|
| Geometric transforms | cv2.resize() |
Resize image | resized = cv2.resize(img, (300, 300)) |
cv2.rotate() |
Rotate by 90°/180°/270° | rotated = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE) |
|
cv2.flip() |
Flip image | flipped = cv2.flip(img, 1) |
|
cv2.warpAffine() |
Affine warp | warped = cv2.warpAffine(img, M, (w, h)) |
|
cv2.warpPerspective() |
Perspective warp | warped = cv2.warpPerspective(img, M, (w, h)) |
|
cv2.getRotationMatrix2D() |
Rotation matrix | M = cv2.getRotationMatrix2D(center, angle, scale) |
|
cv2.getAffineTransform() |
Affine matrix | M = cv2.getAffineTransform(src, dst) |
|
cv2.getPerspectiveTransform() |
Perspective matrix | M = cv2.getPerspectiveTransform(src, dst) |
|
| Filtering | cv2.blur() |
Simple blur | blurred = cv2.blur(img, (5, 5)) |
cv2.GaussianBlur() |
Gaussian blur | blurred = cv2.GaussianBlur(img, (5, 5), 0) |
|
cv2.medianBlur() |
Median blur | blurred = cv2.medianBlur(img, 5) |
|
cv2.bilateralFilter() |
Bilateral filter | filtered = cv2.bilateralFilter(img, 9, 75, 75) |
|
cv2.filter2D() |
Convolution with a kernel | filtered = cv2.filter2D(img, -1, kernel) |
|
| Edge detection | cv2.Canny() |
Canny edge detector | edges = cv2.Canny(gray, 50, 150) |
cv2.Sobel() |
Sobel operator | sobel = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3) |
|
cv2.Laplacian() |
Laplacian operator | laplacian = cv2.Laplacian(gray, cv2.CV_64F) |
|
cv2.Scharr() |
Scharr operator | scharr = cv2.Scharr(gray, cv2.CV_64F, 1, 0) |
|
| Thresholding | cv2.threshold() |
Simple binary threshold | ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY) |
cv2.adaptiveThreshold() |
Adaptive binary threshold | thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2) |
|
| Morphology | cv2.erode() |
Erosion | eroded = cv2.erode(img, kernel, iterations=1) |
cv2.dilate() |
Dilation | dilated = cv2.dilate(img, kernel, iterations=1) |
|
cv2.morphologyEx() |
Combined morphological ops | opened = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel) |
|
cv2.getStructuringElement() |
Create structuring element | kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5)) |
|
| Contours | cv2.findContours() |
Find contours | contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) |
cv2.drawContours() |
Draw contours | cv2.drawContours(img, contours, -1, (0, 255, 0), 2) |
|
cv2.contourArea() |
Contour area | area = cv2.contourArea(contour) |
|
cv2.arcLength() |
Contour perimeter | perimeter = cv2.arcLength(contour, True) |
|
cv2.approxPolyDP() |
Contour approximation | approx = cv2.approxPolyDP(contour, epsilon, True) |
|
cv2.boundingRect() |
Bounding rectangle | x, y, w, h = cv2.boundingRect(contour) |
|
cv2.minAreaRect() |
Minimum area rectangle | rect = cv2.minAreaRect(contour) |
|
cv2.minEnclosingCircle() |
Minimum enclosing circle | (x, y), radius = cv2.minEnclosingCircle(contour) |
|
cv2.fitEllipse() |
Ellipse fitting | ellipse = cv2.fitEllipse(contour) |
|
cv2.convexHull() |
Convex hull | hull = cv2.convexHull(contour) |
|
| Drawing | cv2.line() |
Line | cv2.line(img, pt1, pt2, color, thickness) |
cv2.rectangle() |
Rectangle | cv2.rectangle(img, pt1, pt2, color, thickness) |
|
cv2.circle() |
Circle | cv2.circle(img, center, radius, color, thickness) |
|
cv2.ellipse() |
Ellipse | cv2.ellipse(img, center, axes, angle, startAngle, endAngle, color, thickness) |
|
cv2.polylines() |
Polylines | cv2.polylines(img, [pts], isClosed, color, thickness) |
|
cv2.fillPoly() |
Fill polygon | cv2.fillPoly(img, [pts], color) |
|
cv2.putText() |
Text | cv2.putText(img, text, org, font, fontScale, color, thickness) |
|
| Object detection | cv2.CascadeClassifier() |
Cascade classifier | face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') |
detectMultiScale() |
Detect objects | faces = face_cascade.detectMultiScale(gray, 1.1, 5) |
|
cv2.HOGDescriptor() |
HOG descriptor | hog = cv2.HOGDescriptor() |
|
cv2.HOGDescriptor_getDefaultPeopleDetector() |
People detector | hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) |
|
| Feature detection | cv2.SIFT_create() |
SIFT detector | sift = cv2.SIFT_create() |
cv2.ORB_create() |
ORB detector | orb = cv2.ORB_create() |
|
cv2.AKAZE_create() |
AKAZE detector | akaze = cv2.AKAZE_create() |
|
cv2.BRISK_create() |
BRISK detector | brisk = cv2.BRISK_create() |
|
detectAndCompute() |
Detect and compute descriptors | kp, des = sift.detectAndCompute(gray, None) |
|
cv2.drawKeypoints() |
Draw keypoints | img_kp = cv2.drawKeypoints(img, kp, None) |
|
cv2.goodFeaturesToTrack() |
Corner detection | corners = cv2.goodFeaturesToTrack(gray, maxCorners, qualityLevel, minDistance) |
|
| Feature matching | cv2.BFMatcher() |
Brute‑Force matcher | bf = cv2.BFMatcher() |
cv2.FlannBasedMatcher() |
FLANN matcher | flann = cv2.FlannBasedMatcher() |
|
match() |
Match descriptors | matches = bf.match(des1, des2) |
|
knnMatch() |
k‑NN matching | matches = bf.knnMatch(des1, des2, k=2) |
|
cv2.drawMatches() |
Draw matches | img_matches = cv2.drawMatches(img1, kp1, img2, kp2, matches, None) |
|
| Geometric transforms for matching | cv2.findHomography() |
Find homography | H, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0) |
cv2.findFundamentalMat() |
Fundamental matrix | F, mask = cv2.findFundamentalMat(pts1, pts2, cv2.FM_LMEDS) |
|
cv2.findEssentialMat() |
Essential matrix | E, mask = cv2.findEssentialMat(pts1, pts2, focal, pp, cv2.RANSAC, 0.999, 1.0) |
|
| Tracking | cv2.calcOpticalFlowPyrLK() |
Lucas‑Kanade optical flow | p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params) |
cv2.calcOpticalFlowFarneback() |
Farneback optical flow | flow = cv2.calcOpticalFlowFarneback(old_gray, frame_gray, None, 0.5, 3, 15, 3, 5, 1.2, 0) |
|
cv2.TrackerCSRT_create() |
CSRT tracker | tracker = cv2.TrackerCSRT_create() |
|
cv2.TrackerKCF_create() |
KCF tracker | tracker = cv2.TrackerKCF_create() |
|
| Camera calibration | cv2.findChessboardCorners() |
Find chessboard corners | ret, corners = cv2.findChessboardCorners(gray, (9, 6), None) |
cv2.cornerSubPix() |
Refine corner locations | corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria) |
|
cv2.drawChessboardCorners() |
Draw detected corners | cv2.drawChessboardCorners(img, (9, 6), corners2, ret) |
|
cv2.calibrateCamera() |
Camera calibration | ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None) |
|
cv2.undistort() |
Undistort image | dst = cv2.undistort(img, mtx, dist, None, newcameramtx) |
|
cv2.getOptimalNewCameraMatrix() |
Optimal new camera matrix | newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (w, h), 1, (w, h)) |
|
| Stereovision | cv2.stereoRectify() |
Stereo rectification | R1, R2, P1, P2, Q, validPixROI1, validPixROI2 = cv2.stereoRectify(mtx1, dist1, mtx2, dist2, imgsize, R, T) |
cv2.initUndistortRectifyMap() |
Maps for undistortion | map1x, map1y = cv2.initUndistortRectifyMap(mtx1, dist1, R1, P1, imgsize, cv2.CV_16SC2) |
|
cv2.remap() |
Apply maps | img_rectified = cv2.remap(img, map1x, map1y, cv2.INTER_LINEAR) |
|
cv2.StereoBM_create() |
Block Matching algorithm | stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15) |
|
cv2.StereoSGBM_create() |
Semi‑Global Block Matching | stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=16, blockSize=3) |
|
| Deep learning (DNN) | cv2.dnn.readNetFromTensorflow() |
Load TensorFlow model | net = cv2.dnn.readNetFromTensorflow('model.pb') |
cv2.dnn.readNetFromCaffe() |
Load Caffe model | net = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'model.caffemodel') |
|
cv2.dnn.readNetFromONNX() |
Load ONNX model | net = cv2.dnn.readNetFromONNX('model.onnx') |
|
cv2.dnn.readNetFromDarknet() |
Load Darknet model | net = cv2.dnn.readNetFromDarknet('yolo.cfg', 'yolo.weights') |
|
cv2.dnn.blobFromImage() |
Prepare input blob | blob = cv2.dnn.blobFromImage(image, scalefactor, size, mean) |
|
cv2.dnn.blobFromImages() |
Prepare batch of images | blob = cv2.dnn.blobFromImages(images, scalefactor, size, mean) |
|
net.setInput() |
Set input data | net.setInput(blob) |
|
net.forward() |
Forward pass | outputs = net.forward() |
|
net.getLayerNames() |
Get layer names | layer_names = net.getLayerNames() |
|
net.getUnconnectedOutLayers() |
Get output layers | output_layers = net.getUnconnectedOutLayers() |
|
| Machine learning | cv2.ml.KNearest_create() |
k‑NN classifier | knn = cv2.ml.KNearest_create() |
cv2.ml.SVM_create() |
SVM classifier | svm = cv2.ml.SVM_create() |
|
cv2.ml.RTrees_create() |
Random Forest | rtrees = cv2.ml.RTrees_create() |
|
cv2.ml.NormalBayesClassifier_create() |
Naïve Bayes classifier | bayes = cv2.ml.NormalBayesClassifier_create() |
|
cv2.ml.LogisticRegression_create() |
Logistic regression | lr = cv2.ml.LogisticRegression_create() |
|
cv2.ml.ANN_MLP_create() |
Multilayer perceptron | ann = cv2.ml.ANN_MLP_create() |
|
| Utilities | cv2.getTickCount() |
Get tick counter | t1 = cv2.getTickCount() |
cv2.getTickFrequency() |
Tick frequency | freq = cv2.getTickFrequency() |
|
cv2.norm() |
Compute norm | norm = cv2.norm(img1, img2, cv2.NORM_L2) |
|
cv2.normalize() |
Normalization | normalized = cv2.normalize(img, None, 0, 255, cv2.NORM_MINMAX) |
|
cv2.minMaxLoc() |
Find min and max values | minVal, maxVal, minLoc, maxLoc = cv2.minMaxLoc(gray) |
|
cv2.meanStdDev() |
Mean and standard deviation | mean, stddev = cv2.meanStdDev(img) |
|
cv2.bitwise_and() |
Bitwise AND | result = cv2.bitwise_and(img1, img2) |
|
cv2.bitwise_or() |
Bitwise OR | result = cv2.bitwise_or(img1, img2) |
|
cv2.bitwise_xor() |
Bitwise XOR | result = cv2.bitwise_xor(img1, img2) |
|
cv2.bitwise_not() |
Bitwise NOT | result = cv2.bitwise_not(img) |
|
cv2.add() |
Add images | result = cv2.add(img1, img2) |
|
cv2.subtract() |
Subtract images | result = cv2.subtract(img1, img2) |
|
cv2.multiply() |
Multiply images | result = cv2.multiply(img1, img2) |
|
cv2.divide() |
Divide images | result = cv2.divide(img1, img2) |
|
cv2.absdiff() |
Absolute difference | diff = cv2.absdiff(img1, img2) |
|
cv2.addWeighted() |
Weighted addition | result = cv2.addWeighted(img1, alpha, img2, beta, gamma) |
Integration with other libraries
Collaboration with NumPy
import numpy as np
import cv2
# Create an image using NumPy
img_numpy = np.zeros((300, 300, 3), dtype=np.uint8)
img_numpy[:, :, 2] = 255 # Red channel
# Mathematical operations on images
img_float = img.astype(np.float32) / 255.0
enhanced = np.clip(img_float * 1.5, 0, 1) * 255
enhanced = enhanced.astype(np.uint8)
Working with Matplotlib
import matplotlib.pyplot as plt
# Proper display of OpenCV images in Matplotlib
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
plt.axis('off')
plt.title('OpenCV Image in Matplotlib')
plt.show()
# Create a multi‑panel layout
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes[0, 0].imshow(img_rgb)
axes[0, 0].set_title('Original')
axes[0, 1].imshow(gray, cmap='gray')
axes[0, 1].set_title('Grayscale')
axes[1, 0].imshow(edges, cmap='gray')
axes[1, 0].set_title('Edges')
axes[1, 1].imshow(blurred_rgb)
axes[1, 1].set_title('Blurred')
plt.tight_layout()
plt.show()
Integration with TensorFlow and PyTorch
import tensorflow as tf
import torch
# Prepare data for TensorFlow
def preprocess_for_tensorflow(image):
image = cv2.resize(image, (224, 224))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = image.astype(np.float32) / 255.0
image = np.expand_dims(image, axis=0)
return image
# Prepare data for PyTorch
def preprocess_for_pytorch(image):
image = cv2.resize(image, (224, 224))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = image.astype(np.float32) / 255.0
image = np.transpose(image, (2, 0, 1))
image = np.expand_dims(image, axis=0)
return torch.tensor(image)
Performance optimization
Using multithreading
import threading
import concurrent.futures
def process_frame(frame):
# Process a single frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150)
return edges
# Process video with a thread pool
def process_video_multithreaded(video_path):
cap = cv2.VideoCapture(video_path)
frames = []
while True:
ret, frame = cap.read()
if not ret:
break
frames.append(frame)
cap.release()
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
processed_frames = list(executor.map(process_frame, frames))
return processed_frames
GPU optimization
# Use CUDA if available
if cv2.cuda.getCudaEnabledDeviceCount() > 0:
print("CUDA devices available:", cv2.cuda.getCudaEnabledDeviceCount())
# Upload image to GPU
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(image)
# Process on GPU
gpu_gray = cv2.cuda.cvtColor(gpu_img, cv2.COLOR_BGR2GRAY)
gpu_blur = cv2.cuda.GaussianBlur(gpu_gray, (15, 15), 0)
# Download result back to CPU
result = gpu_blur.download()
Error handling and debugging
Common errors and solutions
# Safe image loading
def safe_imread(path):
img = cv2.imread(path)
if img is None:
raise FileNotFoundError(f"Failed to load image: {path}")
return img
# Verify camera operation
def check_camera(camera_id=0):
cap = cv2.VideoCapture(camera_id)
if not cap.isOpened():
raise RuntimeError(f"Unable to open camera {camera_id}")
ret, frame = cap.read()
if not ret:
cap.release()
raise RuntimeError("Failed to capture a frame from the camera")
cap.release()
return True
# Validate image dimensions
def validate_image_size(img, min_size=(100, 100)):
h, w = img.shape[:2]
if h < min_size[0] or w < min_size[1]:
raise ValueError(f"Image size {w}x{h} is too small. Minimum size: {min_size}")
Practical applications of OpenCV
Video surveillance system
class MotionDetector:
def __init__(self, threshold=25, min_area=500):
self.threshold = threshold
self.min_area = min_area
self.background_subtractor = cv2.createBackgroundSubtractorMOG2(detectShadows=True)
def detect_motion(self, frame):
# Apply background subtraction
fg_mask = self.background_subtractor.apply(frame)
# Morphological cleanup
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
fg_mask = cv2.morphologyEx(fg_mask, cv2.MORPH_OPEN, kernel)
# Find contours
contours, _ = cv2.findContours(fg_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
motion_detected = False
for contour in contours:
if cv2.contourArea(contour) > self.min_area:
motion_detected = True
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
return frame, motion_detected
Optical character recognition (OCR)
# Prepare image for OCR
def preprocess_for_ocr(image):
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Adaptive binarization
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
# Morphological cleanup
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
cleaned = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
return cleaned
# Find text regions
def find_text_regions(image):
# Use MSER to detect text blobs
mser = cv2.MSER_create()
regions, _ = mser.detectRegions(image)
# Filter regions by size
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]
return hulls
Medical image analysis
def analyze_medical_image(image):
# Enhance contrast with CLAHE
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
enhanced = clahe.apply(image)
# Segmentation using watershed
# Prepare markers
ret, thresh = cv2.threshold(enhanced, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Remove noise
kernel = np.ones((3, 3), np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)
# Identify background
sure_bg = cv2.dilate(opening, kernel, iterations=3)
# Find foreground regions
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
ret, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)
# Unknown region
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg, sure_fg)
return enhanced, sure_fg, unknown
Optimization tips and best practices
Choosing the right data type
# Use uint8 for most operations
img_uint8 = img.astype(np.uint8)
# Use float32 for arithmetic
img_float32 = img.astype(np.float32) / 255.0
# Use float64 for high‑precision calculations
img_float64 = img.astype(np.float64)
Memory‑usage optimization
# Pre‑allocate memory for video processing
def process_video_optimized(video_path, output_path):
cap = cv2.VideoCapture(video_path)
# Retrieve video parameters
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
# Pre‑allocate a buffer for processed frames
processed_frame = np.zeros((height, width, 3), dtype=np.uint8)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
while True:
ret, frame = cap.read()
if not ret:
break
# Process frame using pre‑allocated buffer
cv2.GaussianBlur(frame, (15, 15), 0, processed_frame)
out.write(processed_frame)
cap.release()
out.release()
Performance profiling
import time
def profile_function(func, *args, **kwargs):
start_time = cv2.getTickCount()
result = func(*args, **kwargs)
end_time = cv2.getTickCount()
execution_time = (end_time - start_time) / cv2.getTickFrequency()
print(f"Function {func.__name__} executed in {execution_time:.4f} seconds")
return result
# Example usage
processed_image = profile_function(cv2.GaussianBlur, image, (15, 15), 0)
Frequently asked questions
How does OpenCV handle different image formats?
OpenCV automatically determines the image format from the file extension and uses the appropriate codec. Supported formats include JPEG, PNG, BMP, TIFF, WebP and many others.
Can OpenCV be used for real‑time video processing?
Yes, OpenCV is optimized for real‑time video processing. To achieve high performance, hardware acceleration (GPU) and algorithmic optimizations are recommended.
How does OpenCV work with color spaces?
OpenCV uses BGR as its default color order, which differs from the standard RGB used by most other libraries. When interfacing with other libraries, explicit color‑space conversion is required.
Does OpenCV support machine learning?
Yes, OpenCV includes a machine‑learning module with classic algorithms (SVM, k‑NN, Decision Trees) and a DNN module for deep neural networks.
How to ensure cross‑platform compatibility of OpenCV applications?
OpenCV runs on all major operating systems. To maintain portability, use only standard OpenCV APIs and avoid platform‑specific code.
Conclusion
OpenCV is a powerful and versatile library for tackling computer‑vision challenges. Its extensive functionality, active community, and continuous development make it an indispensable tool for researchers and developers. From simple image manipulation to sophisticated object‑recognition systems, OpenCV provides all the building blocks needed to create modern computer‑vision applications.
Thanks to its modular architecture, support for multiple programming languages and platforms, and seamless integration with popular machine‑learning libraries, OpenCV remains the industry benchmark. Whether you are working on academic research or a commercial project, OpenCV offers a reliable foundation for turning your computer‑vision ideas into reality.
The Future of AI in Mathematics and Everyday Life: How Intelligent Agents Are Already Changing the Game
Experts warned about the risks of fake charity with AI
In Russia, universal AI-agent for robots and industrial processes was developed