OpenCV - computer vision

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

History and development of OpenCV

The library was created in 1999 by Intel Research scientists under the leadership of Gary Bradsky. Initially the project was intended to accelerate research in computer vision and provide common tools for developers. The first public release appeared in 2000, and in 2008 the project became fully open‑source.

Key development milestones:

  • 1999‑2005: Development at Intel Research, focus on performance
  • 2006‑2012: Transition to an open development model, creation of OpenCV 2.x
  • 2013‑2015: Release of OpenCV 3.x with major architectural changes
  • 2018‑present: OpenCV 4.x with enhanced deep‑learning support

Today OpenCV is maintained by the OpenCV Foundation and an active community of developers worldwide.

Architecture and modules of OpenCV

OpenCV is built on a modular principle, which provides flexibility and the ability to use only the required components:

Main modules:

Core – basic data structures and algorithms Imgproc – image processing and filtering Imgcodecs – image encoding and decoding Videoio – video and camera handling Highgui – user interface Features2d – feature detectors and descriptors Calib3d – camera calibration and 3‑D reconstruction Objdetect – object detection DNN – deep‑learning module ML – classic machine‑learning algorithms

System requirements and supported platforms

OpenCV supports a wide range of operating systems and architectures:

Operating systems:

  • Windows (7, 8, 10, 11)
  • Linux (Ubuntu, CentOS, Debian and other distributions)
  • macOS
  • Android
  • iOS

Programming languages:

  • C++
  • Python
  • Java
  • C#
  • JavaScript (OpenCV.js)

Hardware accelerators:

  • CUDA (NVIDIA GPU)
  • OpenCL
  • Intel TBB
  • Intel IPP

Installation and configuration of OpenCV

Installation for Python

Basic installation:

pip install opencv-python

Installation with extra modules:

pip install opencv-contrib-python

For video processing you may need:

pip install opencv-python-headless  # for server‑side applications

Verification of installation

import cv2
print(cv2.__version__)
print(cv2.getBuildInformation())

IDE setup

For efficient work with OpenCV it is also recommended to install:

pip install numpy matplotlib jupyter

Core data structures of OpenCV

Mat – the primary class for image handling

In OpenCV images are represented as multi‑dimensional arrays of type Mat (in C++) or numpy.ndarray (in Python). Each pixel can contain 1 to 4 channels (grayscale, RGB, RGBA).

Fundamental data types:

  • CV_8U – 8‑bit unsigned integers (0‑255)
  • CV_8S – 8‑bit signed integers
  • CV_16U – 16‑bit unsigned integers
  • CV_16S – 16‑bit signed integers
  • CV_32S – 32‑bit signed integers
  • CV_32F – 32‑bit floating‑point numbers
  • CV_64F – 64‑bit floating‑point numbers

Working with images

Loading and saving images

import cv2

# Load an image
image = cv2.imread('image.jpg')
gray_image = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)

# Display the image
cv2.imshow('Original', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

# Save the image
cv2.imwrite('output.jpg', image)
cv2.imwrite('output.png', image, [cv2.IMWRITE_PNG_COMPRESSION, 9])

Handling various file formats

OpenCV supports many formats:

  • Raster: JPEG, PNG, BMP, TIFF, WebP
  • Vector: SVG (limited support)
  • Specialized: OpenEXR, JPEG 2000, PFM

Color spaces and conversions

Main color spaces:

BGR – default OpenCV format (Blue, Green, Red) RGB – standard format for most libraries HSV – Hue, Saturation, Value (convenient for color‑based segmentation) LAB – perceptually uniform Lab color space YUV – used in video coding XYZ – CIE 1931 color space

Conversion examples:

# Convert BGR to various color spaces
rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lab_image = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Channel operations
b, g, r = cv2.split(image)
merged = cv2.merge([b, g, r])

Geometric transformations

Resizing and scaling

# Resize while preserving aspect ratio
def resize_with_aspect_ratio(image, width=None, height=None):
    h, w = image.shape[:2]
    if width is None and height is None:
        return image
    if width is None:
        ratio = height / h
        width = int(w * ratio)
    else:
        ratio = width / w
        height = int(h * ratio)
    return cv2.resize(image, (width, height))

# Different interpolation methods
resized_nearest = cv2.resize(image, (300, 300), interpolation=cv2.INTER_NEAREST)
resized_linear = cv2.resize(image, (300, 300), interpolation=cv2.INTER_LINEAR)
resized_cubic = cv2.resize(image, (300, 300), interpolation=cv2.INTER_CUBIC)

Rotation and affine transforms

# Rotate an image
def rotate_image(image, angle, center=None, scale=1.0):
    h, w = image.shape[:2]
    if center is None:
        center = (w // 2, h // 2)
    
    rotation_matrix = cv2.getRotationMatrix2D(center, angle, scale)
    rotated = cv2.warpAffine(image, rotation_matrix, (w, h))
    return rotated

# Affine transformation
src_points = np.float32([[0, 0], [w, 0], [0, h]])
dst_points = np.float32([[0, 0], [w, 0], [100, h]])
affine_matrix = cv2.getAffineTransform(src_points, dst_points)
warped = cv2.warpAffine(image, affine_matrix, (w, h))

Filtering and image processing

Linear filters

# Various blur types
gaussian_blur = cv2.GaussianBlur(image, (15, 15), 0)
box_blur = cv2.blur(image, (15, 15))
median_blur = cv2.medianBlur(image, 15)

# Bilateral filter (preserves edges)
bilateral = cv2.bilateralFilter(image, 9, 75, 75)

# Custom kernels
kernel_sharpen = np.array([[-1, -1, -1],
                          [-1, 9, -1],
                          [-1, -1, -1]])
sharpened = cv2.filter2D(image, -1, kernel_sharpen)

Edge detectors

# Canny edge detector
edges = cv2.Canny(gray_image, 50, 150, apertureSize=3)

# Sobel operator
sobel_x = cv2.Sobel(gray_image, cv2.CV_64F, 1, 0, ksize=3)
sobel_y = cv2.Sobel(gray_image, cv2.CV_64F, 0, 1, ksize=3)
sobel_combined = cv2.magnitude(sobel_x, sobel_y)

# Laplacian operator
laplacian = cv2.Laplacian(gray_image, cv2.CV_64F)

Morphological operations

Basic operations

# Create structuring element
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
kernel_ellipse = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))

# Core morphological ops
eroded = cv2.erode(binary_image, kernel, iterations=1)
dilated = cv2.dilate(binary_image, kernel, iterations=1)
opened = cv2.morphologyEx(binary_image, cv2.MORPH_OPEN, kernel)
closed = cv2.morphologyEx(binary_image, cv2.MORPH_CLOSE, kernel)
gradient = cv2.morphologyEx(binary_image, cv2.MORPH_GRADIENT, kernel)

Working with contours

Finding and analyzing contours

# Find contours
contours, hierarchy = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# Analyze each contour
for contour in contours:
    area = cv2.contourArea(contour)
    perimeter = cv2.arcLength(contour, True)
    
    # Approximate the contour
    epsilon = 0.02 * cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, epsilon, True)
    
    # Bounding rectangle
    x, y, w, h = cv2.boundingRect(contour)
    
    # Minimum enclosing circle
    (x, y), radius = cv2.minEnclosingCircle(contour)
    
    # Minimum area rectangle
    rect = cv2.minAreaRect(contour)
    box = cv2.boxPoints(rect)
    box = np.int0(box)

Working with video

Capturing video from a camera

cap = cv2.VideoCapture(0)  # 0 – first camera

# Camera settings
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
cap.set(cv2.CAP_PROP_FPS, 30)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Process the frame
    processed_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    
    cv2.imshow('Video', processed_frame)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Working with video files

# Read a video file
cap = cv2.VideoCapture('video.mp4')

# Retrieve video information
fps = cap.get(cv2.CAP_PROP_FPS)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
duration = frame_count / fps

# Write video
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi', fourcc, fps, (width, height))

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Process the frame
    processed_frame = cv2.flip(frame, 1)
    out.write(processed_frame)

cap.release()
out.release()

Object detection and recognition

Haar cascades

# Load cascades
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_eye.xml')

# Detect faces
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))

for (x, y, w, h) in faces:
    cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)
    
    # Detect eyes inside the face region
    roi_gray = gray[y:y+h, x:x+w]
    roi_color = image[y:y+h, x:x+w]
    eyes = eye_cascade.detectMultiScale(roi_gray)
    
    for (ex, ey, ew, eh) in eyes:
        cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)

Feature detectors

# SIFT detector
sift = cv2.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(gray, None)

# ORB detector
orb = cv2.ORB_create()
keypoints, descriptors = orb.detectAndCompute(gray, None)

# Visualize keypoints
img_with_keypoints = cv2.drawKeypoints(image, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

Deep‑learning module (DNN)

Loading and using pretrained models

# Load a model
net = cv2.dnn.readNetFromONNX('model.onnx')
net = cv2.dnn.readNetFromTensorflow('model.pb')
net = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'model.caffemodel')

# Prepare input blob
blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size=(224, 224), mean=(104, 117, 123))

# Forward pass
net.setInput(blob)
outputs = net.forward()

# Process results
for output in outputs:
    for detection in output[0, 0]:
        confidence = detection[2]
        if confidence > 0.5:
            # Handle detection
            pass

Camera calibration and 3‑D reconstruction

Camera calibration

# Prepare calibration points
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
objp = np.zeros((6*7, 3), np.float32)
objp[:, :2] = np.mgrid[0:7, 0:6].T.reshape(-1, 2)

objpoints = []  # 3‑D points in real world
imgpoints = []  # 2‑D points in image

# Find chessboard corners
ret, corners = cv2.findChessboardCorners(gray, (7, 6), None)

if ret:
    objpoints.append(objp)
    corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
    imgpoints.append(corners2)

# Calibrate camera
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

# Undistort image
undistorted = cv2.undistort(image, mtx, dist, None, mtx)

Object tracking

Optical flow

# Parameters for corner detection
feature_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)

# Parameters for Lucas‑Kanade optical flow
lk_params = dict(winSize=(15, 15), maxLevel=2, criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# Detect features to track
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)

# Compute optical flow
p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)

OpenCV trackers

# Different tracker types
tracker_types = ['BOOSTING', 'MIL', 'KCF', 'TLD', 'MEDIANFLOW', 'GOTURN', 'MOSSE', 'CSRT']

def create_tracker(tracker_type):
    if tracker_type == 'BOOSTING':
        tracker = cv2.legacy.TrackerBoosting_create()
    elif tracker_type == 'MIL':
        tracker = cv2.legacy.TrackerMIL_create()
    elif tracker_type == 'KCF':
        tracker = cv2.legacy.TrackerKCF_create()
    elif tracker_type == 'TLD':
        tracker = cv2.legacy.TrackerTLD_create()
    elif tracker_type == 'MEDIANFLOW':
        tracker = cv2.legacy.TrackerMedianFlow_create()
    elif tracker_type == 'MOSSE':
        tracker = cv2.legacy.TrackerMOSSE_create()
    elif tracker_type == 'CSRT':
        tracker = cv2.TrackerCSRT_create()
    return tracker

# Initialise tracker
tracker = create_tracker('CSRT')
ok = tracker.init(frame, bbox)

# Update tracker
ok, bbox = tracker.update(frame)

Complete table of OpenCV methods and functions

Category Function/Method Description Usage example
Image I/O cv2.imread() Load an image img = cv2.imread('image.jpg')
  cv2.imshow() Display an image cv2.imshow('Image', img)
  cv2.imwrite() Save an image cv2.imwrite('output.jpg', img)
  cv2.waitKey() Wait for a key press cv2.waitKey(0)
  cv2.destroyAllWindows() Close all windows cv2.destroyAllWindows()
Video handling cv2.VideoCapture() Capture video cap = cv2.VideoCapture(0)
  cv2.VideoWriter() Write video out = cv2.VideoWriter('out.avi', fourcc, fps, size)
  cap.read() Read a frame ret, frame = cap.read()
  cap.release() Release resources cap.release()
Color conversion cv2.cvtColor() Convert color space gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  cv2.split() Split channels b, g, r = cv2.split(img)
  cv2.merge() Merge channels merged = cv2.merge([b, g, r])
  cv2.inRange() Create a mask based on color range mask = cv2.inRange(hsv, lower, upper)
Geometric transforms cv2.resize() Resize image resized = cv2.resize(img, (300, 300))
  cv2.rotate() Rotate by 90°/180°/270° rotated = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
  cv2.flip() Flip image flipped = cv2.flip(img, 1)
  cv2.warpAffine() Affine warp warped = cv2.warpAffine(img, M, (w, h))
  cv2.warpPerspective() Perspective warp warped = cv2.warpPerspective(img, M, (w, h))
  cv2.getRotationMatrix2D() Rotation matrix M = cv2.getRotationMatrix2D(center, angle, scale)
  cv2.getAffineTransform() Affine matrix M = cv2.getAffineTransform(src, dst)
  cv2.getPerspectiveTransform() Perspective matrix M = cv2.getPerspectiveTransform(src, dst)
Filtering cv2.blur() Simple blur blurred = cv2.blur(img, (5, 5))
  cv2.GaussianBlur() Gaussian blur blurred = cv2.GaussianBlur(img, (5, 5), 0)
  cv2.medianBlur() Median blur blurred = cv2.medianBlur(img, 5)
  cv2.bilateralFilter() Bilateral filter filtered = cv2.bilateralFilter(img, 9, 75, 75)
  cv2.filter2D() Convolution with a kernel filtered = cv2.filter2D(img, -1, kernel)
Edge detection cv2.Canny() Canny edge detector edges = cv2.Canny(gray, 50, 150)
  cv2.Sobel() Sobel operator sobel = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
  cv2.Laplacian() Laplacian operator laplacian = cv2.Laplacian(gray, cv2.CV_64F)
  cv2.Scharr() Scharr operator scharr = cv2.Scharr(gray, cv2.CV_64F, 1, 0)
Thresholding cv2.threshold() Simple binary threshold ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
  cv2.adaptiveThreshold() Adaptive binary threshold thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2)
Morphology cv2.erode() Erosion eroded = cv2.erode(img, kernel, iterations=1)
  cv2.dilate() Dilation dilated = cv2.dilate(img, kernel, iterations=1)
  cv2.morphologyEx() Combined morphological ops opened = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
  cv2.getStructuringElement() Create structuring element kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
Contours cv2.findContours() Find contours contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
  cv2.drawContours() Draw contours cv2.drawContours(img, contours, -1, (0, 255, 0), 2)
  cv2.contourArea() Contour area area = cv2.contourArea(contour)
  cv2.arcLength() Contour perimeter perimeter = cv2.arcLength(contour, True)
  cv2.approxPolyDP() Contour approximation approx = cv2.approxPolyDP(contour, epsilon, True)
  cv2.boundingRect() Bounding rectangle x, y, w, h = cv2.boundingRect(contour)
  cv2.minAreaRect() Minimum area rectangle rect = cv2.minAreaRect(contour)
  cv2.minEnclosingCircle() Minimum enclosing circle (x, y), radius = cv2.minEnclosingCircle(contour)
  cv2.fitEllipse() Ellipse fitting ellipse = cv2.fitEllipse(contour)
  cv2.convexHull() Convex hull hull = cv2.convexHull(contour)
Drawing cv2.line() Line cv2.line(img, pt1, pt2, color, thickness)
  cv2.rectangle() Rectangle cv2.rectangle(img, pt1, pt2, color, thickness)
  cv2.circle() Circle cv2.circle(img, center, radius, color, thickness)
  cv2.ellipse() Ellipse cv2.ellipse(img, center, axes, angle, startAngle, endAngle, color, thickness)
  cv2.polylines() Polylines cv2.polylines(img, [pts], isClosed, color, thickness)
  cv2.fillPoly() Fill polygon cv2.fillPoly(img, [pts], color)
  cv2.putText() Text cv2.putText(img, text, org, font, fontScale, color, thickness)
Object detection cv2.CascadeClassifier() Cascade classifier face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
  detectMultiScale() Detect objects faces = face_cascade.detectMultiScale(gray, 1.1, 5)
  cv2.HOGDescriptor() HOG descriptor hog = cv2.HOGDescriptor()
  cv2.HOGDescriptor_getDefaultPeopleDetector() People detector hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
Feature detection cv2.SIFT_create() SIFT detector sift = cv2.SIFT_create()
  cv2.ORB_create() ORB detector orb = cv2.ORB_create()
  cv2.AKAZE_create() AKAZE detector akaze = cv2.AKAZE_create()
  cv2.BRISK_create() BRISK detector brisk = cv2.BRISK_create()
  detectAndCompute() Detect and compute descriptors kp, des = sift.detectAndCompute(gray, None)
  cv2.drawKeypoints() Draw keypoints img_kp = cv2.drawKeypoints(img, kp, None)
  cv2.goodFeaturesToTrack() Corner detection corners = cv2.goodFeaturesToTrack(gray, maxCorners, qualityLevel, minDistance)
Feature matching cv2.BFMatcher() Brute‑Force matcher bf = cv2.BFMatcher()
  cv2.FlannBasedMatcher() FLANN matcher flann = cv2.FlannBasedMatcher()
  match() Match descriptors matches = bf.match(des1, des2)
  knnMatch() k‑NN matching matches = bf.knnMatch(des1, des2, k=2)
  cv2.drawMatches() Draw matches img_matches = cv2.drawMatches(img1, kp1, img2, kp2, matches, None)
Geometric transforms for matching cv2.findHomography() Find homography H, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
  cv2.findFundamentalMat() Fundamental matrix F, mask = cv2.findFundamentalMat(pts1, pts2, cv2.FM_LMEDS)
  cv2.findEssentialMat() Essential matrix E, mask = cv2.findEssentialMat(pts1, pts2, focal, pp, cv2.RANSAC, 0.999, 1.0)
Tracking cv2.calcOpticalFlowPyrLK() Lucas‑Kanade optical flow p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)
  cv2.calcOpticalFlowFarneback() Farneback optical flow flow = cv2.calcOpticalFlowFarneback(old_gray, frame_gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
  cv2.TrackerCSRT_create() CSRT tracker tracker = cv2.TrackerCSRT_create()
  cv2.TrackerKCF_create() KCF tracker tracker = cv2.TrackerKCF_create()
Camera calibration cv2.findChessboardCorners() Find chessboard corners ret, corners = cv2.findChessboardCorners(gray, (9, 6), None)
  cv2.cornerSubPix() Refine corner locations corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
  cv2.drawChessboardCorners() Draw detected corners cv2.drawChessboardCorners(img, (9, 6), corners2, ret)
  cv2.calibrateCamera() Camera calibration ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
  cv2.undistort() Undistort image dst = cv2.undistort(img, mtx, dist, None, newcameramtx)
  cv2.getOptimalNewCameraMatrix() Optimal new camera matrix newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (w, h), 1, (w, h))
Stereovision cv2.stereoRectify() Stereo rectification R1, R2, P1, P2, Q, validPixROI1, validPixROI2 = cv2.stereoRectify(mtx1, dist1, mtx2, dist2, imgsize, R, T)
  cv2.initUndistortRectifyMap() Maps for undistortion map1x, map1y = cv2.initUndistortRectifyMap(mtx1, dist1, R1, P1, imgsize, cv2.CV_16SC2)
  cv2.remap() Apply maps img_rectified = cv2.remap(img, map1x, map1y, cv2.INTER_LINEAR)
  cv2.StereoBM_create() Block Matching algorithm stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
  cv2.StereoSGBM_create() Semi‑Global Block Matching stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=16, blockSize=3)
Deep learning (DNN) cv2.dnn.readNetFromTensorflow() Load TensorFlow model net = cv2.dnn.readNetFromTensorflow('model.pb')
  cv2.dnn.readNetFromCaffe() Load Caffe model net = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'model.caffemodel')
  cv2.dnn.readNetFromONNX() Load ONNX model net = cv2.dnn.readNetFromONNX('model.onnx')
  cv2.dnn.readNetFromDarknet() Load Darknet model net = cv2.dnn.readNetFromDarknet('yolo.cfg', 'yolo.weights')
  cv2.dnn.blobFromImage() Prepare input blob blob = cv2.dnn.blobFromImage(image, scalefactor, size, mean)
  cv2.dnn.blobFromImages() Prepare batch of images blob = cv2.dnn.blobFromImages(images, scalefactor, size, mean)
  net.setInput() Set input data net.setInput(blob)
  net.forward() Forward pass outputs = net.forward()
  net.getLayerNames() Get layer names layer_names = net.getLayerNames()
  net.getUnconnectedOutLayers() Get output layers output_layers = net.getUnconnectedOutLayers()
Machine learning cv2.ml.KNearest_create() k‑NN classifier knn = cv2.ml.KNearest_create()
  cv2.ml.SVM_create() SVM classifier svm = cv2.ml.SVM_create()
  cv2.ml.RTrees_create() Random Forest rtrees = cv2.ml.RTrees_create()
  cv2.ml.NormalBayesClassifier_create() Naïve Bayes classifier bayes = cv2.ml.NormalBayesClassifier_create()
  cv2.ml.LogisticRegression_create() Logistic regression lr = cv2.ml.LogisticRegression_create()
  cv2.ml.ANN_MLP_create() Multilayer perceptron ann = cv2.ml.ANN_MLP_create()
Utilities cv2.getTickCount() Get tick counter t1 = cv2.getTickCount()
  cv2.getTickFrequency() Tick frequency freq = cv2.getTickFrequency()
  cv2.norm() Compute norm norm = cv2.norm(img1, img2, cv2.NORM_L2)
  cv2.normalize() Normalization normalized = cv2.normalize(img, None, 0, 255, cv2.NORM_MINMAX)
  cv2.minMaxLoc() Find min and max values minVal, maxVal, minLoc, maxLoc = cv2.minMaxLoc(gray)
  cv2.meanStdDev() Mean and standard deviation mean, stddev = cv2.meanStdDev(img)
  cv2.bitwise_and() Bitwise AND result = cv2.bitwise_and(img1, img2)
  cv2.bitwise_or() Bitwise OR result = cv2.bitwise_or(img1, img2)
  cv2.bitwise_xor() Bitwise XOR result = cv2.bitwise_xor(img1, img2)
  cv2.bitwise_not() Bitwise NOT result = cv2.bitwise_not(img)
  cv2.add() Add images result = cv2.add(img1, img2)
  cv2.subtract() Subtract images result = cv2.subtract(img1, img2)
  cv2.multiply() Multiply images result = cv2.multiply(img1, img2)
  cv2.divide() Divide images result = cv2.divide(img1, img2)
  cv2.absdiff() Absolute difference diff = cv2.absdiff(img1, img2)
  cv2.addWeighted() Weighted addition result = cv2.addWeighted(img1, alpha, img2, beta, gamma)

Integration with other libraries

Collaboration with NumPy

import numpy as np
import cv2

# Create an image using NumPy
img_numpy = np.zeros((300, 300, 3), dtype=np.uint8)
img_numpy[:, :, 2] = 255  # Red channel

# Mathematical operations on images
img_float = img.astype(np.float32) / 255.0
enhanced = np.clip(img_float * 1.5, 0, 1) * 255
enhanced = enhanced.astype(np.uint8)

Working with Matplotlib

import matplotlib.pyplot as plt

# Proper display of OpenCV images in Matplotlib
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
plt.axis('off')
plt.title('OpenCV Image in Matplotlib')
plt.show()

# Create a multi‑panel layout
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes[0, 0].imshow(img_rgb)
axes[0, 0].set_title('Original')
axes[0, 1].imshow(gray, cmap='gray')
axes[0, 1].set_title('Grayscale')
axes[1, 0].imshow(edges, cmap='gray')
axes[1, 0].set_title('Edges')
axes[1, 1].imshow(blurred_rgb)
axes[1, 1].set_title('Blurred')
plt.tight_layout()
plt.show()

Integration with TensorFlow and PyTorch

import tensorflow as tf
import torch

# Prepare data for TensorFlow
def preprocess_for_tensorflow(image):
    image = cv2.resize(image, (224, 224))
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = image.astype(np.float32) / 255.0
    image = np.expand_dims(image, axis=0)
    return image

# Prepare data for PyTorch
def preprocess_for_pytorch(image):
    image = cv2.resize(image, (224, 224))
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = image.astype(np.float32) / 255.0
    image = np.transpose(image, (2, 0, 1))
    image = np.expand_dims(image, axis=0)
    return torch.tensor(image)

Performance optimization

Using multithreading

import threading
import concurrent.futures

def process_frame(frame):
    # Process a single frame
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 50, 150)
    return edges

# Process video with a thread pool
def process_video_multithreaded(video_path):
    cap = cv2.VideoCapture(video_path)
    frames = []
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        frames.append(frame)
    
    cap.release()
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        processed_frames = list(executor.map(process_frame, frames))
    
    return processed_frames

GPU optimization

# Use CUDA if available
if cv2.cuda.getCudaEnabledDeviceCount() > 0:
    print("CUDA devices available:", cv2.cuda.getCudaEnabledDeviceCount())
    
    # Upload image to GPU
    gpu_img = cv2.cuda_GpuMat()
    gpu_img.upload(image)
    
    # Process on GPU
    gpu_gray = cv2.cuda.cvtColor(gpu_img, cv2.COLOR_BGR2GRAY)
    gpu_blur = cv2.cuda.GaussianBlur(gpu_gray, (15, 15), 0)
    
    # Download result back to CPU
    result = gpu_blur.download()

Error handling and debugging

Common errors and solutions

# Safe image loading
def safe_imread(path):
    img = cv2.imread(path)
    if img is None:
        raise FileNotFoundError(f"Failed to load image: {path}")
    return img

# Verify camera operation
def check_camera(camera_id=0):
    cap = cv2.VideoCapture(camera_id)
    if not cap.isOpened():
        raise RuntimeError(f"Unable to open camera {camera_id}")
    
    ret, frame = cap.read()
    if not ret:
        cap.release()
        raise RuntimeError("Failed to capture a frame from the camera")
    
    cap.release()
    return True

# Validate image dimensions
def validate_image_size(img, min_size=(100, 100)):
    h, w = img.shape[:2]
    if h < min_size[0] or w < min_size[1]:
        raise ValueError(f"Image size {w}x{h} is too small. Minimum size: {min_size}")

Practical applications of OpenCV

Video surveillance system

class MotionDetector:
    def __init__(self, threshold=25, min_area=500):
        self.threshold = threshold
        self.min_area = min_area
        self.background_subtractor = cv2.createBackgroundSubtractorMOG2(detectShadows=True)
    
    def detect_motion(self, frame):
        # Apply background subtraction
        fg_mask = self.background_subtractor.apply(frame)
        
        # Morphological cleanup
        kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
        fg_mask = cv2.morphologyEx(fg_mask, cv2.MORPH_OPEN, kernel)
        
        # Find contours
        contours, _ = cv2.findContours(fg_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        
        motion_detected = False
        for contour in contours:
            if cv2.contourArea(contour) > self.min_area:
                motion_detected = True
                x, y, w, h = cv2.boundingRect(contour)
                cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
        
        return frame, motion_detected

Optical character recognition (OCR)

# Prepare image for OCR
def preprocess_for_ocr(image):
    # Convert to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # Adaptive binarization
    thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
    
    # Morphological cleanup
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
    cleaned = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
    
    return cleaned

# Find text regions
def find_text_regions(image):
    # Use MSER to detect text blobs
    mser = cv2.MSER_create()
    regions, _ = mser.detectRegions(image)
    
    # Filter regions by size
    hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]
    
    return hulls

Medical image analysis

def analyze_medical_image(image):
    # Enhance contrast with CLAHE
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
    enhanced = clahe.apply(image)
    
    # Segmentation using watershed
    # Prepare markers
    ret, thresh = cv2.threshold(enhanced, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
    
    # Remove noise
    kernel = np.ones((3, 3), np.uint8)
    opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)
    
    # Identify background
    sure_bg = cv2.dilate(opening, kernel, iterations=3)
    
    # Find foreground regions
    dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
    ret, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)
    
    # Unknown region
    sure_fg = np.uint8(sure_fg)
    unknown = cv2.subtract(sure_bg, sure_fg)
    
    return enhanced, sure_fg, unknown

Optimization tips and best practices

Choosing the right data type

# Use uint8 for most operations
img_uint8 = img.astype(np.uint8)

# Use float32 for arithmetic
img_float32 = img.astype(np.float32) / 255.0

# Use float64 for high‑precision calculations
img_float64 = img.astype(np.float64)

Memory‑usage optimization

# Pre‑allocate memory for video processing
def process_video_optimized(video_path, output_path):
    cap = cv2.VideoCapture(video_path)
    
    # Retrieve video parameters
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    
    # Pre‑allocate a buffer for processed frames
    processed_frame = np.zeros((height, width, 3), dtype=np.uint8)
    
    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        # Process frame using pre‑allocated buffer
        cv2.GaussianBlur(frame, (15, 15), 0, processed_frame)
        out.write(processed_frame)
    
    cap.release()
    out.release()

Performance profiling

import time

def profile_function(func, *args, **kwargs):
    start_time = cv2.getTickCount()
    result = func(*args, **kwargs)
    end_time = cv2.getTickCount()
    
    execution_time = (end_time - start_time) / cv2.getTickFrequency()
    print(f"Function {func.__name__} executed in {execution_time:.4f} seconds")
    
    return result

# Example usage
processed_image = profile_function(cv2.GaussianBlur, image, (15, 15), 0)

Frequently asked questions

How does OpenCV handle different image formats?

OpenCV automatically determines the image format from the file extension and uses the appropriate codec. Supported formats include JPEG, PNG, BMP, TIFF, WebP and many others.

Can OpenCV be used for real‑time video processing?

Yes, OpenCV is optimized for real‑time video processing. To achieve high performance, hardware acceleration (GPU) and algorithmic optimizations are recommended.

How does OpenCV work with color spaces?

OpenCV uses BGR as its default color order, which differs from the standard RGB used by most other libraries. When interfacing with other libraries, explicit color‑space conversion is required.

Does OpenCV support machine learning?

Yes, OpenCV includes a machine‑learning module with classic algorithms (SVM, k‑NN, Decision Trees) and a DNN module for deep neural networks.

How to ensure cross‑platform compatibility of OpenCV applications?

OpenCV runs on all major operating systems. To maintain portability, use only standard OpenCV APIs and avoid platform‑specific code.

Conclusion

OpenCV is a powerful and versatile library for tackling computer‑vision challenges. Its extensive functionality, active community, and continuous development make it an indispensable tool for researchers and developers. From simple image manipulation to sophisticated object‑recognition systems, OpenCV provides all the building blocks needed to create modern computer‑vision applications.

Thanks to its modular architecture, support for multiple programming languages and platforms, and seamless integration with popular machine‑learning libraries, OpenCV remains the industry benchmark. Whether you are working on academic research or a commercial project, OpenCV offers a reliable foundation for turning your computer‑vision ideas into reality.

News