Sikulix - GUI automation with recognition

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

SikuliX in Python: Complete Guide to Visual Interface Automation

Introduction to SikuliX

Automating interfaces, especially when no API is available, requires unconventional approaches. Visual recognition of elements is one of the most effective methods for solving such tasks. SikuliX is a unique automation tool that allows controlling GUI applications based on visual templates (screenshots of interface elements) rather than DOM structure or available APIs.

SikuliX was originally developed in Java, but it can be successfully used with Python via Jython or external calls through subprocess. The library provides powerful visual search, click, wait, and interaction capabilities with any interface elements represented as images.

What Is SikuliX and How It Works

SikuliX is an open‑source automation library that uses image‑recognition technology to interact with graphical user‑interface elements. Unlike traditional automation tools that rely on element selectors or APIs, SikuliX operates at the pixel level on the screen.

Key Features of SikuliX:

  • Visual recognition: uses OpenCV to locate images on the screen
  • Cross‑platform: works on Windows, macOS, and Linux
  • Technology‑agnostic: can automate any application, including desktop, web, and mobile
  • Ease of use: intuitive API for rapid learning

Installation and Configuration of SikuliX

System Requirements

To work with SikuliX you need:

  • Java 8 or newer
  • Python 2.7 or 3.x (for integration)
  • Sufficient RAM for image processing (at least 2 GB is recommended)

Installation Process

SikuliX is distributed as a Java application and cannot be installed via pip. To install:

  1. Download the latest sikulixide.jar from the official site sikulix.com
  2. Ensure Java is correctly installed
  3. Configure Python integration (via Jython or subprocess)

Running SikuliX from Python

import subprocess
import os

# Launch SikuliX IDE
subprocess.run(["java", "-jar", "sikulixide-2.0.5.jar"])

# Execute a ready script
subprocess.run(["java", "-jar", "sikulix.jar", "-r", "myscript.sikuli"])

Architecture and Core Concepts

SikuliX Object Model

SikuliX is built around several key concepts:

Screen — represents the entire display or a specific monitor. It is the primary object for interacting with the interface.

Region — a rectangular area of the screen where search and interaction operations are performed. Using regions significantly speeds up scripts.

Pattern — an image object with additional parameters such as match similarity and click offset.

Match — the result of an image search, containing coordinates and a confidence score.

Recognition Process

SikuliX uses computer‑vision algorithms to match image patterns against screen content. The process includes:

  1. Capturing a screenshot of the screen or a region
  2. Comparing the pattern with various parts of the screenshot
  3. Calculating a similarity score
  4. Returning the coordinates of the best match

Integration with Python

Using Jython

Jython allows executing Python code directly on the JVM, providing direct access to SikuliX’s Java libraries:

from sikuli import *

# Basic operations
click("login_button.png")
wait("loading_indicator.png", 10)
type("username")

Using subprocess (Python 3)

A more universal approach for Python 3:

import subprocess
import tempfile
import os

def run_sikuli_script(script_content):
    # Create a temporary Sikuli script
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(script_content)
        script_path = f.name
    
    try:
        result = subprocess.run([
            "java", "-jar", "sikulix.jar",
            "-r", script_path
        ], capture_output=True, text=True)
        return result.stdout, result.stderr
    finally:
        os.unlink(script_path)

Working with Images and Patterns

Creating and Managing Patterns

SikuliX works with PNG images of interface elements. Image quality and size are critical for reliable operation:

from sikuli import *

# Simple image click
click("submit_button.png")

# Use Pattern with custom similarity
pattern = Pattern("button.png").similar(0.8)
click(pattern)

# Click with offset
offset_pattern = Pattern("icon.png").targetOffset(10, 5)
click(offset_pattern)

Handling Multiple Matches

# Find all occurrences of an image
matches = findAll("item.png")
for match in matches:
    click(match)
    wait(1)  # Pause between clicks

Core SikuliX Methods and Functions

Summary Table of Main Methods

Method Description Parameters Returns
click(target) Clicks on the given image target: path to image or Pattern Match or None
doubleClick(target) Performs a double‑click target: image / Pattern Match or None
rightClick(target) Performs a right‑click target: image / Pattern Match or None
hover(target) Moves the cursor over the element target: image / Pattern Match or None
wait(target, timeout) Waits for the element to appear target: image, timeout: seconds Match or FindFailed
exists(target, timeout) Checks whether the element is present target: image, timeout: seconds Match or None
find(target) Finds the element on the screen target: image / Pattern Match or FindFailed
findAll(target) Finds all occurrences target: image / Pattern Iterator of Match
type(target, text) Types text into the element target: image, text: string Match or None
paste(target, text) Pastes text via the clipboard target: image, text: string Match or None
dragDrop(source, target) Drags one element onto another source, target: images True/False
wheel(target, direction, steps) Scrolls the mouse wheel target: image, direction: up/down, steps: integer Match or None
capture(region) Takes a screenshot region: screen area File path
highlight(seconds) Highlights an area with a border seconds: duration Region

Keyboard Interaction Methods

Method Description Parameters
keyDown(key) Presses and holds a key key: key code
keyUp(key) Releases a key key: key code
type(text, modifiers) Types with modifiers text: string, modifiers: Ctrl, Alt, Shift

Application Management Methods

Method Description Parameters
App.open(application) Opens an application application: path or command
App.focus(title) Activates a window title: window title
App.close(title) Closes an application title: window title

Working with Regions

Regions limit the search area, which dramatically improves performance and accuracy:

# Create a region
search_area = Region(100, 100, 500, 400)

# Search only inside the defined area
search_area.click("button.png")

# Nested regions
header_region = Region(0, 0, 1920, 100)
menu_button = header_region.find("menu.png")

Dynamic Region Determination

# Find an element and create a region around it
dialog = find("dialog_header.png")
dialog_region = Region(dialog.x - 50, dialog.y, 400, 300)
dialog_region.click("ok_button.png")

Error Handling and Debugging

How to Handle Image‑Search Errors?

from sikuli import *

try:
    wait("element.png", 10)
    click("element.png")
except FindFailed:
    print("Element not found within 10 seconds")
    # Alternative actions
    click("alternative.png")

How to Debug Recognition Issues?

# Enable detailed logging
Settings.InfoLogs = True
Settings.DebugLogs = True

# Visual debugging
region = find("element.png")
region.highlight(2)  # Highlight the found element

# Save a screenshot for analysis
screen_capture = capture()
print(f"Screenshot saved: {screen_capture}")

Practical Usage Examples

Automating Login

from sikuli import *

def login_automation(username, password):
    """Automates the login procedure"""
    try:
        # Wait for the login page
        wait("login_page.png", 15)
        
        # Enter username
        click("username_field.png")
        type(username)
        
        # Enter password
        click("password_field.png")
        type(password)
        
        # Click the login button
        click("login_button.png")
        
        # Verify successful login
        if wait("dashboard.png", 10):
            print("Login successful")
            return True
        else:
            print("Login failed")
            return False
            
    except FindFailed as e:
        print(f"Element not found: {e}")
        return False

Handling Dialog Windows

def handle_dialog():
    """Processes appearing dialog windows"""
    dialogs = [
        "error_dialog.png",
        "warning_dialog.png",
        "confirmation_dialog.png"
    ]
    
    for dialog in dialogs:
        if exists(dialog, 1):
            print(f"Dialog found: {dialog}")
            click("ok_button.png")
            wait(1)
            return True
    return False

Bulk Data Processing

def process_data_table():
    """Processes a data table"""
    row_count = 0
    
    while exists("table_row.png", 2):
        # Click the row
        click("table_row.png")
        
        # Open row editor
        click("edit_button.png")
        wait("edit_form.png", 5)
        
        # Make changes
        click("status_field.png")
        type("Processed")
        
        # Save
        click("save_button.png")
        wait("table_view.png", 3)
        
        row_count += 1
        
        # Move to next row
        type(Key.DOWN)
    
    print(f"Rows processed: {row_count}")

Performance Optimization

How to Speed Up Script Execution?

  1. Use regions: limit the search area
  2. Adjust wait times: set optimal timeouts
  3. Cache images: reuse found elements
  4. Reduce image size: keep images as small as possible
# Optimized settings
Settings.WaitScanRate = 3      # Scan frequency
Settings.ObserveScanRate = 3
Settings.MinSimilarity = 0.7   # Minimum similarity

# Use a region for faster interaction
toolbar_region = Region(0, 0, 1920, 50)
toolbar_region.click("menu.png")

Working with Multiple Monitors

How to Handle Multi‑Monitor Configurations?

# Get monitor information
screen_count = getNumberScreens()
print(f"Number of screens: {screen_count}")

# Work with a specific monitor
screen1 = Screen(1)  # Second monitor (0‑based indexing)
screen1.click("button.png")

# Switch between screens
for i in range(screen_count):
    screen = Screen(i)
    if screen.exists("target.png"):
        screen.click("target.png")
        break

Best Practices and Recommendations

Creating High‑Quality Images

  1. Image size: use images between 50‑200 px
  2. Contrast: choose elements with good contrast
  3. Uniqueness: avoid repetitive elements
  4. Format: use PNG with transparency when needed

Project Structure

project/
├── images/
│   ├── buttons/
│   ├── dialogs/
│   └── icons/
├── scripts/
│   ├── login.py
│   ├── data_processing.py
│   └── utils.py
└── config/
    └── settings.py

Error Handling

def robust_click(image, max_attempts=3):
    """Reliable click with retries"""
    for attempt in range(max_attempts):
        try:
            if exists(image, 2):
                click(image)
                return True
        except FindFailed:
            print(f"Attempt {attempt + 1} failed")
            wait(1)
    return False

Comparison with Alternative Tools

Criterion SikuliX PyAutoGUI Selenium AutoHotkey
Automation type Visual Coordinate‑based Web DOM Windows API
Cross‑platform
Ease of installation Medium High High High
Stability Medium Low High High
Execution speed Medium High Medium High
Best suited for Any GUI type Simple automation Web applications Windows applications

Limitations and Drawbacks

Main Limitations of SikuliX

  1. Resolution dependency: images are tied to a specific screen resolution
  2. UI changes impact: any design change requires updating images
  3. Performance: image search can be slow on large displays
  4. Localization: issues with interfaces in different languages

How to Minimize the Impact of Limitations?

# Use Pattern with lower similarity for resilience
flexible_pattern = Pattern("button.png").similar(0.6)

# Provide alternative images
buttons = ["submit_en.png", "submit_ru.png", "submit_de.png"]
for button in buttons:
    if exists(button, 1):
        click(button)
        break

Integration with Other Tools

Combining with Selenium

from selenium import webdriver
from sikuli import *

def hybrid_automation():
    """Combines Selenium and SikuliX"""
    # Launch browser with Selenium
    driver = webdriver.Chrome()
    driver.get("https://example.com")
    
    # Switch to SikuliX for complex actions
    wait("complex_element.png", 10)
    dragDrop("source.png", "target.png")
    
    # Return to Selenium for data extraction
    data = driver.find_element("id", "result").text
    driver.quit()
    
    return data

Using with Monitoring Systems

import logging
import time

def monitored_automation():
    """Automation with monitoring"""
    logger = logging.getLogger("sikuli_automation")
    
    start_time = time.time()
    try:
        # Core automation logic
        click("start_button.png")
        wait("process_complete.png", 300)
        
        execution_time = time.time() - start_time
        logger.info(f"Process completed in {execution_time:.2f} seconds")
        
    except FindFailed as e:
        logger.error(f"Automation error: {e}")
        capture("error_screenshot.png")  # Screenshot for analysis

Conclusion

SikuliX is a powerful and unique tool for visual automation, applicable to a wide range of tasks—from testing legacy systems to automating routine GUI operations. Although it has some limitations related to its reliance on visual representations, proper use of SikuliX together with Python can dramatically increase automation efficiency.

Key advantages of SikuliX include the ability to work with any type of interface, independence from technology stacks, and relatively easy learning curve. It is essential to plan project structure carefully, create high‑quality image templates, and implement robust exception handling.

To achieve maximum effectiveness, it is recommended to combine SikuliX with other automation tools, use regions for performance optimization, and follow best practices for maintainable automation code.

News