Watchdog - monitoring of changes in files

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

Introduction to File System Monitoring

In modern software development, monitoring changes in the file system is a critical task. Tracking file creation, modification, deletion, and movement is required in various scenarios: from automatically running tests when code changes to monitoring logs in production.

The Watchdog library is a cross‑platform solution for Python that provides efficient and reliable file system monitoring with minimal overhead. It leverages native operating‑system mechanisms and offers a uniform API for handling file system events.

What Is Watchdog

Watchdog is a Python library for real‑time file system monitoring. It ensures cross‑platform compatibility by using different native observation mechanisms depending on the OS: inotify on Linux, FSEvents on macOS, and ReadDirectoryChangesW on Windows.

Key Advantages

The library offers several important benefits compared to custom solutions or alternative tools. First, it delivers high performance by using native system APIs instead of polling the file system. Second, Watchdog provides a consistent interface across all supported platforms, simplifying the development of cross‑platform applications.

Installation and Setup

Install the library the standard way via pip:

pip install watchdog

For additional features, including the watchmedo command‑line utility, install the extended version:

pip install watchdog[watchmedo]

The library requires no extra configuration and is ready to use immediately after installation.

Architecture and Core Components

Watchdog is built around several key components that work together to provide file system monitoring.

Observer

The Observer is the central component of the monitoring system. It starts and manages the observation process, running in a separate thread to keep the main program non‑blocking. An Observer can watch multiple paths with different event handlers simultaneously.

EventHandler

The EventHandler defines how your program should react to various file system events. Developers create custom handlers by subclassing FileSystemEventHandler and overriding the necessary methods.

Event

Events are objects that contain information about changes that occurred in the file system. Each event includes the path to the file or directory, the type of operation, and additional metadata.

Basic Usage Examples

Simple Monitoring

Here’s a basic example of creating a monitoring system:

import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class BasicHandler(FileSystemEventHandler):
    def on_modified(self, event):
        if not event.is_directory:
            print(f"File modified: {event.src_path}")
    
    def on_created(self, event):
        if not event.is_directory:
            print(f"File created: {event.src_path}")

observer = Observer()
observer.schedule(BasicHandler(), path="./monitored_folder", recursive=True)
observer.start()

try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    observer.stop()
    
observer.join()

Monitoring with Logging

For more detailed event tracking, you can use built‑in logging:

import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler

logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S')

event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path=".", recursive=True)
observer.start()

Event Types in Detail

Watchdog provides four primary event types, each corresponding to a specific file system operation.

File and Directory Creation

The on_created() method is called when new files or directories are created:

def on_created(self, event):
    if event.is_directory:
        print(f"Directory created: {event.src_path}")
    else:
        print(f"File created: {event.src_path}")
        # Additional logic can be added here
        self.process_new_file(event.src_path)

Deletion of Items

The on_deleted() event fires when files or directories are removed:

def on_deleted(self, event):
    print(f"Deleted {'directory' if event.is_directory else 'file'}: {event.src_path}")
    # You can log deletions here
    self.log_deletion(event.src_path, event.is_directory)

Content Modification

The on_modified() method reacts to changes in files or directory metadata:

def on_modified(self, event):
    if not event.is_directory:
        print(f"File modified: {event.src_path}")
        # Example: check file size
        import os
        size = os.path.getsize(event.src_path)
        print(f"New size: {size} bytes")

Move and Rename

The on_moved() event handles move and rename operations:

def on_moved(self, event):
    print(f"Moved: {event.src_path} → {event.dest_path}")
    # Update path database
    self.update_file_paths(event.src_path, event.dest_path)

Advanced Features

Recursive Watching

The recursive=True flag enables monitoring of all subdirectories:

# Monitor the entire directory tree
observer.schedule(handler, path="/path/to/root", recursive=True)

# Monitor only the root directory
observer.schedule(handler, path="/path/to/root", recursive=False)

Filtering by File Extensions

For efficiency, you often need to watch only specific file types:

class FilteredHandler(FileSystemEventHandler):
    def __init__(self, extensions=None):
        self.extensions = extensions or ['.txt', '.py', '.md']
    
    def _is_relevant_file(self, path):
        return any(path.endswith(ext) for ext in self.extensions)
    
    def on_modified(self, event):
        if not event.is_directory and self._is_relevant_file(event.src_path):
            print(f"Relevant file modified: {event.src_path}")

Pattern Matching

The library provides a built‑in class for pattern‑based filtering:

from watchdog.events import PatternMatchingEventHandler

class PatternHandler(PatternMatchingEventHandler):
    def __init__(self):
        super().__init__(
            patterns=['*.py', '*.js', '*.css'],
            ignore_patterns=['*.tmp', '*.log'],
            ignore_directories=True,
            case_sensitive=False
        )
    
    def on_modified(self, event):
        print(f"Source code file modified: {event.src_path}")

Watchdog Methods and Functions Reference

Class/Method Description Parameters Return Value
Observer      
Observer() Creates a new observer instance - Observer
schedule(handler, path, recursive) Registers a handler for a given path handler, path, recursive=True Watch
start() Starts monitoring in a separate thread - None
stop() Stops monitoring - None
join(timeout) Waits for the observer thread to finish timeout=None None
is_alive() Checks whether the observer is active - bool
unschedule_all() Removes all registered handlers - None
FileSystemEventHandler      
on_created(event) Handles file/directory creation event None
on_deleted(event) Handles file/directory deletion event None
on_modified(event) Handles file/directory modification event None
on_moved(event) Handles file/directory moves event None
dispatch(event) Routes events to the appropriate methods event None
PatternMatchingEventHandler      
PatternMatchingEventHandler() Creates a handler with pattern filtering patterns, ignore_patterns, ignore_directories, case_sensitive Handler
LoggingEventHandler      
LoggingEventHandler() Creates a handler with automatic logging logger=None Handler
PollingObserver      
PollingObserver() Creates a polling‑based observer timeout=1 Observer
Events      
event.src_path Path to the source file/directory - str
event.dest_path Path to the destination file/directory (for moves) - str
event.is_directory True if the object is a directory - bool
event.key Unique event key - tuple

Integration with Asynchronous Code

To use Watchdog in asynchronous applications, several approaches are possible.

Using asyncio with Threads

import asyncio
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import queue

class AsyncHandler(FileSystemEventHandler):
    def __init__(self, event_queue):
        self.event_queue = event_queue
    
    def on_modified(self, event):
        self.event_queue.put(('modified', event.src_path))

async def process_events(event_queue):
    while True:
        try:
            event_type, path = event_queue.get_nowait()
            print(f"Async processing: {event_type} - {path}")
            # Place asynchronous handling here
            await asyncio.sleep(0.1)
        except queue.Empty:
            await asyncio.sleep(0.1)

async def main():
    event_queue = queue.Queue()
    handler = AsyncHandler(event_queue)
    observer = Observer()
    observer.schedule(handler, path=".", recursive=True)
    observer.start()
    
    await process_events(event_queue)

Practical Applications

Automatic Command Execution

import subprocess
import os

class CommandRunner(FileSystemEventHandler):
    def __init__(self, commands_map):
        self.commands_map = commands_map
    
    def on_modified(self, event):
        if event.is_directory:
            return
            
        for pattern, command in self.commands_map.items():
            if event.src_path.endswith(pattern):
                print(f"Executing command: {command}")
                try:
                    subprocess.run(command, shell=True, check=True)
                except subprocess.CalledProcessError as e:
                    print(f"Command execution error: {e}")

# Configure automatic command execution
commands = {
    '.py': 'python -m py_compile {}',
    '.css': 'sass --update styles/',
    '.js': 'npm run build'
}

handler = CommandRunner(commands)

Backup System

import shutil
import os
from datetime import datetime

class BackupHandler(FileSystemEventHandler):
    def __init__(self, backup_dir):
        self.backup_dir = backup_dir
        os.makedirs(backup_dir, exist_ok=True)
    
    def on_modified(self, event):
        if event.is_directory or event.src_path.endswith('.tmp'):
            return
            
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = os.path.basename(event.src_path)
        backup_path = os.path.join(self.backup_dir, f"{timestamp}_{filename}")
        
        try:
            shutil.copy2(event.src_path, backup_path)
            print(f"Backup created: {backup_path}")
        except Exception as e:
            print(f"Backup creation error: {e}")

Performance and Optimization

Settings for Large Projects

When dealing with projects that contain many files, apply the following optimizations:

class OptimizedHandler(FileSystemEventHandler):
    def __init__(self):
        self.ignored_extensions = {'.tmp', '.swp', '.DS_Store', '.git'}
        self.ignored_directories = {'node_modules', '.git', '__pycache__'}
    
    def _should_ignore(self, path):
        # Extension check
        if any(path.endswith(ext) for ext in self.ignored_extensions):
            return True
        
        # Directory check
        path_parts = path.split(os.sep)
        if any(part in self.ignored_directories for part in path_parts):
            return True
            
        return False
    
    def dispatch(self, event):
        if not self._should_ignore(event.src_path):
            super().dispatch(event)

Event Debouncing

To prevent multiple events from a single file change:

import time
from collections import defaultdict

class DebouncedHandler(FileSystemEventHandler):
    def __init__(self, delay=0.5):
        self.delay = delay
        self.pending_events = defaultdict(float)
    
    def on_modified(self, event):
        current_time = time.time()
        last_event_time = self.pending_events[event.src_path]
        
        if current_time - last_event_time > self.delay:
            self.pending_events[event.src_path] = current_time
            self._process_event(event)
    
    def _process_event(self, event):
        print(f"Processing debounced event: {event.src_path}")

Error Handling and Exceptions

A reliable monitoring system must gracefully handle various error conditions:

class RobustHandler(FileSystemEventHandler):
    def __init__(self, logger=None):
        self.logger = logger or logging.getLogger(__name__)
    
    def on_modified(self, event):
        try:
            self._safe_process_event(event)
        except PermissionError:
            self.logger.warning(f"No access to file: {event.src_path}")
        except FileNotFoundError:
            self.logger.warning(f"File not found: {event.src_path}")
        except Exception as e:
            self.logger.error(f"Unexpected error processing {event.src_path}: {e}")
    
    def _safe_process_event(self, event):
        # Safe event processing
        if os.path.exists(event.src_path):
            print(f"Processing file: {event.src_path}")

Alternatives and Comparison

Tool Language OS Support Performance Async Support Configuration Complexity
Watchdog Python Windows / Linux / macOS High Via threads Low
inotify C / Python Linux only Very high Yes Medium
fswatch C++ macOS / Linux High No High
chokidar Node.js All platforms Medium Yes Low
Polling Any All platforms Low Yes Very low

Best Practices

Code Organization

Structure your code for complex monitoring scenarios:

class FileSystemMonitor:
    def __init__(self, config):
        self.config = config
        self.observer = Observer()
        self.handlers = {}
    
    def add_handler(self, name, handler_class, path, **kwargs):
        handler = handler_class(**kwargs)
        self.handlers[name] = handler
        self.observer.schedule(handler, path, recursive=True)
    
    def start(self):
        self.observer.start()
        print("Monitoring started")
    
    def stop(self):
        self.observer.stop()
        self.observer.join()
        print("Monitoring stopped")
    
    def __enter__(self):
        self.start()
        return self
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.stop()

Configuration via Files

import json

def load_monitoring_config(config_file):
    with open(config_file, 'r') as f:
        config = json.load(f)
    
    monitor = FileSystemMonitor(config)
    
    for watch_config in config['watches']:
        handler_class = globals()[watch_config['handler_class']]
        monitor.add_handler(
            watch_config['name'],
            handler_class,
            watch_config['path'],
            **watch_config.get('options', {})
        )
    
    return monitor

Frequently Asked Questions

How to avoid duplicate events when saving files in an IDE?

Many IDEs create temporary files during save operations, which can generate multiple events. Use debouncing or filter by extensions:

class IDECompatibleHandler(FileSystemEventHandler):
    def on_modified(self, event):
        # Ignore IDE temporary files
        if any(pattern in event.src_path for pattern in ['~', '.tmp', '.swp']):
            return
        
        print(f"File modified: {event.src_path}")

Can I monitor files on network drives?

For network file systems, it’s recommended to use PollingObserver:

from watchdog.observers.polling import PollingObserver

observer = PollingObserver(timeout=1)  # Check every second
observer.schedule(handler, network_path, recursive=True)

How to handle large files without blocking?

Use asynchronous processing or separate threads for heavyweight operations:

import threading

class NonBlockingHandler(FileSystemEventHandler):
    def on_created(self, event):
        # Process in a separate thread
        thread = threading.Thread(target=self._process_large_file, args=(event.src_path,))
        thread.daemon = True
        thread.start()
    
    def _process_large_file(self, file_path):
        # Lengthy file processing logic
        pass

Does the number of watched files affect performance?

Yes, performance can degrade with a large number of files. Recommendations:

  • Filter by file extensions
  • Exclude temporary directories
  • Use recursive watching judiciously

How to monitor changes to file permissions?

On Linux, metadata changes are captured via on_modified. On Windows and macOS behavior may vary:

def on_modified(self, event):
    if not event.is_directory:
        import stat
        file_stat = os.stat(event.src_path)
        permissions = stat.filemode(file_stat.st_mode)
        print(f"Possible permission change: {permissions}")

Conclusion

The Watchdog library is a powerful and flexible tool for file system monitoring in Python applications. Its cross‑platform nature, high performance, and ease of use make it indispensable for a wide range of tasks—from development automation to production‑grade monitoring systems.

Key benefits include minimal overhead thanks to native OS mechanisms, an intuitive API, and active community support. Watchdog integrates smoothly into existing projects and can be adapted to the specific requirements of any application.

When combined with best practices such as robust error handling, performance tuning, and clean code organization, Watchdog becomes a reliable foundation for building efficient file monitoring solutions that operate consistently under diverse operating conditions.

News