Introduction to File System Monitoring
In modern software development, monitoring changes in the file system is a critical task. Tracking file creation, modification, deletion, and movement is required in various scenarios: from automatically running tests when code changes to monitoring logs in production.
The Watchdog library is a cross‑platform solution for Python that provides efficient and reliable file system monitoring with minimal overhead. It leverages native operating‑system mechanisms and offers a uniform API for handling file system events.
What Is Watchdog
Watchdog is a Python library for real‑time file system monitoring. It ensures cross‑platform compatibility by using different native observation mechanisms depending on the OS: inotify on Linux, FSEvents on macOS, and ReadDirectoryChangesW on Windows.
Key Advantages
The library offers several important benefits compared to custom solutions or alternative tools. First, it delivers high performance by using native system APIs instead of polling the file system. Second, Watchdog provides a consistent interface across all supported platforms, simplifying the development of cross‑platform applications.
Installation and Setup
Install the library the standard way via pip:
pip install watchdog
For additional features, including the watchmedo command‑line utility, install the extended version:
pip install watchdog[watchmedo]
The library requires no extra configuration and is ready to use immediately after installation.
Architecture and Core Components
Watchdog is built around several key components that work together to provide file system monitoring.
Observer
The Observer is the central component of the monitoring system. It starts and manages the observation process, running in a separate thread to keep the main program non‑blocking. An Observer can watch multiple paths with different event handlers simultaneously.
EventHandler
The EventHandler defines how your program should react to various file system events. Developers create custom handlers by subclassing FileSystemEventHandler and overriding the necessary methods.
Event
Events are objects that contain information about changes that occurred in the file system. Each event includes the path to the file or directory, the type of operation, and additional metadata.
Basic Usage Examples
Simple Monitoring
Here’s a basic example of creating a monitoring system:
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class BasicHandler(FileSystemEventHandler):
def on_modified(self, event):
if not event.is_directory:
print(f"File modified: {event.src_path}")
def on_created(self, event):
if not event.is_directory:
print(f"File created: {event.src_path}")
observer = Observer()
observer.schedule(BasicHandler(), path="./monitored_folder", recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
Monitoring with Logging
For more detailed event tracking, you can use built‑in logging:
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path=".", recursive=True)
observer.start()
Event Types in Detail
Watchdog provides four primary event types, each corresponding to a specific file system operation.
File and Directory Creation
The on_created() method is called when new files or directories are created:
def on_created(self, event):
if event.is_directory:
print(f"Directory created: {event.src_path}")
else:
print(f"File created: {event.src_path}")
# Additional logic can be added here
self.process_new_file(event.src_path)
Deletion of Items
The on_deleted() event fires when files or directories are removed:
def on_deleted(self, event):
print(f"Deleted {'directory' if event.is_directory else 'file'}: {event.src_path}")
# You can log deletions here
self.log_deletion(event.src_path, event.is_directory)
Content Modification
The on_modified() method reacts to changes in files or directory metadata:
def on_modified(self, event):
if not event.is_directory:
print(f"File modified: {event.src_path}")
# Example: check file size
import os
size = os.path.getsize(event.src_path)
print(f"New size: {size} bytes")
Move and Rename
The on_moved() event handles move and rename operations:
def on_moved(self, event):
print(f"Moved: {event.src_path} → {event.dest_path}")
# Update path database
self.update_file_paths(event.src_path, event.dest_path)
Advanced Features
Recursive Watching
The recursive=True flag enables monitoring of all subdirectories:
# Monitor the entire directory tree
observer.schedule(handler, path="/path/to/root", recursive=True)
# Monitor only the root directory
observer.schedule(handler, path="/path/to/root", recursive=False)
Filtering by File Extensions
For efficiency, you often need to watch only specific file types:
class FilteredHandler(FileSystemEventHandler):
def __init__(self, extensions=None):
self.extensions = extensions or ['.txt', '.py', '.md']
def _is_relevant_file(self, path):
return any(path.endswith(ext) for ext in self.extensions)
def on_modified(self, event):
if not event.is_directory and self._is_relevant_file(event.src_path):
print(f"Relevant file modified: {event.src_path}")
Pattern Matching
The library provides a built‑in class for pattern‑based filtering:
from watchdog.events import PatternMatchingEventHandler
class PatternHandler(PatternMatchingEventHandler):
def __init__(self):
super().__init__(
patterns=['*.py', '*.js', '*.css'],
ignore_patterns=['*.tmp', '*.log'],
ignore_directories=True,
case_sensitive=False
)
def on_modified(self, event):
print(f"Source code file modified: {event.src_path}")
Watchdog Methods and Functions Reference
| Class/Method | Description | Parameters | Return Value |
|---|---|---|---|
| Observer | |||
Observer() |
Creates a new observer instance | - | Observer |
schedule(handler, path, recursive) |
Registers a handler for a given path | handler, path, recursive=True | Watch |
start() |
Starts monitoring in a separate thread | - | None |
stop() |
Stops monitoring | - | None |
join(timeout) |
Waits for the observer thread to finish | timeout=None | None |
is_alive() |
Checks whether the observer is active | - | bool |
unschedule_all() |
Removes all registered handlers | - | None |
| FileSystemEventHandler | |||
on_created(event) |
Handles file/directory creation | event | None |
on_deleted(event) |
Handles file/directory deletion | event | None |
on_modified(event) |
Handles file/directory modification | event | None |
on_moved(event) |
Handles file/directory moves | event | None |
dispatch(event) |
Routes events to the appropriate methods | event | None |
| PatternMatchingEventHandler | |||
PatternMatchingEventHandler() |
Creates a handler with pattern filtering | patterns, ignore_patterns, ignore_directories, case_sensitive | Handler |
| LoggingEventHandler | |||
LoggingEventHandler() |
Creates a handler with automatic logging | logger=None | Handler |
| PollingObserver | |||
PollingObserver() |
Creates a polling‑based observer | timeout=1 | Observer |
| Events | |||
event.src_path |
Path to the source file/directory | - | str |
event.dest_path |
Path to the destination file/directory (for moves) | - | str |
event.is_directory |
True if the object is a directory | - | bool |
event.key |
Unique event key | - | tuple |
Integration with Asynchronous Code
To use Watchdog in asynchronous applications, several approaches are possible.
Using asyncio with Threads
import asyncio
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import queue
class AsyncHandler(FileSystemEventHandler):
def __init__(self, event_queue):
self.event_queue = event_queue
def on_modified(self, event):
self.event_queue.put(('modified', event.src_path))
async def process_events(event_queue):
while True:
try:
event_type, path = event_queue.get_nowait()
print(f"Async processing: {event_type} - {path}")
# Place asynchronous handling here
await asyncio.sleep(0.1)
except queue.Empty:
await asyncio.sleep(0.1)
async def main():
event_queue = queue.Queue()
handler = AsyncHandler(event_queue)
observer = Observer()
observer.schedule(handler, path=".", recursive=True)
observer.start()
await process_events(event_queue)
Practical Applications
Automatic Command Execution
import subprocess
import os
class CommandRunner(FileSystemEventHandler):
def __init__(self, commands_map):
self.commands_map = commands_map
def on_modified(self, event):
if event.is_directory:
return
for pattern, command in self.commands_map.items():
if event.src_path.endswith(pattern):
print(f"Executing command: {command}")
try:
subprocess.run(command, shell=True, check=True)
except subprocess.CalledProcessError as e:
print(f"Command execution error: {e}")
# Configure automatic command execution
commands = {
'.py': 'python -m py_compile {}',
'.css': 'sass --update styles/',
'.js': 'npm run build'
}
handler = CommandRunner(commands)
Backup System
import shutil
import os
from datetime import datetime
class BackupHandler(FileSystemEventHandler):
def __init__(self, backup_dir):
self.backup_dir = backup_dir
os.makedirs(backup_dir, exist_ok=True)
def on_modified(self, event):
if event.is_directory or event.src_path.endswith('.tmp'):
return
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = os.path.basename(event.src_path)
backup_path = os.path.join(self.backup_dir, f"{timestamp}_{filename}")
try:
shutil.copy2(event.src_path, backup_path)
print(f"Backup created: {backup_path}")
except Exception as e:
print(f"Backup creation error: {e}")
Performance and Optimization
Settings for Large Projects
When dealing with projects that contain many files, apply the following optimizations:
class OptimizedHandler(FileSystemEventHandler):
def __init__(self):
self.ignored_extensions = {'.tmp', '.swp', '.DS_Store', '.git'}
self.ignored_directories = {'node_modules', '.git', '__pycache__'}
def _should_ignore(self, path):
# Extension check
if any(path.endswith(ext) for ext in self.ignored_extensions):
return True
# Directory check
path_parts = path.split(os.sep)
if any(part in self.ignored_directories for part in path_parts):
return True
return False
def dispatch(self, event):
if not self._should_ignore(event.src_path):
super().dispatch(event)
Event Debouncing
To prevent multiple events from a single file change:
import time
from collections import defaultdict
class DebouncedHandler(FileSystemEventHandler):
def __init__(self, delay=0.5):
self.delay = delay
self.pending_events = defaultdict(float)
def on_modified(self, event):
current_time = time.time()
last_event_time = self.pending_events[event.src_path]
if current_time - last_event_time > self.delay:
self.pending_events[event.src_path] = current_time
self._process_event(event)
def _process_event(self, event):
print(f"Processing debounced event: {event.src_path}")
Error Handling and Exceptions
A reliable monitoring system must gracefully handle various error conditions:
class RobustHandler(FileSystemEventHandler):
def __init__(self, logger=None):
self.logger = logger or logging.getLogger(__name__)
def on_modified(self, event):
try:
self._safe_process_event(event)
except PermissionError:
self.logger.warning(f"No access to file: {event.src_path}")
except FileNotFoundError:
self.logger.warning(f"File not found: {event.src_path}")
except Exception as e:
self.logger.error(f"Unexpected error processing {event.src_path}: {e}")
def _safe_process_event(self, event):
# Safe event processing
if os.path.exists(event.src_path):
print(f"Processing file: {event.src_path}")
Alternatives and Comparison
| Tool | Language | OS Support | Performance | Async Support | Configuration Complexity |
|---|---|---|---|---|---|
| Watchdog | Python | Windows / Linux / macOS | High | Via threads | Low |
| inotify | C / Python | Linux only | Very high | Yes | Medium |
| fswatch | C++ | macOS / Linux | High | No | High |
| chokidar | Node.js | All platforms | Medium | Yes | Low |
| Polling | Any | All platforms | Low | Yes | Very low |
Best Practices
Code Organization
Structure your code for complex monitoring scenarios:
class FileSystemMonitor:
def __init__(self, config):
self.config = config
self.observer = Observer()
self.handlers = {}
def add_handler(self, name, handler_class, path, **kwargs):
handler = handler_class(**kwargs)
self.handlers[name] = handler
self.observer.schedule(handler, path, recursive=True)
def start(self):
self.observer.start()
print("Monitoring started")
def stop(self):
self.observer.stop()
self.observer.join()
print("Monitoring stopped")
def __enter__(self):
self.start()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.stop()
Configuration via Files
import json
def load_monitoring_config(config_file):
with open(config_file, 'r') as f:
config = json.load(f)
monitor = FileSystemMonitor(config)
for watch_config in config['watches']:
handler_class = globals()[watch_config['handler_class']]
monitor.add_handler(
watch_config['name'],
handler_class,
watch_config['path'],
**watch_config.get('options', {})
)
return monitor
Frequently Asked Questions
How to avoid duplicate events when saving files in an IDE?
Many IDEs create temporary files during save operations, which can generate multiple events. Use debouncing or filter by extensions:
class IDECompatibleHandler(FileSystemEventHandler):
def on_modified(self, event):
# Ignore IDE temporary files
if any(pattern in event.src_path for pattern in ['~', '.tmp', '.swp']):
return
print(f"File modified: {event.src_path}")
Can I monitor files on network drives?
For network file systems, it’s recommended to use PollingObserver:
from watchdog.observers.polling import PollingObserver
observer = PollingObserver(timeout=1) # Check every second
observer.schedule(handler, network_path, recursive=True)
How to handle large files without blocking?
Use asynchronous processing or separate threads for heavyweight operations:
import threading
class NonBlockingHandler(FileSystemEventHandler):
def on_created(self, event):
# Process in a separate thread
thread = threading.Thread(target=self._process_large_file, args=(event.src_path,))
thread.daemon = True
thread.start()
def _process_large_file(self, file_path):
# Lengthy file processing logic
pass
Does the number of watched files affect performance?
Yes, performance can degrade with a large number of files. Recommendations:
- Filter by file extensions
- Exclude temporary directories
- Use recursive watching judiciously
How to monitor changes to file permissions?
On Linux, metadata changes are captured via on_modified. On Windows and macOS behavior may vary:
def on_modified(self, event):
if not event.is_directory:
import stat
file_stat = os.stat(event.src_path)
permissions = stat.filemode(file_stat.st_mode)
print(f"Possible permission change: {permissions}")
Conclusion
The Watchdog library is a powerful and flexible tool for file system monitoring in Python applications. Its cross‑platform nature, high performance, and ease of use make it indispensable for a wide range of tasks—from development automation to production‑grade monitoring systems.
Key benefits include minimal overhead thanks to native OS mechanisms, an intuitive API, and active community support. Watchdog integrates smoothly into existing projects and can be adapted to the specific requirements of any application.
When combined with best practices such as robust error handling, performance tuning, and clean code organization, Watchdog becomes a reliable foundation for building efficient file monitoring solutions that operate consistently under diverse operating conditions.
The Future of AI in Mathematics and Everyday Life: How Intelligent Agents Are Already Changing the Game
Experts warned about the risks of fake charity with AI
In Russia, universal AI-agent for robots and industrial processes was developed