PyDub - Audio processing

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

Introduction

Audio processing is a crucial component of modern applications: from creating sound effects and editing podcasts to developing voice assistants and comprehensive multimedia systems. In the Python ecosystem, one of the most versatile and powerful tools for these tasks is the pydub library.

pydub offers developers a broad range of capabilities for working with audio files: trimming and concatenating tracks, converting between formats, applying sound effects, precise volume control, and even playback. At the same time, the library maintains simplicity and an intuitive API, making it accessible to both beginners and experienced developers.

In this article we will explore pydub in depth, review its methods and functions, analyze typical use‑cases, discuss common pitfalls and their solutions, and examine integration with external tools such as ffmpeg.

What Is the pydub Library?

Architecture and Core Principles

pydub is built around the AudioSegment concept – the primary class for representing audio data. This class encapsulates all essential metadata (sample rate, channel count, sample width) and provides methods for manipulation.

The library follows an “immutability” principle – most operations return new AudioSegment objects without altering the original. This leads to predictable code and easier debugging.

Dependencies and Compatibility

pydub relies on external libraries for decoding and encoding audio formats:

  • ffmpeg – the main decoder for most formats
  • simpleaudio or pyaudio – for audio playback
  • scipy – for additional mathematical operations (optional)

Installation and Configuration of pydub

Basic Installation

Install the library using the standard pip command:

pip install pydub

Setting Up ffmpeg

To work with a wide range of audio formats you need to install ffmpeg:

Windows:

  1. Download ffmpeg from the official website
  2. Extract the archive to a convenient folder
  3. Add the path to the executable to the PATH environment variable

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install ffmpeg

macOS:

brew install ffmpeg

Installing Additional Playback Dependencies

For audio playback install one of the following libraries:

pip install simpleaudio
# or
pip install pyaudio

Getting Started with pydub

Creating and Loading AudioSegment Objects

pydub provides several ways to create AudioSegment instances:

from pydub import AudioSegment

# Load with automatic format detection
sound = AudioSegment.from_file("audio.mp3")

# Load with explicit format
mp3_audio = AudioSegment.from_mp3("track.mp3")
wav_audio = AudioSegment.from_wav("recording.wav")
ogg_audio = AudioSegment.from_ogg("sound.ogg")

# Create silence
silence = AudioSegment.silent(duration=5000)  # 5 seconds of silence

# Generate a sine wave
sine_wave = AudioSegment.sine(440)  # 440 Hz, 1 second

Supported Formats

Thanks to ffmpeg, pydub supports a wide variety of audio formats:

  • MP3 – the most popular compressed format
  • WAV – uncompressed high‑quality format
  • OGG – open‑source compression format
  • FLAC – lossless compression
  • M4A/AAC – Apple’s audio format
  • WMA – Microsoft’s audio format
  • AIFF – Apple’s uncompressed audio format

Retrieving Audio Metadata

# Core properties
print(f"Duration: {len(sound)} ms")
print(f"Duration (seconds): {sound.duration_seconds}")
print(f"Sample rate: {sound.frame_rate} Hz")
print(f"Channels: {sound.channels}")
print(f"Sample width: {sound.sample_width} bytes")
print(f"Average loudness: {sound.dBFS:.2f} dBFS")
print(f"Peak level: {sound.max_dBFS:.2f} dBFS")

Basic Audio Operations

Trimming and Slicing

# Trim by milliseconds
first_10_seconds = sound[:10000]
middle_part = sound[5000:15000]
last_5_seconds = sound[-5000:]

# Trim using time markers
from_minute_2 = sound[2*60*1000:]  # from 2 min to end

Concatenating Audio Segments

# Simple concatenation
combined = sound1 + sound2 + sound3

# Concatenation with crossfade
combined_crossfade = sound1.append(sound2, crossfade=1000)

# Insert silence between tracks
with_pause = sound1 + AudioSegment.silent(duration=2000) + sound2

Repeating and Looping

# Repeat a track
triple_track = sound * 3

# Build a loop of a specific length
loop_duration = 30000  # 30 seconds
loops_needed = loop_duration // len(sound) + 1
looped = (sound * loops_needed)[:loop_duration]

Volume and Dynamics Control

Adjusting Loudness

# Change volume in decibels
quieter = sound - 10    # reduce by 10 dB
louder = sound + 6      # increase by 6 dB

# Apply precise gain
amplified = sound.apply_gain(-3.5)  # reduce by 3.5 dB

# Normalize to the maximum level
normalized = sound.normalize()

Fade‑In and Fade‑Out Effects

# Smooth fade‑in and fade‑out
faded = sound.fade_in(2000).fade_out(3000)

# Create a seamless transition between tracks
transition = sound1.fade_out(1500).overlay(sound2.fade_in(1500),
                                         position=len(sound1)-1500)

Overlay and Mixing

Basic Overlay

# Overlay from the start of the track
overlayed = background.overlay(voice)

# Overlay at a specific position
overlayed_positioned = background.overlay(sound_effect, position=5000)

# Overlay with repeated short sound
repeated_overlay = background.overlay(beep * 10, position=1000)

Mixing with Volume Control

# Mix with level adjustments
music_quiet = background_music - 15  # lower background music
voice_clear = voice_track + 3        # raise voice
mixed = music_quiet.overlay(voice_clear)

Audio Effects and Filtering

Frequency Filtering

# Low‑pass filter (remove high frequencies)
bass_only = sound.low_pass_filter(300)

# High‑pass filter (remove low frequencies)
treble_only = sound.high_pass_filter(2000)

# Band‑pass filter (combine)
mid_range = sound.high_pass_filter(300).low_pass_filter(3000)

Altering Audio Characteristics

# Change sample rate
resampled = sound.set_frame_rate(44100)

# Convert to mono
mono_sound = stereo_sound.set_channels(1)

# Change bit depth
sound_16bit = sound.set_sample_width(2)  # 16‑bit
sound_24bit = sound.set_sample_width(3)  # 24‑bit

Reverse and Other Effects

# Play backwards
reversed_sound = sound.reverse()

# Add echo effect
echo_delay = 500  # ms
echo_volume = -10  # dB
with_echo = sound.overlay(sound.apply_gain(echo_volume), position=echo_delay)

Exporting and Saving Audio

Basic Export

# Export to various formats
sound.export("output.wav", format="wav")
sound.export("output.mp3", format="mp3")
sound.export("output.ogg", format="ogg")

Export with Additional Parameters

# MP3 with specific bitrate
sound.export("high_quality.mp3", format="mp3", bitrate="320k")

# WAV with custom ffmpeg arguments
sound.export("custom.wav", format="wav",
           parameters=["-ar", "48000", "-ac", "2"])

# Export a slice of the file
sound[10000:20000].export("excerpt.mp3", format="mp3")

Export to a Byte Stream

import io

# Export to memory
buffer = io.BytesIO()
sound.export(buffer, format="mp3")
buffer.seek(0)  # reset for reading

Audio Playback

Simple Playback

from pydub.playback import play

# Play the whole track
play(sound)

# Play a segment
play(sound[5000:15000])

Playback Configuration

# Playback using a specific player
from pydub.playback import play
import simpleaudio

# Convert for compatibility
playback_sound = sound.set_frame_rate(44100).set_channels(2).set_sample_width(2)
play(playback_sound)

Comprehensive Table of pydub Methods and Functions

Category Method / Function Description Example
Loading & Creation AudioSegment.from_file(file, format=None) Loads an audio file with automatic or explicit format detection sound = AudioSegment.from_file("audio.mp3")
  AudioSegment.from_mp3(file) Loads an MP3 file mp3_sound = AudioSegment.from_mp3("track.mp3")
  AudioSegment.from_wav(file) Loads a WAV file wav_sound = AudioSegment.from_wav("audio.wav")
  AudioSegment.from_ogg(file) Loads an OGG file ogg_sound = AudioSegment.from_ogg("audio.ogg")
  AudioSegment.silent(duration) Creates silence of the specified duration (ms) silence = AudioSegment.silent(duration=5000)
  AudioSegment.sine(freq, duration) Generates a sine wave tone = AudioSegment.sine(440, duration=1000)
Core Operations sound1 + sound2 Sequential concatenation of audio segments combined = intro + main_track + outro
  sound * n Repeats the sound n times loop = sound * 3
  sound[start:end] Slices audio by time markers (ms) excerpt = sound[1000:5000]
  sound.append(segment, crossfade=0) Adds a segment with optional crossfade result = sound1.append(sound2, crossfade=500)
  sound.overlay(segment, position=0) Overlays a sound at the given position (ms) mixed = background.overlay(voice, position=1000)
  sound.reverse() Reverses the audio (playback backwards) backwards = sound.reverse()
Properties & Metadata len(sound) Duration in milliseconds duration_ms = len(sound)
  sound.duration_seconds Duration in seconds duration_s = sound.duration_seconds
  sound.frame_rate Sample rate in Hz sample_rate = sound.frame_rate
  sound.channels Number of channels channel_count = sound.channels
  sound.sample_width Sample width in bytes bit_depth = sound.sample_width
  sound.dBFS Average loudness in dBFS volume_level = sound.dBFS
  sound.max_dBFS Peak loudness in dBFS peak_level = sound.max_dBFS
Volume Control sound + dB Increase loudness by dB decibels louder = sound + 6
  sound - dB Decrease loudness by dB decibels quieter = sound - 10
  sound.apply_gain(dB) Apply gain or attenuation adjusted = sound.apply_gain(-3.5)
  sound.normalize(headroom=0.1) Normalize to maximum level with optional headroom normalized = sound.normalize()
Effects sound.fade_in(duration) Smooth fade‑in faded_in = sound.fade_in(2000)
  sound.fade_out(duration) Smooth fade‑out faded_out = sound.fade_out(1500)
  sound.low_pass_filter(cutoff) Low‑pass filter bass = sound.low_pass_filter(300)
  sound.high_pass_filter(cutoff) High‑pass filter treble = sound.high_pass_filter(2000)
Conversion sound.set_frame_rate(rate) Change sample rate resampled = sound.set_frame_rate(44100)
  sound.set_channels(count) Set number of channels mono = stereo.set_channels(1)
  sound.set_sample_width(bytes) Change sample bit depth sound_16bit = sound.set_sample_width(2)
Export sound.export(file, format, **kwargs) Export audio to a file sound.export("output.mp3", format="mp3")
Analysis sound.get_array_of_samples() Return a NumPy‑compatible array of samples samples = sound.get_array_of_samples()
  sound.split_to_mono() Split stereo into mono channels left, right = stereo.split_to_mono()
Playback play(sound) Play an AudioSegment play(sound)

Practical Examples and Use Cases

Automated Podcast Processing

def process_podcast(intro_file, main_files, outro_file, output_file):
    """Automatically build a podcast with normalized volume"""
    
    # Load components
    intro = AudioSegment.from_file(intro_file).normalize()
    outro = AudioSegment.from_file(outro_file).normalize()
    
    # Concatenate main parts
    main_content = AudioSegment.empty()
    for file in main_files:
        segment = AudioSegment.from_file(file).normalize()
        # Add a pause between segments
        main_content += segment + AudioSegment.silent(duration=1000)
    
    # Final assembly
    podcast = intro + main_content + outro
    
    # Export with optimal settings
    podcast.export(output_file, format="mp3", bitrate="128k")
    
    return len(podcast) // 1000  # duration in seconds

Creating an Audiobook from Separate Chapters

def create_audiobook(chapter_files, output_file, silence_between=2000):
    """Combine chapters into an audiobook with normalization"""
    
    audiobook = AudioSegment.empty()
    
    for i, chapter_file in enumerate(chapter_files):
        print(f"Processing chapter {i+1}")
        
        chapter = AudioSegment.from_file(chapter_file)
        
        # Normalize loudness
        chapter = chapter.normalize()
        
        # Add smooth fades
        if i == 0:
            chapter = chapter.fade_in(1000)
        if i == len(chapter_files) - 1:
            chapter = chapter.fade_out(2000)
        
        audiobook += chapter
        
        # Insert silence between chapters (except after the last)
        if i < len(chapter_files) - 1:
            audiobook += AudioSegment.silent(duration=silence_between)
    
    # Export in high quality
    audiobook.export(output_file, format="mp3", bitrate="192k")
    
    return {
        'duration_minutes': audiobook.duration_seconds / 60,
        'file_size_mb': len(audiobook.raw_data) / (1024 * 1024),
        'chapters_count': len(chapter_files)
    }

Silence Detection and Removal

def remove_silence(audio_file, silence_threshold=-50, min_silence_len=1000):
    """Strip long pauses from an audio file"""
    from pydub.silence import split_on_silence
    
    audio = AudioSegment.from_file(audio_file)
    
    # Split on silence
    chunks = split_on_silence(
        audio,
        min_silence_len=min_silence_len,
        silence_thresh=silence_threshold,
        keep_silence=500  # keep 500 ms of padding
    )
    
    # Re‑assemble without long gaps
    result = AudioSegment.empty()
    for chunk in chunks:
        result += chunk + AudioSegment.silent(duration=200)
    
    return result

Generating Custom Sound Effects

def create_alarm_sound(base_freq=800, duration=5000):
    """Generate an alarm tone with a rising frequency"""
    
    alarm = AudioSegment.empty()
    beep_duration = 200
    pause_duration = 100
    
    # Build a series of beeps with increasing pitch
    for i in range(duration // (beep_duration + pause_duration)):
        freq = base_freq + (i * 50)  # raise frequency each step
        
        beep = AudioSegment.sine(freq, duration=beep_duration)
        beep = beep.fade_in(20).fade_out(20)  # smooth edges
        
        alarm += beep + AudioSegment.silent(duration=pause_duration)
    
    return alarm[:duration]  # trim to exact length

Integration with Other Libraries

Using NumPy for Signal Analysis

import numpy as np
import matplotlib.pyplot as plt

def analyze_audio_spectrum(audio_segment):
    """Compute and plot the frequency spectrum of an audio segment"""
    
    # Get raw samples
    samples = audio_segment.get_array_of_samples()
    audio_data = np.array(samples)
    
    # If stereo, take the left channel
    if audio_segment.channels == 2:
        audio_data = audio_data.reshape((-1, 2))
        audio_data = audio_data[:, 0]
    
    # FFT for spectrum analysis
    fft = np.fft.fft(audio_data)
    freqs = np.fft.fftfreq(len(fft), 1/audio_segment.frame_rate)
    
    # Plot
    plt.figure(figsize=(12, 6))
    plt.plot(freqs[:len(freqs)//2], np.abs(fft[:len(fft)//2]))
    plt.xlabel('Frequency (Hz)')
    plt.ylabel('Amplitude')
    plt.title('Audio Frequency Spectrum')
    plt.grid(True)
    plt.show()
    
    return freqs, fft

Integrating SciPy for Advanced Processing

from scipy import signal
import numpy as np

def apply_custom_filter(audio_segment, filter_type='bandpass', lowcut=300, highcut=3400):
    """Apply a custom SciPy filter to an AudioSegment"""
    
    # Convert to NumPy array
    samples = np.array(audio_segment.get_array_of_samples())
    
    if audio_segment.channels == 2:
        samples = samples.reshape((-1, 2))
    
    # Filter design
    nyquist = audio_segment.frame_rate / 2
    low = lowcut / nyquist
    high = highcut / nyquist
    
    if filter_type == 'bandpass':
        b, a = signal.butter(5, [low, high], btype='band')
    elif filter_type == 'lowpass':
        b, a = signal.butter(5, high, btype='low')
    elif filter_type == 'highpass':
        b, a = signal.butter(5, low, btype='high')
    
    # Apply filter
    if audio_segment.channels == 1:
        filtered = signal.filtfilt(b, a, samples)
    else:
        filtered = np.column_stack([
            signal.filtfilt(b, a, samples[:, 0]),
            signal.filtfilt(b, a, samples[:, 1])
        ])
    
    # Back to AudioSegment
    filtered = filtered.astype(samples.dtype)
    filtered_audio = audio_segment._spawn(filtered.tobytes())
    
    return filtered_audio

Common Errors and Solutions

Error: “Couldn't find ffmpeg or avconv”

Cause: ffmpeg is not installed or not on the system PATH.
Fix:

# Verify ffmpeg installation
import subprocess
try:
    subprocess.run(["ffmpeg", "-version"], capture_output=True, check=True)
    print("ffmpeg is installed correctly")
except FileNotFoundError:
    print("ffmpeg not found. Install ffmpeg and add it to PATH")

Error: “CouldntDecodeError” when loading a file

Cause: Corrupted file or unsupported format.
Fix:

def safe_load_audio(file_path):
    """Robust audio loading with error handling"""
    try:
        return AudioSegment.from_file(file_path)
    except Exception as e:
        print(f"Failed to load {file_path}: {e}")
        # Retry with explicit format hint
        try:
            fmt = file_path.split('.')[-1].lower()
            return AudioSegment.from_file(file_path, format=fmt)
        except Exception as e2:
            print(f"Second attempt failed: {e2}")
            return None

Memory Issues with Large Files

Solution:

def process_large_file(file_path, chunk_size=60000):
    """Process a large audio file in chunks (default 60 s)"""
    audio = AudioSegment.from_file(file_path)
    
    processed_chunks = []
    
    for start in range(0, len(audio), chunk_size):
        chunk = audio[start:start + chunk_size]
        # Example processing: normalize each chunk
        processed_chunk = chunk.normalize()
        processed_chunks.append(processed_chunk)
    
    # Concatenate processed chunks
    result = sum(processed_chunks)
    return result

Playback Error: “No playback software found”

Solution:

# Install a playback backend
# pip install simpleaudio

# Fallback: play via a temporary file
import tempfile
import os
import subprocess

def play_with_system_player(audio_segment):
    """Play audio using the OS default player"""
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
        audio_segment.export(tmp.name, format="wav")
        
        if os.name == 'nt':  # Windows
            os.system(f"start {tmp.name}")
        elif os.name == 'posix':  # Linux/macOS
            player = "xdg-open" if "linux" in os.uname().sysname.lower() else "open"
            subprocess.run([player, tmp.name])

Performance Optimization

Efficient Memory Usage

# Inefficient: load entire file into memory
large_audio = AudioSegment.from_file("huge_file.wav")
processed = large_audio.normalize()

# Efficient: process in chunks
def process_efficiently(file_path, output_path):
    audio = AudioSegment.from_file(file_path)
    
    chunk_size = 30000  # 30 seconds
    
    with open(output_path, 'wb') as out_file:
        for i in range(0, len(audio), chunk_size):
            chunk = audio[i:i + chunk_size]
            processed_chunk = chunk.normalize()
            
            # Export chunk to a temporary buffer
            buffer = io.BytesIO()
            processed_chunk.export(buffer, format="wav")
            
            if i == 0:
                # First chunk includes WAV header
                out_file.write(buffer.getvalue())
            else:
                # Subsequent chunks: skip header (44 bytes)
                buffer.seek(44)
                out_file.write(buffer.read())

Caching Results

from functools import lru_cache
import hashlib

@lru_cache(maxsize=10)
def cached_process_audio(file_path, operation):
    """Cache processed audio to avoid redundant work"""
    audio = AudioSegment.from_file(file_path)
    
    if operation == 'normalize':
        return audio.normalize()
    elif operation == 'fade':
        return audio.fade_in(1000).fade_out(1000)
    
    return audio

Advanced Techniques and Tricks

Dynamic Equalizer Creation

def apply_eq_curve(audio, eq_points):
    """
    Apply a custom EQ curve.
    eq_points: list of (frequency, gain_dB) tuples.
    """
    result = audio
    
    for freq, gain in eq_points:
        if gain != 0:
            # Narrow band filter around the target frequency
            filtered = result.high_pass_filter(freq * 0.7).low_pass_filter(freq * 1.4)
            
            if gain > 0:
                boosted = filtered.apply_gain(gain)
                result = result.overlay(boosted)
            else:
                # Attenuation: subtract the signal
                attenuated = filtered.apply_gain(abs(gain))
                # Invert phase and overlay
                inverted = AudioSegment(
                    attenuated.raw_data,
                    frame_rate=attenuated.frame_rate,
                    sample_width=attenuated.sample_width,
                    channels=attenuated.channels
                )
                result = result.overlay(inverted)
    
    return result

Automatic Volume Normalization for a Collection

def auto_normalize_collection(audio_files, target_dBFS=-20):
    """Normalize a batch of audio files to a target loudness"""
    
    normalized_files = []
    
    for file_path in audio_files:
        audio = AudioSegment.from_file(file_path)
        
        # Compute required gain
        current_dBFS = audio.dBFS
        gain_needed = target_dBFS - current_dBFS
        
        # Apply gain
        normalized = audio.apply_gain(gain_needed)
        
        # Save
        output_path = file_path.replace('.', '_normalized.')
        normalized.export(output_path, format="mp3", bitrate="192k")
        
        normalized_files.append({
            'original': file_path,
            'normalized': output_path,
            'gain_applied': gain_needed
        })
    
    return normalized_files

Conclusion

The pydub library is a powerful and flexible tool for audio processing in Python, combining ease of use with a rich feature set. Its AudioSegment-centric architecture provides an intuitive API for both basic operations (trimming, concatenation, volume adjustment) and more complex sound‑processing tasks.

Key advantages include broad format support via ffmpeg, a clean syntax for chaining operations, seamless integration with scientific Python libraries (NumPy, SciPy) for advanced signal analysis, cross‑platform compatibility, and an active developer community.

Whether you are building podcasts, audiobooks, music apps, or speech‑recognition pipelines, pydub offers a solid foundation. Keep in mind its limitations—full‑file loading into memory and lack of real‑time processing—and complement it with specialized libraries when needed.

For maximum efficiency, pair pydub with tools like librosa for music analysis, scipy for advanced mathematics, and numpy for array manipulation.

Thanks to ongoing development and comprehensive documentation, pydub remains one of the top choices for audio processing tasks in Python, providing developers with a reliable base for creating audio applications of any complexity.

News