How to use multithreading in Python?

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

Introduction to Multithreading and Multiprocessing in Python

When developing modern applications, it's often necessary to enhance performance. This is achieved through parallel task execution. In Python, tools like multithreading and multiprocessing are used for this purpose. Despite the Global Interpreter Lock (GIL), multithreading in Python can significantly speed up programs, especially for I/O-bound tasks such as network requests or file operations.

In this guide, we will analyze how multithreading works in Python. We'll examine the threading library, compare it with multiprocessing, and provide practical examples to help you understand how to optimize your code for faster execution.

What is Multithreading in Python?

Multithreading in Python allows you to run multiple threads within a single process. Each thread executes a portion of the code in parallel. This approach is effective for tasks that involve waiting for external resources.

Advantages of Multithreading for I/O-Bound Tasks

Multithreading accelerates programs when tasks include:

  • Network requests, such as downloading data from the internet.
  • File operations, including reading and writing.
  • Database interactions.
  • Other I/O operations where the processor is idle while waiting.

However, multithreading has limitations. Python has a Global Interpreter Lock (GIL). The GIL prevents multiple threads from executing Python bytecode simultaneously. Due to this, multithreading is not suitable for CPU-intensive tasks that require full CPU utilization.

When to Use Multithreading or Multiprocessing

The choice between multithreading and multiprocessing depends on the type of tasks. Multithreading is suitable for scenarios involving waiting, while multiprocessing is better for computations.

Recommendations by Scenario

  • Tasks with significant I/O: Use threading (threads) to avoid blocking the main thread.
  • Computationally intensive tasks: Employ multiprocessing (processes) to bypass the GIL and utilize multiple CPU cores.
  • Working with network APIs: Choose threading for efficient request handling.
  • Parallel data processing: Optimally use multiprocessing to distribute the load across processors.

This approach helps avoid inefficient resource use and speeds up program execution.

The threading Library: Basics and Examples

The threading library in Python provides tools for creating and managing threads. It is easy to use and suitable for beginners.

Simple Example of Creating a Thread

Here's a basic example of multithreading in Python:

import threading
import time

def print_numbers():
    for i in range(5):
        print(f"Number: {i}")
        time.sleep(1)

# Create a thread
thread = threading.Thread(target=print_numbers)
thread.start()

# The main thread continues
print("Main thread finished!")

In this code, the main thread finishes immediately, while the print_numbers function executes in a separate thread. This demonstrates how multithreading allows work to continue without waiting.

Passing Arguments to a Thread

To pass data to a thread function, use the args parameter:

def greet(name):
    print(f"Hello, {name}!")

thread = threading.Thread(target=greet, args=("Ivan",))
thread.start()

Here, the argument "Ivan" is passed to the greet function. This is useful for parameterizing tasks in threads.

Waiting for Threads to Complete Using join

The join method allows you to wait for a thread to finish:

thread.join()
print("Thread finished!")

This ensures that the main thread does not continue until the additional thread is complete.

Working with Multiple Threads

To run multiple threads, create a list and manage them:

def worker(number):
    print(f"Thread {number} is running")
    time.sleep(2)
    print(f"Thread {number} finished running")

threads = []
for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()

# Wait for all threads to complete
for t in threads:
    t.join()

print("All threads completed.")

This example shows how to start and synchronize multiple threads. Without join, the program might finish before the threads do.

Synchronizing Threads with Locks

When working with shared resources in multithreading, data races can occur. To prevent them, use Locks.

Example of Using a Lock

import threading
import time

lock = threading.Lock()
counter = 0

def increment():
    global counter
    with lock:
        local_counter = counter
        local_counter += 1
        time.sleep(0.1)
        counter = local_counter

threads = [threading.Thread(target=increment) for _ in range(100)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Final counter value: {counter}")

Without a Lock, the counter value may be incorrect due to concurrent access. Lock ensures sequential access, preventing errors. This is critical for applications where multiple threads modify shared variables, such as counters or lists.

Multiprocessing in Python Using multiprocessing

Multiprocessing in Python is suitable for CPU-intensive tasks. The multiprocessing module starts separate processes, bypassing the GIL and utilizing all CPU cores. Each process has its own memory, which improves performance but increases resource consumption.

Example of Using multiprocessing

import multiprocessing
import time

def heavy_computation(x):
    print(f"Processing {x}")
    time.sleep(2)
    return x * x

if __name__ == "__main__":
    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(heavy_computation, [1, 2, 3, 4])
    print(f"Results: {results}")

Here, the process pool handles computations in parallel. This is effective for tasks such as processing large datasets or mathematical calculations, where threads would not help due to the GIL.

Comparison of threading and multiprocessing

Here are the key differences between the libraries:

  • CPU Usage: threading - low due to GIL; multiprocessing - high, with full core utilization.
  • I/O Operations: threading - effective; multiprocessing - suboptimal, as processes are heavier.
  • Memory Usage: threading - low (shared memory); multiprocessing - high (separate processes).
  • Data Sharing: threading - uses locks; multiprocessing - queues or pipes.
  • GIL Limitation: Present in threading; absent in multiprocessing.

This comparison helps choose the right tool based on project requirements.

Frequently Asked Questions (FAQ)

What is GIL and How Does it Affect Multithreading in Python?

The GIL (Global Interpreter Lock) is a mechanism in CPython that limits bytecode execution to one thread at a time. It ensures memory safety but makes multithreading ineffective for computational tasks. However, the GIL does not hinder I/O operations because threads can release the lock while waiting.

Can the GIL be Completely Eliminated?

In standard CPython, this is not possible. Alternatives include interpreters like Jython or IronPython. You can also use multiprocessing to run separate processes without the GIL.

How to Pass Data Between Processes in multiprocessing?

Use a queue for data exchange:

from multiprocessing import Process, Queue

def worker(q):
    q.put("Hello from process!")

if __name__ == "__main__":
    q = Queue()
    p = Process(target=worker, args=(q,))
    p.start()
    print(q.get())
    p.join()

Queues allow safe data transfer, avoiding shared memory issues.

When Should Multiprocessing be Preferred?

Choose multiprocessing for tasks with heavy computations, such as image processing or machine learning, where you need to utilize multiple CPU cores.

Can Multithreading and Multiprocessing be Combined in One Project?

Yes, this is possible in a hybrid approach. For example, use threads for network requests inside processes for computations. This combines the advantages of both methods.

Which Module is Easier for Asynchronous Tasks?

threading is easier for beginners and suitable for I/O. multiprocessing is better for performance in computations. For asynchronous programming, consider asyncio, which uses coroutines for non-blocking code.

Conclusion

Multithreading and multiprocessing in Python are powerful tools for optimizing performance. threading is ideal for I/O-bound tasks, such as network requests or file operations. multiprocessing is suitable for resource-intensive computations where it is important to utilize all processor cores.

Choose your approach based on the type of tasks. Do not forget about synchronization, such as locks in threads or queues in processes, to ensure data safety. Experiment with examples to adapt these techniques to your projects.

News