MatPlotlib - data visualization

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

The Ultimate Guide to Matplotlib: From Basics to Advanced Techniques

Matplotlib is a powerful Python library for creating static, animated, and interactive data visualizations. Developed by John Hunter in 2003, it has become the foundation for most other visualization libraries in Python and remains the industry standard for creating high-quality graphics.

What is Matplotlib and Why Use It?

Matplotlib is a comprehensive library for creating two-dimensional and three-dimensional graphs of any complexity. It provides a programming interface similar to MATLAB, making it familiar to users of this popular mathematical environment.

Key Advantages of Matplotlib:

  • Versatility: Supports over 20 different types of graphs.
  • Ease of Learning: Intuitive API for a quick start.
  • Customization Flexibility: Complete control over every element of visualization.
  • Integration with NumPy and Pandas: Seamless compatibility with core data libraries.
  • Multiple Export Formats: PNG, SVG, PDF, EPS, PGF, and more.
  • Cross-Platform Compatibility: Works on Windows, macOS, and Linux.
  • Active Community: Regular updates and extensive documentation.

Installation and Initial Setup

Installing Matplotlib is done in the standard way via pip:

pip install matplotlib

For use in Jupyter Notebook, it is also recommended to install:

pip install matplotlib jupyter

Standard library import:

import matplotlib.pyplot as plt
import numpy as np  # often used with matplotlib

Using the alias plt is a common practice in the Python community.

Architecture and Main Components

Matplotlib is built on a three-layer architecture:

  • Backend layer: Responsible for rendering graphics.
  • Artist layer: Manages graphical objects.
  • Scripting layer: pyplot - a high-level interface.

Main Elements of a Graph:

  • Figure: Top-level container for all elements.
  • Axes: Area for plotting the graph.
  • Axis: Coordinate lines (X, Y, Z).
  • Artist: All visual elements.

Creating Your First Graph

import matplotlib.pyplot as plt

# Simple data
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 5, 3]

# Creating a graph
plt.plot(x, y)
plt.title("My First Graph")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.grid(True)
plt.show()

Types of Graphs and Their Applications

Line Plot

Ideal for displaying trends and changes over time:

# Creating data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

plt.plot(x, y1, label='sin(x)', color='blue', linewidth=2)
plt.plot(x, y2, label='cos(x)', linestyle='--', color='red')
plt.legend()
plt.title('Trigonometric Functions')
plt.show()

Scatter Plot

Shows relationships between variables:

# Generating random data
np.random.seed(42)
x = np.random.randn(100)
y = 2 * x + np.random.randn(100)

plt.scatter(x, y, alpha=0.6, c='purple')
plt.xlabel('X Variable')
plt.ylabel('Y Variable')
plt.title('Correlation between X and Y')
plt.show()

Histogram

Displays the distribution of data:

# Normal distribution
data = np.random.normal(100, 15, 1000)

plt.hist(data, bins=30, alpha=0.7, color='skyblue', edgecolor='black')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Data Distribution')
plt.show()

Bar Chart

Comparison of categorical data:

categories = ['A', 'B', 'C', 'D', 'E']
values = [23, 45, 56, 78, 32]

plt.bar(categories, values, color=['red', 'blue', 'green', 'orange', 'purple'])
plt.title('Sales by Category')
plt.ylabel('Quantity')
plt.show()

Pie Chart

Shows proportions of a whole:

labels = ['Python', 'Java', 'JavaScript', 'C++', 'Others']
sizes = [35, 25, 20, 15, 5]
colors = ['gold', 'lightcoral', 'lightskyblue', 'lightgreen', 'pink']

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90)
plt.title('Popularity of Programming Languages')
plt.axis('equal')
plt.show()

Customizing Styles and Appearance

Working with Colors

# Different ways to specify colors
plt.plot(x, y1, color='red')           # color name
plt.plot(x, y2, color='#FF5733')       # HEX code
plt.plot(x, y3, color=(0.1, 0.2, 0.5)) # RGB values
plt.plot(x, y4, c='b')                 # short notation

Line Styles and Markers

# Different styles
plt.plot(x, y1, linestyle='-', marker='o', markersize=8)    # solid with circles
plt.plot(x, y2, linestyle='--', marker='s', markersize=6)   # dashed with squares
plt.plot(x, y3, linestyle=':', marker='^', markersize=10)   # dotted with triangles

Applying Predefined Styles

# Available styles
print(plt.style.available)

# Applying a style
plt.style.use('seaborn-v0_8')
# or
plt.style.use('ggplot')

Creating Complex Layouts

Subplots

# Creating a grid of subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Filling each subplot
axes[0, 0].plot(x, np.sin(x))
axes[0, 0].set_title('sin(x)')

axes[0, 1].plot(x, np.cos(x), 'r--')
axes[0, 1].set_title('cos(x)')

axes[1, 0].scatter(x[:50], np.random.randn(50))
axes[1, 0].set_title('Scatter plot')

axes[1, 1].hist(np.random.randn(1000), bins=30)
axes[1, 1].set_title('Histogram')

plt.tight_layout()
plt.show()

Advanced Visualization Techniques

Annotations and Text

plt.plot(x, y)
plt.annotate('Maximum', xy=(3, 5), xytext=(4, 6),
            arrowprops=dict(arrowstyle='->', color='red'),
            fontsize=12, color='red')
plt.text(1, 4, 'Important Point', fontsize=10, bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow"))

Axis Settings

# Logarithmic axes
plt.semilogy(x, np.exp(x))  # Y - logarithmic
plt.semilogx(x, x**2)       # X - logarithmic
plt.loglog(x, x**2)         # both axes logarithmic

# Setting limits and labels
plt.xlim(0, 10)
plt.ylim(-2, 2)
plt.xticks(np.arange(0, 11, 2))
plt.yticks([-2, -1, 0, 1, 2])

Working with Data from Files

Integration with Pandas

import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'],
    'Sales': [120, 150, 180, 200, 240],
    'Profit': [30, 45, 55, 70, 85]
})

# Building a graph directly from DataFrame
df.plot(x='Month', y=['Sales', 'Profit'], kind='bar')
plt.title('Sales and Profit Dynamics')
plt.show()

Interactivity and Animation

Interactive Graphs

# For Jupyter Notebook
# %matplotlib widget

# Creating an interactive graph
# fig, ax = plt.subplots()
# ax.plot(x, y)
# plt.show()

Simple Animation

from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots()
x = np.linspace(0, 2*np.pi, 100)
line, = ax.plot(x, np.sin(x))

def animate(frame):
    line.set_ydata(np.sin(x + frame/10))
    return line,

anim = FuncAnimation(fig, animate, frames=100, interval=50, blit=True)
plt.show()

Saving and Exporting Graphs

# Different formats and settings
plt.savefig('graph.png', dpi=300, bbox_inches='tight', facecolor='white')
plt.savefig('graph.pdf', format='pdf', bbox_inches='tight')
plt.savefig('graph.svg', format='svg', bbox_inches='tight')

# Setting quality and size
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.savefig('high_quality.png', dpi=600, bbox_inches='tight', 
            facecolor='white', edgecolor='none')

Performance Optimization

# For large data
plt.plot(x, y, rasterized=True)  # rasterization for vector formats
plt.savefig('large_data.pdf', rasterized=True)

# Disabling interactivity for batch processing
plt.ioff()  # disable interactive mode
# ... graph creation ...
plt.ion()   # enable back

Table of Basic Matplotlib Functions and Methods

Category Function Description
Creating Figures plt.figure(figsize=(8,6)) Create a new figure
  plt.subplots(rows, cols) Create a grid of subplots
  plt.subplot(rows, cols, index) Add a subplot
Basic Graphs plt.plot(x, y) Line graph
  plt.scatter(x, y) Scatter plot
  plt.bar(x, height) Bar chart
  plt.hist(data, bins) Histogram
  plt.pie(sizes, labels) Pie chart
  plt.boxplot(data) Box plot
Axis Settings plt.xlabel("text") X-axis label
  plt.ylabel("text") Y-axis label
  plt.title("text") Title
  plt.xlim(min, max) X-axis limits
  plt.ylim(min, max) Y-axis limits
  plt.xticks(ticks, labels) X-axis ticks
  plt.grid(True) Enable grid
Styling plt.legend() Legend
  plt.legend(loc='best') Legend with position
  plt.style.use('style') Apply style
  plt.tight_layout() Automatic layout
Text & Annotations plt.text(x, y, "text") Add text
  plt.annotate("text", xy=(x,y)) Annotation with arrow
Display plt.show() Show graph
  plt.savefig("file.png") Save to file
  plt.clf() Clear figure
  plt.close() Close figure
Colors & Styles color='red' Element color
  linestyle='--' Line style
  marker='o' Marker type
  linewidth=2 Line width
  alpha=0.7 Transparency
3D Graphs ax.plot3D(x, y, z) 3D line
  ax.scatter3D(x, y, z) 3D points
  ax.plot_surface(X, Y, Z) 3D surface
Special plt.imshow(data) Display matrix as image
  plt.colorbar() Color scale
  plt.contour(X, Y, Z) Contour plot

Practical Use Cases

Time Series Analysis

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Creating a time series
dates = pd.date_range(start='2023-01-01', periods=365, freq='D')
values = np.cumsum(np.random.randn(365)) + 100

plt.figure(figsize=(12, 6))
plt.plot(dates, values, linewidth=2)
plt.title('Time Series for 2023')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Creating a Dashboard

# Creating a complex dashboard
fig = plt.figure(figsize=(15, 10))

# Graph 1: Line
ax1 = plt.subplot(2, 3, 1)
plt.plot(x, np.sin(x), 'b-', label='sin(x)')
plt.title('Trigonometric Function')
plt.legend()

# Graph 2: Histogram
ax2 = plt.subplot(2, 3, 2)
data = np.random.normal(0, 1, 1000)
plt.hist(data, bins=30, alpha=0.7, color='green')
plt.title('Distribution')

# Graph 3: Scatter
ax3 = plt.subplot(2, 3, 3)
x_scatter = np.random.randn(100)
y_scatter = 2 * x_scatter + np.random.randn(100)
plt.scatter(x_scatter, y_scatter, alpha=0.6)
plt.title('Correlation')

# Graph 4: Bar chart
ax4 = plt.subplot(2, 3, 4)
categories = ['A', 'B', 'C', 'D']
values = [23, 45, 56, 78]
plt.bar(categories, values, color=['red', 'blue', 'green', 'orange'])
plt.title('Categories')

# Graph 5: Pie chart
ax5 = plt.subplot(2, 3, 5)
sizes = [30, 25, 20, 25]
plt.pie(sizes, labels=categories, autopct='%1.1f%%')
plt.title('Shares')

# Graph 6: Box plot
ax6 = plt.subplot(2, 3, 6)
data_box = [np.random.normal(0, std, 100) for std in range(1, 4)]
plt.boxplot(data_box)
plt.title('Distributions')

plt.tight_layout()
plt.show()

Working with Large Datasets

# Optimization for large data arrays
def plot_large_data(x, y, max_points=10000):
    if len(x) > max_points:
        # Data thinning
        step = len(x) // max_points
        x_reduced = x[::step]
        y_reduced = y[::step]
    else:
        x_reduced, y_reduced = x, y
    
    plt.plot(x_reduced, y_reduced, rasterized=True)
    plt.title(f'Graph with {len(x)} points (showing {len(x_reduced)})')
    plt.show()

# Usage example
large_x = np.linspace(0, 100, 1000000)
large_y = np.sin(large_x) + np.random.normal(0, 0.1, 1000000)
plot_large_data(large_x, large_y)

Integration with Other Libraries

Working with NumPy

# Creating a 2D array for a heatmap
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))

plt.imshow(Z, extent=[-5, 5, -5, 5], cmap='viridis', origin='lower')
plt.colorbar(label='Value')
plt.title('Heatmap of Function')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

Working with Pandas

# Creating a complex DataFrame
np.random.seed(42)
df = pd.DataFrame({
    'date': pd.date_range('2023-01-01', periods=100, freq='D'),
    'value1': np.random.randn(100).cumsum(),
    'value2': np.random.randn(100).cumsum(),
    'category': np.random.choice(['A', 'B', 'C'], 100)
})

# Grouping and visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Time series
axes[0, 0].plot(df['date'], df['value1'], label='Value 1')
axes[0, 0].plot(df['date'], df['value2'], label='Value 2')
axes[0, 0].set_title('Time Series')
axes[0, 0].legend()
axes[0, 0].tick_params(axis='x', rotation=45)

# Histogram by category
df.groupby('category')['value1'].mean().plot(kind='bar', ax=axes[0, 1])
axes[0, 1].set_title('Average by Category')

# Scatter plot
axes[1, 0].scatter(df['value1'], df['value2'], alpha=0.6)
axes[1, 0].set_title('Correlation of Values')
axes[1, 0].set_xlabel('Value 1')
axes[1, 0].set_ylabel('Value 2')

# Box plot
df.boxplot(column=['value1', 'value2'], ax=axes[1, 1])
axes[1, 1].set_title('Distribution of Values')

plt.tight_layout()
plt.show()

Frequently Asked Questions and Solutions

How to Change the Font Size?

plt.rcParams.update({'font.size': 14})
# or for a specific element
plt.title('Title', fontsize=16)
plt.xlabel('X Axis', fontsize=12)

How to Create a Legend Outside the Graph Area?

plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()

How to Save a Graph with High Resolution?

plt.savefig('high_res_plot.png', dpi=300, bbox_inches='tight', 
            facecolor='white', edgecolor='none')

How to Create a Transparent Background?

plt.savefig('transparent_plot.png', transparent=True, bbox_inches='tight')

Advanced Techniques and Tips

Creating Custom Color Palettes

from matplotlib.colors import LinearSegmentedColormap

# Creating a custom palette
colors = ['#FF0000', '#FFFF00', '#00FF00', '#00FFFF', '#0000FF']
n_bins = 100
cmap = LinearSegmentedColormap.from_list('custom', colors, N=n_bins)

plt.imshow(np.random.rand(10, 10), cmap=cmap)
plt.colorbar()
plt.show()

Working with LaTeX in Captions

plt.rc('text', usetex=True)  # requires LaTeX installation
plt.plot(x, y)
plt.xlabel(r'$\alpha$ (radians)')
plt.ylabel(r'$f(x) = \sin(\alpha x)$')
plt.title(r'Graph of the function $f(x) = \sin(\alpha x)$')
plt.show()

Matplotlib vs Other Libraries

When to Use Matplotlib:

  • Need full control over appearance.
  • Creating publication-ready graphs.
  • Integration with existing code.
  • Educational purposes.

Alternatives:

  • Seaborn: For statistical visualization.
  • Plotly: For interactive graphics.
  • Bokeh: For web applications.
  • Altair: For declarative visualization.

Conclusion

Matplotlib remains an indispensable tool for creating high-quality data visualization in Python. Its flexibility and power allow you to create graphs of any complexity - from simple line diagrams to complex scientific visualizations. Mastering Matplotlib opens the door to effective data analysis and creating compelling presentations of research results.

The library is constantly evolving, receiving new features and performance improvements. For maximum efficiency, it is recommended to study the official documentation and follow community updates. Matplotlib is an investment in your data visualization skills that will pay off in any project involving the analysis and presentation of information.

News