Working with Word at Python: Creation, Reading and editing documents using the Python-Docx library.

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

A self-study guide for Python 3 compiled from the materials on this site. Primarily intended for those who want to learn the Python programming language from scratch.

What is python-docx and why is it needed

Python-docx is a powerful library for working with Microsoft Word documents in .docx format. It allows you to programmatically create, edit, read and format Word documents without the need to install Microsoft Office. The library is especially useful for automating the creation of reports, processing large volumes of documents, and integrating with other systems.

Installing and configuring python-docx

To start working with the library, you need to install it via pip:

pip install python-docx

After installation, you can immediately start working with Word documents in Python.

Creating a new Word document

Creating a new document is one of the basic operations when working with python-docx:

from docx import Document

# Creating a new document
doc = Document()

# Adding headers of different levels
doc.add_heading('Main header', level=1)
doc.add_paragraph('This is the main text of the document with important information.')

# Adding a subtitle
doc.add_heading('Subtitle', level=2)
doc.add_paragraph('Additional text under the subtitle.')

# Saving a document with the path
doc.save('new_document.docx ')
print("The document has been created successfully!")

Reading and extracting data from a Word document

The following approach is used to work with existing documents:

from docx import Document

# Opening an existing document
doc = Document('new_document.docx ')

# Extract the entire text from the document
print("Document content:")
for paragraph in doc.paragraphs:
    if paragraph.text.strip(): # Skip empty paragraphs
        print(f"- {paragraph.text}")

# Getting information about
print headings("\Headings in a document:")
for paragraph in doc.paragraphs:
    if paragraph.style.name.startswith('Heading'):
print(f"Level heading{paragraph.style.name [-1]}: {paragraph.text}")

Working with tables in Word documents

Python-docx provides convenient tools for creating and filling tables:

 

from docx import Document
from docx.shared import Inches

# Creating a document with a table
doc = Document()
doc.add_heading('Sales report', level=1)

# Creating a 4x4 table
table = doc.add_table(rows=4, cols=4)
table.style = 'Table Grid' # Applying table style

# Filling in the table headers
headers = ['Product', 'Quantity', 'Price', 'Amount']
for i, header in enumerate(headers):
    table.cell(0, i).text = header
    # Making headlines bold
table.cell(0, i).paragraphs[0].runs[0].bold = True

# Filling in the table data
data = [
    ['Apples', '10 kg', '150 rub/kg', '1500 rub'],
    ['Pears', '5 kg', '200 rub/kg', '1000 rub'],
    ['Oranges', '8 kg', '180 rub/kg', '1440 rub']
]

for row_idx, row_data in enumerate(data, 1):
    for col_idx, cell_data in enumerate(row_data):
        table.cell(row_idx, col_idx).text = cell_data

# Saving the document
doc.save('sales_report.docx ')

Adding images to a document

The library allows you to easily insert images with their size settings:

from docx import Document
from docx.shared import Inches, Cm

# Creating a document with an image
doc = Document()
doc.add_heading('Report with images', level=1)

# Adding text before the image
doc.add_paragraph('Below is a diagram of the results:')

# Adding an image with dimensions
try:
# Adding an image with a specified width
    doc.add_picture('chart.png', width=Inches(6.0))
    
    # You can also use centimeters
    doc.add_paragraph('Additional schema:')
    doc.add_picture('scheme.jpg', width=Cm(10))
    
except FileNotFoundError:
    doc.add_paragraph('Image not found. Check the file path.')

# Saving the document
doc.save('document_with_images.docx ')

Advanced text formatting

Python-docx provides extensive text formatting features:

from docx import Document
from docx.shared import Pt, RGBColor
from docx.enum.text import WD_COLOR_INDEX, WD_ALIGN_PARAGRAPH

# Creating a document with formatting
doc = Document()

# Adding a center-aligned header
title = doc.add_heading('Formatted document', level=1)
title.alignment = WD_ALIGN_PARAGRAPH.CENTER

# Creating a paragraph with different formatting
p = doc.add_paragraph()

# Plain text
run1 = p.add_run('Plain text, ')

# Bold text
run2 = p.add_run('bold text,')
run2.bold = True

# Italics
run3 = p.add_run('italics, ')
run3.italic = True

# Underlined text
run4 = p.add_run('underlined,')
run4.underline = True

# Color text
run5 = p.add_run('color text')
run5.font.color.rgb = RGBColor(255, 0, 0) # Red color

# Font and size adjustment
p2 = doc.add_paragraph()
run6 = p2.add_run('Text in Arial font, size 14pt')
run6.font.name = 'Arial'
run6.font.size = Pt(14)

# Marker
selection run7 = p2.add_run('with marker selection')
run7.font.highlight_color = WD_COLOR_INDEX.YELLOW

# Saving the document
doc.save('formatted_document.docx ')

Working with document metadata

Python-docx allows you to manage document properties:

from docx import Document
from datetime import datetime

# Creating a document with metadata
doc = Document()

# Setting document properties
doc.core_properties.title = 'My report is'
doc.core_properties.author = 'Ivan Petrov'
doc.core_properties.subject = 'Monthly report'
doc.core_properties.comments = 'Created using python-docx'
doc.core_properties.created = datetime.now()

# Adding content
doc.add_heading('Report for the current month', level=1)
doc.add_paragraph('Main content of the report...')

# Saving
doc.save('report_with_metadata.docx ')

Error handling and best practices

When working with python-docx, it is important to consider possible errors.:

from docx import Document
import os

def safe_document_processing(file_path):
"""
Secure Word document processing with error
handling """
try:
        # Checking the existence of the file
        if not os.path.exists(file_path):
            print(f"File {file_path} not found")
return None
        
        # Opening a document
        doc = Document(file_path)
        
        # Text extraction
        full_text = []
        for paragraph in doc.paragraphs:
            if paragraph.text.strip():
                full_text.append(paragraph.text)
        
        return full_text
        
    except Exception as e:
        print(f"Error processing the document: {e}")
return None

# Using the function
text_content = safe_document_processing('example.docx')
if text_content:
    print("The document has been processed successfully")
for line in text_content:
        print(f"- {line}")

Conclusion

The python-docx library provides powerful tools for automating work with Word documents. It allows you to create professionally designed documents, process large amounts of data, and integrate Word documents into the workflows of Python applications. The main features include creating and editing documents, working with tables and images, formatting text, and managing metadata.

 

categories

  • Introduction to Python
  • Python Programming Basics
  • Control Structures
  • Data Structures
  • Functions and Modules
  • Exception Handling
  • Working with Files and Streams
  • File System
  • Object-Oriented Programming (OOP)
  • Regular Expressions
  • Additional Topics
  • General Python Base