PDFKIT - PDF generation from HTML

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

What is PDFKit for Python

PDFKit — a powerful Python library that serves as a wrapper for the wkhtmltopdf utility, which uses the WebKit engine to convert HTML documents into PDF files. This library enables developers to create professional PDF documents while preserving complex layouts, modern CSS styles, embedded fonts, SVG graphics, and even JavaScript.

The main advantage of PDFKit is that it lets you leverage familiar web technologies (HTML, CSS) to produce documents that can then be rendered to PDF with high‑quality output. This makes it ideal for generating reports, invoices, tickets, certificates, and other dynamically‑filled documents.

Features of PDFKit Library

Supported Technologies

  • Modern HTML5 and CSS3
  • Bootstrap, Tailwind CSS, and other CSS frameworks
  • SVG graphics and web fonts
  • Tables with complex formatting
  • Flexbox and Grid layouts
  • CSS media queries
  • Basic JavaScript support

Data Sources for Conversion

PDFKit can convert content from three primary sources:

  • Web page URLs
  • Local HTML files
  • In‑memory HTML strings

Installation and Setup

Installing the Python Package

pip install pdfkit

Installing wkhtmltopdf

Ubuntu/Debian:

sudo apt update
sudo apt install wkhtmltopdf

macOS (via Homebrew):

brew install --cask wkhtmltopdf

Windows: Download the installer from the official site https://wkhtmltopdf.org/downloads.html. Both 32‑bit and 64‑bit versions are available for Windows Vista and later.

CentOS/RHEL/Amazon Linux:

# For CentOS 7
sudo yum install wkhtmltopdf

# For newer releases
sudo dnf install wkhtmltopdf

Verifying the Installation

After installation, make sure wkhtmltopdf is accessible:

wkhtmltopdf --version

If the command is not found, you’ll need to set the path to the executable in your Python code.

Core Methods and Functions

Creating PDFs from Different Sources

import pdfkit

# From a URL
pdfkit.from_url("https://example.com", "output.pdf")

# From an HTML file
pdfkit.from_file("template.html", "output.pdf")

# From an HTML string
html_content = "<h1>Title</h1><p>Document content</p>"
pdfkit.from_string(html_content, "output.pdf")

Configuring the wkhtmltopdf Path

If wkhtmltopdf is not in the system PATH, you can specify it manually:

config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf')
pdfkit.from_string(html_content, "output.pdf", configuration=config)

Generating PDFs In‑Memory

For web applications and APIs you often need the PDF as bytes:

# Returns PDF as bytes
pdf_bytes = pdfkit.from_string(html_content, False)

# Can be sent in an HTTP response
from flask import Response
return Response(pdf_bytes, mimetype='application/pdf')

Generation Options Configuration

Basic Formatting Options

options = {
    'page-size': 'A4',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'no-outline': None,
    'enable-local-file-access': None
}

pdfkit.from_string(html_content, "output.pdf", options=options)

Advanced Options

advanced_options = {
    'dpi': 300,                       # High‑resolution printing
    'print-media-type': None,        # Use CSS print media
    'zoom': 1.3,                      # Scale factor
    'javascript-delay': 1000,        # Wait for JS execution
    'no-stop-slow-scripts': None,    # Do not abort slow scripts
    'debug-javascript': None,        # Enable JS debugging
    'load-error-handling': 'ignore', # Ignore loading errors
    'load-media-error-handling': 'ignore'
}

Working with Headers and Footers

options = {
    'header-html': 'header.html',
    'footer-html': 'footer.html',
    'header-spacing': 5,
    'footer-spacing': 5,
    'header-font-size': 8,
    'footer-font-size': 8
}

Integration with Web Frameworks

Flask

from flask import Flask, render_template, Response
import pdfkit

app = Flask(__name__)

@app.route('/generate-pdf')
def generate_pdf():
    # Render HTML template
    html = render_template('invoice.html',
                           customer_name='Ivan Petrov',
                           amount=15000)

    # Convert to PDF
    pdf = pdfkit.from_string(html, False)

    return Response(pdf,
                    mimetype='application/pdf',
                    headers={'Content-Disposition': 'attachment; filename=invoice.pdf'})

Django

from django.http import HttpResponse
from django.template.loader import render_to_string
import pdfkit

def generate_report(request):
    # Prepare context
    context = {
        'title': 'Monthly Report',
        'data': get_report_data(),
        'date': datetime.now()
    }

    # Render template
    html = render_to_string('report_template.html', context)

    # Create PDF
    pdf = pdfkit.from_string(html, False, options={
        'page-size': 'A4',
        'margin-top': '1in',
        'encoding': 'UTF-8'
    })

    response = HttpResponse(pdf, content_type='application/pdf')
    response['Content-Disposition'] = 'attachment; filename="report.pdf"'
    return response

FastAPI

from fastapi import FastAPI
from fastapi.responses import Response
import pdfkit

app = FastAPI()

@app.post("/generate-pdf")
async def create_pdf(html_content: str):
    options = {
        'page-size': 'A4',
        'margin-top': '0.75in',
        'encoding': "UTF-8",
    }

    pdf = pdfkit.from_string(html_content, False, options=options)

    return Response(
        content=pdf,
        media_type='application/pdf',
        headers={'Content-Disposition': 'attachment; filename=document.pdf'}
    )

Complete Methods and Parameters Table

Method Description Parameters
pdfkit.from_url(url, output_path, options=None, configuration=None) Converts a web page to PDF url: page address
output_path: file path or False for bytes
pdfkit.from_file(input, output_path, options=None, configuration=None) Converts an HTML file to PDF input: path to HTML file
output_path: destination file path
pdfkit.from_string(string, output_path, options=None, configuration=None) Converts an HTML string to PDF string: HTML content
output_path: file path or False
pdfkit.configuration(wkhtmltopdf=None) Sets the path to the wkhtmltopdf executable wkhtmltopdf: full path to the binary

Key Configuration Options

Option Description Possible Values
page-size Page size A4, A3, Letter, Legal, Tabloid
orientation Page orientation Portrait, Landscape
margin-top/bottom/left/right Margins Values in in, cm, mm, px
encoding Text encoding UTF-8, Windows-1251, etc.
dpi Print resolution 72, 150, 300, 600
zoom Page scaling factor 0.1 – 3.0
print-media-type Use CSS print media None (activates the flag)
javascript-delay Delay for JavaScript execution (ms) 0‑10000
no-outline Disable the PDF outline (bookmarks) None
grayscale Render in grayscale None

Working with CSS and Styles

Print‑Optimized CSS

Create dedicated print styles for PDFs:

@media print {
    body { 
        font-family: 'DejaVu Sans', sans-serif;
        font-size: 12pt;
        line-height: 1.4;
    }
    
    .no-print { display: none; }
    
    .page-break { 
        page-break-before: always; 
    }
    
    table { 
        page-break-inside: avoid; 
    }
}

Including External Fonts

<head>
    <link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;700&display=swap" rel="stylesheet">
    <style>
        body {
            font-family: 'Roboto', sans-serif;
        }
    </style>
</head>

Error Handling and Debugging

Common Issues and Solutions

“wkhtmltopdf not found” error

import pdfkit

try:
    pdf = pdfkit.from_string(html, False)
except OSError as e:
    if "wkhtmltopdf" in str(e):
        # Specify the executable path
        config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf')
        pdf = pdfkit.from_string(html, False, configuration=config)

Encoding problems

options = {
    'encoding': 'UTF-8',
    'enable-local-file-access': None,
    'page-size': 'A4'
}

JavaScript debugging

debug_options = {
    'debug-javascript': None,
    'javascript-delay': 2000,
    'no-stop-slow-scripts': None
}

Performance Optimization

Caching Configuration

# Create configuration once
config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf')
options = {
    'page-size': 'A4',
    'encoding': 'UTF-8',
    'margin-top': '0.75in'
}

# Reuse for multiple PDFs
pdf1 = pdfkit.from_string(html1, False, options=options, configuration=config)
pdf2 = pdfkit.from_string(html2, False, options=options, configuration=config)

Asynchronous Processing

import asyncio
import concurrent.futures

async def generate_pdf_async(html_content, options):
    loop = asyncio.get_event_loop()
    with concurrent.futures.ThreadPoolExecutor() as executor:
        pdf = await loop.run_in_executor(
            executor, 
            pdfkit.from_string, 
            html_content, 
            False, 
            options
        )
    return pdf

Usage in Different Environments

Docker Containers

FROM python:3.9

# Install wkhtmltopdf
RUN apt-get update && apt-get install -y \
    wkhtmltopdf \
    xvfb \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install -r requirements.txt

# Headless mode
ENV DISPLAY=:0

AWS Lambda

To run in AWS Lambda you need a dedicated Lambda Layer containing wkhtmltopdf:

import os
import pdfkit

def lambda_handler(event, context):
    # Lambda‑specific configuration
    config = pdfkit.configuration(wkhtmltopdf='/opt/bin/wkhtmltopdf')
    
    # Set environment variables for fonts and libraries
    os.environ['FONTCONFIG_PATH'] = '/opt/fonts'
    os.environ['LD_LIBRARY_PATH'] = '/opt/lib'
    
    pdf = pdfkit.from_string(html, False, configuration=config)
    
    return {
        'statusCode': 200,
        'body': base64.b64encode(pdf).decode('utf-8'),
        'isBase64Encoded': True,
        'headers': {
            'Content-Type': 'application/pdf'
        }
    }

Frequently Asked Questions

How to add page numbers?

<style>
@page {
    @bottom-right {
        content: "Page " counter(page) " of " counter(pages);
    }
}
</style>

Can I add watermarks?

Yes, using CSS:

body::before {
    content: "CONFIDENTIAL";
    position: fixed;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%) rotate(-45deg);
    font-size: 80px;
    color: rgba(255, 0, 0, 0.1);
    z-index: 9999;
    pointer-events: none;
}

How to enforce page breaks?

.page-break {
    page-break-before: always;
}

.avoid-break {
    page-break-inside: avoid;
}

Are charts and graphs supported?

Yes, via SVG, Canvas, or libraries like Chart.js with a JavaScript delay:

options = {
    'javascript-delay': 3000,  # Wait 3 seconds for rendering
    'no-stop-slow-scripts': None
}

Alternatives to PDFKit

WeasyPrint

  • No external dependencies
  • Better CSS support
  • Slower than PDFKit

ReportLab

  • Programmatic PDF creation
  • Greater layout control
  • Steeper learning curve

xhtml2pdf

  • Pure Python
  • Limited CSS support
  • Suitable for simple documents

Playwright PDF

  • Modern alternative
  • Better JavaScript handling
  • Higher resource consumption

Conclusion

PDFKit is a powerful and flexible tool for generating PDF documents from HTML in Python. Its key strengths — ease of use, excellent support for modern web standards, and the ability to produce professional‑grade documents with minimal effort.

By leveraging the WebKit engine, PDFKit delivers high‑quality rendering, supports complex layouts, and embraces the latest CSS features. This makes it the ideal choice for projects where document appearance matters and where developers already work with web technologies.

When wkhtmltopdf is properly configured and generation options are fine‑tuned, PDFKit can perform efficiently in both small‑scale applications and high‑throughput systems, providing fast, reliable PDF creation for a wide range of business needs.

News