Fundamentals of regular expressions syntax in Python: Creation of templates for the search and processing of string data.

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

A self-study guide for Python 3 compiled from the materials on this site. Primarily intended for those who want to learn the Python programming language from scratch.

What are regular expressions in Python

Regular Expressions (regex) in Python are a powerful tool for working with text data. They allow you to search, extract, check, and replace text based on predefined templates. The re module in Python provides a complete set of functions for working with regular expressions.

Syntax of regular expressions

Special characters

symbol

The Description
. Any character other than the newline character (\n)
^ The beginning of the line
$ End of line
* Zero or more repetitions of the previous character
+ One or more repetitions of the previous character
? Zero or one repetition of the previous character
{n} Exactly n repetitions of the previous character
{n,} At least n repetitions of the previous character
{n,m} From n to m repetitions of the previous character
\ Escaping special characters
[] Character set
` `

Special sequences

The sequence Description
\d Digit (0-9)
\D Not a number
\w Alphanumeric character (a-z, A-Z, 0-9, _)
\W Not an alphanumeric character
\s Space character (space, tab, newline)
\S Non-whitespace character
\b The border of the word
\B Not a word boundary

Grouping and modifiers

element

The Description
() Character grouping
\1, \2, ... Backlinks to groups
re.IGNORECASE (or re.I) Ignore case when matching
re.MULTILINE (or re.M) Allow ^ and $ to match the beginning and end of each line
re.DOTALL (or re.S) The character. matches any character, including a newline
re.VERBOSE (or re.X) Allows the use of spaces and comments in the template

Basic methods of the re module

1. re.compile(pattern, flags=0)

Compiles a regular expression into a Pattern object for reuse.

import re

# Compiling a regular expression to search for numbers
pattern = re.compile(r'\d+')
text = 'Price: 1,500 rubles'
result = pattern.search(text)
print(result.group()) # Output: 1500

2. re.search(pattern, string, flags=0)

Searches for the first regular expression match in a string.

import re

text = 'Phone: +7 (123) 456-78-90'
match = re.search(r'\+\d{1,3}', text)
if match:
    print(match.group()) # Output: +7

3. re.match(pattern, string, flags=0)

It searches for a regular expression match only at the beginning of the string.

import re

text = '2023-12-25 - date of the event'
match = re.match(r'\d{4}-\d{2}-\d{2}', text)
if match:
    print(match.group()) # Output: 2023-12-25

4. re.findall(pattern, string, flags=0)

Finds all the mappings and returns them as a list.

import re

text = 'Email: user@example.com, admin@site.ru'
emails = re.findall(r'\w+@\w+\.\w+', text)
print(emails) # Output: ['user@example.com ', 'admin@site.ru ']

5. re.finditer(pattern, string, flags=0)

Returns an iterator of Match objects for all mappings.

import re

text = 'Prices: 100p, 250p, 500p'
matches = re.finditer(r'\d+', text)
for match in matches:
    print(f' Number found: {match.group()} at position {match.start()}')

6. re.sub(pattern, repl, string, count=0, flags=0)

Replaces all mappings with the specified substitution.

import re

text = 'Date: 12/25/2023'
# Replacing the date format from dots to hyphens
new_text = re.sub(r'(\d{2})\.(\d{2})\.(\d{4})', r'\3-\2-\1', text)
print(new_text) # Output: Date: 2023-12-25

7. re.split(pattern, string, maxsplit=0, flags=0)

Splits a string using a regular expression.

import re

text = 'apples,pears;oranges:tangerines'
fruits = re.split(r'[,;:]', text)
print(fruits) # Output: ['apples', 'pears', 'oranges', 'tangerines']

Practical usage examples

Validation of email addresses

import re

def validate_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return re.match(pattern, email) is not None

print(validate_email('user@example.com'))  # True
print(validate_email('invalid-email'))     # False

Extracting phone numbers

import re

text = '''
Contacts:
+7 (123) 456-78-90
8-800-555-35-35
+7 987 654 32 10
'''

phone_pattern = r'\+?[78][-\s]?(?:\(\d{3}\)|\d{3})[-\s]?\d{3}[-\s]?\d{2}[-\s]?\d{2}'
phones = re.findall(phone_pattern, text)
print(phones)

Working with HTML tags

import re

html = '

This is a paragraph

 
This is a div
'
# Delete all HTML tags
clean_text = re.sub(r'<[^>]+>', ", html)
print(clean_text) # Output: This paragraph is a div

Search for words of a certain length

import re

text = 'Python is a powerful programming language'
# Find all words from 5 to 8 characters long
long_words = re.findall(r'\b\w{5,8}\b', text)
print(long_words) # Output: ['Python', 'powerful']

Capture groups and named groups

import re

text = 'Date of birth: 03/15/1990'
# Using capture groups
match = re.search(r'(\d{2})\.(\d{2})\.(\d{4})', text)
if match:
    day, month, year = match.groups()
print(f'Day: {day}, Month: {month}, Year: {year}')

# Named groups
pattern = r'(?P<day>\d{2})\.(?P<month>\d{2})\.(?P<year>\d{4})'
match = re.search(pattern, text)
if match:
    print(f' Year: {match.group("year")}')

Optimization tips

1. Compile regular expressions for reuse

import re

# Inefficient - compiles every time
for text in texts:
    re.search(r'\d+', text)

# Efficient - compiles once
pattern = re.compile(r'\d+')
for text in texts:
    pattern.search(text)

2. Use raw strings

# Correct
pattern = r'\d+\.\d+'

# Incorrect - requires double escaping
pattern ='\\d+\\.\\d+'
 

categories

  • Introduction to Python
  • Python Programming Basics
  • Control Structures
  • Data Structures
  • Functions and Modules
  • Exception Handling
  • Working with Files and Streams
  • File System
  • Object-Oriented Programming (OOP)
  • Regular Expressions
  • Additional Topics
  • General Python Base