The use of regular expressions in Python to search and replace the text: Methods and examples of working with the library Re.

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

A self-study guide for Python 3 compiled from the materials on this site. Primarily intended for those who want to learn the Python programming language from scratch.

Regular expressions in Python: easy operation with

Regular expressions (regex) is a powerful tool for searching and processing text in Python. The re module provides all the necessary tools for working with regular images. In this article, we will look at modern methods and practical techniques of use.

 

Basic metrics for regular display

Comment Description Example Cooperation
. Love the newline character a.c abc, a1c, a@c
^ It was great ^hello hello world (at the beginning)
$ the horses of the line world$ hello world (at the end)
* 0 or more supervisors ab*c ac, abc, abbc
+ 1 or more supervisors ab+c abc, abbc (not ac)
? 0 or 1 change ab?c ac, abc
{n} Completely updated a{3} aaa
{n,} or more supervisors a{2,} aa, aaa, aaaa
{n,m} It depends on me a{2,4} aa, aaa, aaaa
[] The symbolic class [abc] a, b, c
[^] External location [^abc] Any character except a, b, c
| Registration (OR) cat|dog cat or dog
() Grouping (ab)+ ab, abab, ababab
\ Scanning \$ Literal character $

Symbolic classes

DescriptionComment

Class
\d Cfra [0-9]
\D Low fat [^0-9]
\w Letter, cfr, or underscore [a-zA-Z0-9_]
\W Not a letter, but a sign or underscore [^a-zA-Z0-9_]
\s Space character [\t\n\r\f\v]
\S Impenetrable password [^\t\n\r\f\v]

Quantifiers (greedy and lazy)

Greedy

Lazy Description
* *? 0 or more (minimum)
+ +? 1 or more (minimum)
? ?? 0 or 1 (minutes)
{n,m} {n,m}? From n to m (imaginary)

Graph language

Meta message Description Example
\b Grania of the word \bcat\b find cat as a separate word
\B A new book \Bcat\B find cat inside the word

Other participants

Community Description
\n Newline character
\t Tab character
\r The symbol caused the carriages
\f The page translation symbol
\v Character tabulation
\0 Null character
\xhh Symbolic program code hh

Use cases

re-import

# Email
request email_pattern = r'\b[A-Za-z0-9._%+-]+@[ A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

# Phone
search phone_pattern = r'\+?[1-9]\d{1,14}'

# Search for data in the format FROM DD.MM.YYYY
date_pattern = r'\d{2}\.\d{2}\.\d{4}'

# Search for an IP address
ip_pattern = r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b'

 

The dot (.) is a universal character

The meta-symbolic point is responsible for the love of the symbol, besides the symbol of the new line. This is one of the few cases where the user uses regular images.

 

re-import

text = "cat, bat, mat, rat"
pattern = r".at" # Found words based on "at"
matches = re.findall(template, text)
print(matches)  # Input: ["cat", "bat", "mat", "rat"]
I started the lines (^)

The symbol ^ indicates what happened. It allows you to find a person only if they are in another place.

re-import

text = "hello world"
pattern = "^hello" # I mean, starting with "hello"
matches = repeat search(template, text)
if it matches:
    print(match.group()) # Input: 'hello'
Anchor cognacs ($)

The $ symbol is responsible for the string. It is used to search for templates in this topic.

re-import

text = "hello world"
template = r"world$" # This is the word pointing to "world"
matching = repeat search(template, text)
if it matches:
    print(match.group()) # Input: 'world'

Quantifiers in regular display

Star (*) - not or more expected

The quantifier * does not match zero or more occurrences of the preceding element.

re-import

text = "ac abc abbc abbbc"
template = r"ab*c" # with "a", with "b", with "c"
matches = re.findall(template, text)
output(matches)  # Input: ['ac', 'abc', 'abbc', 'abbbc']
Ples (+) - only or most expected

The quantifier + does not require at least one occurrence of the preceding element.

re-import

text = "ac abc abbc abbbc"
template = r"ab+c" # with "a", with "a", with "b", with "c"
matches = re.find all(template, text)
output(matches)  # Input: ['abc', 'abbc', 'abbbc bbc']
The question mark (?) is not or one opening

Quantifier ? is a representative user.

re-import

text = "color color"
pattern = r"color?r" # This "color" only means "u" because it matches "r"
= re. find all(pattern, text)
print(matches)  # Input: ['color', 'color']

Practical ways to use regular expressions

 
Registration by email-
Searching and publishing email newsletters is one of the most common ways to get information about working with regulatory documents.
re-import

text = "Emails: test@example.com , another.test@mail.co.uk "
template = r"\b[A-Za-z0-9._%+-]+@[ A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
matches = re.findall(template, text)
print(matches)  # Input: ['test@example.com ', 'another.test@mail.co.uk ']

Link to the source from the text
Regular impressions will help you find a web destination in various formats.
re-import

text = "Visit my website at https://www.example.com "
template = r"https?://(?:www\.)?([a-zA-Z0-9-]+\.[a-zA-Z]{2,})"
matches = re.findall(template, text)
print(matches) # Output: ['example.com ']

Social media search
 

For analysis, a social network is needed, part of which requires the development of technology.

 
re-import

text = "Check out #Python and #DataScience on Twitter!"
template = r"#\w+"
matches = re.find everything(template, text)
print(matches)  # Input: ['#Python', '#DataScience']
 
IP address search
 

Regular expressions effectively find its IP addresses in logs and configuration files.

 
re-import
text = "IP addresses: 192.168.1.1 and 10.0.0.1"
template = r"\b(?:\d{1,3}\.){3}\d{1,3}\b"
matches = re.findall(template, text)
print(matches)  # Output: ['192.168.1.1', '10.0.0.1']
 
Data in the format DD/MM/YYYY
 

The assignment is made from technology using a team of developers.

 
re-import

text = "Dates: 20/01/2022 and 31/12/2023"
template = r"\b(0[1-9]|[12]\ d|3[01])/(0[1-9]|1[0-2])/(\ d{4})\b"
matches = re.findall(template, text)
print (matches)  # Output: [('20', '01', '2022'), ('31', '12', '2023')]
 
HTML tag search
 

Lazy graphic editors are used to work with HTML markup.

categories

  • Introduction to Python
  • Python Programming Basics
  • Control Structures
  • Data Structures
  • Functions and Modules
  • Exception Handling
  • Working with Files and Streams
  • File System
  • Object-Oriented Programming (OOP)
  • Regular Expressions
  • Additional Topics
  • General Python Base