Regular expressions in Python: easy operation with
Regular expressions (regex) is a powerful tool for searching and processing text in Python. The re module provides all the necessary tools for working with regular images. In this article, we will look at modern methods and practical techniques of use.
Basic metrics for regular display
| Comment | Description | Example | Cooperation |
|---|---|---|---|
. |
Love the newline character | a.c |
abc, a1c, a@c |
^ |
It was great | ^hello |
hello world (at the beginning) |
$ |
the horses of the line | world$ |
hello world (at the end) |
* |
0 or more supervisors | ab*c |
ac, abc, abbc |
+ |
1 or more supervisors | ab+c |
abc, abbc (not ac) |
? |
0 or 1 change | ab?c |
ac, abc |
{n} |
Completely updated | a{3} |
aaa |
{n,} |
or more supervisors | a{2,} |
aa, aaa, aaaa |
{n,m} |
It depends on me | a{2,4} |
aa, aaa, aaaa |
[] |
The symbolic class | [abc] |
a, b, c |
[^] |
External location | [^abc] |
Any character except a, b, c |
| |
Registration (OR) | cat|dog |
cat or dog |
() |
Grouping | (ab)+ |
ab, abab, ababab |
\ |
Scanning | \$ |
Literal character $ |
Symbolic classes
DescriptionComment
| Class | ||
|---|---|---|
\d |
Cfra | [0-9] |
\D |
Low fat | [^0-9] |
\w |
Letter, cfr, or underscore | [a-zA-Z0-9_] |
\W |
Not a letter, but a sign or underscore | [^a-zA-Z0-9_] |
\s |
Space character | [\t\n\r\f\v] |
\S |
Impenetrable password | [^\t\n\r\f\v] |
Quantifiers (greedy and lazy)
Greedy
| Lazy | Description | |
|---|---|---|
* |
*? |
0 or more (minimum) |
+ |
+? |
1 or more (minimum) |
? |
?? |
0 or 1 (minutes) |
{n,m} |
{n,m}? |
From n to m (imaginary) |
Graph language
| Meta message | Description | Example |
|---|---|---|
\b |
Grania of the word | \bcat\b find cat as a separate word |
\B |
A new book | \Bcat\B find cat inside the word |
Other participants
| Community | Description |
|---|---|
\n |
Newline character |
\t |
Tab character |
\r |
The symbol caused the carriages |
\f |
The page translation symbol |
\v |
Character tabulation |
\0 |
Null character |
\xhh |
Symbolic program code hh |
Use cases
re-import
# Email
request email_pattern = r'\b[A-Za-z0-9._%+-]+@[ A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
# Phone
search phone_pattern = r'\+?[1-9]\d{1,14}'
# Search for data in the format FROM DD.MM.YYYY
date_pattern = r'\d{2}\.\d{2}\.\d{4}'
# Search for an IP address
ip_pattern = r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b'
The dot (.) is a universal character
The meta-symbolic point is responsible for the love of the symbol, besides the symbol of the new line. This is one of the few cases where the user uses regular images.
re-import
text = "cat, bat, mat, rat"
pattern = r".at" # Found words based on "at"
matches = re.findall(template, text)
print(matches) # Input: ["cat", "bat", "mat", "rat"]
I started the lines (^)
The symbol ^ indicates what happened. It allows you to find a person only if they are in another place.
re-import
text = "hello world"
pattern = "^hello" # I mean, starting with "hello"
matches = repeat search(template, text)
if it matches:
print(match.group()) # Input: 'hello'
Anchor cognacs ($)
The $ symbol is responsible for the string. It is used to search for templates in this topic.
re-import
text = "hello world"
template = r"world$" # This is the word pointing to "world"
matching = repeat search(template, text)
if it matches:
print(match.group()) # Input: 'world'
Quantifiers in regular display
Star (*) - not or more expected
The quantifier * does not match zero or more occurrences of the preceding element.
re-import
text = "ac abc abbc abbbc"
template = r"ab*c" # with "a", with "b", with "c"
matches = re.findall(template, text)
output(matches) # Input: ['ac', 'abc', 'abbc', 'abbbc']
Ples (+) - only or most expected
The quantifier + does not require at least one occurrence of the preceding element.
re-import
text = "ac abc abbc abbbc"
template = r"ab+c" # with "a", with "a", with "b", with "c"
matches = re.find all(template, text)
output(matches) # Input: ['abc', 'abbc', 'abbbc bbc']
The question mark (?) is not or one opening
Quantifier ? is a representative user.
re-import
text = "color color"
pattern = r"color?r" # This "color" only means "u" because it matches "r"
= re. find all(pattern, text)
print(matches) # Input: ['color', 'color']
Practical ways to use regular expressions
Registration by email-
Searching and publishing email newsletters is one of the most common ways to get information about working with regulatory documents.
re-import
text = "Emails: test@example.com , another.test@mail.co.uk "
template = r"\b[A-Za-z0-9._%+-]+@[ A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
matches = re.findall(template, text)
print(matches) # Input: ['test@example.com ', 'another.test@mail.co.uk ']
Link to the source from the text
Regular impressions will help you find a web destination in various formats.
re-import
text = "Visit my website at https://www.example.com "
template = r"https?://(?:www\.)?([a-zA-Z0-9-]+\.[a-zA-Z]{2,})"
matches = re.findall(template, text)
print(matches) # Output: ['example.com ']
Social media search
For analysis, a social network is needed, part of which requires the development of technology.
re-import
text = "Check out #Python and #DataScience on Twitter!"
template = r"#\w+"
matches = re.find everything(template, text)
print(matches) # Input: ['#Python', '#DataScience']
IP address search
Regular expressions effectively find its IP addresses in logs and configuration files.
re-import
text = "IP addresses: 192.168.1.1 and 10.0.0.1"
template = r"\b(?:\d{1,3}\.){3}\d{1,3}\b"
matches = re.findall(template, text)
print(matches) # Output: ['192.168.1.1', '10.0.0.1']
Data in the format DD/MM/YYYY
The assignment is made from technology using a team of developers.
re-import
text = "Dates: 20/01/2022 and 31/12/2023"
template = r"\b(0[1-9]|[12]\ d|3[01])/(0[1-9]|1[0-2])/(\ d{4})\b"
matches = re.findall(template, text)
print (matches) # Output: [('20', '01', '2022'), ('31', '12', '2023')]
HTML tag search
Lazy graphic editors are used to work with HTML markup.