How to work with strings in Python: search, replace, merge

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

Python String Fundamentals

Strings as `str` Objects

In Python, strings are a fundamental data type for working with textual information. Understanding how strings work is critical for effective programming.

In Python, strings are objects of the built-in str type. They represent sequences of Unicode characters. Each string supports a multitude of built-in methods for text processing. Strings support access to individual characters by index. They also allow for comparison and sorting operations.

text = "Python"
print(type(text))  # <class 'str'>
print(len(text))   # 6

The Immutability Principle of Strings

Strings in Python are immutable objects. This means that any transformation operation creates a new string. The original string remains unchanged. This principle ensures data safety and code predictability.

original = "hello"
modified = original.upper()
print(original)  # 'hello' - unchanged
print(modified)  # 'HELLO' - a new string

Methods for Creating Strings

Python provides several ways to create strings:

Single quotes for simple strings Double quotes for strings containing apostrophes Triple quotes for multi-line text Raw strings for escaping special characters

single = 'simple string'
double = "string with an 'apostrophe'"
multiline = """First line
Second line
Third line"""
raw = r"path\to\file"

Indexing and Slicing

Accessing String Characters

Each character in a string has its own index. Indexing starts from zero. Python supports both positive and negative indexing.

text = "Python"
print(text[0])   # 'P' - first character
print(text[5])   # 'n' - last character
print(text[-1])  # 'n' - last character via negative index
print(text[-6])  # 'P' - first character via negative index

Creating String Slices

Slices allow you to extract substrings from the original string. The slice syntax is: `string[start:end:step]`. The starting index is inclusive, and the ending index is exclusive.

text = "Python Programming"
print(text[1:4])    # 'yth'
print(text[:6])     # 'Python'
print(text[7:])     # 'Programming'
print(text[::2])    # 'Pto rgamn' - every second character
print(text[::-1])   # 'gnimmargorP nohtyP' - reverse the string

Practical Use of Negative Indexing

Negative indexing simplifies working with the end of a string. It is particularly useful when processing files and data of variable length.

filename = "document.pdf"
extension = filename[-4:]  # '.pdf'
name_part = filename[:-4]  # 'document'

String Searching Methods

Basic Substring Search Methods

The `find()` method returns the index of the first occurrence of a substring. If the substring is not found, it returns -1. The `rfind()` method searches from the end of the string.

text = "banana programming"
print(text.find("ana"))     # 1
print(text.find("python"))  # -1 (not found)
print(text.rfind("ana"))    # 3
print(text.find("a", 2))    # 3 (search starting from position 2)

Strict Search Methods with Exceptions

The `index()` and `rindex()` methods work similarly to `find()`. However, they raise a `ValueError` exception if the substring is not found.

text = "banana"
print(text.index("n"))      # 2
# print(text.index("z"))    # ValueError: substring not found

Counting Substring Occurrences

The `count()` method returns the number of non-overlapping occurrences of a substring. You can specify search boundaries.

text = "banana"
print(text.count("a"))        # 3
print(text.count("an"))       # 2
print(text.count("a", 1, 5))  # 2 (search from position 1 to 5)

Checking for Substring Presence

The `in` operator provides a simple way to check for the presence of a substring. It returns a boolean value.

text = "Python programming"
if "Python" in text:
    print("Found!")
if "Java" not in text:
    print("Java not found")

Replacing Text in Strings

The Core `replace()` Method

The `replace()` method replaces all occurrences of one substring with another. A new string with the replacements is created.

text = "hello world"
new_text = text.replace("world", "Python")
print(new_text)  # 'hello Python'

Limiting the Number of Replacements

The third parameter of the `replace()` method allows you to limit the number of replacements. This is useful for partial text processing.

text = "one, two, one, three, one"
result = text.replace("one", "1", 2)  # replace only the first 2
print(result)  # '1, two, 1, three, one'

Conditional Text Replacement

Before performing a replacement, it is recommended to check for the presence of the substring. This improves performance and prevents unnecessary operations.

text = "old text"
if "old" in text:
    text = text.replace("old", "new")
print(text)  # 'new text'

Multiple Replacements

To perform multiple replacements, you can use a loop or a dictionary. Each replacement creates a new string.

text = "красный синий зеленый"
replacements = {"красный": "red", "синий": "blue", "зеленый": "green"}
for old, new in replacements.items():
    text = text.replace(old, new)
print(text)  # 'red blue green'

Joining and Concatenating Strings

Efficient Joining with `join()`

The `join()` method is the most efficient way to combine multiple strings. It takes an iterable of strings.

words = ["Python", "is", "awesome"]
sentence = " ".join(words)
print(sentence)  # 'Python is awesome'

numbers = ["1", "2", "3", "4"]
csv_line = ",".join(numbers)
print(csv_line)  # '1,2,3,4'

Concatenation with the `+` Operator

The `+` operator is suitable for joining a small number of strings. For many strings, it is better to use `join()`.

greeting = "Hello, " + "world!"
print(greeting)  # 'Hello, world!'

# Less efficient for many strings
result = ""
for word in ["a", "b", "c"]:
    result += word  # a new string is created in each iteration

Combining Strings with Numbers

Numeric values must be converted to strings before being combined. Use `str()` or formatting.

age = 25
message = "I am " + str(age) + " years old"
print(message)  # 'I am 25 years old'

# Better to use f-strings
message = f"I am {age} years old"
print(message)  # 'I am 25 years old'

Validating String Content

Methods for Checking Character Types

Python provides a set of methods for checking the type of characters in a string:

"123".isdigit()      # True - only digits
"abc".isalpha()      # True - only letters
"abc123".isalnum()   # True - letters and digits
"Hello World".isspace()  # False - not only whitespace
"   ".isspace()      # True - only whitespace characters
"Title Case".istitle()   # True - title-cased words
"UPPER".isupper()    # True - only uppercase letters
"lower".islower()    # True - only lowercase letters

Checking the Start and End of a String

The `startswith()` and `endswith()` methods check for prefixes and suffixes of strings. They accept both strings and tuples of strings.

filename = "index.html"
print(filename.endswith(".html"))     # True
print(filename.endswith((".html", ".htm")))  # True
print(filename.startswith("index"))   # True

url = "https://example.com"
print(url.startswith(("http://", "https://")))  # True

Checking for an Empty String

A correct check for empty strings considers whitespace characters. The `strip()` method removes whitespace from the ends.

def is_empty_string(s):
    return not s.strip()

print(is_empty_string(""))      # True
print(is_empty_string("   "))   # True
print(is_empty_string("text"))  # False

Case Conversion

Core Case Conversion Methods

Python provides several methods for changing the case of characters:

text = "python programming"
print(text.upper())      # 'PYTHON PROGRAMMING'
print(text.lower())      # 'python programming'
print(text.capitalize()) # 'Python programming'
print(text.title())      # 'Python Programming'
print(text.swapcase())   # 'PYTHON PROGRAMMING'

mixed = "PyThOn"
print(mixed.casefold())  # 'python' - a more aggressive lowercasing

Practical Applications of Case Conversion

Case conversion is often used for normalizing data and comparing strings case-insensitively.

def normalize_email(email):
    return email.strip().lower()

def case_insensitive_compare(str1, str2):
    return str1.lower() == str2.lower()

user_input = "  USER@EXAMPLE.COM  "
clean_email = normalize_email(user_input)
print(clean_email)  # 'user@example.com'

Stripping Characters and Whitespace

String Cleaning Methods

The `strip()` family of methods removes characters from the ends of a string:

text = "   hello world   "
print(text.strip())    # 'hello world'
print(text.lstrip())   # 'hello world   '
print(text.rstrip())   # '   hello world'

# Removing specific characters
data = "***important data***"
clean = data.strip("*")
print(clean)  # 'important data'

# Removing multiple characters
messy = "...!!!text!!!..."
clean = messy.strip(".!")
print(clean)  # 'text'

Advanced String Cleaning

Combining cleaning methods allows solving complex text processing tasks:

def clean_user_input(text):
    """Clean user input"""
    if not text:
        return ""
    
    # Remove whitespace and special characters
    cleaned = text.strip(" \t\n\r.,;!?")
    
    # Normalize whitespace
    cleaned = " ".join(cleaned.split())
    
    return cleaned

user_data = "  Hello,,,   world!!!   "
result = clean_user_input(user_data)
print(result)  # 'Hello world'

String Formatting

The Classic `format()` Method

The `format()` method provides powerful string formatting capabilities:

template = "Hello, {}! Today is {}"
message = template.format("Andrew", "Monday")
print(message)  # 'Hello, Andrew! Today is Monday'

# Named parameters
template = "Hello, {name}! Age: {age}"
message = template.format(name="Maria", age=30)
print(message)  # 'Hello, Maria! Age: 30'

# Positional parameters
template = "Result: {1} + {0} = {2}"
message = template.format(5, 3, 8)
print(message)  # 'Result: 3 + 5 = 8'

Modern F-Strings

F-strings (formatted string literals) provide the most readable and efficient way to format strings:

name = "Elena"
age = 28
salary = 50000.75

message = f"Employee: {name}, age: {age}, salary: {salary:.2f}"
print(message)  # 'Employee: Elena, age: 28, salary: 50000.75'

# Evaluating expressions inside f-strings
x = 10
y = 20
print(f"Sum: {x + y}")  # 'Sum: 30'
print(f"Greater: {max(x, y)}")  # 'Greater: 20'

Advanced Formatting

F-strings support complex formats for numbers, dates, and other data types:

from datetime import datetime

price = 1234.567
percentage = 0.847
now = datetime.now()

print(f"Price: ${price:,.2f}")  # 'Price: $1,234.57'
print(f"Percentage: {percentage:.1%}")  # 'Percentage: 84.7%'
print(f"Date: {now:%Y-%m-%d %H:%M}")  # 'Date: 2024-03-15 14:30'

# Text alignment
name = "Python"
print(f"|{name:<10}|")  # '|Python    |' - left-aligned
print(f"|{name:>10}|")  # '|    Python|' - right-aligned
print(f"|{name:^10}|")  # '|  Python  |' - centered

Practical Method Combinations

Splitting and Joining Strings

The combination of `split()` and `join()` allows for effective processing of structured text:

# Converting CSV to another format
csv_data = "apple;banana;pear;orange"
items = csv_data.split(";")
formatted = " | ".join(items)
print(formatted)  # 'apple | banana | pear | orange'

# Normalizing a list with spaces
messy_list = " item1 , item2,item3 ,  item4"
clean_items = [item.strip() for item in messy_list.split(",")]
result = ", ".join(clean_items)
print(result)  # 'item1, item2, item3, item4'

Chaining Cleaning and Replacement

Combining cleaning and replacement methods solves complex text processing tasks:

def normalize_phone(phone):
    """Normalize a phone number"""
    # Remove all characters except digits and +
    clean = "".join(char for char in phone if char.isdigit() or char == "+")
    
    # Replace country code
    if clean.startswith("8"):
        clean = "+7" + clean[1:]
    elif clean.startswith("7") and not clean.startswith("+7"):
        clean = "+" + clean
        
    return clean

phone = " +7 (912) 345-67-89 "
normalized = normalize_phone(phone)
print(normalized)  # '+79123456789'

Practical Problems and Solutions

Handling User Input

Properly handling user input is critical for program reliability:

def get_user_choice(prompt, valid_choices):
    """Get user choice with validation"""
    while True:
        user_input = input(prompt).strip().lower()
        if user_input in [choice.lower() for choice in valid_choices]:
            return user_input
        print(f"Please choose from: {', '.join(valid_choices)}")

def parse_numbers_list(input_string):
    """Parse a list of numbers from a string"""
    try:
        numbers = []
        for item in input_string.split(","):
            clean_item = item.strip()
            if clean_item:
                numbers.append(float(clean_item))
        return numbers
    except ValueError:
        return None

# Using the functions
choice = get_user_choice("Choose an action (yes/no): ", ["yes", "no"])
numbers_input = "1.5, 2.3, 3.7, 4.1"
numbers = parse_numbers_list(numbers_input)
print(f"Numbers: {numbers}")  # [1.5, 2.3, 3.7, 4.1]

Analyzing and Processing Text Data

String methods are widely used for analyzing text data:

def analyze_text(text):
    """Analyze textual information"""
    # Basic statistics
    words = text.split()
    sentences = text.split(".")
    
    # Count different character types
    letters = sum(1 for char in text if char.isalpha())
    digits = sum(1 for char in text if char.isdigit())
    spaces = sum(1 for char in text if char.isspace())
    
    return {
        "Total characters": len(text),
        "Words": len(words),
        "Sentences": len([s for s in sentences if s.strip()]),
        "Letters": letters,
        "Digits": digits,
        "Spaces": spaces
    }

sample_text = "Python 3.9 is an excellent programming language. It is easy to learn."
stats = analyze_text(sample_text)
for key, value in stats.items():
    print(f"{key}: {value}")

Working with File Paths

String operations are often used for working with file paths:

def get_file_info(filepath):
    """Extract file information from a path"""
    # Split the path into components
    if "/" in filepath:
        parts = filepath.split("/")
    else:
        parts = filepath.split("\\")
    
    filename = parts[-1]
    directory = "/".join(parts[:-1]) if len(parts) > 1 else "."
    
    # Extract name and extension
    if "." in filename:
        name, extension = filename.rsplit(".", 1)
    else:
        name, extension = filename, ""
    
    return {
        "Full Path": filepath,
        "Directory": directory,
        "Filename": filename,
        "Name without extension": name,
        "Extension": extension
    }

file_path = "/home/user/documents/report.pdf"
info = get_file_info(file_path)
for key, value in info.items():
    print(f"{key}: {value}")

Regular Expressions for Advanced Processing

Fundamentals of the `re` Module

The `re` module provides powerful tools for working with regular expressions. They are indispensable for complex pattern-based search and replacement.

import re

# Find email addresses
text = "Contacts: admin@site.com, user@example.org"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
print(emails)  # ['admin@site.com', 'user@example.org']

# Find phone numbers
text = "Call: +7(912)345-67-89 or 8-800-555-35-35"
phones = re.findall(r'[+]?[7-8][\s\-\(\)]?\d{3}[\s\-\(\)]?\d{3}[\s\-]?\d{2}[\s\-]?\d{2}', text)
print(phones)  # ['+7(912)345-67-89', '8-800-555-35-35']

Key Functions of the `re` Module

Core functions of the `re` module for various text processing tasks:

import re

text = "Python 3.9.7 was released in 2021"

# re.search() - find the first match
match = re.search(r'\d+\.\d+\.\d+', text)
if match:
    print(f"Version found: {match.group()}")  # 'Version found: 3.9.7'

# re.findall() - find all matches
numbers = re.findall(r'\d+', text)
print(f"All numbers: {numbers}")  # ['3', '9', '7', '2021']

# re.sub() - replace by pattern
normalized = re.sub(r'\s+', ' ', "too   many    spaces")
print(f"Normalized: '{normalized}'")  # 'too many spaces'

# re.split() - split by pattern
data = "item1;item2,item3:item4"
items = re.split(r'[;,:]+', data)
print(f"Items: {items}")  # ['item1', 'item2', 'item3', 'item4']

Practical Regular Expression Patterns

Useful patterns for typical text processing tasks:

import re

def validate_data(data_dict):
    """Validate different data types"""
    patterns = {
        'email': r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$',
        'phone': r'^[+]?[7-8][\d\s\-\(\)]{10,15}$',
        'url': r'^https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
        'date': r'^\d{2}\.\d{2}\.\d{4}$',
        'time': r'^\d{2}:\d{2}$'
    }
    
    results = {}
    for key, value in data_dict.items():
        if key in patterns:
            results[key] = bool(re.match(patterns[key], value))
        else:
            results[key] = True
    
    return results

# Testing validation
test_data = {
    'email': 'user@example.com',
    'phone': '+7(912)345-67-89',
    'url': 'https://python.org',
    'date': '15.03.2024',
    'time': '14:30'
}

validation_results = validate_data(test_data)
for field, is_valid in validation_results.items():
    status = "✓" if is_valid else "✗"
    print(f"{field}: {status}")

Frequently Asked Questions (FAQ)

How to Find All Occurrence Positions of a Substring?

To find all positions, use a loop or regular expressions:

def find_all_positions(text, substring):
    """Find all occurrence positions of a substring"""
    positions = []
    start = 0
    while True:
        pos = text.find(substring, start)
        if pos == -1:
            break
        positions.append(pos)
        start = pos + 1
    return positions

text = "banana"
positions = find_all_positions(text, "a")
print(positions)  # [1, 3, 5]

# Alternative with regular expressions
import re
positions = [m.start() for m in re.finditer("a", text)]
print(positions)  # [1, 3, 5]

How to Perform Multiple Replacements Efficiently?

For multiple replacements, use regular expressions or a pre-compiled dictionary:

def multiple_replace(text, replacements):
    """Efficient multiple replacement"""
    import re
    
    # Create a pattern from dictionary keys
    pattern = '|'.join(re.escape(key) for key in replacements.keys())
    
    # Replacement function
    def replace_func(match):
        return replacements[match.group(0)]
    
    return re.sub(pattern, replace_func, text)

text = "red blue green red"
replacements = {
    "red": "rouge",
    "blue": "bleu", 
    "green": "vert"
}

result = multiple_replace(text, replacements)
print(result)  # 'rouge bleu vert rouge'

How to Safely Combine Strings with Numbers?

Always convert numbers to strings before combining:

# Correct ways
number = 42
text1 = f"Number: {number}"
text2 = "Number: " + str(number)
text3 = "Number: {}".format(number)

# For lists of numbers
numbers = [1, 2, 3, 4, 5]
text = ", ".join(str(n) for n in numbers)
print(text)  # '1, 2, 3, 4, 5'

# Formatting with precision
price = 19.99
formatted = f"Price: ${price:.2f}"
print(formatted)  # 'Price: $19.99'

How to Remove Characters from a String by Index?

Since strings are immutable, create a new string without the desired characters:

def remove_char_at_index(text, index):
    """Remove a character at a specific index"""
    if 0 <= index < len(text):
        return text[:index] + text[index + 1:]
    return text

def remove_chars_at_indices(text, indices):
    """Remove characters at multiple indices"""
    # Sort indices in descending order
    for index in sorted(indices, reverse=True):
        if 0 <= index < len(text):
            text = text[:index] + text[index + 1:]
    return text

original = "Python"
without_char = remove_char_at_index(original, 2)  # remove 't'
print(without_char)  # 'Pyhon'

without_multiple = remove_chars_at_indices("programming", [0, 2, 4])
print(without_multiple)  # 'roramming'

How to Work with Strings Efficiently in Loops?

Avoid concatenation in loops; use lists and `join()` instead:

# Inefficient
result = ""
for i in range(1000):
    result += f"item{i} "  # creates a new string in each iteration

# Efficient
items = []
for i in range(1000):
    items.append(f"item{i}")
result = " ".join(items)

# Even better - list comprehension
result = " ".join(f"item{i}" for i in range(1000))

# For processing a list of strings
lines = [" line 1 ", " line 2 ", "line 3  "]
cleaned = [line.strip() for line in lines if line.strip()]
print(cleaned)  # ['line 1', 'line 2', 'line 3']

Best Practices for Effective Use

Choosing the Right Method

Different tasks require different approaches to string manipulation:

Simple substring search: use in or find() Complex patterns: use regular expressions Multiple replacements: combine re.sub() with functions Formatting: prefer f-strings

Performance Optimization

Following simple rules will help you write efficient code:

# Bad: repeated concatenation
result = ""
for word in words:
    result += word + " "

# Good: using join()
result = " ".join(words)

# Bad: multiple method calls
for line in lines:
    if line.strip().lower().startswith("error"):
        process_error(line)

# Good: storing the result
for line in lines:
    clean_line = line.strip().lower()
    if clean_line.startswith("error"):
        process_error(line)

Conclusion

Working with strings in Python is a fundamental skill for any developer. The language provides a rich set of built-in methods for handling almost any text processing task.

Core tools include search methods (find(), index()), replacement (replace()), splitting (split()), joining (join()), and cleaning (strip()). Modern f-strings provide convenient and efficient formatting. For complex tasks, regular expressions from the re module are indispensable.

Efficient string manipulation requires an understanding of immutability, choosing the right methods for specific tasks, and following performance best practices. Mastering these principles will enable you to create reliable and effective programs for processing text data, handling user input, and solving information analysis problems.

News