Working with files in Python: text and binary files
Working with files in Python is one of the main programming tasks. Python provides convenient tools for working with both text and binary files. Properly understanding the differences between these file types will help you efficiently process data in various formats.
Text files in Python
Text files contain data in the form of characters that can be read and interpreted by humans. These files include documents with extensions.txt, .csv, .json, .html and other files containing text information.
Text file access modes
Python provides the following modes for working with text files:
- 'r' - reading the file (the file must exist)
- 'w' - writing to a file (creates a new file or overwrites an existing one)
- 'a' - adding data to the end of the file (creates a new file or adds data to an existing one)
- 'r+' - read and write (the file must exist)
- 'w+' - write and read (creates a new file or overwrites an existing one)
- 'a+' - adding and reading (creates a new file or adds data to an existing one)
Examples of working with text files
# Reading from a text file
with open("text_file.txt", "r", encoding="utf-8") as file:
data = file.read()
print(data)
# Writing to a text file
with open("text_file.txt", "w", encoding="utf-8") as file:
file.write("Hello, world!\n")
file.write("This is a text file in Russian.")
# Adding to a text file
with open("text_file.txt", "a", encoding="utf-8") as file:
file.write("\Add a new line to the file.")
Binary files in Python
Binary files contain data in the form of a sequence of bytes that are not intended for human reading. These files include images (.jpg, .png), audio files (.mp3, .wav), video files (.mp4, .avi), archives (.zip, .rar) and executable files.
Binary file access modes
The following modes are used to work with binary files:
- 'rb' - reading a binary file (the file must exist)
- 'wb' - writing to a binary file (creates a new file or overwrites an existing one)
- 'ab' - adding data to the end of the binary file
- 'rb+' - reading and writing a binary file (the file must exist)
- 'wb+' - writing and reading a binary file (creates a new file or overwrites an existing one)
- 'ab+' - adding and reading a binary file
Examples of working with binary files
# Reading from a binary file
with open("binary_file.bin", "rb") as file:
data = file.read()
print(data)
# Writing to a binary file
with open("binary_file.bin", "wb") as file:
file.write(b"\x48\x65\x6C\x6C\x6F\x2C\x20\x77\x6F\x72\x6C\x64\x21") # "Hello, world!" in bytes
# Adding to a binary file
with open("binary_file.bin", "ab") as file:
file.write(b"\x0A\x4E\x65\x77\x20\x61\x74\x61") #"\New data" in bytes
Context manager with in Python
Using the context manager with is the best practice when working with files in Python. It ensures that the file is automatically closed after operations are completed, even if an exception occurs during operation.
# The correct way to work with files
with open("example.txt", "r") as file:
content = file.read()
# The file will automatically close after exiting the with block
Reading files line by line
For efficient processing of large text files, it is recommended to read them line by line:
with open("example.txt", "r", encoding="utf-8") as file:
for line in file:
print(line.strip()) # strip() deletes newline characters
Writing a list of lines to a file
Python allows you to write the entire list of strings in one operation:
lines = ["First line\n", "Second line\n", "Third line\n"]
with open("example.txt ", "w", encoding="utf-8") as file:
file.writelines(lines)
Reading a file into a list of lines
To upload the entire contents of the file to the list, use the readlines() method:
with open("example.txt", "r", encoding="utf-8") as file:
lines = file.readlines()
print(lines)
Working with large files
When working with large files, it is important to save memory by reading the file in parts:
def read_large_file(filename, chunk_size=1024):
"""Reading a large file in parts"""
with open(filename, "r", encoding="utf-8") as file:
while True:
chunk = file.read(chunk_size)
if not chunk:
break
yield chunk
# Using the function
for chunk in read_large_file("large_file.txt"):
process_chunk(chunk) # Processing a part of a file
Error handling when working with files
It is important to properly handle possible errors when working with files:
try:
with open("nonexistent_file.txt", "r") as file:
content = file.read()
except FileNotFoundError:
print("File not found")
except PermissionError:
print("Insufficient permissions to access the file")
except Exception as e:
print(f"Error occurred: {e}")
Checking the existence of a file
Before working with a file, it is useful to check its existence.:
import os
if os.path.exists("example.txt"):
with open("example.txt", "r") as file:
content = file.read()
else:
print("The file does not exist")
File encoding
When working with text files, it is important to specify the correct encoding, especially for Cyrillic files.:
# Explicit UTF-8 encoding
with open("russian_text.txt", "w", encoding="utf-8") as file:
file.write("Text in Russian")
# Reading with encoding indication
with open("russian_text.txt", "r", encoding="utf-8") as file:
content = file.read()
Conclusion
Working with files in Python requires understanding the differences between text and binary files, proper use of access modes, and mandatory use of the context manager with. This knowledge will allow you to efficiently process various types of files and avoid common mistakes when working with data.