Key Data Storage Formats in Python
Working with Excel and CSV files is a crucial skill for Python developers, especially in data analysis, report automation, and backend system development. Correctly saving data to files enables efficient storage, transfer, and processing of information between different systems.
This guide covers the main methods for saving data in popular formats using standard and third-party Python libraries. You'll learn practical examples of working with CSV and Excel files, mastering techniques for formatting and structuring data.
CSV Format: Features and Applications
What is CSV Format?
CSV (Comma-Separated Values) is a simple text format for storing tabular data. In this format, values are separated by commas or other delimiters, such as semicolons or tabs. CSV format is widely used for exchanging data between various applications and systems.
Advantages of CSV Format
- Lightweight and readable text format
- Quickly opens in any text editor, including Excel
- Ideal for transferring data between systems
- Small file size compared to binary formats
- Supported by most data processing programs
Limitations of CSV Format
- No support for complex formatting (colors, fonts)
- Inability to store formulas and calculations
- Limited support for different data types
- No possibility of storing images and nested tables
Working with CSV via the Standard Library
Basics of Using the csv Module
The csv module is included in the standard Python distribution and requires no additional installation. It provides convenient tools for reading and writing CSV files with various formatting settings.
import csv
data = [
['Имя', 'Возраст', 'Город'],
['Иван', 28, 'Москва'],
['Мария', 25, 'Санкт-Петербург']
]
with open('people.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerows(data)
After executing this code, a file named people.csv is created with the following content:
Имя,Возраст,Город
Иван,28,Москва
Мария,25,Санкт-Петербург
Setting Delimiters in CSV
The standard delimiter in CSV can be changed depending on system requirements or regional settings:
with open('people.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file, delimiter=';')
writer.writerows(data)
Working with DictWriter
For convenient work with data in the form of dictionaries, the DictWriter class is used:
import csv
data = [
{'Имя': 'Иван', 'Возраст': 28, 'Город': 'Москва'},
{'Имя': 'Мария', 'Возраст': 25, 'Город': 'Санкт-Петербург'}
]
with open('people.csv', 'w', newline='', encoding='utf-8') as file:
fieldnames = ['Имя', 'Возраст', 'Город']
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(data)
Working with Excel Files via openpyxl
Installation and Configuration of the Library
Excel files are used when it is necessary to store structured data with formatting, formulas, or charts. The openpyxl library provides a complete set of tools for working with Excel files.
pip install openpyxl
Creating a Basic Excel File
from openpyxl import Workbook
data = [
['Имя', 'Возраст', 'Город'],
['Иван', 28, 'Москва'],
['Мария', 25, 'Санкт-Петербург']
]
wb = Workbook()
ws = wb.active
for row in data:
ws.append(row)
wb.save('people.xlsx')
Adding Formatting in Excel
The openpyxl library allows you to apply various formatting styles to cells:
from openpyxl.styles import Font, PatternFill, Alignment
# Formatting the header
ws['A1'].font = Font(bold=True, color='FF0000')
ws['A1'].fill = PatternFill(start_color='FFFF00', end_color='FFFF00', fill_type='solid')
ws['A1'].alignment = Alignment(horizontal='center')
wb.save('people_styled.xlsx')
Working with Formulas
You can add calculated formulas to Excel files:
ws['D1'] = 'Возраст через 10 лет'
ws['D2'] = '=B2+10'
ws['D3'] = '=B3+10'
wb.save('people_with_formulas.xlsx')
Using pandas for Data Handling
Installing the pandas Library
The pandas library provides powerful tools for working with large amounts of data and supports many file formats:
pip install pandas openpyxl
Saving Data to CSV with pandas
import pandas as pd
data = {
'Имя': ['Иван', 'Мария'],
'Возраст': [28, 25],
'Город': ['Москва', 'Санкт-Петербург']
}
df = pd.DataFrame(data)
df.to_csv('people_pandas.csv', index=False, sep=';', encoding='utf-8')
Creating Excel Files with pandas
df.to_excel('people_pandas.xlsx', index=False, engine='openpyxl')
Working with Multiple Excel Sheets
Pandas allows you to create Excel files with multiple sheets:
with pd.ExcelWriter('multi_sheet.xlsx', engine='openpyxl') as writer:
df.to_excel(writer, sheet_name='Сотрудники', index=False)
# Creating a second sheet with other data
sales_data = pd.DataFrame({
'Месяц': ['Январь', 'Февраль', 'Март'],
'Продажи': [10000, 15000, 12000]
})
sales_data.to_excel(writer, sheet_name='Продажи', index=False)
Additional Features and Settings
Working with Encodings
When working with different systems, it is important to correctly specify the file encoding:
# For CSV files
df.to_csv('data_utf8.csv', encoding='utf-8', index=False)
df.to_csv('data_cp1251.csv', encoding='cp1251', index=False)
# For Excel, encoding is usually not required
Data Filtering When Saving
# Saving only certain columns
df[['Имя', 'Город']].to_csv('filtered_columns.csv', index=False)
# Saving filtered rows
filtered_df = df[df['Возраст'] > 26]
filtered_df.to_csv('filtered_rows.csv', index=False)
Configuring Recording Parameters
# Various parameters for CSV
df.to_csv('custom.csv',
sep=';', # separator
decimal=',', # decimal separator
index=False, # do not save indexes
header=True, # save headers
quoting=csv.QUOTE_ALL) # enclose all values in quotes
Solving Typical Problems
Problems with Separators
When working with CSV files in different regions, problems may arise with separators:
# Using a semicolon for European systems
df.to_csv('european.csv', sep=';', decimal=',')
# Automatically detecting the separator when reading
df = pd.read_csv('unknown.csv', sep=None, engine='python')
Working with Large Files
For processing large amounts of data, use reading and writing in parts:
# Writing in parts
for chunk in pd.read_csv('large_file.csv', chunksize=10000):
processed_chunk = chunk.copy() # data processing
processed_chunk.to_csv('output.csv', mode='a', header=False, index=False)
Saving Multi-Level Indexes
# Creating a DataFrame with a multi-level index
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df_multi = pd.DataFrame({'values': [1, 2, 3, 4]}, index=index)
# Saving while preserving the index structure
df_multi.to_csv('multi_index.csv', index=True)
df_multi.to_excel('multi_index.xlsx', index=True)
Choosing the Optimal Approach
The choice of data saving method depends on the specific requirements of the project:
Use CSV when:
- Simple data transfer between systems is required
- Minimum file size is required
- Data has a simple tabular structure
- Compatibility with various programs is required
Choose Excel for:
- Creating complex reports with formatting
- Working with formulas and calculations
- The need for multiple sheets in one file
- Creating documents for data presentation
Apply pandas when:
- Processing large arrays of data
- The need for flexible data operations
- Working with complex data structures
- Integration with data analysis systems
The Future of AI in Mathematics and Everyday Life: How Intelligent Agents Are Already Changing the Game
Experts warned about the risks of fake charity with AI
In Russia, universal AI-agent for robots and industrial processes was developed