Installation and Setup of Dataset
Installing the Library
pip install dataset
Additional drivers may be required for specific databases:
# PostgreSQL
pip install psycopg2-binary
# MySQL
pip install pymysql
# SQLite driver is built into Python
Connecting to Different Databases
import dataset
# SQLite (local file)
db = dataset.connect('sqlite:///mydatabase.db')
# PostgreSQL
db = dataset.connect('postgresql://user:password@localhost:5432/dbname')
# MySQL
db = dataset.connect('mysql://user:password@localhost/dbname')
# In‑memory (for testing)
db = dataset.connect('sqlite:///:memory:')
Core Principles of Dataset
Dataset operates on the following principles:
- Automatic table creation – tables are created on first use
- Dynamic schema – table structure is inferred automatically when data is inserted
- Dictionary interface – data is represented as Python dictionaries
- No SQL required – all operations are performed through Python methods
Creating and Managing Tables
Getting a Table
# Retrieve a table (created automatically if it does not exist)
users_table = db['users']
# Alternative method
users_table = db.get_table('users')
Creating a Table with Custom Settings
# Create a table with a custom primary key
db.create_table('products', primary_id='product_id', primary_type=db.types.string)
Inserting Data
Inserting a Single Record
table = db['users']
# Insert one user
table.insert({
'name': 'Ivan Petrov',
'email': 'ivan@example.com',
'age': 30,
'city': 'Moscow'
})
Bulk Insertion
# Insert multiple records
users_data = [
{'name': 'Anna Sidorova', 'email': 'anna@example.com', 'age': 25},
{'name': 'Peter Ivanov', 'email': 'peter@example.com', 'age': 35},
{'name': 'Maria Smirnova', 'email': 'maria@example.com', 'age': 28}
]
table.insert_many(users_data)
Upsert Operations
# Insert or update (if a record with the same email already exists)
table.upsert({
'name': 'Ivan Petrov',
'email': 'ivan@example.com',
'age': 31,
'city': 'Saint Petersburg'
}, ['email'])
Searching and Filtering Data
Retrieving All Records
# Get all rows
all_users = list(table.all())
for user in all_users:
print(f"{user['name']} - {user['email']}")
Conditional Queries
# Find a single record
user = table.find_one(name='Ivan Petrov')
# Find all records matching a condition
young_users = list(table.find(age={'<': 30}))
# Complex conditions
moscow_users = list(table.find(city='Moscow', age={'>=': 25}))
Advanced Filtering Operators
# Various comparison operators
results = table.find(age={'>=': 25, '<=': 35})
results = table.find(name={'like': '%Ivan%'})
results = table.find(city={'in': ['Moscow', 'Saint Petersburg']})
Distinct Values
# Get distinct cities
unique_cities = list(table.distinct('city'))
Updating and Deleting Data
Updating Records
# Update by primary key
table.update({'id': 1, 'age': 32}, ['id'])
# Update with multiple conditions
table.update({'name': 'Anna Sidorova', 'city': 'Kazan'}, ['name'])
Deleting Records
# Delete by condition
table.delete(name='Peter Ivanov')
# Delete with multiple conditions
table.delete(age={'<': 18})
# Clear the whole table
table.delete()
Working with Transactions
Automatic Transactions
# All operations inside the block run in a single transaction
with db as tx:
tx['users'].insert({'name': 'Test1', 'email': 'test1@example.com'})
tx['users'].insert({'name': 'Test2', 'email': 'test2@example.com'})
# If an exception occurs, all changes are rolled back
Manual Transaction Management
# Start a transaction
transaction = db.begin()
try:
db['users'].insert({'name': 'Test', 'email': 'test@example.com'})
transaction.commit()
except Exception as e:
transaction.rollback()
print(f"Error: {e}")
Exporting and Importing Data
Export to CSV
import csv
# Export all users to CSV
with open('users.csv', 'w', newline='', encoding='utf-8') as csvfile:
fieldnames = ['id', 'name', 'email', 'age', 'city']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for user in table.all():
writer.writerow(user)
Export to JSON
import json
# Export to JSON
users_data = list(table.all())
with open('users.json', 'w', encoding='utf-8') as jsonfile:
json.dump(users_data, jsonfile, ensure_ascii=False, indent=2)
Import from CSV
import csv
# Import from CSV
with open('users.csv', 'r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
users_to_import = []
for row in reader:
users_to_import.append({
'name': row['name'],
'email': row['email'],
'age': int(row['age']),
'city': row['city']
})
table.insert_many(users_to_import)
Integration with Pandas
Converting to a DataFrame
import pandas as pd
# Load data from the database into a DataFrame
df = pd.DataFrame(table.all())
# Data analysis
print(df.describe())
print(df.groupby('city').size())
Saving a DataFrame to the Database
# Create a DataFrame
df = pd.DataFrame({
'name': ['Alexey', 'Elena', 'Dmitry'],
'age': [29, 34, 27],
'city': ['Moscow', 'Saint Petersburg', 'Kazan']
})
# Insert into the database
table.insert_many(df.to_dict('records'))
Executing Arbitrary SQL Queries
SELECT Queries
# Run a raw SELECT query
results = db.query('''
SELECT city, COUNT(*) AS user_count
FROM users
GROUP BY city
ORDER BY user_count DESC
''')
for row in results:
print(f"{row['city']}: {row['user_count']} users")
Other SQL Commands
# Create an index
db.query('CREATE INDEX idx_users_email ON users(email)')
# Update data via SQL
db.query('UPDATE users SET city = ? WHERE age > ?', ['Moscow', 30])
Working with Metadata
Inspecting Tables
# List all tables
print("Database tables:", db.tables)
# Columns of a specific table
print("Columns in users table:", table.columns)
# Number of rows
print("Number of users:", len(table))
Checking Table Existence
# Verify if a table exists
if 'users' in db.tables:
print("Table 'users' exists")
else:
print("Table 'users' not found")
Complete Reference of Dataset Methods and Functions
| Category | Method / Function | Description | Example |
|---|---|---|---|
| Connection | dataset.connect(url) |
Connect to a database | db = dataset.connect('sqlite:///db.db') |
| Tables | db['table_name'] |
Retrieve a table | table = db['users'] |
db.get_table('name') |
Alternative way to get a table | table = db.get_table('users') |
|
db.create_table(name, **kwargs) |
Create a table with options | db.create_table('users', primary_id='uid') |
|
db.tables |
List of all tables | print(db.tables) |
|
| Insertion | table.insert(dict) |
Insert a single record | table.insert({'name': 'Ivan'}) |
table.insert_many(list) |
Bulk insert | table.insert_many([{'name': 'Anna'}]) |
|
table.upsert(dict, keys) |
Insert or update | table.upsert({'name': 'Ivan'}, ['name']) |
|
| Querying | table.all() |
Retrieve all rows | for row in table.all(): print(row) |
table.find(**filters) |
Conditional find | table.find(age={'>=': 18}) |
|
table.find_one(**filters) |
Find a single row | user = table.find_one(name='Ivan') |
|
table.distinct('field') |
Distinct values | cities = table.distinct('city') |
|
| Updating | table.update(dict, keys) |
Update rows | table.update({'age': 30}, ['id']) |
| Deletion | table.delete(**filters) |
Delete rows | table.delete(name='Ivan') |
| SQL Queries | db.query(sql) |
Execute a SELECT query | db.query('SELECT * FROM users') |
db.executable(sql) |
Run non‑SELECT SQL commands | db.executable('CREATE INDEX ...') |
|
| Transactions | with db as tx: |
Automatic transaction block | with db as tx: tx['users'].insert({}) |
db.begin() |
Start a transaction | tx = db.begin() |
|
tx.commit() |
Commit the transaction | tx.commit() |
|
tx.rollback() |
Rollback the transaction | tx.rollback() |
|
| Metadata | table.columns |
List of table columns | print(table.columns) |
len(table) |
Row count | count = len(table) |
|
table.exists |
Existence check | if table.exists: print('Exists') |
Using Dataset in Web Applications
Flask Integration
from flask import Flask, request, jsonify
import dataset
app = Flask(__name__)
db = dataset.connect('sqlite:///app.db')
@app.route('/users', methods=['POST'])
def create_user():
data = request.get_json()
user_id = db['users'].insert(data)
return jsonify({'id': user_id, 'status': 'created'})
@app.route('/users', methods=['GET'])
def get_users():
users = list(db['users'].all())
return jsonify(users)
@app.route('/users/<int:user_id>', methods=['PUT'])
def update_user(user_id):
data = request.get_json()
data['id'] = user_id
db['users'].update(data, ['id'])
return jsonify({'status': 'updated'})
FastAPI Integration
from fastapi import FastAPI
from pydantic import BaseModel
import dataset
app = FastAPI()
db = dataset.connect('sqlite:///fastapi.db')
class User(BaseModel):
name: str
email: str
age: int
@app.post('/users/')
async def create_user(user: User):
user_id = db['users'].insert(user.dict())
return {'id': user_id, 'message': 'User created'}
@app.get('/users/')
async def get_users():
return list(db['users'].all())
Testing with Dataset
In‑Memory Database for Tests
import pytest
import dataset
@pytest.fixture
def test_db():
"""Create an in‑memory test database"""
db = dataset.connect('sqlite:///:memory:')
yield db
db.close()
def test_user_creation(test_db):
"""Test creating a user"""
users = test_db['users']
user_id = users.insert({'name': 'Test', 'email': 'test@example.com'})
assert user_id is not None
user = users.find_one(id=user_id)
assert user['name'] == 'Test'
assert user['email'] == 'test@example.com'
def test_user_update(test_db):
"""Test updating a user"""
users = test_db['users']
user_id = users.insert({'name': 'Test', 'age': 25})
users.update({'id': user_id, 'age': 26}, ['id'])
updated_user = users.find_one(id=user_id)
assert updated_user['age'] == 26
Comparison with Other Solutions
| Criterion | Dataset | SQLAlchemy ORM | Pandas | TinyDB |
|---|---|---|---|---|
| Ease of Learning | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Performance | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
| Feature Set | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Scalability | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐ |
| Prototyping Suitability | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Best Practices for Using Dataset
Code Organization
# Separate module for database operations
class DatabaseManager:
def __init__(self, db_url):
self.db = dataset.connect(db_url)
def add_user(self, user_data):
return self.db['users'].insert(user_data)
def get_user_by_email(self, email):
return self.db['users'].find_one(email=email)
def update_user(self, user_id, update_data):
update_data['id'] = user_id
return self.db['users'].update(update_data, ['id'])
def delete_user(self, user_id):
return self.db['users'].delete(id=user_id)
Error Handling
def safe_database_operation(func):
"""Decorator for safe database operations"""
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except Exception as e:
print(f"Database error: {e}")
return None
return wrapper
@safe_database_operation
def create_user(name, email):
return db['users'].insert({'name': name, 'email': email})
Frequently Asked Questions
Can Dataset Be Used in Production?
Dataset is suitable for production in small‑to‑medium projects where complex business logic is not required. For large‑scale applications, a full‑featured ORM like SQLAlchemy is recommended.
Does Dataset Support Schema Migrations?
Dataset automatically creates and alters tables when new fields appear. For complex migrations, use Alembic together with SQLAlchemy.
How to Ensure Data Security?
Dataset uses parameterised queries, protecting against SQL injection. Additionally, validate input data and configure proper database access permissions.
Can Dataset Be Used with Asynchronous Code?
Dataset operates synchronously. For async database interactions, consider libraries such as asyncpg or Tortoise‑ORM.
How to Optimize Performance?
Performance tips:
- Use bulk operations (
insert_many) - Create indexes on frequently queried columns
- Group operations inside transactions
- Apply filters at the database level instead of in Python
Does Dataset Support JSON Fields?
Dataset automatically serialises dictionaries and lists to JSON when storing them and deserialises them back to Python objects on retrieval.
Conclusion
Dataset is a powerful yet easy‑to‑use tool for database work in Python. Its intuitive API and minimal setup make it ideal for rapid prototyping, automation scripts, web applications, and analytical projects.
Key benefits of Dataset:
- Very low learning curve
- Automatic schema management
- Support for many DBMSs
- Great integration with the Python ecosystem
- Fits both quick scripts and full‑featured applications
Choose Dataset when you need simplicity without sacrificing functionality, and move to more advanced ORM solutions only when you require their extended capabilities.
The Future of AI in Mathematics and Everyday Life: How Intelligent Agents Are Already Changing the Game
Experts warned about the risks of fake charity with AI
In Russia, universal AI-agent for robots and industrial processes was developed