DatAST - Work with Pandas -style bases

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

Installation and Setup of Dataset

Installing the Library

pip install dataset

Additional drivers may be required for specific databases:

# PostgreSQL
pip install psycopg2-binary

# MySQL
pip install pymysql

# SQLite driver is built into Python

Connecting to Different Databases

import dataset

# SQLite (local file)
db = dataset.connect('sqlite:///mydatabase.db')

# PostgreSQL
db = dataset.connect('postgresql://user:password@localhost:5432/dbname')

# MySQL
db = dataset.connect('mysql://user:password@localhost/dbname')

# In‑memory (for testing)
db = dataset.connect('sqlite:///:memory:')

Core Principles of Dataset

Dataset operates on the following principles:

  1. Automatic table creation – tables are created on first use
  2. Dynamic schema – table structure is inferred automatically when data is inserted
  3. Dictionary interface – data is represented as Python dictionaries
  4. No SQL required – all operations are performed through Python methods

Creating and Managing Tables

Getting a Table

# Retrieve a table (created automatically if it does not exist)
users_table = db['users']

# Alternative method
users_table = db.get_table('users')

Creating a Table with Custom Settings

# Create a table with a custom primary key
db.create_table('products', primary_id='product_id', primary_type=db.types.string)

Inserting Data

Inserting a Single Record

table = db['users']

# Insert one user
table.insert({
    'name': 'Ivan Petrov',
    'email': 'ivan@example.com',
    'age': 30,
    'city': 'Moscow'
})

Bulk Insertion

# Insert multiple records
users_data = [
    {'name': 'Anna Sidorova', 'email': 'anna@example.com', 'age': 25},
    {'name': 'Peter Ivanov', 'email': 'peter@example.com', 'age': 35},
    {'name': 'Maria Smirnova', 'email': 'maria@example.com', 'age': 28}
]

table.insert_many(users_data)

Upsert Operations

# Insert or update (if a record with the same email already exists)
table.upsert({
    'name': 'Ivan Petrov',
    'email': 'ivan@example.com',
    'age': 31,
    'city': 'Saint Petersburg'
}, ['email'])

Searching and Filtering Data

Retrieving All Records

# Get all rows
all_users = list(table.all())
for user in all_users:
    print(f"{user['name']} - {user['email']}")

Conditional Queries

# Find a single record
user = table.find_one(name='Ivan Petrov')

# Find all records matching a condition
young_users = list(table.find(age={'<': 30}))

# Complex conditions
moscow_users = list(table.find(city='Moscow', age={'>=': 25}))

Advanced Filtering Operators

# Various comparison operators
results = table.find(age={'>=': 25, '<=': 35})
results = table.find(name={'like': '%Ivan%'})
results = table.find(city={'in': ['Moscow', 'Saint Petersburg']})

Distinct Values

# Get distinct cities
unique_cities = list(table.distinct('city'))

Updating and Deleting Data

Updating Records

# Update by primary key
table.update({'id': 1, 'age': 32}, ['id'])

# Update with multiple conditions
table.update({'name': 'Anna Sidorova', 'city': 'Kazan'}, ['name'])

Deleting Records

# Delete by condition
table.delete(name='Peter Ivanov')

# Delete with multiple conditions
table.delete(age={'<': 18})

# Clear the whole table
table.delete()

Working with Transactions

Automatic Transactions

# All operations inside the block run in a single transaction
with db as tx:
    tx['users'].insert({'name': 'Test1', 'email': 'test1@example.com'})
    tx['users'].insert({'name': 'Test2', 'email': 'test2@example.com'})
    # If an exception occurs, all changes are rolled back

Manual Transaction Management

# Start a transaction
transaction = db.begin()

try:
    db['users'].insert({'name': 'Test', 'email': 'test@example.com'})
    transaction.commit()
except Exception as e:
    transaction.rollback()
    print(f"Error: {e}")

Exporting and Importing Data

Export to CSV

import csv

# Export all users to CSV
with open('users.csv', 'w', newline='', encoding='utf-8') as csvfile:
    fieldnames = ['id', 'name', 'email', 'age', 'city']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    
    writer.writeheader()
    for user in table.all():
        writer.writerow(user)

Export to JSON

import json

# Export to JSON
users_data = list(table.all())
with open('users.json', 'w', encoding='utf-8') as jsonfile:
    json.dump(users_data, jsonfile, ensure_ascii=False, indent=2)

Import from CSV

import csv

# Import from CSV
with open('users.csv', 'r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    users_to_import = []
    
    for row in reader:
        users_to_import.append({
            'name': row['name'],
            'email': row['email'],
            'age': int(row['age']),
            'city': row['city']
        })
    
    table.insert_many(users_to_import)

Integration with Pandas

Converting to a DataFrame

import pandas as pd

# Load data from the database into a DataFrame
df = pd.DataFrame(table.all())

# Data analysis
print(df.describe())
print(df.groupby('city').size())

Saving a DataFrame to the Database

# Create a DataFrame
df = pd.DataFrame({
    'name': ['Alexey', 'Elena', 'Dmitry'],
    'age': [29, 34, 27],
    'city': ['Moscow', 'Saint Petersburg', 'Kazan']
})

# Insert into the database
table.insert_many(df.to_dict('records'))

Executing Arbitrary SQL Queries

SELECT Queries

# Run a raw SELECT query
results = db.query('''
    SELECT city, COUNT(*) AS user_count 
    FROM users 
    GROUP BY city 
    ORDER BY user_count DESC
''')

for row in results:
    print(f"{row['city']}: {row['user_count']} users")

Other SQL Commands

# Create an index
db.query('CREATE INDEX idx_users_email ON users(email)')

# Update data via SQL
db.query('UPDATE users SET city = ? WHERE age > ?', ['Moscow', 30])

Working with Metadata

Inspecting Tables

# List all tables
print("Database tables:", db.tables)

# Columns of a specific table
print("Columns in users table:", table.columns)

# Number of rows
print("Number of users:", len(table))

Checking Table Existence

# Verify if a table exists
if 'users' in db.tables:
    print("Table 'users' exists")
else:
    print("Table 'users' not found")

Complete Reference of Dataset Methods and Functions

Category Method / Function Description Example
Connection dataset.connect(url) Connect to a database db = dataset.connect('sqlite:///db.db')
Tables db['table_name'] Retrieve a table table = db['users']
  db.get_table('name') Alternative way to get a table table = db.get_table('users')
  db.create_table(name, **kwargs) Create a table with options db.create_table('users', primary_id='uid')
  db.tables List of all tables print(db.tables)
Insertion table.insert(dict) Insert a single record table.insert({'name': 'Ivan'})
  table.insert_many(list) Bulk insert table.insert_many([{'name': 'Anna'}])
  table.upsert(dict, keys) Insert or update table.upsert({'name': 'Ivan'}, ['name'])
Querying table.all() Retrieve all rows for row in table.all(): print(row)
  table.find(**filters) Conditional find table.find(age={'>=': 18})
  table.find_one(**filters) Find a single row user = table.find_one(name='Ivan')
  table.distinct('field') Distinct values cities = table.distinct('city')
Updating table.update(dict, keys) Update rows table.update({'age': 30}, ['id'])
Deletion table.delete(**filters) Delete rows table.delete(name='Ivan')
SQL Queries db.query(sql) Execute a SELECT query db.query('SELECT * FROM users')
  db.executable(sql) Run non‑SELECT SQL commands db.executable('CREATE INDEX ...')
Transactions with db as tx: Automatic transaction block with db as tx: tx['users'].insert({})
  db.begin() Start a transaction tx = db.begin()
  tx.commit() Commit the transaction tx.commit()
  tx.rollback() Rollback the transaction tx.rollback()
Metadata table.columns List of table columns print(table.columns)
  len(table) Row count count = len(table)
  table.exists Existence check if table.exists: print('Exists')

Using Dataset in Web Applications

Flask Integration

from flask import Flask, request, jsonify
import dataset

app = Flask(__name__)
db = dataset.connect('sqlite:///app.db')

@app.route('/users', methods=['POST'])
def create_user():
    data = request.get_json()
    user_id = db['users'].insert(data)
    return jsonify({'id': user_id, 'status': 'created'})

@app.route('/users', methods=['GET'])
def get_users():
    users = list(db['users'].all())
    return jsonify(users)

@app.route('/users/<int:user_id>', methods=['PUT'])
def update_user(user_id):
    data = request.get_json()
    data['id'] = user_id
    db['users'].update(data, ['id'])
    return jsonify({'status': 'updated'})

FastAPI Integration

from fastapi import FastAPI
from pydantic import BaseModel
import dataset

app = FastAPI()
db = dataset.connect('sqlite:///fastapi.db')

class User(BaseModel):
    name: str
    email: str
    age: int

@app.post('/users/')
async def create_user(user: User):
    user_id = db['users'].insert(user.dict())
    return {'id': user_id, 'message': 'User created'}

@app.get('/users/')
async def get_users():
    return list(db['users'].all())

Testing with Dataset

In‑Memory Database for Tests

import pytest
import dataset

@pytest.fixture
def test_db():
    """Create an in‑memory test database"""
    db = dataset.connect('sqlite:///:memory:')
    yield db
    db.close()

def test_user_creation(test_db):
    """Test creating a user"""
    users = test_db['users']
    user_id = users.insert({'name': 'Test', 'email': 'test@example.com'})
    
    assert user_id is not None
    user = users.find_one(id=user_id)
    assert user['name'] == 'Test'
    assert user['email'] == 'test@example.com'

def test_user_update(test_db):
    """Test updating a user"""
    users = test_db['users']
    user_id = users.insert({'name': 'Test', 'age': 25})
    
    users.update({'id': user_id, 'age': 26}, ['id'])
    updated_user = users.find_one(id=user_id)
    assert updated_user['age'] == 26

Comparison with Other Solutions

Criterion Dataset SQLAlchemy ORM Pandas TinyDB
Ease of Learning ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Performance ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐
Feature Set ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐
Scalability ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐ ⭐⭐
Prototyping Suitability ⭐⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐

Best Practices for Using Dataset

Code Organization

# Separate module for database operations
class DatabaseManager:
    def __init__(self, db_url):
        self.db = dataset.connect(db_url)
    
    def add_user(self, user_data):
        return self.db['users'].insert(user_data)
    
    def get_user_by_email(self, email):
        return self.db['users'].find_one(email=email)
    
    def update_user(self, user_id, update_data):
        update_data['id'] = user_id
        return self.db['users'].update(update_data, ['id'])
    
    def delete_user(self, user_id):
        return self.db['users'].delete(id=user_id)

Error Handling

def safe_database_operation(func):
    """Decorator for safe database operations"""
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except Exception as e:
            print(f"Database error: {e}")
            return None
    return wrapper

@safe_database_operation
def create_user(name, email):
    return db['users'].insert({'name': name, 'email': email})

Frequently Asked Questions

Can Dataset Be Used in Production?

Dataset is suitable for production in small‑to‑medium projects where complex business logic is not required. For large‑scale applications, a full‑featured ORM like SQLAlchemy is recommended.

Does Dataset Support Schema Migrations?

Dataset automatically creates and alters tables when new fields appear. For complex migrations, use Alembic together with SQLAlchemy.

How to Ensure Data Security?

Dataset uses parameterised queries, protecting against SQL injection. Additionally, validate input data and configure proper database access permissions.

Can Dataset Be Used with Asynchronous Code?

Dataset operates synchronously. For async database interactions, consider libraries such as asyncpg or Tortoise‑ORM.

How to Optimize Performance?

Performance tips:

  • Use bulk operations (insert_many)
  • Create indexes on frequently queried columns
  • Group operations inside transactions
  • Apply filters at the database level instead of in Python

Does Dataset Support JSON Fields?

Dataset automatically serialises dictionaries and lists to JSON when storing them and deserialises them back to Python objects on retrieval.

Conclusion

Dataset is a powerful yet easy‑to‑use tool for database work in Python. Its intuitive API and minimal setup make it ideal for rapid prototyping, automation scripts, web applications, and analytical projects.

Key benefits of Dataset:

  • Very low learning curve
  • Automatic schema management
  • Support for many DBMSs
  • Great integration with the Python ecosystem
  • Fits both quick scripts and full‑featured applications

Choose Dataset when you need simplicity without sacrificing functionality, and move to more advanced ORM solutions only when you require their extended capabilities.

News