Over the past two years, I've been working on scaling database performance for a high-traffic application that handles millions of daily transactions. What started as a straightforward PostgreSQL setup quickly became a performance bottleneck as our data grew exponentially. This post shares the practical optimization techniques that helped us reduce query times from 15+ seconds to under 200ms.
The Problem: When Good Queries Go Bad
Our application was initially handling around 50,000 records. Fast forward to today, and we're managing over 10 million records with complex relationships. Here's what we learned the hard way.
Initial Performance Issues
-- Our original "simple" query
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= '2024-01-01'
ORDER BY o.created_at DESC;
-- Query time: 18 seconds ❌
This query seemed innocent enough, but it was killing our application's responsiveness.
Strategy 1: Index Optimization
Understanding Index Types
PostgreSQL offers several index types, and choosing the right one makes a massive difference.
1. B-tree Indexes (Default)
Best for equality and range queries:
-- Create B-tree index on frequently queried columns
CREATE INDEX idx_orders_created_at ON orders(created_at);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Composite index for multi-column queries
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, created_at);
Performance Impact: Query time dropped to 8 seconds
2. Partial Indexes
Index only relevant rows:
-- Only index active orders (not archived ones)
CREATE INDEX idx_orders_active
ON orders(created_at)
WHERE status = 'active';
-- Reduced index size by 60%
Performance Impact: Query time dropped to 4.5 seconds
3. GIN Indexes for JSONB
For JSON data columns:
-- Index JSONB column for fast lookups
CREATE INDEX idx_orders_metadata
ON orders USING GIN (metadata);
-- Now we can query JSON efficiently
SELECT * FROM orders
WHERE metadata @> '{"priority": "high"}';
Index Maintenance
-- Monitor index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- Remove unused indexes (idx_scan = 0)
DROP INDEX IF EXISTS unused_index_name;
-- Rebuild bloated indexes
REINDEX INDEX CONCURRENTLY idx_orders_created_at;
Strategy 2: Query Restructuring
Select Only What You Need
-- ❌ BAD: Fetching all columns
SELECT * FROM orders WHERE status = 'active';
-- ✅ GOOD: Fetch only required columns
SELECT id, customer_id, total_amount, created_at
FROM orders
WHERE status = 'active';
Performance Impact: 40% reduction in data transfer time
Use EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.total_amount, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days';
Key metrics to watch:
- Seq Scan → Bad (needs index)
- Index Scan → Good
- Bitmap Heap Scan → Good for large result sets
- Nested Loop → Bad for large tables
- Hash Join → Good for large tables
Optimize Joins
-- ❌ BAD: Multiple LEFT JOINs returning nulls
SELECT o.*, c.*, p.*, s.*
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
LEFT JOIN products p ON o.product_id = p.id
LEFT JOIN shipments s ON o.id = s.order_id;
-- ✅ GOOD: Use INNER JOIN when possible
SELECT o.id, o.total_amount, c.name, p.title
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN products p ON o.product_id = p.id
WHERE o.status = 'completed';
Strategy 3: Partitioning Large Tables
Time-Based Partitioning
-- Create parent table
CREATE TABLE orders (
id SERIAL,
customer_id INTEGER,
total_amount DECIMAL(10,2),
created_at TIMESTAMP NOT NULL,
status VARCHAR(50)
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_2024_02 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Create indexes on each partition
CREATE INDEX idx_orders_2024_01_customer
ON orders_2024_01(customer_id);
Performance Impact: Queries scanning specific date ranges became 10x faster
List-Based Partitioning
-- Partition by status
CREATE TABLE orders (
id SERIAL,
customer_id INTEGER,
status VARCHAR(50),
created_at TIMESTAMP
) PARTITION BY LIST (status);
CREATE TABLE orders_active PARTITION OF orders
FOR VALUES IN ('pending', 'processing', 'shipped');
CREATE TABLE orders_completed PARTITION OF orders
FOR VALUES IN ('delivered', 'completed');
Strategy 4: Connection Pooling
PgBouncer Configuration
[databases]
myapp_db = host=localhost port=5432 dbname=production
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
Implementation in Node.js:
const { Pool } = require('pg');
const pool = new Pool({
host: 'localhost',
port: 6432, // PgBouncer port
database: 'myapp_db',
user: 'app_user',
password: process.env.DB_PASSWORD,
max: 20, // max connections in pool
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
// Efficient query execution
async function getActiveOrders() {
const client = await pool.connect();
try {
const result = await client.query(
'SELECT id, total_amount FROM orders WHERE status = $1',
['active']
);
return result.rows;
} finally {
client.release(); // Always release!
}
}
Performance Impact: Reduced connection overhead by 70%
Strategy 5: Caching Strategies
Application-Level Caching (Redis)
const redis = require('redis');
const client = redis.createClient();
async function getCustomerOrders(customerId) {
const cacheKey = `customer:${customerId}:orders`;
// Check cache first
const cached = await client.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Query database if not cached
const result = await pool.query(
'SELECT * FROM orders WHERE customer_id = $1',
[customerId]
);
// Cache for 5 minutes
await client.setex(cacheKey, 300, JSON.stringify(result.rows));
return result.rows;
}
Materialized Views
-- Create materialized view for expensive aggregations
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT
DATE(created_at) as sale_date,
COUNT(*) as total_orders,
SUM(total_amount) as total_revenue,
AVG(total_amount) as avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY DATE(created_at);
-- Create index on the materialized view
CREATE INDEX idx_daily_sales_date ON daily_sales_summary(sale_date);
-- Refresh the view (can be scheduled)
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales_summary;
Performance Impact: Dashboard queries went from 12s to 100ms
Strategy 6: Batch Operations
Efficient Bulk Inserts
// ❌ BAD: Multiple individual inserts
for (let i = 0; i < 10000; i++) {
await pool.query(
'INSERT INTO orders (customer_id, amount) VALUES ($1, $2)',
[customerId, amount]
);
}
// Time: 45 seconds
// ✅ GOOD: Single batch insert
const values = [];
const params = [];
for (let i = 0; i < 10000; i++) {
const offset = i * 2;
values.push(`($${offset + 1}, $${offset + 2})`);
params.push(customerId, amount);
}
await pool.query(
`INSERT INTO orders (customer_id, amount) VALUES ${values.join(',')}`,
params
);
// Time: 2 seconds ✅
Using COPY for Massive Imports
const { pipeline } = require('stream');
const { from } = require('pg-copy-streams');
async function bulkImport(dataStream) {
const client = await pool.connect();
try {
const stream = client.query(
from('COPY orders (customer_id, amount, created_at) FROM STDIN CSV')
);
await pipeline(dataStream, stream);
} finally {
client.release();
}
}
Strategy 7: Configuration Tuning
Key PostgreSQL Settings
# postgresql.conf optimizations
# Memory settings
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM
work_mem = 64MB # Per operation
maintenance_work_mem = 1GB # For VACUUM, CREATE INDEX
# Checkpoint settings
checkpoint_completion_target = 0.9
wal_buffers = 16MB
max_wal_size = 4GB
min_wal_size = 1GB
# Query planner
random_page_cost = 1.1 # For SSD
effective_io_concurrency = 200 # For SSD
# Parallelism
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
Vacuum and Analyze
-- Manual vacuum
VACUUM ANALYZE orders;
-- Autovacuum configuration (postgresql.conf)
autovacuum = on
autovacuum_max_workers = 4
autovacuum_naptime = 30s
-- Monitor vacuum progress
SELECT * FROM pg_stat_progress_vacuum;
Real-World Results: Before & After
Query Performance Comparison
| Operation | Before | After | Improvement |
|---|---|---|---|
| Complex JOIN query | 18s | 0.18s | 100x faster |
| Aggregation report | 12s | 0.10s | 120x faster |
| Bulk insert (10k) | 45s | 2s | 22x faster |
| Full-text search | 8s | 0.3s | 26x faster |
| Dashboard load | 15s | 0.5s | 30x faster |
Infrastructure Impact
- Server Cost: Reduced from 3 servers to 1
- CPU Usage: Dropped from 80% to 25%
- Memory Usage: Optimized from 90% to 45%
- Storage I/O: Reduced by 60%
Monitoring & Maintenance
Essential Queries for Monitoring
-- 1. Find slow queries
SELECT query,
mean_exec_time,
calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- 2. Check index usage
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;
-- 3. Monitor table bloat
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;
-- 4. Active connections
SELECT count(*), state
FROM pg_stat_activity
GROUP BY state;
Automated Monitoring Script
const { Pool } = require('pg');
const pool = new Pool(/* config */);
async function healthCheck() {
const checks = {
slowQueries: await pool.query(`
SELECT COUNT(*) FROM pg_stat_statements
WHERE mean_exec_time > 1000
`),
connections: await pool.query(`
SELECT COUNT(*) FROM pg_stat_activity
WHERE state = 'active'
`),
tableSize: await pool.query(`
SELECT pg_size_pretty(pg_database_size(current_database()))
`)
};
console.log('Database Health:', checks);
// Alert if thresholds exceeded
if (checks.slowQueries.rows[0].count > 10) {
console.error('⚠️ Too many slow queries detected!');
}
}
// Run every 5 minutes
setInterval(healthCheck, 5 * 60 * 1000);
Common Pitfalls to Avoid
1. N+1 Query Problem
// ❌ BAD: N+1 queries
const orders = await pool.query('SELECT * FROM orders LIMIT 100');
for (let order of orders.rows) {
const customer = await pool.query(
'SELECT * FROM customers WHERE id = $1',
[order.customer_id]
);
order.customer = customer.rows[0];
}
// ✅ GOOD: Single JOIN query
const result = await pool.query(`
SELECT o.*, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id
LIMIT 100
`);
2. Not Using Prepared Statements
// ❌ BAD: SQL injection risk + no query plan caching
const query = `SELECT * FROM orders WHERE id = ${userId}`;
// ✅ GOOD: Prepared statement
const query = 'SELECT * FROM orders WHERE id = $1';
await pool.query(query, [userId]);
3. Ignoring EXPLAIN Output
Always analyze your queries before deploying to production:
EXPLAIN (ANALYZE, BUFFERS, COSTS, VERBOSE, TIMING)
SELECT * FROM orders
WHERE created_at >= NOW() - INTERVAL '7 days';
Advanced Techniques
Full-Text Search Optimization
-- Add tsvector column
ALTER TABLE products ADD COLUMN search_vector tsvector;
-- Create GIN index
CREATE INDEX idx_products_search
ON products USING GIN(search_vector);
-- Update trigger to maintain search_vector
CREATE TRIGGER tsvector_update
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', title, description);
-- Efficient full-text search
SELECT * FROM products
WHERE search_vector @@ to_tsquery('english', 'laptop & gaming');
Write-Ahead Log (WAL) Tuning
# For high-write workloads
wal_compression = on
wal_writer_delay = 200ms
commit_delay = 10
commit_siblings = 5
Lessons Learned
- Measure First: Use EXPLAIN ANALYZE before optimizing
- Index Wisely: More indexes ≠ better performance
- Monitor Continuously: pg_stat_statements is your best friend
- Test in Production-Like Environment: Your local DB won't show scaling issues
- Incremental Changes: Optimize one thing at a time
- Document Everything: Future you will thank present you
Conclusion
Optimizing PostgreSQL for large-scale applications is an iterative process. The techniques I've shared here reduced our query times by 100x and cut infrastructure costs by 60%.
The key takeaways:
- ✅ Strategic indexing (not over-indexing)
- ✅ Query restructuring and proper JOINs
- ✅ Partitioning for time-series data
- ✅ Connection pooling with PgBouncer
- ✅ Caching at multiple levels
- ✅ Regular monitoring and maintenance
Resources & Tools
- pgAdmin - GUI management
- pg_stat_statements - Query statistics
- PgBouncer - Connection pooling
- pgBadger - Log analyzer
- explain.depesz.com - Visual EXPLAIN analyzer
Have you faced similar scaling challenges? What optimization techniques worked for your use case? Let me know in the comments!
Top comments (0)