Skip to content

PostgreSQL Performance Optimization

This guide covers performance optimization techniques for PostgreSQL SQL tasks, including query optimization, indexing strategies, connection pooling, and performance monitoring.

PostgreSQL performance optimization focuses on reducing query execution time, minimizing resource usage, and improving overall pipeline efficiency. The key areas include query design, indexing, connection management, and performance monitoring.

  • Query Design: Efficient SQL patterns and structures
  • Indexing: Strategic index creation and maintenance
  • Connection Management: Connection pooling and optimization
  • Resource Utilization: Efficient use of database resources
  • Performance Monitoring: Tracking and optimizing performance metrics

Always filter by indexed columns to enable index usage.

-- Good: Filter by indexed columns
SELECT
user_id,
order_date,
SUM(revenue) as total_revenue
FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2024-02-01'
AND user_id = '12345'
GROUP BY 1, 2
-- Avoid: No index usage
SELECT
user_id,
order_date,
SUM(revenue) as total_revenue
FROM orders
WHERE revenue > 1000
GROUP BY 1, 2

Use appropriate JOIN types and conditions for better performance.

-- Good: Efficient JOIN with proper filtering
SELECT
u.user_id,
u.email,
o.order_count,
o.total_revenue
FROM users u
JOIN user_orders o ON u.user_id = o.user_id
WHERE u.created_date >= '2024-01-01'
AND o.order_date >= '2024-01-01'
AND u.status = 'active'
-- Avoid: Inefficient JOIN without filtering
SELECT
u.user_id,
u.email,
o.order_count,
o.total_revenue
FROM users u
JOIN user_orders o ON u.user_id = o.user_id
WHERE u.status = 'active'

Optimize window functions for better performance.

-- Good: Efficient window function
SELECT
user_id,
order_date,
revenue,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) as order_sequence
FROM orders
WHERE order_date >= '2024-01-01'
AND user_id IS NOT NULL
-- Avoid: Inefficient window function
SELECT
user_id,
order_date,
revenue,
ROW_NUMBER() OVER (ORDER BY order_date) as global_sequence
FROM orders
WHERE order_date >= '2024-01-01'

Choose the right data types to minimize storage and improve performance.

-- Good: Appropriate data types
SELECT
CAST(user_id AS VARCHAR) as user_id,
CAST(order_count AS INTEGER) as order_count,
CAST(revenue AS DECIMAL(10,2)) as revenue,
CAST(created_at AS TIMESTAMP) as created_at
FROM staging.orders
WHERE order_date >= '{{ ds }}'::date
-- Avoid: Unnecessary data type conversions
SELECT
user_id,
order_count,
revenue,
created_at
FROM staging.orders
WHERE order_date >= '{{ ds }}'::date

Avoid unnecessary casting and conversions.

-- Good: Minimal conversions
SELECT
user_id,
order_type,
DATE(order_date) as order_date,
COUNT(*) as order_count
FROM staging.orders
WHERE order_date >= '{{ ds }}'::date
GROUP BY 1, 2, 3
-- Avoid: Excessive conversions
SELECT
CAST(user_id AS VARCHAR) as user_id,
CAST(order_type AS VARCHAR) as order_type,
CAST(DATE(order_date) AS DATE) as order_date,
CAST(COUNT(*) AS INTEGER) as order_count
FROM staging.orders
WHERE order_date >= '{{ ds }}'::date
GROUP BY 1, 2, 3

Break down complex queries into manageable parts.

-- Good: Use CTEs for clarity and performance
WITH base_orders AS (
SELECT
user_id,
order_date,
revenue,
order_type
FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2024-02-01'
AND user_id IS NOT NULL
),
aggregated_orders AS (
SELECT
user_id,
order_type,
COUNT(*) as order_count,
SUM(revenue) as total_revenue
FROM base_orders
GROUP BY 1, 2
)
SELECT
user_id,
order_type,
order_count,
total_revenue,
CURRENT_TIMESTAMP as processed_at
FROM aggregated_orders
ORDER BY total_revenue DESC

Use efficient subquery patterns.

-- Good: Efficient subquery
SELECT
u.user_id,
u.email,
COALESCE(o.order_count, 0) as order_count
FROM users u
LEFT JOIN (
SELECT
user_id,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1
) o ON u.user_id = o.user_id
WHERE u.created_date >= '2024-01-01'
-- Avoid: Inefficient subquery
SELECT
u.user_id,
u.email,
(SELECT COUNT(*) FROM orders WHERE user_id = u.user_id AND order_date >= '2024-01-01') as order_count
FROM users u
WHERE u.created_date >= '2024-01-01'

Use B-tree indexes for equality and range queries.

-- Create B-tree index
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);

Use hash indexes for equality queries only.

-- Create hash index
CREATE INDEX idx_users_id_hash ON users USING HASH (user_id);

Use GIN indexes for array and JSON data.

-- Create GIN index for JSONB
CREATE INDEX idx_users_properties_gin ON users USING GIN (properties);
-- Create GIN index for arrays
CREATE INDEX idx_users_tags_gin ON users USING GIN (tags);

Create partial indexes for filtered queries.

-- Create partial index
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';
-- Create partial index for recent orders
CREATE INDEX idx_recent_orders ON orders(user_id, order_date)
WHERE order_date >= '2024-01-01';

Check index usage to identify optimization opportunities.

-- Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Find indexes that are not being used.

-- Find unused indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;

Analyze index effectiveness and performance.

-- Analyze index effectiveness
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email = '[email protected]';

Set up connection pooling for better performance.

# Connection pooling configuration
connections:
postgres: "my-pg-conn"
pool_size: 10
max_connections: 20
connection_timeout: 30
idle_timeout: 300

Configure connection settings for optimal performance.

# Optimized connection settings
connections:
postgres: "my-pg-conn"
pool_size: 10
max_connections: 20
connection_timeout: 30
idle_timeout: 300
statement_timeout: 300000
lock_timeout: 30000

Implement efficient transaction patterns.

-- Good: Efficient transaction
BEGIN;
INSERT INTO processed_users (user_id, email, processed_at)
SELECT user_id, email, NOW()
FROM staging.users
WHERE created_at >= '{{ ds }}'::date;
UPDATE users
SET status = 'processed', updated_at = NOW()
WHERE created_at >= '{{ ds }}'::date
AND status = 'pending';
COMMIT;

Keep transactions as short as possible.

-- Good: Short transaction
BEGIN;
UPDATE users SET status = 'processed' WHERE user_id = '12345';
COMMIT;
-- Avoid: Long transaction
BEGIN;
-- Multiple complex operations
-- ... long running operations ...
COMMIT;

Monitor query execution time to identify performance issues.

-- Monitor query performance
SELECT
query,
mean_exec_time,
calls,
total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

Track connection usage for optimization.

-- Monitor connection usage
SELECT
state,
COUNT(*) as connection_count
FROM pg_stat_activity
GROUP BY state
ORDER BY connection_count DESC;

Monitor index usage and effectiveness.

-- Monitor index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Analyze query execution plans to identify optimization opportunities.

-- Analyze query performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT
user_id,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1;

Monitor resource utilization for optimization.

-- Monitor resource utilization
SELECT
datname,
numbackends,
xact_commit,
xact_rollback,
blks_read,
blks_hit
FROM pg_stat_database
WHERE datname = current_database();

Always filter by indexed columns first.

-- Good: Index filter first
SELECT * FROM users
WHERE email = '[email protected]' -- Indexed column
AND status = 'active'
AND created_at >= '2024-01-01'

Choose the right JOIN type for your use case.

-- Good: Use INNER JOIN when appropriate
SELECT
u.user_id,
u.email,
o.order_count
FROM users u
INNER JOIN user_orders o ON u.user_id = o.user_id
WHERE u.created_date >= '2024-01-01'

Use efficient aggregation patterns.

-- Good: Efficient aggregation
SELECT
user_id,
COUNT(*) as order_count,
SUM(revenue) as total_revenue,
AVG(revenue) as avg_revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1

Regularly monitor and optimize query performance.

-- Monitor slow queries
SELECT
query,
mean_exec_time,
calls,
total_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 1000 -- Slow queries
ORDER BY mean_exec_time DESC;

Develop and maintain an effective indexing strategy.

-- Create strategic indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);
CREATE INDEX idx_orders_status ON orders(status) WHERE status = 'pending';

Conduct regular performance optimization reviews.

  • Analyze query performance trends
  • Identify optimization opportunities
  • Implement performance improvements
  • Monitor performance impact

Problem: Queries running slowly

Solutions:

  • Check index usage
  • Optimize query structure
  • Review connection pooling
  • Monitor resource usage

Debug Steps:

  1. Analyze query execution plan
  2. Check index usage and effectiveness
  3. Review connection pool configuration
  4. Monitor resource utilization

Problem: High CPU or memory usage

Solutions:

  • Optimize query patterns
  • Implement proper indexing
  • Review connection management
  • Monitor resource utilization

Debug Steps:

  1. Monitor resource usage
  2. Check query performance
  3. Review indexing strategy
  4. Analyze connection patterns

Problem: Connection pool exhaustion or timeouts

Solutions:

  • Optimize connection pool settings
  • Review transaction patterns
  • Implement connection monitoring
  • Optimize query performance

Debug Steps:

  1. Monitor connection usage
  2. Check connection pool configuration
  3. Review transaction patterns
  4. Analyze query performance

Use EXPLAIN to analyze query execution plans.

EXPLAIN (ANALYZE, BUFFERS)
SELECT
user_id,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1;

Monitor query performance metrics.

-- Monitor query performance
SELECT
query,
mean_exec_time,
calls,
total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC;

Analyze resource usage and trends.

-- Analyze resource usage
SELECT
datname,
numbackends,
xact_commit,
xact_rollback,
blks_read,
blks_hit
FROM pg_stat_database
WHERE datname = current_database();