Skip to content

Snowflake Performance Optimization

This guide covers performance optimization techniques for Snowflake SQL tasks, including query optimization, clustering strategies, warehouse sizing, and cost management.

Snowflake performance optimization focuses on reducing query execution time, minimizing compute costs, and improving overall pipeline efficiency. The key areas include query design, clustering, warehouse sizing, and cost management.

  • Query Design: Efficient SQL patterns and structures
  • Clustering: Data organization for optimal performance
  • Warehouse Sizing: Right-sizing compute resources
  • Cost Management: Optimizing Snowflake credits
  • Resource Utilization: Efficient use of compute resources

Always filter by clustering keys to enable automatic clustering.

-- Good: Filter by clustering keys
SELECT
user_id,
order_date,
SUM(revenue) as total_revenue
FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2024-02-01'
AND user_id = '12345'
GROUP BY 1, 2
-- Avoid: No clustering key filter
SELECT
user_id,
order_date,
SUM(revenue) as total_revenue
FROM orders
WHERE revenue > 1000
GROUP BY 1, 2

Use appropriate JOIN types and conditions for better performance.

-- Good: Efficient JOIN with proper filtering
SELECT
u.user_id,
u.email,
o.order_count,
o.total_revenue
FROM users u
JOIN user_orders o ON u.user_id = o.user_id
WHERE u.created_date >= '2024-01-01'
AND o.order_date >= '2024-01-01'
AND u.status = 'active'
-- Avoid: Inefficient JOIN without filtering
SELECT
u.user_id,
u.email,
o.order_count,
o.total_revenue
FROM users u
JOIN user_orders o ON u.user_id = o.user_id
WHERE u.status = 'active'

Optimize window functions for better performance.

-- Good: Efficient window function
SELECT
user_id,
order_date,
revenue,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) as order_sequence
FROM orders
WHERE order_date >= '2024-01-01'
AND user_id IS NOT NULL
-- Avoid: Inefficient window function
SELECT
user_id,
order_date,
revenue,
ROW_NUMBER() OVER (ORDER BY order_date) as global_sequence
FROM orders
WHERE order_date >= '2024-01-01'

Choose the right data types to minimize storage and improve performance.

-- Good: Appropriate data types
SELECT
CAST(user_id AS VARCHAR) as user_id,
CAST(order_count AS INTEGER) as order_count,
CAST(revenue AS DECIMAL(10,2)) as revenue,
CAST(created_at AS TIMESTAMP) as created_at
FROM staging.orders
WHERE order_date >= '{{ ds }}'
-- Avoid: Unnecessary data type conversions
SELECT
user_id,
order_count,
revenue,
created_at
FROM staging.orders
WHERE order_date >= '{{ ds }}'

Avoid unnecessary casting and conversions.

-- Good: Minimal conversions
SELECT
user_id,
order_type,
DATE(order_date) as order_date,
COUNT(*) as order_count
FROM staging.orders
WHERE order_date >= '{{ ds }}'
GROUP BY 1, 2, 3
-- Avoid: Excessive conversions
SELECT
CAST(user_id AS VARCHAR) as user_id,
CAST(order_type AS VARCHAR) as order_type,
CAST(DATE(order_date) AS DATE) as order_date,
CAST(COUNT(*) AS INTEGER) as order_count
FROM staging.orders
WHERE order_date >= '{{ ds }}'
GROUP BY 1, 2, 3

Break down complex queries into manageable parts.

-- Good: Use CTEs for clarity and performance
WITH base_orders AS (
SELECT
user_id,
order_date,
revenue,
order_type
FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2024-02-01'
AND user_id IS NOT NULL
),
aggregated_orders AS (
SELECT
user_id,
order_type,
COUNT(*) as order_count,
SUM(revenue) as total_revenue
FROM base_orders
GROUP BY 1, 2
)
SELECT
user_id,
order_type,
order_count,
total_revenue,
CURRENT_TIMESTAMP() as processed_at
FROM aggregated_orders
ORDER BY total_revenue DESC

Use efficient subquery patterns.

-- Good: Efficient subquery
SELECT
u.user_id,
u.email,
COALESCE(o.order_count, 0) as order_count
FROM users u
LEFT JOIN (
SELECT
user_id,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1
) o ON u.user_id = o.user_id
WHERE u.created_date >= '2024-01-01'
-- Avoid: Inefficient subquery
SELECT
u.user_id,
u.email,
(SELECT COUNT(*) FROM orders WHERE user_id = u.user_id AND order_date >= '2024-01-01') as order_count
FROM users u
WHERE u.created_date >= '2024-01-01'

Cluster by frequently queried columns.

-- Create table with single column clustering
CREATE TABLE analytics.user_orders (
user_id VARCHAR,
order_date DATE,
revenue DECIMAL(10,2),
order_type VARCHAR
)
CLUSTER BY (user_id)

Cluster by multiple columns for complex query patterns.

-- Create table with multi-column clustering
CREATE TABLE analytics.user_orders (
user_id VARCHAR,
order_date DATE,
revenue DECIMAL(10,2),
order_type VARCHAR
)
CLUSTER BY (user_id, order_date)

Leverage clustering for better performance.

-- Good: Use clustered columns
SELECT * FROM analytics.user_orders
WHERE user_id = '12345'
AND order_date >= '2024-01-01'
AND order_date < '2024-02-01'
-- Avoid: Don't use clustered columns
SELECT * FROM analytics.user_orders
WHERE revenue > 1000
AND order_type = 'premium'

Use clustered columns in ORDER BY for optimal performance.

-- Good: Order by clustered columns
SELECT * FROM analytics.user_orders
WHERE user_id = '12345'
ORDER BY user_id, order_date
-- Avoid: Order by non-clustered columns
SELECT * FROM analytics.user_orders
WHERE user_id = '12345'
ORDER BY revenue, order_type

Check clustering effectiveness regularly.

-- Check clustering effectiveness
SELECT
SYSTEM$CLUSTERING_INFORMATION('analytics.user_orders')

Recluster tables when clustering effectiveness degrades.

-- Recluster table
ALTER TABLE analytics.user_orders RECLUSTER

Select the right warehouse size for your workload.

# Small workload
warehouse: "X-SMALL"
# Medium workload
warehouse: "SMALL"
# Large workload
warehouse: "MEDIUM"
# Very large workload
warehouse: "LARGE"

Configure warehouse settings for optimal performance.

# Warehouse configuration
warehouse: "MEDIUM"
auto_suspend: 60 # Suspend after 60 seconds of inactivity
auto_resume: true
max_cluster_count: 10
min_cluster_count: 1

Choose warehouse sizes based on workload requirements.

# Good: Right-size warehouse
warehouse: "SMALL" # For small to medium workloads
# Avoid: Over-provisioning
warehouse: "X-LARGE" # For small workloads

Enable auto-suspend to save costs.

# Good: Enable auto-suspend
warehouse: "MEDIUM"
auto_suspend: 60
auto_resume: true
# Avoid: Keep warehouse running
warehouse: "MEDIUM"
auto_suspend: 0
auto_resume: false

Use multi-cluster warehouses for variable workloads.

# Multi-cluster warehouse
warehouse: "MEDIUM"
max_cluster_count: 10
min_cluster_count: 1
scaling_policy: "STANDARD"

Track credit consumption to optimize costs.

-- Monitor credit usage
SELECT
DATE(start_time) as usage_date,
SUM(credits_used) as total_credits,
COUNT(*) as query_count,
AVG(credits_used) as avg_credits_per_query
FROM snowflake.account_usage.warehouse_metering_history
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESC

Find queries that consume the most credits.

-- Find expensive queries
SELECT
query_id,
start_time,
credits_used,
execution_time,
query_text
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
ORDER BY credits_used DESC
LIMIT 20

Monitor cost trends over time.

-- Track cost trends
SELECT
DATE(start_time) as usage_date,
SUM(credits_used) as daily_credits,
COUNT(*) as daily_query_count,
AVG(credits_used) as avg_credits_per_query
FROM snowflake.account_usage.warehouse_metering_history
WHERE start_time >= DATEADD(day, -30, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESC

Use appropriate warehouse sizes for your workload.

# Good: Right-size warehouse
warehouse: "SMALL" # For small workloads
# Avoid: Over-provisioning
warehouse: "X-LARGE" # For small workloads

Enable auto-suspend to save costs.

# Good: Enable auto-suspend
warehouse: "MEDIUM"
auto_suspend: 60
auto_resume: true
# Avoid: Keep warehouse running
warehouse: "MEDIUM"
auto_suspend: 0
auto_resume: false

Leverage Snowflake’s query result caching.

-- Good: Use cached results
SELECT
user_id,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1

Use appropriate data types and compression.

-- Good: Use appropriate data types
CREATE TABLE analytics.user_orders (
user_id VARCHAR,
order_date DATE,
revenue DECIMAL(10,2),
order_type VARCHAR
)

Monitor query execution time to identify performance issues.

-- Monitor query performance
SELECT
query_id,
start_time,
execution_time,
credits_used,
query_text
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
ORDER BY execution_time DESC

Track warehouse utilization for optimization.

-- Monitor warehouse utilization
SELECT
warehouse_name,
DATE(start_time) as usage_date,
SUM(credits_used) as total_credits,
COUNT(*) as query_count,
AVG(execution_time) as avg_execution_time
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
GROUP BY 1, 2
ORDER BY 1, 2 DESC

Monitor data scanning to optimize costs.

-- Monitor data scanning
SELECT
query_id,
start_time,
bytes_scanned,
credits_used,
query_text
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
ORDER BY bytes_scanned DESC

Analyze query execution plans to identify optimization opportunities.

-- Analyze query performance
EXPLAIN
SELECT
user_id,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1

Monitor resource utilization for optimization.

-- Monitor resource utilization
SELECT
warehouse_name,
DATE(start_time) as usage_date,
SUM(credits_used) as total_credits,
COUNT(*) as query_count,
AVG(execution_time) as avg_execution_time
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
GROUP BY 1, 2
ORDER BY total_credits DESC

Always filter by clustering keys first.

-- Good: Clustering filter first
SELECT * FROM analytics.user_orders
WHERE user_id = '12345' -- Clustering key
AND order_date >= '2024-01-01'
AND order_date < '2024-02-01'

Choose the right JOIN type for your use case.

-- Good: Use INNER JOIN when appropriate
SELECT
u.user_id,
u.email,
o.order_count
FROM users u
INNER JOIN user_orders o ON u.user_id = o.user_id
WHERE u.created_date >= '2024-01-01'

Use efficient aggregation patterns.

-- Good: Efficient aggregation
SELECT
user_id,
COUNT(*) as order_count,
SUM(revenue) as total_revenue,
AVG(revenue) as avg_revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1

Regularly monitor and optimize query performance.

-- Monitor slow queries
SELECT
query_id,
execution_time,
credits_used,
query_text
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
AND execution_time > 30000 -- Slow queries
ORDER BY execution_time DESC

Set up cost monitoring and controls.

-- Monitor costs
SELECT
DATE(start_time) as usage_date,
SUM(credits_used) as total_credits,
COUNT(*) as query_count
FROM snowflake.account_usage.warehouse_metering_history
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESC

Conduct regular performance optimization reviews.

  • Analyze query performance trends
  • Identify optimization opportunities
  • Implement performance improvements
  • Monitor cost impact

Problem: Queries running slowly

Solutions:

  • Check clustering effectiveness
  • Optimize query structure
  • Right-size warehouse
  • Review resource usage

Debug Steps:

  1. Analyze query execution plan
  2. Check clustering effectiveness
  3. Review warehouse sizing
  4. Monitor resource utilization

Problem: Unexpected high credit consumption

Solutions:

  • Optimize query patterns
  • Right-size warehouses
  • Implement cost controls
  • Review data scanning

Debug Steps:

  1. Analyze credit usage by query
  2. Check warehouse sizing
  3. Review query patterns
  4. Monitor cost trends

Problem: Poor clustering effectiveness

Solutions:

  • Recluster tables
  • Review clustering strategy
  • Monitor clustering effectiveness
  • Optimize query patterns

Debug Steps:

  1. Check clustering information
  2. Review clustering strategy
  3. Monitor clustering effectiveness
  4. Recluster if needed

Use EXPLAIN to analyze query execution plans.

EXPLAIN
SELECT
user_id,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1

Monitor query performance metrics.

-- Monitor query performance
SELECT
query_id,
execution_time,
credits_used,
query_text
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_DATE())
ORDER BY execution_time DESC

Analyze credit usage and trends.

-- Analyze costs
SELECT
DATE(start_time) as usage_date,
SUM(credits_used) as total_credits,
COUNT(*) as query_count
FROM snowflake.account_usage.warehouse_metering_history
WHERE start_time >= DATEADD(day, -30, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESC