PostgreSQL Advanced Features
This guide covers advanced PostgreSQL features available in Datablast SQL tasks, including JSON support, window functions, custom functions, and advanced data types.
Overview
Section titled “Overview”PostgreSQL provides advanced features that enable complex data processing, analytics, and custom logic. These features include JSON support, window functions, custom functions, and advanced data types.
Key Advanced Features
Section titled “Key Advanced Features”- JSON Support: Native JSON and JSONB data types
- Window Functions: Advanced analytical functions
- Custom Functions: User-defined functions and procedures
- Advanced Data Types: Arrays, ranges, and custom types
- CTEs: Common Table Expressions for complex queries
JSON Support
Section titled “JSON Support”JSONB Data Type
Section titled “JSONB Data Type”PostgreSQL’s JSONB data type provides efficient storage and querying of JSON data.
Creating JSONB Columns
Section titled “Creating JSONB Columns”-- Create table with JSONB columnCREATE TABLE user_sessions ( user_id VARCHAR, session_id VARCHAR, properties JSONB, created_at TIMESTAMP DEFAULT NOW());Querying JSONB Data
Section titled “Querying JSONB Data”-- Query JSONB dataSELECT user_id, session_id, properties->>'browser' as browser, properties->>'device' as device, properties->'location'->>'country' as country, properties->'location'->>'city' as cityFROM user_sessionsWHERE created_at >= '{{ ds }}'::date AND properties ? 'location'JSONB Operators
Section titled “JSONB Operators”-- Use JSONB operatorsSELECT user_id, properties->>'browser' as browser, -- Get text value properties->'tags' as tags, -- Get JSON value properties->'tags'->0 as first_tag, -- Get array element properties ? 'location' as has_location, -- Check key existence properties @> '{"device": "mobile"}' as is_mobile -- Contains checkFROM user_sessionsWHERE created_at >= '{{ ds }}'::dateJSONB Functions
Section titled “JSONB Functions”-- Use JSONB functionsSELECT user_id, jsonb_array_length(properties->'tags') as tag_count, jsonb_object_keys(properties) as property_keys, jsonb_pretty(properties) as formatted_propertiesFROM user_sessionsWHERE created_at >= '{{ ds }}'::date AND properties ? 'tags'JSON Aggregation
Section titled “JSON Aggregation”Aggregate JSON Data
Section titled “Aggregate JSON Data”-- Aggregate JSON dataSELECT user_id, jsonb_agg( jsonb_build_object( 'session_id', session_id, 'browser', properties->>'browser', 'device', properties->>'device' ) ) as session_summaryFROM user_sessionsWHERE created_at >= '{{ ds }}'::dateGROUP BY user_idBuild JSON Objects
Section titled “Build JSON Objects”-- Build JSON objectsSELECT user_id, jsonb_build_object( 'total_sessions', COUNT(*), 'browsers', jsonb_agg(DISTINCT properties->>'browser'), 'devices', jsonb_agg(DISTINCT properties->>'device') ) as user_summaryFROM user_sessionsWHERE created_at >= '{{ ds }}'::dateGROUP BY user_idWindow Functions
Section titled “Window Functions”Basic Window Functions
Section titled “Basic Window Functions”ROW_NUMBER()
Section titled “ROW_NUMBER()”-- Use ROW_NUMBER() for rankingSELECT user_id, order_date, revenue, ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY order_date DESC ) as order_rankFROM ordersWHERE order_date >= '{{ ds }}'::dateRANK() and DENSE_RANK()
Section titled “RANK() and DENSE_RANK()”-- Use RANK() and DENSE_RANK()SELECT user_id, revenue, RANK() OVER (ORDER BY revenue DESC) as revenue_rank, DENSE_RANK() OVER (ORDER BY revenue DESC) as dense_revenue_rankFROM user_totalsWHERE order_date >= '{{ ds }}'::dateLAG() and LEAD()
Section titled “LAG() and LEAD()”-- Use LAG() and LEAD() for time-series analysisSELECT user_id, order_date, revenue, LAG(revenue) OVER ( PARTITION BY user_id ORDER BY order_date ) as prev_revenue, LEAD(revenue) OVER ( PARTITION BY user_id ORDER BY order_date ) as next_revenueFROM ordersWHERE order_date >= '{{ ds }}'::dateAdvanced Window Functions
Section titled “Advanced Window Functions”Running Totals
Section titled “Running Totals”-- Calculate running totalsSELECT user_id, order_date, revenue, SUM(revenue) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as running_totalFROM ordersWHERE order_date >= '{{ ds }}'::dateMoving Averages
Section titled “Moving Averages”-- Calculate moving averagesSELECT user_id, order_date, revenue, AVG(revenue) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) as moving_avg_3FROM ordersWHERE order_date >= '{{ ds }}'::datePercentile Functions
Section titled “Percentile Functions”-- Use percentile functionsSELECT user_id, revenue, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY revenue) as median_revenue, PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY revenue) as p90_revenueFROM user_totalsWHERE order_date >= '{{ ds }}'::dateWindow Frame Specifications
Section titled “Window Frame Specifications”ROWS vs RANGE
Section titled “ROWS vs RANGE”-- Use ROWS frameSELECT user_id, order_date, revenue, SUM(revenue) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING ) as sum_3_rowsFROM ordersWHERE order_date >= '{{ ds }}'::date
-- Use RANGE frameSELECT user_id, order_date, revenue, SUM(revenue) OVER ( PARTITION BY user_id ORDER BY order_date RANGE BETWEEN INTERVAL '1 day' PRECEDING AND INTERVAL '1 day' FOLLOWING ) as sum_3_daysFROM ordersWHERE order_date >= '{{ ds }}'::dateCommon Table Expressions (CTEs)
Section titled “Common Table Expressions (CTEs)”Basic CTEs
Section titled “Basic CTEs”Simple CTE
Section titled “Simple CTE”-- Use simple CTEWITH user_totals AS ( SELECT user_id, SUM(revenue) as total_revenue, COUNT(*) as order_count FROM orders WHERE order_date >= '{{ ds }}'::date GROUP BY user_id)SELECT user_id, total_revenue, order_count, total_revenue / order_count as avg_order_valueFROM user_totalsWHERE total_revenue > 1000Multiple CTEs
Section titled “Multiple CTEs”-- Use multiple CTEsWITH user_totals AS ( SELECT user_id, SUM(revenue) as total_revenue, COUNT(*) as order_count FROM orders WHERE order_date >= '{{ ds }}'::date GROUP BY user_id),user_ranks AS ( SELECT user_id, total_revenue, RANK() OVER (ORDER BY total_revenue DESC) as revenue_rank FROM user_totals)SELECT u.user_id, u.total_revenue, u.revenue_rank, CASE WHEN u.revenue_rank <= 10 THEN 'Top 10' WHEN u.revenue_rank <= 100 THEN 'Top 100' ELSE 'Other' END as user_tierFROM user_ranks uRecursive CTEs
Section titled “Recursive CTEs”Hierarchical Data
Section titled “Hierarchical Data”-- Use recursive CTE for hierarchical dataWITH RECURSIVE user_hierarchy AS ( -- Base case SELECT user_id, parent_user_id, user_name, 1 as level FROM users WHERE parent_user_id IS NULL
UNION ALL
-- Recursive case SELECT u.user_id, u.parent_user_id, u.user_name, uh.level + 1 FROM users u JOIN user_hierarchy uh ON u.parent_user_id = uh.user_id)SELECT user_id, user_name, levelFROM user_hierarchyORDER BY level, user_idNumber Generation
Section titled “Number Generation”-- Use recursive CTE for number generationWITH RECURSIVE numbers AS ( SELECT 1 as n UNION ALL SELECT n + 1 FROM numbers WHERE n < 100)SELECT nFROM numbersAdvanced Data Types
Section titled “Advanced Data Types”Array Types
Section titled “Array Types”Working with Arrays
Section titled “Working with Arrays”-- Create table with array columnCREATE TABLE user_tags ( user_id VARCHAR, tags TEXT[], created_at TIMESTAMP DEFAULT NOW());
-- Query array dataSELECT user_id, tags, array_length(tags, 1) as tag_count, tags[1] as first_tag, 'premium' = ANY(tags) as is_premiumFROM user_tagsWHERE created_at >= '{{ ds }}'::dateArray Functions
Section titled “Array Functions”-- Use array functionsSELECT user_id, tags, array_agg(DISTINCT unnest(tags)) as unique_tags, array_length(tags, 1) as tag_countFROM user_tagsWHERE created_at >= '{{ ds }}'::dateGROUP BY user_id, tagsRange Types
Section titled “Range Types”Working with Ranges
Section titled “Working with Ranges”-- Create table with range columnCREATE TABLE user_sessions ( user_id VARCHAR, session_range TSRANGE, created_at TIMESTAMP DEFAULT NOW());
-- Query range dataSELECT user_id, session_range, lower(session_range) as session_start, upper(session_range) as session_end, session_range @> NOW() as is_activeFROM user_sessionsWHERE created_at >= '{{ ds }}'::dateCustom Types
Section titled “Custom Types”Creating Custom Types
Section titled “Creating Custom Types”-- Create custom typeCREATE TYPE user_status AS ENUM ('active', 'inactive', 'suspended');
-- Create table with custom typeCREATE TABLE users ( user_id VARCHAR, email VARCHAR, status user_status, created_at TIMESTAMP DEFAULT NOW());
-- Query custom typeSELECT user_id, email, status, CASE status WHEN 'active' THEN 'User is active' WHEN 'inactive' THEN 'User is inactive' WHEN 'suspended' THEN 'User is suspended' END as status_descriptionFROM usersWHERE created_at >= '{{ ds }}'::dateCustom Functions
Section titled “Custom Functions”User-Defined Functions
Section titled “User-Defined Functions”Creating Functions
Section titled “Creating Functions”-- Create function for data validationCREATE OR REPLACE FUNCTION validate_email(email TEXT)RETURNS BOOLEAN AS $$BEGIN RETURN email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$';END;$$ LANGUAGE plpgsql;
-- Use custom functionSELECT user_id, email, validate_email(email) as is_valid_emailFROM usersWHERE created_at >= '{{ ds }}'::dateFunctions with Parameters
Section titled “Functions with Parameters”-- Create function with parametersCREATE OR REPLACE FUNCTION calculate_user_score( user_id TEXT, start_date DATE, end_date DATE)RETURNS DECIMAL AS $$DECLARE total_revenue DECIMAL; order_count INTEGER; user_score DECIMAL;BEGIN SELECT COALESCE(SUM(revenue), 0), COUNT(*) INTO total_revenue, order_count FROM orders WHERE user_id = $1 AND order_date >= $2 AND order_date <= $3;
user_score := total_revenue * 0.1 + order_count * 0.01;
RETURN user_score;END;$$ LANGUAGE plpgsql;
-- Use function with parametersSELECT user_id, calculate_user_score( user_id, '{{ ds }}'::date, '{{ next_ds }}'::date ) as user_scoreFROM usersWHERE created_at >= '{{ ds }}'::dateStored Procedures
Section titled “Stored Procedures”Creating Procedures
Section titled “Creating Procedures”-- Create procedure for data processingCREATE OR REPLACE PROCEDURE process_user_data( process_date DATE)LANGUAGE plpgsql AS $$BEGIN -- Update user statistics UPDATE users SET last_order_date = ( SELECT MAX(order_date) FROM orders WHERE user_id = users.user_id AND order_date <= process_date ), total_orders = ( SELECT COUNT(*) FROM orders WHERE user_id = users.user_id AND order_date <= process_date ), total_revenue = ( SELECT COALESCE(SUM(revenue), 0) FROM orders WHERE user_id = users.user_id AND order_date <= process_date ) WHERE created_at <= process_date;
-- Log processing completion INSERT INTO processing_log (process_date, status, completed_at) VALUES (process_date, 'completed', NOW());END;$$;
-- Call procedureCALL process_user_data('{{ ds }}'::date);Advanced Query Patterns
Section titled “Advanced Query Patterns”Pivot Operations
Section titled “Pivot Operations”Using CASE for Pivoting
Section titled “Using CASE for Pivoting”-- Pivot data using CASESELECT user_id, SUM(CASE WHEN order_type = 'premium' THEN revenue ELSE 0 END) as premium_revenue, SUM(CASE WHEN order_type = 'standard' THEN revenue ELSE 0 END) as standard_revenue, SUM(CASE WHEN order_type = 'basic' THEN revenue ELSE 0 END) as basic_revenueFROM ordersWHERE order_date >= '{{ ds }}'::dateGROUP BY user_idComplex Aggregations
Section titled “Complex Aggregations”Using FILTER Clause
Section titled “Using FILTER Clause”-- Use FILTER clause for conditional aggregationSELECT user_id, COUNT(*) as total_orders, COUNT(*) FILTER (WHERE order_type = 'premium') as premium_orders, SUM(revenue) FILTER (WHERE order_type = 'premium') as premium_revenue, AVG(revenue) FILTER (WHERE order_type = 'premium') as avg_premium_revenueFROM ordersWHERE order_date >= '{{ ds }}'::dateGROUP BY user_idAdvanced JOINs
Section titled “Advanced JOINs”Lateral Joins
Section titled “Lateral Joins”-- Use lateral joinSELECT u.user_id, u.email, recent_orders.order_date, recent_orders.revenueFROM users uCROSS JOIN LATERAL ( SELECT order_date, revenue FROM orders o WHERE o.user_id = u.user_id AND o.order_date >= '{{ ds }}'::date ORDER BY o.order_date DESC LIMIT 3) recent_ordersWHERE u.created_at >= '{{ ds }}'::dateBest Practices
Section titled “Best Practices”JSON Best Practices
Section titled “JSON Best Practices”1. Use JSONB for Storage
Section titled “1. Use JSONB for Storage”-- Good: Use JSONB for storageCREATE TABLE user_sessions ( user_id VARCHAR, properties JSONB);
-- Avoid: Use JSON for storageCREATE TABLE user_sessions ( user_id VARCHAR, properties JSON);2. Create GIN Indexes for JSONB
Section titled “2. Create GIN Indexes for JSONB”-- Create GIN index for JSONB queriesCREATE INDEX idx_user_sessions_properties_ginON user_sessions USING GIN (properties);3. Use Appropriate JSON Operators
Section titled “3. Use Appropriate JSON Operators”-- Good: Use appropriate operatorsSELECT properties->>'browser' as browserFROM user_sessionsWHERE properties ? 'browser';
-- Avoid: Use wrong operatorsSELECT properties->'browser' as browserFROM user_sessionsWHERE properties ? 'browser';Window Function Best Practices
Section titled “Window Function Best Practices”1. Use Appropriate Window Frames
Section titled “1. Use Appropriate Window Frames”-- Good: Use appropriate window frameSELECT user_id, order_date, revenue, SUM(revenue) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as running_totalFROM orders2. Optimize Window Functions
Section titled “2. Optimize Window Functions”-- Good: Optimize window functionSELECT user_id, order_date, revenue, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) as order_sequenceFROM ordersWHERE order_date >= '{{ ds }}'::dateCTE Best Practices
Section titled “CTE Best Practices”1. Use CTEs for Complex Logic
Section titled “1. Use CTEs for Complex Logic”-- Good: Use CTEs for complex logicWITH user_totals AS ( SELECT user_id, SUM(revenue) as total_revenue FROM orders WHERE order_date >= '{{ ds }}'::date GROUP BY user_id)SELECT user_id, total_revenue, CASE WHEN total_revenue > 1000 THEN 'High Value' WHEN total_revenue > 100 THEN 'Medium Value' ELSE 'Low Value' END as user_tierFROM user_totals2. Keep CTEs Focused
Section titled “2. Keep CTEs Focused”-- Good: Keep CTEs focusedWITH base_data AS ( SELECT user_id, order_date, revenue FROM orders WHERE order_date >= '{{ ds }}'::date),aggregated_data AS ( SELECT user_id, COUNT(*) as order_count, SUM(revenue) as total_revenue FROM base_data GROUP BY user_id)SELECT * FROM aggregated_dataTroubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”JSON Query Performance
Section titled “JSON Query Performance”Problem: Slow JSON queries
Solutions:
- Create GIN indexes on JSONB columns
- Use appropriate JSON operators
- Avoid complex JSON operations in WHERE clauses
Debug Steps:
- Check index usage
- Analyze query execution plan
- Review JSON operator usage
- Optimize JSON queries
Window Function Performance
Section titled “Window Function Performance”Problem: Slow window functions
Solutions:
- Optimize window frame specifications
- Use appropriate partitioning
- Avoid unnecessary window functions
Debug Steps:
- Analyze query execution plan
- Check window frame specifications
- Review partitioning strategy
- Optimize window function usage
CTE Performance
Section titled “CTE Performance”Problem: Slow CTE queries
Solutions:
- Optimize CTE logic
- Use appropriate indexing
- Avoid complex CTE operations
Debug Steps:
- Analyze CTE execution plan
- Check indexing strategy
- Review CTE logic
- Optimize CTE queries