SQL Task Overview
SQL tasks in Datablast allow you to execute data transformation and analysis queries against various databases. This guide covers the supported SQL dialects and basic configuration methods.
Supported SQL Dialects
Section titled “Supported SQL Dialects”| Task Type | Database | Description |
|---|---|---|
bq.sql | BigQuery | Google Cloud BigQuery with automatic materialization |
sf.sql | Snowflake | Snowflake data warehouse |
athena.sql | AWS Athena | Amazon Athena for S3 data analysis |
pg.sql | PostgreSQL | PostgreSQL databases |
Configuration Methods
Section titled “Configuration Methods”Method 1: Annotation-based Configuration
Section titled “Method 1: Annotation-based Configuration”Define task information directly in your SQL file using annotations:
-- @blast.name: core_model.users-- @blast.type: bq.sql-- @blast.description: Create users table from activity data-- @blast.depends: core_model.activity_daily-- @blast.materialization.type: table-- @blast.materialization.strategy: create+replace-- @blast.materialization.partition_by: dt-- @blast.materialization.cluster_by: user_id,event_type-- @blast.tests.columns.user_id: not_null,unique-- @blast.tests.columns.daily_active_users: not_null,positive
SELECT user_id, event_type, COUNT(*) as event_count, CURRENT_TIMESTAMP() as created_atFROM staging.eventsWHERE event_date = '{{ ds }}'GROUP BY 1, 2Method 2: YAML Configuration
Section titled “Method 2: YAML Configuration”Define task information in a separate YAML file:
name: "core_model.users"type: "bq.sql"description: "Create users table from activity data"depends: - core_model.activity_dailyrun: "users.sql"
# Materialization settingsmaterialization: type: "table" strategy: "create+replace" partition_by: "dt" cluster_by: ["user_id", "event_type"]
# Data quality teststests: columns: user_id: - name: "not_null" # blocking: true (default) - name: "unique" blocking: false # Must specify to override default daily_active_users: - name: "not_null" # blocking: true (default) - name: "positive" # blocking: true (default)Basic Configuration
Section titled “Basic Configuration”Required Fields
Section titled “Required Fields”name: "task.name" # Unique task identifiertype: "bq.sql" # Task typedescription: "Task description" # Human-readable descriptionrun: "script.sql" # Script file to executeOptional Fields
Section titled “Optional Fields”depends: - task1 - task2root_dir: "tasks/analytics" # Root directory for task filesconnections: gcp: "my-gcp-conn" # Database connectionsTask Dependencies
Section titled “Task Dependencies”Simple Dependencies
Section titled “Simple Dependencies”depends: - task1 - task2Conditional Dependencies
Section titled “Conditional Dependencies”depends: - "task1" - "task2" condition: "task1.status == 'success'"Connection Configuration
Section titled “Connection Configuration”Database Connections
Section titled “Database Connections”connections: gcp: "my-gcp-conn" # BigQuery connection snowflake: "my-snowflake-conn" # Snowflake connection aws: "my-aws-conn" # AWS connection postgres: "my-pg-conn" # PostgreSQL connectionBest Practices
Section titled “Best Practices”Task Design
Section titled “Task Design”- Single Responsibility: Each task should have one clear purpose
- Idempotency: Tasks should be safe to run multiple times
- Error Handling: Implement proper error handling and logging
- Resource Efficiency: Use appropriate resources for task complexity
Performance
Section titled “Performance”- Incremental Processing: Use incremental strategies for large datasets
- Partitioning: Implement appropriate partitioning strategies
- Clustering: Use clustering for frequently queried columns
- Resource Right-sizing: Match resources to task requirements
Next Steps
Section titled “Next Steps”- BigQuery Tasks – BigQuery-specific features and configuration
- Snowflake Tasks – Snowflake-specific features and configuration
- Athena Tasks – Athena-specific features and configuration
- PostgreSQL Tasks – PostgreSQL-specific features and configuration
- Materialization Strategies – Table and view materialization options
- Data Quality Testing – Built-in testing framework