Skip to content

Tasks

Tasks are the building blocks of Datablast pipelines. Each task represents a unit of work that can be executed as part of your data processing workflow.

Execute SQL queries against various databases:

  • BigQuery (bq.sql) - Google Cloud BigQuery
  • Snowflake (sf.sql) - Snowflake data warehouse
  • Athena (athena.sql) - AWS Athena
  • PostgreSQL (pg.sql) - PostgreSQL databases

Execute Python scripts for data processing:

  • Python (python) - Custom Python logic and ML workflows

Wait for external conditions before proceeding:

  • BigQuery Sensors - Wait for tables, partitions, or query results
  • Cloud Storage Sensors - Wait for files in GCS or S3

Tasks can be configured using:

  1. YAML Files - Separate configuration files
  2. Annotations - Configuration directly in code files
name: task.name # Unique task identifier
type: bq.sql # Task type
description: Task description # Human-readable description
depends:
- task1
- task2
run: script.sql # Script file to execute

For detailed configuration examples and advanced features:

For best practices and advanced topics: