Skip to content

Pipeline Scheduling

Pipeline scheduling in Datablast uses cron expressions to define when pipelines should execute. This provides flexible and precise control over pipeline execution timing.

  • Cron Expressions: Standard cron syntax for scheduling
  • Timezone Support: UTC and custom timezone configuration
  • Backfill Support: Historical data processing
  • Schedule Validation: Built-in schedule validation
┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
│ │ │ │ │
* * * * *
# Daily at midnight UTC
schedule: "0 0 * * *"
# Daily at 6 AM UTC
schedule: "0 6 * * *"
# Daily at 2 PM UTC
schedule: "0 14 * * *"
# Daily at 11:30 PM UTC
schedule: "30 23 * * *"
# Every Monday at midnight
schedule: "0 0 * * 1"
# Every Friday at 5 PM
schedule: "0 17 * * 5"
# Every Sunday at 2 AM
schedule: "0 2 * * 0"
# First day of every month at midnight
schedule: "0 0 1 * *"
# 15th of every month at 10 AM
schedule: "0 10 15 * *"
# Last day of every month at 11 PM
schedule: "0 23 28-31 * *"
# Every hour
schedule: "0 * * * *"
# Every 6 hours
schedule: "0 */6 * * *"
# Every 2 hours during business hours (8 AM - 6 PM)
schedule: "0 8-18/2 * * *"
# Every 15 minutes
schedule: "*/15 * * * *"
# Every 30 minutes during business hours
schedule: "*/30 9-17 * * 1-5"
# Every 2 hours on weekdays
schedule: "0 */2 * * 1-5"
# Default UTC scheduling
id: utc-pipeline
schedule: "0 6 * * *" # 6 AM UTC
start_date: "2024-01-01"
# Process daily data at 2 AM UTC
id: daily-processing
schedule: "0 2 * * *"
start_date: "2024-01-01"
# Process data every 15 minutes
id: realtime-processing
schedule: "*/15 * * * *"
start_date: "2024-01-01"
# Process data every 6 hours
id: batch-processing
schedule: "0 */6 * * *"
start_date: "2024-01-01"
# Generate daily reports at 6 AM UTC
id: daily-reports
schedule: "0 6 * * *"
start_date: "2024-01-01"
# Generate weekly reports on Monday at 8 AM UTC
id: weekly-reports
schedule: "0 8 * * 1"
start_date: "2024-01-01"
# Generate monthly reports on the 1st at 9 AM UTC
id: monthly-reports
schedule: "0 9 1 * *"
start_date: "2024-01-01"
# Run data quality checks every 4 hours
id: data-quality
schedule: "0 */4 * * *"
start_date: "2024-01-01"
# Run health checks every 30 minutes
id: health-checks
schedule: "*/30 * * * *"
start_date: "2024-01-01"
# Run during business hours (9 AM - 5 PM UTC, Monday-Friday)
id: business-hours
schedule: "0 9-17 * * 1-5"
start_date: "2024-01-01"
# Run during off-peak hours (10 PM - 6 AM UTC)
id: off-peak
schedule: "0 22-23,0-5 * * *"
start_date: "2024-01-01"

Datablast automatically validates cron expressions:

# Valid cron expression
schedule: "0 6 * * *" # ✅ Valid
# Invalid cron expression
schedule: "6 AM daily" # ❌ Invalid
# Good: Appropriate frequency for data volume
id: "high-volume"
schedule: "0 */2 * * *" # Every 2 hours for high-volume data
# Avoid: Too frequent for low-volume data
id: "low-volume"
schedule: "*/5 * * * *" # Every 5 minutes for low-volume data
# Good: Schedule after source data is available
id: "dependent-pipeline"
schedule: "0 8 * * *" # 8 AM after source data at 6 AM
# Avoid: Schedule before source data is available
id: "early-pipeline"
schedule: "0 4 * * *" # 4 AM before source data at 6 AM
# Good: Business-appropriate schedule
id: "daily-reports"
schedule: "0 6 * * *" # 6 AM for daily reports
# Avoid: Inappropriate schedule
id: "daily-reports"
schedule: "0 2 * * *" # 2 AM (too early for daily reports)
# Good: Realistic start date
id: "new-pipeline"
schedule: "0 6 * * *"
start_date: "2024-01-01"
# Avoid: Unrealistic start date
id: "new-pipeline"
schedule: "0 6 * * *"
start_date: "2020-01-01" # Too far in the past

Problem: Pipeline fails to schedule

Solution: Use valid cron expressions

# Good: Valid cron expression
schedule: "0 6 * * *"
# Avoid: Invalid cron expression
schedule: "6 AM daily"

Problem: Multiple pipelines scheduled at the same time

Solution: Stagger pipeline schedules

# Good: Staggered schedules
id: "pipeline-1"
schedule: "0 6 * * *"
id: "pipeline-2"
schedule: "0 7 * * *"
# Avoid: Conflicting schedules
id: "pipeline-1"
schedule: "0 6 * * *"
id: "pipeline-2"
schedule: "0 6 * * *"