Pipeline Scheduling
Pipeline scheduling in Datablast uses cron expressions to define when pipelines should execute. This provides flexible and precise control over pipeline execution timing.
Key Scheduling Features
Section titled “Key Scheduling Features”- Cron Expressions: Standard cron syntax for scheduling
- Timezone Support: UTC and custom timezone configuration
- Backfill Support: Historical data processing
- Schedule Validation: Built-in schedule validation
Cron Expression Format
Section titled “Cron Expression Format”Basic Syntax
Section titled “Basic Syntax”┌───────────── minute (0 - 59)│ ┌───────────── hour (0 - 23)│ │ ┌───────────── day of month (1 - 31)│ │ │ ┌───────────── month (1 - 12)│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday)│ │ │ │ ││ │ │ │ │* * * * *Common Schedule Patterns
Section titled “Common Schedule Patterns”Daily Schedules
Section titled “Daily Schedules”# Daily at midnight UTCschedule: "0 0 * * *"
# Daily at 6 AM UTCschedule: "0 6 * * *"
# Daily at 2 PM UTCschedule: "0 14 * * *"
# Daily at 11:30 PM UTCschedule: "30 23 * * *"Weekly Schedules
Section titled “Weekly Schedules”# Every Monday at midnightschedule: "0 0 * * 1"
# Every Friday at 5 PMschedule: "0 17 * * 5"
# Every Sunday at 2 AMschedule: "0 2 * * 0"Monthly Schedules
Section titled “Monthly Schedules”# First day of every month at midnightschedule: "0 0 1 * *"
# 15th of every month at 10 AMschedule: "0 10 15 * *"
# Last day of every month at 11 PMschedule: "0 23 28-31 * *"Hourly Schedules
Section titled “Hourly Schedules”# Every hourschedule: "0 * * * *"
# Every 6 hoursschedule: "0 */6 * * *"
# Every 2 hours during business hours (8 AM - 6 PM)schedule: "0 8-18/2 * * *"Custom Intervals
Section titled “Custom Intervals”# Every 15 minutesschedule: "*/15 * * * *"
# Every 30 minutes during business hoursschedule: "*/30 9-17 * * 1-5"
# Every 2 hours on weekdaysschedule: "0 */2 * * 1-5"Timezone Configuration
Section titled “Timezone Configuration”UTC Scheduling (Default)
Section titled “UTC Scheduling (Default)”# Default UTC schedulingid: utc-pipelineschedule: "0 6 * * *" # 6 AM UTCstart_date: "2024-01-01"Schedule Examples by Use Case
Section titled “Schedule Examples by Use Case”Data Processing Pipelines
Section titled “Data Processing Pipelines”Daily Data Processing
Section titled “Daily Data Processing”# Process daily data at 2 AM UTCid: daily-processingschedule: "0 2 * * *"start_date: "2024-01-01"Real-time Data Processing
Section titled “Real-time Data Processing”# Process data every 15 minutesid: realtime-processingschedule: "*/15 * * * *"start_date: "2024-01-01"Batch Processing
Section titled “Batch Processing”# Process data every 6 hoursid: batch-processingschedule: "0 */6 * * *"start_date: "2024-01-01"Analytics and Reporting
Section titled “Analytics and Reporting”Daily Reports
Section titled “Daily Reports”# Generate daily reports at 6 AM UTCid: daily-reportsschedule: "0 6 * * *"start_date: "2024-01-01"Weekly Reports
Section titled “Weekly Reports”# Generate weekly reports on Monday at 8 AM UTCid: weekly-reportsschedule: "0 8 * * 1"start_date: "2024-01-01"Monthly Reports
Section titled “Monthly Reports”# Generate monthly reports on the 1st at 9 AM UTCid: monthly-reportsschedule: "0 9 1 * *"start_date: "2024-01-01"Data Quality and Monitoring
Section titled “Data Quality and Monitoring”Data Quality Checks
Section titled “Data Quality Checks”# Run data quality checks every 4 hoursid: data-qualityschedule: "0 */4 * * *"start_date: "2024-01-01"System Health Checks
Section titled “System Health Checks”# Run health checks every 30 minutesid: health-checksschedule: "*/30 * * * *"start_date: "2024-01-01"Advanced Scheduling Patterns
Section titled “Advanced Scheduling Patterns”Business Hours Scheduling
Section titled “Business Hours Scheduling”# Run during business hours (9 AM - 5 PM UTC, Monday-Friday)id: business-hoursschedule: "0 9-17 * * 1-5"start_date: "2024-01-01"Off-Peak Scheduling
Section titled “Off-Peak Scheduling”# Run during off-peak hours (10 PM - 6 AM UTC)id: off-peakschedule: "0 22-23,0-5 * * *"start_date: "2024-01-01"Schedule Validation
Section titled “Schedule Validation”Built-in Validation
Section titled “Built-in Validation”Datablast automatically validates cron expressions:
# Valid cron expressionschedule: "0 6 * * *" # ✅ Valid
# Invalid cron expressionschedule: "6 AM daily" # ❌ InvalidBest Practices
Section titled “Best Practices”1. Choose Appropriate Schedule Frequency
Section titled “1. Choose Appropriate Schedule Frequency”# Good: Appropriate frequency for data volumeid: "high-volume"schedule: "0 */2 * * *" # Every 2 hours for high-volume data
# Avoid: Too frequent for low-volume dataid: "low-volume"schedule: "*/5 * * * *" # Every 5 minutes for low-volume data2. Consider Data Dependencies
Section titled “2. Consider Data Dependencies”# Good: Schedule after source data is availableid: "dependent-pipeline"schedule: "0 8 * * *" # 8 AM after source data at 6 AM
# Avoid: Schedule before source data is availableid: "early-pipeline"schedule: "0 4 * * *" # 4 AM before source data at 6 AM3. Use Business-Appropriate Times
Section titled “3. Use Business-Appropriate Times”# Good: Business-appropriate scheduleid: "daily-reports"schedule: "0 6 * * *" # 6 AM for daily reports
# Avoid: Inappropriate scheduleid: "daily-reports"schedule: "0 2 * * *" # 2 AM (too early for daily reports)4. Set Appropriate Start Dates
Section titled “4. Set Appropriate Start Dates”# Good: Realistic start dateid: "new-pipeline"schedule: "0 6 * * *"start_date: "2024-01-01"
# Avoid: Unrealistic start dateid: "new-pipeline"schedule: "0 6 * * *"start_date: "2020-01-01" # Too far in the pastTroubleshooting
Section titled “Troubleshooting”Common Schedule Issues
Section titled “Common Schedule Issues”Invalid Cron Expression
Section titled “Invalid Cron Expression”Problem: Pipeline fails to schedule
Solution: Use valid cron expressions
# Good: Valid cron expressionschedule: "0 6 * * *"
# Avoid: Invalid cron expressionschedule: "6 AM daily"Schedule Conflicts
Section titled “Schedule Conflicts”Problem: Multiple pipelines scheduled at the same time
Solution: Stagger pipeline schedules
# Good: Staggered schedulesid: "pipeline-1"schedule: "0 6 * * *"
id: "pipeline-2"schedule: "0 7 * * *"
# Avoid: Conflicting schedulesid: "pipeline-1"schedule: "0 6 * * *"
id: "pipeline-2"schedule: "0 6 * * *"