Python Task Overview
Python tasks in Datablast allow you to execute complex data processing, machine learning, and custom logic using Python. This guide covers the basic configuration methods and key features.
Configuration Methods
Section titled “Configuration Methods”Method 1: Annotation-based Configuration
Section titled “Method 1: Annotation-based Configuration”Define task information directly in your Python file using annotations:
# @blast.type: python# @blast.description: Generate churn predictions using trained model# @blast.depends: ml_models.churn_model_train# @blast.instance: d1.large# @blast.secrets: ML_API_KEY:ML_API_KEY,MODEL_SECRET:MODEL_SECRET
import osimport pandas as pdimport numpy as npfrom datetime import datetime
# Get execution date from environment variablesexecution_date = os.getenv('BLAST_START_DATE')
# Your Python logic hereresult = process_data(execution_date)
print(f"Successfully processed data for {execution_date}")
def process_data(execution_date): """Process data for the given execution date.""" # Implementation here return "Processing completed"Method 2: YAML Configuration
Section titled “Method 2: YAML Configuration”Define task information in a separate YAML file:
name: "ml_models.churn_prediction"type: "python"description: "Generate churn predictions using trained model"depends: - ml_models.churn_model_trainrun: "churn_prediction.py"instance: "d1.large"secrets: - "ML_API_KEY:ML_API_KEY" - "MODEL_SECRET:MODEL_SECRET"Basic Configuration
Section titled “Basic Configuration”Required Fields
Section titled “Required Fields”name: "task.name" # Unique task identifiertype: "python" # Task typedescription: "Task description" # Human-readable descriptionrun: "script.py" # Script file to executeOptional Fields
Section titled “Optional Fields”depends: - task1 - task2root_dir: "tasks/ml_models" # Root directory for task filesinstance: "d1.medium" # Compute instance typesecrets: - "SECRET_NAME:ENV_VAR_NAME" # Secret managementInstance Types
Section titled “Instance Types”The platform supports the following instance types for Python tasks:
| Instance Type | CPU Limit | Memory Limit | CPU Request | Memory Request | Use Case |
|---|---|---|---|---|---|
d1.nano | 250m | 512Mi | 250m | 256Mi | Lightweight tasks, testing (Default) |
d1.small | 500m | 1200Mi | 500m | 1Gi | Small data processing |
d1.medium | 750m | 2400Mi | 750m | 2Gi | Medium workloads |
d1.large | 1 | 4400Mi | 1 | 4Gi | Large data processing |
d1.xlarge | 2 | 6600Mi | 2 | 6Gi | Heavy workloads, ML training |
Default Instance: d1.nano - No need to specify unless you need more resources.
⚠️ Important: Using instance types other than d1.nano may incur additional charges. Please consult with your Datablast representative for pricing details before upgrading instance types.
Environment Variables
Section titled “Environment Variables”Python tasks receive date information through environment variables:
import osfrom datetime import datetime
# Access date variables through environment variablesdata_interval_start = os.getenv('BLAST_DATA_INTERVAL_START')data_interval_end = os.getenv('BLAST_DATA_INTERVAL_END')start_date = os.getenv('BLAST_START_DATE')end_date = os.getenv('BLAST_END_DATE')start_date_nodash = os.getenv('BLAST_START_DATE_NODASH')end_date_nodash = os.getenv('BLAST_END_DATE_NODASH')
# Convert to datetime objects if neededstart_dt = datetime.fromisoformat(data_interval_start.replace('Z', '+00:00'))end_dt = datetime.fromisoformat(data_interval_end.replace('Z', '+00:00'))
print(f"Processing data from {start_dt} to {end_dt}")Available Environment Variables
Section titled “Available Environment Variables”| Variable | Description | Example |
|---|---|---|
BLAST_DATA_INTERVAL_START | Start of data interval | 2024-01-15T00:00:00+00:00 |
BLAST_DATA_INTERVAL_END | End of data interval | 2024-01-16T00:00:00+00:00 |
BLAST_START_DATE | Data interval start date | 2024-01-15 |
BLAST_END_DATE | Data interval end date | 2024-01-16 |
BLAST_START_DATE_NODASH | Start date without dashes | 20240115 |
BLAST_END_DATE_NODASH | End date without dashes | 20240116 |
Secret Management
Section titled “Secret Management”Using Secrets in Python Tasks
Section titled “Using Secrets in Python Tasks”# @blast.secrets: my_secret:my_secret_in_env, another_secret:another_secret_var
import os
# Access secrets through environment variablesfirst_secret = os.getenv("my_secret_in_env")second_secret = os.getenv("another_secret_var")
# Use secrets in your codeapi_key = os.getenv("ML_API_KEY")database_password = os.getenv("DB_PASSWORD")Secret Configuration
Section titled “Secret Configuration”secrets: - "ML_API_KEY:ML_API_KEY" - "DB_PASSWORD:DB_PASSWORD" - "ENCRYPTION_KEY:ENCRYPTION_KEY"The format is: name_on_scheduler:name_to_be_exported_on_script
Python Dependencies
Section titled “Python Dependencies”Requirements File Location
Section titled “Requirements File Location”The platform searches for requirements.txt files hierarchically:
- Task Directory: Look for
requirements.txtin the same directory as your Python task - Parent Directories: Search upward through parent directories
- Repository Root: Check the root directory of your repository
Example Requirements Organization
Section titled “Example Requirements Organization”your-project/├── requirements.txt # Global dependencies└── tasks/ ├── ml_models/ │ ├── churn_model.py │ └── requirements.txt # ML-specific dependencies └── export/ ├── csv_export.py └── requirements.txt # Export-specific dependenciesBest Practices
Section titled “Best Practices”Code Structure
Section titled “Code Structure”- Single Responsibility: Each task should have one clear purpose
- Error Handling: Implement proper error handling and logging
- Resource Efficiency: Use appropriate resources for task complexity
- Testing: Include comprehensive tests for critical functions
Performance
Section titled “Performance”- Resource Right-sizing: Match resources to task requirements
- Data Processing: Optimize data processing and use appropriate data structures
- Memory Management: Monitor memory usage and data sizes
- Caching: Implement caching where appropriate
Next Steps
Section titled “Next Steps”- Instance Types – Resource allocation and instance configuration
- Environment Variables – Date variables and dynamic configuration
- Dependencies – Python package management and requirements
- Secret Management – Secure credential handling
- Code Structure – Best practices for Python task development
- Error Handling – Robust error handling and logging