Pipeline Connections
Pipeline connections in Datablast provide secure access to external systems, databases, and services. Connections are configured at the pipeline level and can be used by individual tasks.
Key Connection Features
Section titled “Key Connection Features”- Secure Storage: Encrypted connection credentials
- Environment Support: Environment-specific connections
- Connection Pooling: Efficient connection management
- Validation: Built-in connection validation
Connection Types
Section titled “Connection Types”Database Connections
Section titled “Database Connections”BigQuery Connection
Section titled “BigQuery Connection”# BigQuery connection configurationdefault_connections: gcpConnectionId: my-bigquery-connSnowflake Connection
Section titled “Snowflake Connection”# Snowflake connection configurationdefault_connections: snowflake: my-snowflake-connPostgreSQL Connection
Section titled “PostgreSQL Connection”# PostgreSQL connection configurationdefault_connections: postgres: my-postgres-connMySQL Connection
Section titled “MySQL Connection”# MySQL connection configurationdefault_connections: mysql: my-mysql-connAthena Connection
Section titled “Athena Connection”# Athena connection configurationdefault_connections: aws_conn_id: my-athena-connConnection Configuration
Section titled “Connection Configuration”Basic Connection Setup
Section titled “Basic Connection Setup”# Basic pipeline with connectionsid: analytics-pipelineschedule: "0 6 * * *"start_date: "2024-01-01"# Default connectionsdefault_connections: gcpConnectionId: analytics-gcp snowflake: analytics-snowflakeMultiple Connection Types
Section titled “Multiple Connection Types”# Pipeline with multiple connection typesid: multi-conn-pipelineschedule: "0 6 * * *"start_date: "2024-01-01"# Multiple connection typesdefault_connections: # Database connections gcpConnectionId: analytics-gcp snowflake: analytics-snowflake postgres: analytics-postgres # Cloud connections aws_conn_id: analytics-awsConnection Management
Section titled “Connection Management”Connection Naming Conventions
Section titled “Connection Naming Conventions”Descriptive Names
Section titled “Descriptive Names”# Good: Descriptive connection namesdefault_connections: gcpConnectionId: marketing-analytics-gcp snowflake: customer-data-snowflake# Avoid: Generic connection namesdefault_connections: gcpConnectionId: gcp1 snowflake: sf1Environment Prefixes
Section titled “Environment Prefixes”# Good: Environment-prefixed namesdefault_connections: gcpConnectionId: prod-analytics-gcp snowflake: prod-customer-snowflake# Avoid: No environment indicationdefault_connections: gcpConnectionId: analytics-gcp snowflake: customer-snowflakeConnection Validation
Section titled “Connection Validation”Built-in Validation
Section titled “Built-in Validation”Datablast automatically validates connections:
# Valid connection referencedefault_connections: gcpConnectionId: existing-gcp-conn # ✅ Valid
# Invalid connection referencedefault_connections: gcpConnectionId: non-existent-conn # ❌ InvalidManual Validation
Section titled “Manual Validation”You can validate connections manually:
# Validate connectiondatablast validate-connection my-gcp-conn# Test connectiondatablast test-connection my-gcp-connConnection Security
Section titled “Connection Security”Credential Management
Section titled “Credential Management”Secure Storage
Section titled “Secure Storage”Connections use encrypted credential storage:
# Credentials are stored securelydefault_connections: gcpConnectionId: secure-gcp-conn # Credentials are encrypted and stored separatelyAccess Control
Section titled “Access Control”Connections have access control:
# Connection access controldefault_connections: gcpConnectionId: restricted-gcp-conn # Access is controlled by connection permissionsBest Practices
Section titled “Best Practices”1. Use Environment Variables
Section titled “1. Use Environment Variables”# Good: Use environment variablesdefault_connections: gcpConnectionId: {{ env.GCP_CONNECTION }} snowflake: {{ env.SNOWFLAKE_CONNECTION }}# Avoid: Hard-coded connection namesdefault_connections: gcpConnectionId: prod-gcp-conn snowflake: prod-snowflake-conn2. Implement Connection Pooling
Section titled “2. Implement Connection Pooling”# Good: Connection pooling configurationdefault_connections: gcpConnectionId: pooled-gcp-conn # Connection pooling is configured automatically3. Monitor Connection Usage
Section titled “3. Monitor Connection Usage”# Good: Monitor connection usagedefault_connections: gcpConnectionId: monitored-gcp-conn # Connection usage is monitored automaticallyConnection Examples
Section titled “Connection Examples”Analytics Pipeline
Section titled “Analytics Pipeline”# Analytics pipeline with multiple connectionsid: analytics-pipelineschedule: "0 6 * * *"start_date: "2024-01-01"description: Daily analytics pipeline# Analytics-specific connectionsdefault_connections: gcpConnectionId: analytics-gcp snowflake: analytics-snowflakeData Processing Pipeline
Section titled “Data Processing Pipeline”# Data processing pipelineid: data-processingschedule: "0 4 * * *"start_date: "2024-01-01"description: Data processing pipeline# Data processing connectionsdefault_connections: gcpConnectionId: processing-gcp postgres: processing-postgres mysql: processing-mysqlMachine Learning Pipeline
Section titled “Machine Learning Pipeline”# Machine learning pipelineid: ml-pipelineschedule: "0 8 * * *"start_date: "2024-01-01"description: Machine learning pipeline# ML-specific connectionsdefault_connections: gcpConnectionId: ml-gcp snowflake: ml-snowflakeTroubleshooting
Section titled “Troubleshooting”Common Connection Issues
Section titled “Common Connection Issues”Invalid Connection Reference
Section titled “Invalid Connection Reference”Problem: Task fails due to invalid connection
Solution: Use valid connection IDs
# Good: Valid connection referencedefault_connections: gcpConnectionId: existing-gcp-conn# Avoid: Invalid connection referencedefault_connections: gcpConnectionId: non-existent-conn```
#### Connection Timeout
**Problem**: Connection timeout errors
**Solution**: Check connection configuration
```yaml# Good: Proper connection configurationdefault_connections: gcpConnectionId: configured-gcp-conn# Avoid: Misconfigured connectiondefault_connections: gcpConnectionId: misconfigured-gcp-conn```
#### Permission Issues
**Problem**: Connection permission errors
**Solution**: Check connection permissions
```yaml# Good: Proper permissionsdefault_connections: gcpConnectionId: permissioned-gcp-conn# Avoid: Insufficient permissionsdefault_connections: gcpConnectionId: restricted-gcp-conn```
### Debugging Connections
#### Connection Testing
Test connections before deployment:
```bash# Test connectiondatablast test-connection my-gcp-conn# Test with specific environmentdatablast test-connection my-gcp-conn --env prodConnection Logging
Section titled “Connection Logging”Enable connection logging for debugging:
# Enable connection loggingid: debug-pipelineschedule: "0 6 * * *"start_date: "2024-01-01"# Connection with loggingdefault_connections: gcpConnectionId: logged-gcp-conn # Connection logging is enabled