Skip to content

Pipeline Connections

Pipeline connections in Datablast provide secure access to external systems, databases, and services. Connections are configured at the pipeline level and can be used by individual tasks.

  • Secure Storage: Encrypted connection credentials
  • Environment Support: Environment-specific connections
  • Connection Pooling: Efficient connection management
  • Validation: Built-in connection validation
# BigQuery connection configuration
default_connections:
gcpConnectionId: my-bigquery-conn
# Snowflake connection configuration
default_connections:
snowflake: my-snowflake-conn
# PostgreSQL connection configuration
default_connections:
postgres: my-postgres-conn
# MySQL connection configuration
default_connections:
mysql: my-mysql-conn
# Athena connection configuration
default_connections:
aws_conn_id: my-athena-conn
# Basic pipeline with connections
id: analytics-pipeline
schedule: "0 6 * * *"
start_date: "2024-01-01"
# Default connections
default_connections:
gcpConnectionId: analytics-gcp
snowflake: analytics-snowflake
# Pipeline with multiple connection types
id: multi-conn-pipeline
schedule: "0 6 * * *"
start_date: "2024-01-01"
# Multiple connection types
default_connections:
# Database connections
gcpConnectionId: analytics-gcp
snowflake: analytics-snowflake
postgres: analytics-postgres
# Cloud connections
aws_conn_id: analytics-aws
# Good: Descriptive connection names
default_connections:
gcpConnectionId: marketing-analytics-gcp
snowflake: customer-data-snowflake
# Avoid: Generic connection names
default_connections:
gcpConnectionId: gcp1
snowflake: sf1
# Good: Environment-prefixed names
default_connections:
gcpConnectionId: prod-analytics-gcp
snowflake: prod-customer-snowflake
# Avoid: No environment indication
default_connections:
gcpConnectionId: analytics-gcp
snowflake: customer-snowflake

Datablast automatically validates connections:

# Valid connection reference
default_connections:
gcpConnectionId: existing-gcp-conn # ✅ Valid
# Invalid connection reference
default_connections:
gcpConnectionId: non-existent-conn # ❌ Invalid

You can validate connections manually:

Terminal window
# Validate connection
datablast validate-connection my-gcp-conn
# Test connection
datablast test-connection my-gcp-conn

Connections use encrypted credential storage:

# Credentials are stored securely
default_connections:
gcpConnectionId: secure-gcp-conn # Credentials are encrypted and stored separately

Connections have access control:

# Connection access control
default_connections:
gcpConnectionId: restricted-gcp-conn # Access is controlled by connection permissions
# Good: Use environment variables
default_connections:
gcpConnectionId: {{ env.GCP_CONNECTION }}
snowflake: {{ env.SNOWFLAKE_CONNECTION }}
# Avoid: Hard-coded connection names
default_connections:
gcpConnectionId: prod-gcp-conn
snowflake: prod-snowflake-conn
# Good: Connection pooling configuration
default_connections:
gcpConnectionId: pooled-gcp-conn # Connection pooling is configured automatically
# Good: Monitor connection usage
default_connections:
gcpConnectionId: monitored-gcp-conn # Connection usage is monitored automatically
# Analytics pipeline with multiple connections
id: analytics-pipeline
schedule: "0 6 * * *"
start_date: "2024-01-01"
description: Daily analytics pipeline
# Analytics-specific connections
default_connections:
gcpConnectionId: analytics-gcp
snowflake: analytics-snowflake
# Data processing pipeline
id: data-processing
schedule: "0 4 * * *"
start_date: "2024-01-01"
description: Data processing pipeline
# Data processing connections
default_connections:
gcpConnectionId: processing-gcp
postgres: processing-postgres
mysql: processing-mysql
# Machine learning pipeline
id: ml-pipeline
schedule: "0 8 * * *"
start_date: "2024-01-01"
description: Machine learning pipeline
# ML-specific connections
default_connections:
gcpConnectionId: ml-gcp
snowflake: ml-snowflake

Problem: Task fails due to invalid connection

Solution: Use valid connection IDs

# Good: Valid connection reference
default_connections:
gcpConnectionId: existing-gcp-conn
# Avoid: Invalid connection reference
default_connections:
gcpConnectionId: non-existent-conn```
#### Connection Timeout
**Problem**: Connection timeout errors
**Solution**: Check connection configuration
```yaml
# Good: Proper connection configuration
default_connections:
gcpConnectionId: configured-gcp-conn
# Avoid: Misconfigured connection
default_connections:
gcpConnectionId: misconfigured-gcp-conn```
#### Permission Issues
**Problem**: Connection permission errors
**Solution**: Check connection permissions
```yaml
# Good: Proper permissions
default_connections:
gcpConnectionId: permissioned-gcp-conn
# Avoid: Insufficient permissions
default_connections:
gcpConnectionId: restricted-gcp-conn```
### Debugging Connections
#### Connection Testing
Test connections before deployment:
```bash
# Test connection
datablast test-connection my-gcp-conn
# Test with specific environment
datablast test-connection my-gcp-conn --env prod

Enable connection logging for debugging:

# Enable connection logging
id: debug-pipeline
schedule: "0 6 * * *"
start_date: "2024-01-01"
# Connection with logging
default_connections:
gcpConnectionId: logged-gcp-conn # Connection logging is enabled