Infrastructure
Overview
Self-hosted Prefect on AWS ECS with ALB, Aurora Serverless v2, and ECR.
Current production access: https://pipelines.rocketclub.online
All production infrastructure (VPC, ECS, Aurora, S3, IAM, Prefect API, etc.) is defined and deployed from this repository via Terraform (see
infra/platform/infra/envs/prod). This page summarizes the key components that the data platform depends on.
Cost: ~$46-71/month
Architecture
Internet → ALB (HTTP/HTTPS) → Prefect Server (ECS) → Aurora PostgreSQL (encrypted)
↓
Prefect Worker (ECS) → Flow Tasks
Components
Prefect Server
- Image:
<account-id>.dkr.ecr.<region>.amazonaws.com/blog-data:prefect-3-python3.12-ecs - Resources: Fargate task (default: 0.25 vCPU, 0.5 GB RAM via Terraform
task_size) - Port: 4200
- Command:
prefect server start --host 0.0.0.0 --port 4200 - Database: Connection URL constructed at runtime from Secrets Manager
Prefect Worker
- Image:
<account-id>.dkr.ecr.<region>.amazonaws.com/blog-data:prefect-3-python3.12-ecs - Resources: Fargate task (default: 0.25 vCPU, 0.5 GB RAM via Terraform
task_size) - Command:
prefect worker start --pool blog-data-pool --type ecs - API URL: Retrieved from ALB DNS (Terraform output)
- Note:
prefect-aws[ecs]and worker dependencies are baked into the base image.
Database
- Type: Aurora Serverless v2 PostgreSQL 15.13
- Scaling: 0.5-1 ACU
- Encryption: KMS encrypted at rest
- Performance Insights: Enabled with KMS encryption (7-day retention)
- Credentials: Stored in AWS Secrets Manager
- Endpoint: Aurora cluster endpoint (see Terraform outputs or AWS console)
Security
- Database Password: AWS Secrets Manager (encrypted with RDS KMS key)
- RDS Encryption: KMS key with automatic rotation
- ECR Encryption: KMS encrypted
- HTTPS: Optional ACM certificate support (TLS 1.3)
- IAM: Assumable role with temporary credentials (see Security Setup below)
ECR
- Repository:
blog-data - Base Prefect tag:
prefect-3-python3.12-ecs - Architecture: amd64 (critical for Fargate)
- Encryption: KMS encrypted
- Source of truth: The base Prefect image is built and published by the
CircleCI
refresh-prefect-imageworkflow (in this repo) into the sharedblog-datarepository. This repo does not build or push the base Prefect image directly.
Deployment
Deploy Infrastructure
Infrastructure is deployed via Terraform from this repository
(infra/platform/infra/envs/prod) via CI/CD pipelines.
Verify
To confirm the platform is up after Terraform has been applied:
# Check Prefect UI via CloudFront
curl -I https://pipelines.rocketclub.online
Enable HTTPS (Optional)
HTTPS and custom domains for pipelines.rocketclub.online are configured via
the Terraform stack in this repository. Refer to
Networking Entry Points for ACM and DNS details.
Monitoring
Health Check
# Prefect UI health (external)
curl -I https://pipelines.rocketclub.online || echo "Prefect UI not reachable"
# Prefect API health (from within the VPC or via port-forward)
curl -I https://pipelines.rocketclub.online/api/health || true
Logs
# Prefect API service
aws logs tail /ecs/prod/prefect-api --follow --region ${AWS_REGION:-eu-west-2}
# Prefect worker service
aws logs tail /ecs/prod/prefect-worker --follow --region ${AWS_REGION:-eu-west-2}
# Prefect deployer (one-off ECS task)
aws logs tail /ecs/prod/prefect-deployer --follow --region ${AWS_REGION:-eu-west-2}
Alarms
CloudWatch alarms (including any Prefect-related alarms) are defined and managed in Terraform in this repository. See the Terraform config (and CloudWatch) for the current alarm set and thresholds.
Troubleshooting
Architecture Error
Symptom: exec /usr/bin/tini: exec format error
Fix: Delete ARM64 image, push amd64:
aws ecr batch-delete-image --repository-name blog-data \
--image-ids imageTag=prefect-3-python3.11 --profile ron --region eu-west-2
# Then push amd64 image (see above)
Worker Connection Error
Symptom: httpx.ConnectError: [Errno -2] Name or service not known
Fix: Verify PREFECT_API_URL uses the ALB DNS name for the Prefect API
surface (see the Terraform/ECS worker configuration under infra/platform/infra).
Database Timeout
Symptom: TimeoutError during server startup
Fix: Check security group allows Prefect server → RDS port 5432
Health Check Timeout
Symptom: Target "unhealthy" or "timeout"
Fix: Verify security group allows ALB → Prefect server port 4200
Security Groups
Prefect Server Security Group
- Ingress: Port 4200 from ALB and ECS tasks
- Egress: Port 5432 to RDS, 443 to internet, 53 for DNS
Database Security Group
- Ingress: Port 5432 from Prefect server and ECS tasks
ALB
- Ingress: Port 80 from internet
- Egress: Port 4200 to Prefect server
Security Setup
IAM Assumable Role Architecture
The pipeline uses an assumable IAM role instead of long-lived access keys for enhanced security:
Benefits:
- Yes Automatic credential rotation (1 hour default)
- Yes No secrets in Terraform state
- Yes Comprehensive audit trail via CloudTrail
- Yes Better security posture (AWS best practice)
- Yes No manual key rotation needed
Architecture:
ECS Tasks / Local Dev / CI/CD
↓
STS AssumeRole (temporary credentials)
↓
blog-data-pipeline-role (S3 permissions)
↓
S3 Buckets
ECS Tasks (Automatic)
ECS tasks automatically assume the pipeline role - no configuration needed:
- ECS task execution starts
- Task role assumes pipeline role via STS
- Temporary credentials are obtained
- Task accesses S3 with temporary credentials
- Credentials automatically expire after 1 hour
Local Development Setup
Step 1: Create AWS CLI Profile
Add to ~/.aws/config:
[profile blog-data-pipeline]
role_arn = arn:aws:iam::421115711209:role/blog-data-pipeline-role
source_profile = blog-data-terraform
duration_seconds = 3600
Step 2: Update .env.local
AWS_PROFILE=blog-data-pipeline
AWS_BUCKET_NAME=ron-website-docs
AWS_REGION=eu-west-2
AWS_BLOG_DATA_RAW_BUCKET=blog-data-raw
AWS_BLOG_DATA_CLEAN_BUCKET=blog-data-clean
Step 3: Test
# Verify profile works
aws sts get-caller-identity --profile blog-data-pipeline
# Test S3 access
aws s3 ls s3://blog-data-raw --profile blog-data-pipeline
CI/CD Setup
For CircleCI or other CI/CD, use STS AssumeRole:
# Get temporary credentials (matching this repo's CI pattern)
CREDENTIALS=$(aws sts assume-role \
--role-arn arn:aws:iam::174051987565:role/OrganizationAccountAccessRole \
--role-session-name circleci-blog-data \
--duration-seconds 3600 \
--query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
--output text)
# Export credentials
export AWS_ACCESS_KEY_ID=$(echo "$CREDENTIALS" | awk '{print $1}')
export AWS_SECRET_ACCESS_KEY=$(echo "$CREDENTIALS" | awk '{print $2}')
export AWS_SESSION_TOKEN=$(echo "$CREDENTIALS" | awk '{print $3}')
IAM Resources
Pipeline Role:
- Name:
blog-data-pipeline-role - Permissions: Full S3 access to all pipeline buckets
- Trust Policy: Allows ECS task role and Terraform admin user to assume
S3 Buckets with Access:
blog-data-cachekit-instructionsdesign-filesblog-data-rawblog-data-clean
Monitoring Role Usage
View role assumptions in CloudTrail:
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=ResourceName,AttributeValue=blog-data-pipeline-role \
--max-results 50
IAM Role Troubleshooting
"User is not authorized to perform: sts:AssumeRole"
Check that the principal (ECS task role or Terraform admin user) is in the trust policy:
aws iam get-role --role-name blog-data-pipeline-role \
--query 'Role.AssumeRolePolicyDocument'
"Access Denied" when accessing S3
Verify role has S3 permissions:
aws iam list-role-policies --role-name blog-data-pipeline-role
aws iam get-role-policy --role-name blog-data-pipeline-role \
--policy-name blog-data-pipeline-policy
Credentials expire too quickly
Increase duration_seconds in AWS CLI profile or STS call (max 12 hours):
[profile blog-data-pipeline]
role_arn = arn:aws:iam::421115711209:role/blog-data-pipeline-role
source_profile = blog-data-terraform
duration_seconds = 43200 # 12 hours
Next Steps
- HTTPS: Add ACM certificate and HTTPS listener
- Domain: Configure
pipelines.rocketclub.online - Backstage: Deploy on same ALB
- Flows: Create work pool and deploy pipelines