Infrastructure
Overview
Self-hosted Prefect on AWS ECS with ALB, Aurora Serverless v2, and ECR.
Current production access: https://pipelines.rocketclub.online
All production infrastructure (VPC, ECS, Aurora, S3, IAM, Prefect API, etc.) is defined and deployed from the sibling
blog_infrarepository (see itsinfra/envs/prodstack). This page summarizes the key components that this repo depends on.
Cost: ~$46-71/month
Architecture
Internet → ALB (HTTP/HTTPS) → Prefect Server (ECS) → Aurora PostgreSQL (encrypted)
↓
Prefect Worker (ECS) → Flow Tasks
Components
Prefect Server
- Image:
<account-id>.dkr.ecr.<region>.amazonaws.com/blog-data:prefect-3-python3.12-ecs - Resources: Fargate task (default: 0.25 vCPU, 0.5 GB RAM via Terraform
task_size) - Port: 4200
- Command:
prefect server start --host 0.0.0.0 --port 4200 - Database: Connection URL constructed at runtime from Secrets Manager
Prefect Worker
- Image:
<account-id>.dkr.ecr.<region>.amazonaws.com/blog-data:prefect-3-python3.12-ecs - Resources: Fargate task (default: 0.25 vCPU, 0.5 GB RAM via Terraform
task_size) - Command:
prefect worker start --pool blog-data-pool --type ecs - API URL: Retrieved from ALB DNS (Terraform output)
- Note:
prefect-aws[ecs]and worker dependencies are baked into the base image.
Database
- Type: Aurora Serverless v2 PostgreSQL 15.13
- Scaling: 0.5-1 ACU
- Encryption: KMS encrypted at rest
- Performance Insights: Enabled with KMS encryption (7-day retention)
- Credentials: Stored in AWS Secrets Manager
- Endpoint: Aurora cluster endpoint (see blog_infra outputs or AWS console)
Security
- Database Password: AWS Secrets Manager (encrypted with RDS KMS key)
- RDS Encryption: KMS key with automatic rotation
- ECR Encryption: KMS encrypted
- HTTPS: Optional ACM certificate support (TLS 1.3)
- IAM: Assumable role with temporary credentials (see Security Setup below)
ECR
- Repository:
blog-data - Base Prefect tag:
prefect-3-python3.12-ecs - Architecture: amd64 (critical for Fargate)
- Encryption: KMS encrypted
- Source of truth: The base Prefect image is built and published by the
blog_infraCircleCIrefresh-prefect-imageworkflow into the sharedblog-datarepository. This repo does not build or push the base Prefect image directly.
Deployment
Deploy Infrastructure
Infrastructure is deployed via Terraform from the blog_infra repository
(infra/envs/prod stack) and its CI/CD pipelines. This repo does not apply
Terraform directly.
Verify
To confirm the platform is up after blog_infra has applied Terraform:
# Check Prefect UI via CloudFront
curl -I https://pipelines.rocketclub.online
Enable HTTPS (Optional)
HTTPS and custom domains for pipelines.rocketclub.online are configured via
the blog_infra Terraform stack. Refer to blog_infra/docs/architecture for
ACM and DNS details.
Monitoring
Health Check
# Prefect UI health (external)
curl -I https://pipelines.rocketclub.online || echo "Prefect UI not reachable"
# Prefect API health (from within the VPC or via port-forward)
curl -I https://pipelines.rocketclub.online/api/health || true
Logs
# Prefect API service
aws logs tail /ecs/prod/prefect-api --follow --region ${AWS_REGION:-eu-west-2}
# Prefect worker service
aws logs tail /ecs/prod/prefect-worker --follow --region ${AWS_REGION:-eu-west-2}
# Prefect deployer (one-off ECS task)
aws logs tail /ecs/prod/prefect-deployer --follow --region ${AWS_REGION:-eu-west-2}
Alarms
CloudWatch alarms (including any Prefect-related alarms) are defined and
managed in the blog_infra repository. See its documentation for the current
alarm set and thresholds.
Troubleshooting
Architecture Error
Symptom: exec /usr/bin/tini: exec format error
Fix: Delete ARM64 image, push amd64:
aws ecr batch-delete-image --repository-name blog-data \
--image-ids imageTag=prefect-3-python3.11 --profile ron --region eu-west-2
# Then push amd64 image (see above)
Worker Connection Error
Symptom: httpx.ConnectError: [Errno -2] Name or service not known
Fix: Verify PREFECT_API_URL uses the ALB DNS name for the Prefect API
surface (see worker configuration in blog_infra).
Database Timeout
Symptom: TimeoutError during server startup
Fix: Check security group allows Prefect server → RDS port 5432
Health Check Timeout
Symptom: Target "unhealthy" or "timeout"
Fix: Verify security group allows ALB → Prefect server port 4200
Security Groups
Prefect Server Security Group
- Ingress: Port 4200 from ALB and ECS tasks
- Egress: Port 5432 to RDS, 443 to internet, 53 for DNS
Database Security Group
- Ingress: Port 5432 from Prefect server and ECS tasks
ALB
- Ingress: Port 80 from internet
- Egress: Port 4200 to Prefect server
Security Setup
IAM Assumable Role Architecture
The pipeline uses an assumable IAM role instead of long-lived access keys for enhanced security:
Benefits:
- Yes Automatic credential rotation (1 hour default)
- Yes No secrets in Terraform state
- Yes Comprehensive audit trail via CloudTrail
- Yes Better security posture (AWS best practice)
- Yes No manual key rotation needed
Architecture:
ECS Tasks / Local Dev / CI/CD
↓
STS AssumeRole (temporary credentials)
↓
blog-data-pipeline-role (S3 permissions)
↓
S3 Buckets
ECS Tasks (Automatic)
ECS tasks automatically assume the pipeline role - no configuration needed:
- ECS task execution starts
- Task role assumes pipeline role via STS
- Temporary credentials are obtained
- Task accesses S3 with temporary credentials
- Credentials automatically expire after 1 hour
Local Development Setup
Step 1: Create AWS CLI Profile
Add to ~/.aws/config:
[profile blog-data-pipeline]
role_arn = arn:aws:iam::421115711209:role/blog-data-pipeline-role
source_profile = blog-data-terraform
duration_seconds = 3600
Step 2: Update .env.local
AWS_PROFILE=blog-data-pipeline
AWS_BUCKET_NAME=ron-website-docs
AWS_REGION=eu-west-2
AWS_BLOG_DATA_RAW_BUCKET=blog-data-raw
AWS_BLOG_DATA_CLEAN_BUCKET=blog-data-clean
Step 3: Test
# Verify profile works
aws sts get-caller-identity --profile blog-data-pipeline
# Test S3 access
aws s3 ls s3://blog-data-raw --profile blog-data-pipeline
CI/CD Setup
For CircleCI or other CI/CD, use STS AssumeRole:
# Get temporary credentials (matching blog_infra CI pattern)
CREDENTIALS=$(aws sts assume-role \
--role-arn arn:aws:iam::174051987565:role/OrganizationAccountAccessRole \
--role-session-name circleci-blog-data \
--duration-seconds 3600 \
--query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
--output text)
# Export credentials
export AWS_ACCESS_KEY_ID=$(echo "$CREDENTIALS" | awk '{print $1}')
export AWS_SECRET_ACCESS_KEY=$(echo "$CREDENTIALS" | awk '{print $2}')
export AWS_SESSION_TOKEN=$(echo "$CREDENTIALS" | awk '{print $3}')
IAM Resources
Pipeline Role:
- Name:
blog-data-pipeline-role - Permissions: Full S3 access to all pipeline buckets
- Trust Policy: Allows ECS task role and Terraform admin user to assume
S3 Buckets with Access:
blog-data-cachekit-instructionsdesign-filesblog-data-rawblog-data-clean
Monitoring Role Usage
View role assumptions in CloudTrail:
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=ResourceName,AttributeValue=blog-data-pipeline-role \
--max-results 50
IAM Role Troubleshooting
"User is not authorized to perform: sts:AssumeRole"
Check that the principal (ECS task role or Terraform admin user) is in the trust policy:
aws iam get-role --role-name blog-data-pipeline-role \
--query 'Role.AssumeRolePolicyDocument'
"Access Denied" when accessing S3
Verify role has S3 permissions:
aws iam list-role-policies --role-name blog-data-pipeline-role
aws iam get-role-policy --role-name blog-data-pipeline-role \
--policy-name blog-data-pipeline-policy
Credentials expire too quickly
Increase duration_seconds in AWS CLI profile or STS call (max 12 hours):
[profile blog-data-pipeline]
role_arn = arn:aws:iam::421115711209:role/blog-data-pipeline-role
source_profile = blog-data-terraform
duration_seconds = 43200 # 12 hours
Next Steps
- HTTPS: Add ACM certificate and HTTPS listener
- Domain: Configure
pipelines.rocketclub.online - Backstage: Deploy on same ALB
- Flows: Create work pool and deploy pipelines