Infrastructure

Overview

Self-hosted Prefect on AWS ECS with ALB, Aurora Serverless v2, and ECR.

Current production access: https://pipelines.rocketclub.online

All production infrastructure (VPC, ECS, Aurora, S3, IAM, Prefect API, etc.) is defined and deployed from this repository via Terraform (see infra/platform/infra/envs/prod). This page summarizes the key components that the data platform depends on.

Cost: ~$46-71/month

Architecture

Internet → ALB (HTTP/HTTPS) → Prefect Server (ECS) → Aurora PostgreSQL (encrypted)

                               Prefect Worker (ECS) → Flow Tasks

Components

Prefect Server

  • Image: <account-id>.dkr.ecr.<region>.amazonaws.com/blog-data:prefect-3-python3.12-ecs
  • Resources: Fargate task (default: 0.25 vCPU, 0.5 GB RAM via Terraform task_size)
  • Port: 4200
  • Command: prefect server start --host 0.0.0.0 --port 4200
  • Database: Connection URL constructed at runtime from Secrets Manager

Prefect Worker

  • Image: <account-id>.dkr.ecr.<region>.amazonaws.com/blog-data:prefect-3-python3.12-ecs
  • Resources: Fargate task (default: 0.25 vCPU, 0.5 GB RAM via Terraform task_size)
  • Command: prefect worker start --pool blog-data-pool --type ecs
  • API URL: Retrieved from ALB DNS (Terraform output)
  • Note: prefect-aws[ecs] and worker dependencies are baked into the base image.

Database

  • Type: Aurora Serverless v2 PostgreSQL 15.13
  • Scaling: 0.5-1 ACU
  • Encryption: KMS encrypted at rest
  • Performance Insights: Enabled with KMS encryption (7-day retention)
  • Credentials: Stored in AWS Secrets Manager
  • Endpoint: Aurora cluster endpoint (see Terraform outputs or AWS console)

Security

  • Database Password: AWS Secrets Manager (encrypted with RDS KMS key)
  • RDS Encryption: KMS key with automatic rotation
  • ECR Encryption: KMS encrypted
  • HTTPS: Optional ACM certificate support (TLS 1.3)
  • IAM: Assumable role with temporary credentials (see Security Setup below)

ECR

  • Repository: blog-data
  • Base Prefect tag: prefect-3-python3.12-ecs
  • Architecture: amd64 (critical for Fargate)
  • Encryption: KMS encrypted
  • Source of truth: The base Prefect image is built and published by the CircleCI refresh-prefect-image workflow (in this repo) into the shared blog-data repository. This repo does not build or push the base Prefect image directly.

Deployment

Deploy Infrastructure

Infrastructure is deployed via Terraform from this repository (infra/platform/infra/envs/prod) via CI/CD pipelines.

Verify

To confirm the platform is up after Terraform has been applied:

# Check Prefect UI via CloudFront
curl -I https://pipelines.rocketclub.online

Enable HTTPS (Optional)

HTTPS and custom domains for pipelines.rocketclub.online are configured via the Terraform stack in this repository. Refer to Networking Entry Points for ACM and DNS details.

Monitoring

Health Check

# Prefect UI health (external)
curl -I https://pipelines.rocketclub.online || echo "Prefect UI not reachable"

# Prefect API health (from within the VPC or via port-forward)
curl -I https://pipelines.rocketclub.online/api/health || true

Logs

# Prefect API service
aws logs tail /ecs/prod/prefect-api --follow --region ${AWS_REGION:-eu-west-2}

# Prefect worker service
aws logs tail /ecs/prod/prefect-worker --follow --region ${AWS_REGION:-eu-west-2}

# Prefect deployer (one-off ECS task)
aws logs tail /ecs/prod/prefect-deployer --follow --region ${AWS_REGION:-eu-west-2}

Alarms

CloudWatch alarms (including any Prefect-related alarms) are defined and managed in Terraform in this repository. See the Terraform config (and CloudWatch) for the current alarm set and thresholds.

Troubleshooting

Architecture Error

Symptom: exec /usr/bin/tini: exec format error

Fix: Delete ARM64 image, push amd64:

aws ecr batch-delete-image --repository-name blog-data \
  --image-ids imageTag=prefect-3-python3.11 --profile ron --region eu-west-2

# Then push amd64 image (see above)

Worker Connection Error

Symptom: httpx.ConnectError: [Errno -2] Name or service not known

Fix: Verify PREFECT_API_URL uses the ALB DNS name for the Prefect API surface (see the Terraform/ECS worker configuration under infra/platform/infra).

Database Timeout

Symptom: TimeoutError during server startup

Fix: Check security group allows Prefect server → RDS port 5432

Health Check Timeout

Symptom: Target "unhealthy" or "timeout"

Fix: Verify security group allows ALB → Prefect server port 4200

Security Groups

Prefect Server Security Group

  • Ingress: Port 4200 from ALB and ECS tasks
  • Egress: Port 5432 to RDS, 443 to internet, 53 for DNS

Database Security Group

  • Ingress: Port 5432 from Prefect server and ECS tasks

ALB

  • Ingress: Port 80 from internet
  • Egress: Port 4200 to Prefect server

Security Setup

IAM Assumable Role Architecture

The pipeline uses an assumable IAM role instead of long-lived access keys for enhanced security:

Benefits:

  • Yes Automatic credential rotation (1 hour default)
  • Yes No secrets in Terraform state
  • Yes Comprehensive audit trail via CloudTrail
  • Yes Better security posture (AWS best practice)
  • Yes No manual key rotation needed

Architecture:

ECS Tasks / Local Dev / CI/CD

STS AssumeRole (temporary credentials)

blog-data-pipeline-role (S3 permissions)

S3 Buckets

ECS Tasks (Automatic)

ECS tasks automatically assume the pipeline role - no configuration needed:

  1. ECS task execution starts
  2. Task role assumes pipeline role via STS
  3. Temporary credentials are obtained
  4. Task accesses S3 with temporary credentials
  5. Credentials automatically expire after 1 hour

Local Development Setup

Step 1: Create AWS CLI Profile

Add to ~/.aws/config:

[profile blog-data-pipeline]
role_arn = arn:aws:iam::421115711209:role/blog-data-pipeline-role
source_profile = blog-data-terraform
duration_seconds = 3600

Step 2: Update .env.local

AWS_PROFILE=blog-data-pipeline
AWS_BUCKET_NAME=ron-website-docs
AWS_REGION=eu-west-2
AWS_BLOG_DATA_RAW_BUCKET=blog-data-raw
AWS_BLOG_DATA_CLEAN_BUCKET=blog-data-clean

Step 3: Test

# Verify profile works
aws sts get-caller-identity --profile blog-data-pipeline

# Test S3 access
aws s3 ls s3://blog-data-raw --profile blog-data-pipeline

CI/CD Setup

For CircleCI or other CI/CD, use STS AssumeRole:

# Get temporary credentials (matching this repo's CI pattern)
CREDENTIALS=$(aws sts assume-role \
  --role-arn arn:aws:iam::174051987565:role/OrganizationAccountAccessRole \
  --role-session-name circleci-blog-data \
  --duration-seconds 3600 \
  --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
  --output text)

# Export credentials
export AWS_ACCESS_KEY_ID=$(echo "$CREDENTIALS" | awk '{print $1}')
export AWS_SECRET_ACCESS_KEY=$(echo "$CREDENTIALS" | awk '{print $2}')
export AWS_SESSION_TOKEN=$(echo "$CREDENTIALS" | awk '{print $3}')

IAM Resources

Pipeline Role:

  • Name: blog-data-pipeline-role
  • Permissions: Full S3 access to all pipeline buckets
  • Trust Policy: Allows ECS task role and Terraform admin user to assume

S3 Buckets with Access:

  • blog-data-cache
  • kit-instructions
  • design-files
  • blog-data-raw
  • blog-data-clean

Monitoring Role Usage

View role assumptions in CloudTrail:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=ResourceName,AttributeValue=blog-data-pipeline-role \
  --max-results 50

IAM Role Troubleshooting

"User is not authorized to perform: sts:AssumeRole"

Check that the principal (ECS task role or Terraform admin user) is in the trust policy:

aws iam get-role --role-name blog-data-pipeline-role \
  --query 'Role.AssumeRolePolicyDocument'

"Access Denied" when accessing S3

Verify role has S3 permissions:

aws iam list-role-policies --role-name blog-data-pipeline-role
aws iam get-role-policy --role-name blog-data-pipeline-role \
  --policy-name blog-data-pipeline-policy

Credentials expire too quickly

Increase duration_seconds in AWS CLI profile or STS call (max 12 hours):

[profile blog-data-pipeline]
role_arn = arn:aws:iam::421115711209:role/blog-data-pipeline-role
source_profile = blog-data-terraform
duration_seconds = 43200  # 12 hours

Next Steps

  1. HTTPS: Add ACM certificate and HTTPS listener
  2. Domain: Configure pipelines.rocketclub.online
  3. Backstage: Deploy on same ALB
  4. Flows: Create work pool and deploy pipelines