Infrastructure

Overview

Self-hosted Prefect on AWS ECS with ALB, Aurora Serverless v2, and ECR.

Current production access: https://pipelines.rocketclub.online

All production infrastructure (VPC, ECS, Aurora, S3, IAM, Prefect API, etc.) is defined and deployed from the sibling blog_infra repository (see its infra/envs/prod stack). This page summarizes the key components that this repo depends on.

Cost: ~$46-71/month

Architecture

Internet → ALB (HTTP/HTTPS) → Prefect Server (ECS) → Aurora PostgreSQL (encrypted)

                               Prefect Worker (ECS) → Flow Tasks

Components

Prefect Server

  • Image: <account-id>.dkr.ecr.<region>.amazonaws.com/blog-data:prefect-3-python3.12-ecs
  • Resources: Fargate task (default: 0.25 vCPU, 0.5 GB RAM via Terraform task_size)
  • Port: 4200
  • Command: prefect server start --host 0.0.0.0 --port 4200
  • Database: Connection URL constructed at runtime from Secrets Manager

Prefect Worker

  • Image: <account-id>.dkr.ecr.<region>.amazonaws.com/blog-data:prefect-3-python3.12-ecs
  • Resources: Fargate task (default: 0.25 vCPU, 0.5 GB RAM via Terraform task_size)
  • Command: prefect worker start --pool blog-data-pool --type ecs
  • API URL: Retrieved from ALB DNS (Terraform output)
  • Note: prefect-aws[ecs] and worker dependencies are baked into the base image.

Database

  • Type: Aurora Serverless v2 PostgreSQL 15.13
  • Scaling: 0.5-1 ACU
  • Encryption: KMS encrypted at rest
  • Performance Insights: Enabled with KMS encryption (7-day retention)
  • Credentials: Stored in AWS Secrets Manager
  • Endpoint: Aurora cluster endpoint (see blog_infra outputs or AWS console)

Security

  • Database Password: AWS Secrets Manager (encrypted with RDS KMS key)
  • RDS Encryption: KMS key with automatic rotation
  • ECR Encryption: KMS encrypted
  • HTTPS: Optional ACM certificate support (TLS 1.3)
  • IAM: Assumable role with temporary credentials (see Security Setup below)

ECR

  • Repository: blog-data
  • Base Prefect tag: prefect-3-python3.12-ecs
  • Architecture: amd64 (critical for Fargate)
  • Encryption: KMS encrypted
  • Source of truth: The base Prefect image is built and published by the blog_infra CircleCI refresh-prefect-image workflow into the shared blog-data repository. This repo does not build or push the base Prefect image directly.

Deployment

Deploy Infrastructure

Infrastructure is deployed via Terraform from the blog_infra repository (infra/envs/prod stack) and its CI/CD pipelines. This repo does not apply Terraform directly.

Verify

To confirm the platform is up after blog_infra has applied Terraform:

# Check Prefect UI via CloudFront
curl -I https://pipelines.rocketclub.online

Enable HTTPS (Optional)

HTTPS and custom domains for pipelines.rocketclub.online are configured via the blog_infra Terraform stack. Refer to blog_infra/docs/architecture for ACM and DNS details.

Monitoring

Health Check

# Prefect UI health (external)
curl -I https://pipelines.rocketclub.online || echo "Prefect UI not reachable"

# Prefect API health (from within the VPC or via port-forward)
curl -I https://pipelines.rocketclub.online/api/health || true

Logs

# Prefect API service
aws logs tail /ecs/prod/prefect-api --follow --region ${AWS_REGION:-eu-west-2}

# Prefect worker service
aws logs tail /ecs/prod/prefect-worker --follow --region ${AWS_REGION:-eu-west-2}

# Prefect deployer (one-off ECS task)
aws logs tail /ecs/prod/prefect-deployer --follow --region ${AWS_REGION:-eu-west-2}

Alarms

CloudWatch alarms (including any Prefect-related alarms) are defined and managed in the blog_infra repository. See its documentation for the current alarm set and thresholds.

Troubleshooting

Architecture Error

Symptom: exec /usr/bin/tini: exec format error

Fix: Delete ARM64 image, push amd64:

aws ecr batch-delete-image --repository-name blog-data \
  --image-ids imageTag=prefect-3-python3.11 --profile ron --region eu-west-2

# Then push amd64 image (see above)

Worker Connection Error

Symptom: httpx.ConnectError: [Errno -2] Name or service not known

Fix: Verify PREFECT_API_URL uses the ALB DNS name for the Prefect API surface (see worker configuration in blog_infra).

Database Timeout

Symptom: TimeoutError during server startup

Fix: Check security group allows Prefect server → RDS port 5432

Health Check Timeout

Symptom: Target "unhealthy" or "timeout"

Fix: Verify security group allows ALB → Prefect server port 4200

Security Groups

Prefect Server Security Group

  • Ingress: Port 4200 from ALB and ECS tasks
  • Egress: Port 5432 to RDS, 443 to internet, 53 for DNS

Database Security Group

  • Ingress: Port 5432 from Prefect server and ECS tasks

ALB

  • Ingress: Port 80 from internet
  • Egress: Port 4200 to Prefect server

Security Setup

IAM Assumable Role Architecture

The pipeline uses an assumable IAM role instead of long-lived access keys for enhanced security:

Benefits:

  • Yes Automatic credential rotation (1 hour default)
  • Yes No secrets in Terraform state
  • Yes Comprehensive audit trail via CloudTrail
  • Yes Better security posture (AWS best practice)
  • Yes No manual key rotation needed

Architecture:

ECS Tasks / Local Dev / CI/CD

STS AssumeRole (temporary credentials)

blog-data-pipeline-role (S3 permissions)

S3 Buckets

ECS Tasks (Automatic)

ECS tasks automatically assume the pipeline role - no configuration needed:

  1. ECS task execution starts
  2. Task role assumes pipeline role via STS
  3. Temporary credentials are obtained
  4. Task accesses S3 with temporary credentials
  5. Credentials automatically expire after 1 hour

Local Development Setup

Step 1: Create AWS CLI Profile

Add to ~/.aws/config:

[profile blog-data-pipeline]
role_arn = arn:aws:iam::421115711209:role/blog-data-pipeline-role
source_profile = blog-data-terraform
duration_seconds = 3600

Step 2: Update .env.local

AWS_PROFILE=blog-data-pipeline
AWS_BUCKET_NAME=ron-website-docs
AWS_REGION=eu-west-2
AWS_BLOG_DATA_RAW_BUCKET=blog-data-raw
AWS_BLOG_DATA_CLEAN_BUCKET=blog-data-clean

Step 3: Test

# Verify profile works
aws sts get-caller-identity --profile blog-data-pipeline

# Test S3 access
aws s3 ls s3://blog-data-raw --profile blog-data-pipeline

CI/CD Setup

For CircleCI or other CI/CD, use STS AssumeRole:

# Get temporary credentials (matching blog_infra CI pattern)
CREDENTIALS=$(aws sts assume-role \
  --role-arn arn:aws:iam::174051987565:role/OrganizationAccountAccessRole \
  --role-session-name circleci-blog-data \
  --duration-seconds 3600 \
  --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
  --output text)

# Export credentials
export AWS_ACCESS_KEY_ID=$(echo "$CREDENTIALS" | awk '{print $1}')
export AWS_SECRET_ACCESS_KEY=$(echo "$CREDENTIALS" | awk '{print $2}')
export AWS_SESSION_TOKEN=$(echo "$CREDENTIALS" | awk '{print $3}')

IAM Resources

Pipeline Role:

  • Name: blog-data-pipeline-role
  • Permissions: Full S3 access to all pipeline buckets
  • Trust Policy: Allows ECS task role and Terraform admin user to assume

S3 Buckets with Access:

  • blog-data-cache
  • kit-instructions
  • design-files
  • blog-data-raw
  • blog-data-clean

Monitoring Role Usage

View role assumptions in CloudTrail:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=ResourceName,AttributeValue=blog-data-pipeline-role \
  --max-results 50

IAM Role Troubleshooting

"User is not authorized to perform: sts:AssumeRole"

Check that the principal (ECS task role or Terraform admin user) is in the trust policy:

aws iam get-role --role-name blog-data-pipeline-role \
  --query 'Role.AssumeRolePolicyDocument'

"Access Denied" when accessing S3

Verify role has S3 permissions:

aws iam list-role-policies --role-name blog-data-pipeline-role
aws iam get-role-policy --role-name blog-data-pipeline-role \
  --policy-name blog-data-pipeline-policy

Credentials expire too quickly

Increase duration_seconds in AWS CLI profile or STS call (max 12 hours):

[profile blog-data-pipeline]
role_arn = arn:aws:iam::421115711209:role/blog-data-pipeline-role
source_profile = blog-data-terraform
duration_seconds = 43200  # 12 hours

Next Steps

  1. HTTPS: Add ACM certificate and HTTPS listener
  2. Domain: Configure pipelines.rocketclub.online
  3. Backstage: Deploy on same ALB
  4. Flows: Create work pool and deploy pipelines