In a multi-account AWS environment handling energy trading workloads, a single misconfigured S3 bucket or an overly permissive IAM role is not just a security finding — it is a compliance violation, a potential regulatory breach, and an audit risk. At RWE Supply & Trading, I faced this challenge at scale: dozens of accounts, hundreds of Terraform modules, and a continuous pressure to ship infrastructure quickly without compromising security posture.
This post documents the CSPM architecture I designed and implemented: a centralized, automated control plane that continuously monitors posture, enforces policy, and auto-remediates critical findings — all driven by Infrastructure as Code.
The Problem with Point-in-Time Security Reviews
Traditional cloud security reviews are periodic. A team runs a checklist against a snapshot of the environment, flags findings, and assigns tickets. By the time those tickets are resolved, the environment has drifted further. In fast-moving cloud environments, this model breaks down within weeks.
The operational shift required is continuous posture management: every configuration change is evaluated against policy the moment it is applied, and deviations are either blocked before they land or remediated automatically within minutes.
Architecture Overview

The architecture has three layers:
1. Preventive layer: Checkov and OPA run in the CI/CD pipeline and block non-compliant Terraform before it is applied. AWS Service Control Policies (SCPs) at the Organizations level enforce hard boundaries that no account-level policy can override.
2. Detective layer: AWS GuardDuty, Config Rules, Security Hub, and Orca Security continuously monitor all accounts. Security Hub aggregates findings centrally in the Security/Audit account.
3. Responsive layer: EventBridge rules trigger Lambda functions that auto-remediate critical findings (e.g., public S3 buckets, disabled CloudTrail, overly permissive security groups) within minutes of detection.
Setting Up the Security Account as the Control Plane
All findings flow into a dedicated Security/Audit account. This account is not a workload account — it exists solely to aggregate, analyse, and act on security findings.
# Enable Security Hub aggregator in the security account
resource "aws_securityhub_finding_aggregator" "central" {
linking_mode = "ALL_REGIONS"
}
# Enable GuardDuty as organization delegated admin
resource "aws_guardduty_organization_admin_account" "delegated" {
admin_account_id = var.security_account_id
}
resource "aws_guardduty_organization_configuration" "auto_enable" {
auto_enable_organization_members = "ALL"
detector_id = aws_guardduty_detector.main.id
}Each member account is enrolled automatically via AWS Organizations, so new accounts inherit the full security stack on creation — no manual onboarding required.
Preventive Controls: Checkov + OPA in the CI/CD Pipeline
The pipeline never reaches `terraform apply` unless the IaC passes security linting. Checkov runs first, validating Terraform plans against 500+ built-in rules covering CIS, NIST, and PCI-DSS:
# .github/workflows/security-gate.yml
- name: Run Checkov IaC Scan
uses: bridgecrewio/checkov-action@master
with:
directory: .
framework: terraform
output_format: sarif
output_file_path: results.sarif
soft_fail: false
check: |
CKV_AWS_18 # S3 access logging
CKV_AWS_19 # S3 encryption at rest
CKV_AWS_20 # S3 public access
CKV_AWS_86 # CloudTrail log validation
CKV_AWS_119 # DynamoDB encryptionAfter Checkov, an OPA policy gate evaluates the Terraform plan JSON against custom Rego policies specific to our environment:
# policies/no_public_s3.rego
package terraform.s3
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket_public_access_block"
resource.change.after.block_public_acls == false
msg := sprintf("S3 bucket '%v' must block public ACLs", [resource.address])
}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
not resource.change.after.server_side_encryption_configuration
msg := sprintf("S3 bucket '%v' must have server-side encryption enabled", [resource.address])
}Any `deny` result blocks the pipeline and posts the violation reason directly to the PR as a review comment.
Deploying AWS Config Rules at Scale with Terraform
AWS Config Rules run continuously in every account, evaluating resources against compliance rules whenever a configuration change is detected. I deploy them as an organization-wide Terraform module:
# modules/config-rules/main.tf
resource "aws_config_config_rule" "cis_s3_public_access" {
name = "s3-bucket-public-read-prohibited"
description = "CIS 2.1.2 - S3 buckets must not allow public read"
source {
owner = "AWS"
source_identifier = "S3_BUCKET_PUBLIC_READ_PROHIBITED"
}
depends_on = [aws_config_configuration_recorder.main]
}
resource "aws_config_config_rule" "mfa_enabled_for_iam_console" {
name = "mfa-enabled-for-iam-console-access"
description = "CIS 1.2 - MFA required for console access"
source {
owner = "AWS"
source_identifier = "MFA_ENABLED_FOR_IAM_CONSOLE_ACCESS"
}
}
resource "aws_config_config_rule" "cloudtrail_enabled" {
name = "cloudtrail-enabled"
description = "CIS 2.1 - CloudTrail must be enabled in all regions"
source {
owner = "AWS"
source_identifier = "CLOUD_TRAIL_ENABLED"
}
}
resource "aws_config_config_rule" "encrypted_volumes" {
name = "encrypted-volumes"
description = "CIS 2.2.1 - EBS volumes must be encrypted"
source {
owner = "AWS"
source_identifier = "ENCRYPTED_VOLUMES"
}
}Findings from Config flow into Security Hub, which normalises them into the ASFF (Amazon Security Finding Format) alongside GuardDuty and Inspector findings.
Auto-Remediation with EventBridge and Lambda
Critical findings trigger immediate automated responses. The EventBridge rule pattern targets findings by severity and type:
{
"source": ["aws.securityhub"],
"detail-type": ["Security Hub Findings - Imported"],
"detail": {
"findings": {
"Severity": {
"Label": ["CRITICAL", "HIGH"]
},
"Compliance": {
"Status": ["FAILED"]
},
"RecordState": ["ACTIVE"]
}
}
}The Lambda function dispatches remediation based on finding type:
import boto3
import json
def handler(event, context):
finding = event["detail"]["findings"][0]
finding_type = finding["Types"][0]
resource = finding["Resources"][0]
if "S3/BucketPubliclyAccessible" in finding_type:
remediate_s3_public_access(resource["Id"])
elif "IAM/RootAccountUsage" in finding_type:
notify_critical(finding)
elif "CloudTrail/CloudTrailStopped" in finding_type:
enable_cloudtrail(resource["Region"])
def remediate_s3_public_access(bucket_arn):
bucket_name = bucket_arn.split(":::")[-1]
s3 = boto3.client("s3")
s3.put_public_access_block(
Bucket=bucket_name,
PublicAccessBlockConfiguration={
"BlockPublicAcls": True,
"IgnorePublicAcls": True,
"BlockPublicPolicy": True,
"RestrictPublicBuckets": True,
},
)
print(f"[REMEDIATED] Blocked public access on S3 bucket: {bucket_name}")For findings that cannot be auto-remediated safely (e.g., IAM policy changes), the Lambda creates a JIRA ticket with the finding detail, account ID, resource ARN, and a link to the relevant runbook.
Service Control Policies: The Non-Bypassable Layer
SCPs apply at the AWS Organizations level and cannot be overridden by any IAM policy within a member account, including root. This is the last-resort preventive control:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyRootAccountActions",
"Effect": "Deny",
"Principal": "*",
"Action": "*",
"Resource": "*",
"Condition": {
"StringLike": {
"aws:PrincipalArn": "arn:aws:iam::*:root"
}
}
},
{
"Sid": "DenyIAMPrivilegeEscalation",
"Effect": "Deny",
"Principal": "*",
"Action": [
"iam:CreatePolicyVersion",
"iam:SetDefaultPolicyVersion",
"iam:PassRole",
"iam:CreateAccessKey"
],
"Resource": "*",
"Condition": {
"ArnNotLike": {
"aws:PrincipalArn": "arn:aws:iam::*:role/SecurityBreakGlassRole"
}
}
},
{
"Sid": "DenyUnauthorisedRegions",
"Effect": "Deny",
"Principal": "*",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": ["eu-central-1", "eu-west-1"]
}
}
}
]
}The region restriction alone eliminates a large class of shadow-IT risks. If a developer accidentally provisions resources in `us-east-1`, the SCP blocks the API call before it lands.
Integrating Orca Security for Agentless CSPM
Orca Security complements AWS-native tooling with agentless scanning that reads cloud provider APIs and storage snapshots without deploying agents into workloads. In the Orca dashboard, I configure:
– Attack path analysis: Identifies multi-hop paths from the internet to sensitive data (e.g., internet-facing EC2 → unrestricted S3 → PII data)
– Vulnerability prioritisation: CVEs ranked by exploitability and lateral movement risk, not just CVSS score
– Compliance posture: Continuous CIS, NIST CSF 2.0, ISO 27001, GDPR mapping
Orca findings feed back into Security Hub via the Orca Security Hub integration, keeping all findings in one pane of glass.
Results After 6 Months
After deploying this architecture across the full AWS estate:
– CI/CD gate blocks: Checkov catches an average of 12 IaC policy violations per sprint before they reach the AWS environment
– Mean time to remediate critical findings dropped from ~72 hours (manual ticket) to **< 8 minutes** for auto-remediable findings
– False-positive rate: GuardDuty tuning and Security Hub suppression rules reduced noisy, low-value alerts by approximately 60%, so the on-call team focuses on signal
– Compliance posture: CIS AWS Foundations Benchmark v3.0 score improved from 62% → 91% within the first quarter
Key Takeaways
1. Shift left first: The cheapest fix is blocking a misconfiguration in the CI/CD pipeline before it reaches AWS. Checkov + OPA running on every PR costs nothing compared to a breach or audit finding.
2. Don’t build a SIEM, build automation: The goal of a CSPM control plane is not to show findings — it is to close them. Every HIGH/CRITICAL finding should have an automated response path.
3. SCPs are your safety net, not your primary control: SCPs are powerful but blunt. Use them for hard organisational boundaries, not fine-grained policy enforcement.
4. Orca and AWS-native tooling are complementary: AWS native services (GuardDuty, Inspector, Config) have deep integration and low latency. Orca adds context (attack paths, sensitive data identification) that native tools do not provide.
5. Measure posture, not findings: Report compliance score trends (CIS score over time), not raw finding counts. Leadership cares whether posture is improving, not how many findings were generated this week.