AWS Architecture: Security Data Lakehouse
Author: RedEye Security | Date: 2026-04-06 | Status: Draft v1.0
Account Structure
Investcloud AWS Organization
├── Management Account (billing, SCPs)
├── Security Account ← PRIMARY: lakehouse lives here
│ ├── VPC: security-vpc (10.10.0.0/16)
│ ├── S3: log-archive-bucket (raw + OCSF + Iceberg)
│ ├── Glue: Iceberg catalog
│ ├── Athena: query engine
│ ├── EKS: lakehouse services (Vector, API, AI)
│ └── Secrets Manager + KMS
├── Log Archive Account ← raw log landing zone (optional separation)
└── Shared Services Account ← Jenkins, GitLab, Grafana, CA
Simple start: Security Account + Shared Services Account - add Log Archive if compliance requires it.
Core AWS Services
| Layer | Service | Purpose |
|---|---|---|
| Ingest | Kinesis Data Streams | High-throughput log ingestion buffer |
| Ingest | MSK (Kafka) | Optional: existing Kafka sources |
| Transform | EKS (Vector pods) | OCSF normalization, fan-out |
| Storage | S3 | Raw + OCSF + Iceberg data lake |
| Catalog | AWS Glue | Iceberg table catalog, schema registry |
| Query | Amazon Athena | Serverless SQL on Iceberg |
| Compute | EKS | Services: API, AI layer, Vector |
| Identity | IAM Identity Center | AD federation, role mapping |
| PKI | ACM Private CA | Intermediate CA (chains to Investcloud CA) |
| Secrets | Secrets Manager | Creds, API keys, CA passwords |
| Encryption | KMS | S3, EKS secrets, Kinesis |
| CI/CD | CodePipeline (or Jenkins) | Deploy pipelines |
| Monitoring | CloudWatch + Grafana | Ops dashboards for the lakehouse itself |
| DNS | Route 53 (private) | Internal service discovery |
Network Architecture
On-Prem / Investcloud DC
│
│ Direct Connect or Site-to-Site VPN
▼
┌─────────────────────────────────────────────────────┐
│ Security VPC (10.10.0.0/16) │
│ │
│ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ Public Subnets │ │ Private Subnets │ │
│ │ (ALB, NLB) │ │ (EKS nodes, RDS) │ │
│ └──────────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ Ingest Subnet │ │ Isolated Subnet │ │
│ │ Kinesis, MSK │ │ Athena VPC endpoint │ │
│ └──────────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────┘
│
│ VPC Peering / Transit Gateway
▼
Shared Services VPC
- Jenkins / GitLab
- Grafana
- ACM Private CA
- AD connector
Data Flow
Log Sources (on-prem + AWS)
│
├── Syslog/HEC ──► Kinesis Data Stream (raw-logs)
├── Kafka ──────► MSK ──► Kinesis
└── AWS services ► CloudWatch ──► Kinesis (via subscription filter)
│
▼
EKS: Vector pods (auto-scaled)
- Consume from Kinesis
- Transform to OCSF (VRL)
- Fan-out:
├──► S3: s3://log-archive/raw/dt=YYYY-MM-DD/
├──► S3: s3://log-archive/ocsf/dt=YYYY-MM-DD/
└──► (optional) Splunk HEC (parallel during migration)
│
▼
AWS Glue (Iceberg catalog)
- Iceberg tables: network_activity, authentication,
security_finding, dns_activity, http_activity
- Partitioned by: dt (date) + sourcetype + region
│
▼
Amazon Athena
- Serverless SQL
- Federated query (can query Splunk via connector during migration)
- Results cached in S3
│
┌────┴──────────────────┐
│ │
▼ ▼
Grafana AI Query API (EKS)
(AD/SAML auth) (FastAPI + Bedrock/Ollama)
│ │
▼ ▼
Team dashboards NL → SQL → results
Jira/Zendesk → dashboards
S3 Bucket Structure
s3://ic-security-log-archive-{account-id}/
├── raw/
│ └── sourcetype=cisco:asa/
│ └── dt=2026-04-06/
│ └── *.log.gz
├── ocsf/
│ └── class_uid=4001/ ← Network Activity
│ └── dt=2026-04-06/
│ └── *.json.gz
├── iceberg/
│ ├── network_activity/ ← Iceberg table data + metadata
│ ├── authentication/
│ ├── security_finding/
│ ├── dns_activity/
│ └── http_activity/
└── athena-results/ ← Query result cache (7-day TTL)
Lifecycle policies: - raw/ - S3
Standard 30 days → S3-IA 60 days → Glacier 1 year → delete at 7 years -
ocsf/ - S3 Standard 90 days → S3-IA 1 year → Glacier 3
years - iceberg/ - S3 Standard 90 days (hot query window) →
S3-IA 1 year - athena-results/ - delete after 7 days
EKS Cluster Design
EKS Cluster: ic-security-lakehouse
├── Node Group: ingest (c5.2xlarge x 3–10, auto-scaled)
│ └── Vector pods (Kinesis consumer → OCSF → S3)
├── Node Group: api (t3.xlarge x 2–4)
│ ├── Query API (FastAPI, NL→SQL)
│ ├── Dashboard Generator (Jira/Zendesk → Grafana)
│ └── OCSF Ingest job (batch: S3 JSON → Iceberg commit)
└── Node Group: ai (g4dn.xlarge x 1–2, GPU optional)
└── LLM inference (Ollama or Bedrock)
Namespaces: - ingest - Vector, Kinesis
consumers - api - query API, dashboard generator -
ai - LLM inference - monitoring - Prometheus,
Grafana agent
IAM / Identity Design
AD Groups ──► IAM Identity Center ──► Permission Sets ──► AWS Roles
| AD Group | Permission Set | Access |
|---|---|---|
sg-splunk-admin |
LakehouseAdmin | Full S3 + Athena + Glue + EKS |
sg-splunk-analyst |
LakehouseAnalyst | Athena read, S3 ocsf/ read |
sg-splunk-<team> |
LakehouseTeam-{team} | Athena read, scoped S3 prefix |
sg-grafana-admin |
GrafanaAdmin | Grafana API, all dashboards |
sg-devops |
LakehouseDevOps | EKS, CodePipeline, ECR |
Service accounts (EKS workload identity via IRSA): -
vector-sa → S3 write (raw/, ocsf/), Kinesis read -
iceberg-ingest-sa → S3 write (iceberg/), Glue write -
query-api-sa → Athena execute, S3 read (athena-results/),
Glue read - dashboard-gen-sa → Athena read, Grafana API
write
PKI / Certificate Architecture
Investcloud Root CA
└── Investcloud Intermediate CA
└── Security Lakehouse Intermediate CA (ACM Private CA)
├── *.ic-security.internal (wildcard internal)
├── vector.ic-security.internal
├── api.ic-security.internal
├── grafana.ic-security.internal
└── EKS pod certs (cert-manager → ACM PCA issuer)
Implementation: - ACM Private CA -
subordinate CA, CSR sent to Investcloud PKI team for signing -
cert-manager on EKS with aws-privateca-issuer
- auto-issues/renews pod certs - mTLS enforced: Vector ↔︎ Kinesis, API ↔︎
Athena, all internal service-to-service - External endpoints (Grafana,
API): ACM public cert via ALB
Cost Estimate (rough, based on 1TB/day ingest)
| Service | Est. Monthly Cost |
|---|---|
| S3 (10TB stored, lifecycle tiered) | ~$230 |
| Kinesis Data Streams (10 shards) | ~$150 |
| EKS (cluster + nodes) | ~$800 |
| Athena (10TB scanned/month) | ~$50 |
| Glue catalog | ~$10 |
| ACM Private CA | ~$400 |
| Data transfer | ~$200 |
| Total est. | ~$1,840/mo |
vs. Splunk enterprise: typically $50,000–$200,000+/year at this volume.
Payback period: < 3 months after full migration.
Key Architecture Decisions
- Athena over Trino - serverless, no infra, pays per query, integrates natively with Glue/Iceberg. Trino if sub-second latency needed.
- Kinesis over direct-to-S3 - provides backpressure handling for ingest spikes; Vector reads from Kinesis, not directly from log sources
- EKS over Lambda - OCSF transforms are stateful (enrichment lookups); Lambda cold starts hurt at this volume
- Iceberg over Delta/Hudi - best Athena support, OCSF project uses Iceberg natively, time travel critical for forensics
- Bedrock option - if Ollama on EKS GPU is too expensive, AWS Bedrock (Claude/Llama) as fallback for AI layer