Etairos.
⚡ InvestCloud Security Lakehouse

AWS Architecture

Account structure, VPC design, S3/Iceberg/Athena/EKS data flow, IAM federation, PKI chain to Investcloud CA, and cost estimate.

AWS Architecture: Security Data Lakehouse

Author: RedEye Security | Date: 2026-04-06 | Status: Draft v1.0


Account Structure

Investcloud AWS Organization
├── Management Account (billing, SCPs)
├── Security Account  ← PRIMARY: lakehouse lives here
│   ├── VPC: security-vpc (10.10.0.0/16)
│   ├── S3: log-archive-bucket (raw + OCSF + Iceberg)
│   ├── Glue: Iceberg catalog
│   ├── Athena: query engine
│   ├── EKS: lakehouse services (Vector, API, AI)
│   └── Secrets Manager + KMS
├── Log Archive Account ← raw log landing zone (optional separation)
└── Shared Services Account ← Jenkins, GitLab, Grafana, CA

Simple start: Security Account + Shared Services Account - add Log Archive if compliance requires it.


Core AWS Services

Layer Service Purpose
Ingest Kinesis Data Streams High-throughput log ingestion buffer
Ingest MSK (Kafka) Optional: existing Kafka sources
Transform EKS (Vector pods) OCSF normalization, fan-out
Storage S3 Raw + OCSF + Iceberg data lake
Catalog AWS Glue Iceberg table catalog, schema registry
Query Amazon Athena Serverless SQL on Iceberg
Compute EKS Services: API, AI layer, Vector
Identity IAM Identity Center AD federation, role mapping
PKI ACM Private CA Intermediate CA (chains to Investcloud CA)
Secrets Secrets Manager Creds, API keys, CA passwords
Encryption KMS S3, EKS secrets, Kinesis
CI/CD CodePipeline (or Jenkins) Deploy pipelines
Monitoring CloudWatch + Grafana Ops dashboards for the lakehouse itself
DNS Route 53 (private) Internal service discovery

Network Architecture

On-Prem / Investcloud DC
        │
        │  Direct Connect or Site-to-Site VPN
        ▼
┌─────────────────────────────────────────────────────┐
│  Security VPC  (10.10.0.0/16)                       │
│                                                     │
│  ┌──────────────────┐   ┌──────────────────────┐   │
│  │  Public Subnets  │   │  Private Subnets     │   │
│  │  (ALB, NLB)      │   │  (EKS nodes, RDS)    │   │
│  └──────────────────┘   └──────────────────────┘   │
│                                                     │
│  ┌──────────────────┐   ┌──────────────────────┐   │
│  │  Ingest Subnet   │   │  Isolated Subnet     │   │
│  │  Kinesis, MSK    │   │  Athena VPC endpoint │   │
│  └──────────────────┘   └──────────────────────┘   │
└─────────────────────────────────────────────────────┘
        │
        │  VPC Peering / Transit Gateway
        ▼
Shared Services VPC
  - Jenkins / GitLab
  - Grafana
  - ACM Private CA
  - AD connector

Data Flow

Log Sources (on-prem + AWS)
        │
        ├── Syslog/HEC ──► Kinesis Data Stream (raw-logs)
        ├── Kafka ──────► MSK ──► Kinesis
        └── AWS services ► CloudWatch ──► Kinesis (via subscription filter)
                │
                ▼
        EKS: Vector pods (auto-scaled)
          - Consume from Kinesis
          - Transform to OCSF (VRL)
          - Fan-out:
                ├──► S3: s3://log-archive/raw/dt=YYYY-MM-DD/
                ├──► S3: s3://log-archive/ocsf/dt=YYYY-MM-DD/
                └──► (optional) Splunk HEC (parallel during migration)
                │
                ▼
        AWS Glue (Iceberg catalog)
          - Iceberg tables: network_activity, authentication,
            security_finding, dns_activity, http_activity
          - Partitioned by: dt (date) + sourcetype + region
                │
                ▼
        Amazon Athena
          - Serverless SQL
          - Federated query (can query Splunk via connector during migration)
          - Results cached in S3
                │
           ┌────┴──────────────────┐
           │                       │
           ▼                       ▼
      Grafana                 AI Query API (EKS)
      (AD/SAML auth)          (FastAPI + Bedrock/Ollama)
           │                       │
           ▼                       ▼
      Team dashboards         NL → SQL → results
                              Jira/Zendesk → dashboards

S3 Bucket Structure

s3://ic-security-log-archive-{account-id}/
├── raw/
│   └── sourcetype=cisco:asa/
│       └── dt=2026-04-06/
│           └── *.log.gz
├── ocsf/
│   └── class_uid=4001/          ← Network Activity
│       └── dt=2026-04-06/
│           └── *.json.gz
├── iceberg/
│   ├── network_activity/        ← Iceberg table data + metadata
│   ├── authentication/
│   ├── security_finding/
│   ├── dns_activity/
│   └── http_activity/
└── athena-results/              ← Query result cache (7-day TTL)

Lifecycle policies: - raw/ - S3 Standard 30 days → S3-IA 60 days → Glacier 1 year → delete at 7 years - ocsf/ - S3 Standard 90 days → S3-IA 1 year → Glacier 3 years - iceberg/ - S3 Standard 90 days (hot query window) → S3-IA 1 year - athena-results/ - delete after 7 days


EKS Cluster Design

EKS Cluster: ic-security-lakehouse
├── Node Group: ingest (c5.2xlarge x 3–10, auto-scaled)
│   └── Vector pods (Kinesis consumer → OCSF → S3)
├── Node Group: api (t3.xlarge x 2–4)
│   ├── Query API (FastAPI, NL→SQL)
│   ├── Dashboard Generator (Jira/Zendesk → Grafana)
│   └── OCSF Ingest job (batch: S3 JSON → Iceberg commit)
└── Node Group: ai (g4dn.xlarge x 1–2, GPU optional)
    └── LLM inference (Ollama or Bedrock)

Namespaces: - ingest - Vector, Kinesis consumers - api - query API, dashboard generator - ai - LLM inference - monitoring - Prometheus, Grafana agent


IAM / Identity Design

AD Groups ──► IAM Identity Center ──► Permission Sets ──► AWS Roles
AD Group Permission Set Access
sg-splunk-admin LakehouseAdmin Full S3 + Athena + Glue + EKS
sg-splunk-analyst LakehouseAnalyst Athena read, S3 ocsf/ read
sg-splunk-<team> LakehouseTeam-{team} Athena read, scoped S3 prefix
sg-grafana-admin GrafanaAdmin Grafana API, all dashboards
sg-devops LakehouseDevOps EKS, CodePipeline, ECR

Service accounts (EKS workload identity via IRSA): - vector-sa → S3 write (raw/, ocsf/), Kinesis read - iceberg-ingest-sa → S3 write (iceberg/), Glue write - query-api-sa → Athena execute, S3 read (athena-results/), Glue read - dashboard-gen-sa → Athena read, Grafana API write


PKI / Certificate Architecture

Investcloud Root CA
    └── Investcloud Intermediate CA
            └── Security Lakehouse Intermediate CA  (ACM Private CA)
                    ├── *.ic-security.internal  (wildcard internal)
                    ├── vector.ic-security.internal
                    ├── api.ic-security.internal
                    ├── grafana.ic-security.internal
                    └── EKS pod certs (cert-manager → ACM PCA issuer)

Implementation: - ACM Private CA - subordinate CA, CSR sent to Investcloud PKI team for signing - cert-manager on EKS with aws-privateca-issuer - auto-issues/renews pod certs - mTLS enforced: Vector ↔︎ Kinesis, API ↔︎ Athena, all internal service-to-service - External endpoints (Grafana, API): ACM public cert via ALB


Cost Estimate (rough, based on 1TB/day ingest)

Service Est. Monthly Cost
S3 (10TB stored, lifecycle tiered) ~$230
Kinesis Data Streams (10 shards) ~$150
EKS (cluster + nodes) ~$800
Athena (10TB scanned/month) ~$50
Glue catalog ~$10
ACM Private CA ~$400
Data transfer ~$200
Total est. ~$1,840/mo

vs. Splunk enterprise: typically $50,000–$200,000+/year at this volume.

Payback period: < 3 months after full migration.


Key Architecture Decisions

  1. Athena over Trino - serverless, no infra, pays per query, integrates natively with Glue/Iceberg. Trino if sub-second latency needed.
  2. Kinesis over direct-to-S3 - provides backpressure handling for ingest spikes; Vector reads from Kinesis, not directly from log sources
  3. EKS over Lambda - OCSF transforms are stateful (enrichment lookups); Lambda cold starts hurt at this volume
  4. Iceberg over Delta/Hudi - best Athena support, OCSF project uses Iceberg natively, time travel critical for forensics
  5. Bedrock option - if Ollama on EKS GPU is too expensive, AWS Bedrock (Claude/Llama) as fallback for AI layer