⚡ InvestCloud Security Lakehouse

AWS Architecture

Account structure, VPC design, S3/Iceberg/Athena/EKS data flow, IAM federation, PKI chain to Investcloud CA, and cost estimate.

AWS Architecture: Security Data Lakehouse

Author: RedEye Security | Date: 2026-04-06 | Status: Draft v1.0

Account Structure

Investcloud AWS Organization
├── Management Account (billing, SCPs)
├── Security Account  ← PRIMARY: lakehouse lives here
│   ├── VPC: security-vpc (10.10.0.0/16)
│   ├── S3: log-archive-bucket (raw + OCSF + Iceberg)
│   ├── Glue: Iceberg catalog
│   ├── Athena: query engine
│   ├── EKS: lakehouse services (Vector, API, AI)
│   └── Secrets Manager + KMS
├── Log Archive Account ← raw log landing zone (optional separation)
└── Shared Services Account ← Jenkins, GitLab, Grafana, CA

Simple start: Security Account + Shared Services Account - add Log Archive if compliance requires it.

Core AWS Services

Layer	Service	Purpose
Ingest	Kinesis Data Streams	High-throughput log ingestion buffer
Ingest	MSK (Kafka)	Optional: existing Kafka sources
Transform	EKS (Vector pods)	OCSF normalization, fan-out
Storage	S3	Raw + OCSF + Iceberg data lake
Catalog	AWS Glue	Iceberg table catalog, schema registry
Query	Amazon Athena	Serverless SQL on Iceberg
Compute	EKS	Services: API, AI layer, Vector
Identity	IAM Identity Center	AD federation, role mapping
PKI	ACM Private CA	Intermediate CA (chains to Investcloud CA)
Secrets	Secrets Manager	Creds, API keys, CA passwords
Encryption	KMS	S3, EKS secrets, Kinesis
CI/CD	CodePipeline (or Jenkins)	Deploy pipelines
Monitoring	CloudWatch + Grafana	Ops dashboards for the lakehouse itself
DNS	Route 53 (private)	Internal service discovery

Network Architecture

On-Prem / Investcloud DC
        │
        │  Direct Connect or Site-to-Site VPN
        ▼
┌─────────────────────────────────────────────────────┐
│  Security VPC  (10.10.0.0/16)                       │
│                                                     │
│  ┌──────────────────┐   ┌──────────────────────┐   │
│  │  Public Subnets  │   │  Private Subnets     │   │
│  │  (ALB, NLB)      │   │  (EKS nodes, RDS)    │   │
│  └──────────────────┘   └──────────────────────┘   │
│                                                     │
│  ┌──────────────────┐   ┌──────────────────────┐   │
│  │  Ingest Subnet   │   │  Isolated Subnet     │   │
│  │  Kinesis, MSK    │   │  Athena VPC endpoint │   │
│  └──────────────────┘   └──────────────────────┘   │
└─────────────────────────────────────────────────────┘
        │
        │  VPC Peering / Transit Gateway
        ▼
Shared Services VPC
  - Jenkins / GitLab
  - Grafana
  - ACM Private CA
  - AD connector

Data Flow

Log Sources (on-prem + AWS)
        │
        ├── Syslog/HEC ──► Kinesis Data Stream (raw-logs)
        ├── Kafka ──────► MSK ──► Kinesis
        └── AWS services ► CloudWatch ──► Kinesis (via subscription filter)
                │
                ▼
        EKS: Vector pods (auto-scaled)
          - Consume from Kinesis
          - Transform to OCSF (VRL)
          - Fan-out:
                ├──► S3: s3://log-archive/raw/dt=YYYY-MM-DD/
                ├──► S3: s3://log-archive/ocsf/dt=YYYY-MM-DD/
                └──► (optional) Splunk HEC (parallel during migration)
                │
                ▼
        AWS Glue (Iceberg catalog)
          - Iceberg tables: network_activity, authentication,
            security_finding, dns_activity, http_activity
          - Partitioned by: dt (date) + sourcetype + region
                │
                ▼
        Amazon Athena
          - Serverless SQL
          - Federated query (can query Splunk via connector during migration)
          - Results cached in S3
                │
           ┌────┴──────────────────┐
           │                       │
           ▼                       ▼
      Grafana                 AI Query API (EKS)
      (AD/SAML auth)          (FastAPI + Bedrock/Ollama)
           │                       │
           ▼                       ▼
      Team dashboards         NL → SQL → results
                              Jira/Zendesk → dashboards

S3 Bucket Structure

s3://ic-security-log-archive-{account-id}/
├── raw/
│   └── sourcetype=cisco:asa/
│       └── dt=2026-04-06/
│           └── *.log.gz
├── ocsf/
│   └── class_uid=4001/          ← Network Activity
│       └── dt=2026-04-06/
│           └── *.json.gz
├── iceberg/
│   ├── network_activity/        ← Iceberg table data + metadata
│   ├── authentication/
│   ├── security_finding/
│   ├── dns_activity/
│   └── http_activity/
└── athena-results/              ← Query result cache (7-day TTL)

Lifecycle policies: - raw/ - S3 Standard 30 days → S3-IA 60 days → Glacier 1 year → delete at 7 years - ocsf/ - S3 Standard 90 days → S3-IA 1 year → Glacier 3 years - iceberg/ - S3 Standard 90 days (hot query window) → S3-IA 1 year - athena-results/ - delete after 7 days

EKS Cluster Design

EKS Cluster: ic-security-lakehouse
├── Node Group: ingest (c5.2xlarge x 3–10, auto-scaled)
│   └── Vector pods (Kinesis consumer → OCSF → S3)
├── Node Group: api (t3.xlarge x 2–4)
│   ├── Query API (FastAPI, NL→SQL)
│   ├── Dashboard Generator (Jira/Zendesk → Grafana)
│   └── OCSF Ingest job (batch: S3 JSON → Iceberg commit)
└── Node Group: ai (g4dn.xlarge x 1–2, GPU optional)
    └── LLM inference (Ollama or Bedrock)

Namespaces: - ingest - Vector, Kinesis consumers - api - query API, dashboard generator - ai - LLM inference - monitoring - Prometheus, Grafana agent

IAM / Identity Design

AD Groups ──► IAM Identity Center ──► Permission Sets ──► AWS Roles

AD Group	Permission Set	Access
`sg-splunk-admin`	LakehouseAdmin	Full S3 + Athena + Glue + EKS
`sg-splunk-analyst`	LakehouseAnalyst	Athena read, S3 ocsf/ read
`sg-splunk-<team>`	LakehouseTeam-{team}	Athena read, scoped S3 prefix
`sg-grafana-admin`	GrafanaAdmin	Grafana API, all dashboards
`sg-devops`	LakehouseDevOps	EKS, CodePipeline, ECR

Service accounts (EKS workload identity via IRSA): - vector-sa → S3 write (raw/, ocsf/), Kinesis read - iceberg-ingest-sa → S3 write (iceberg/), Glue write - query-api-sa → Athena execute, S3 read (athena-results/), Glue read - dashboard-gen-sa → Athena read, Grafana API write

PKI / Certificate Architecture

Investcloud Root CA
    └── Investcloud Intermediate CA
            └── Security Lakehouse Intermediate CA  (ACM Private CA)
                    ├── *.ic-security.internal  (wildcard internal)
                    ├── vector.ic-security.internal
                    ├── api.ic-security.internal
                    ├── grafana.ic-security.internal
                    └── EKS pod certs (cert-manager → ACM PCA issuer)

Implementation: - ACM Private CA - subordinate CA, CSR sent to Investcloud PKI team for signing - cert-manager on EKS with aws-privateca-issuer - auto-issues/renews pod certs - mTLS enforced: Vector ↔︎ Kinesis, API ↔︎ Athena, all internal service-to-service - External endpoints (Grafana, API): ACM public cert via ALB

Cost Estimate (rough, based on 1TB/day ingest)

Service	Est. Monthly Cost
S3 (10TB stored, lifecycle tiered)	~$230
Kinesis Data Streams (10 shards)	~$150
EKS (cluster + nodes)	~$800
Athena (10TB scanned/month)	~$50
Glue catalog	~$10
ACM Private CA	~$400
Data transfer	~$200
Total est.	~$1,840/mo

vs. Splunk enterprise: typically $50,000–$200,000+/year at this volume.

Payback period: < 3 months after full migration.

Key Architecture Decisions

Athena over Trino - serverless, no infra, pays per query, integrates natively with Glue/Iceberg. Trino if sub-second latency needed.
Kinesis over direct-to-S3 - provides backpressure handling for ingest spikes; Vector reads from Kinesis, not directly from log sources
EKS over Lambda - OCSF transforms are stateful (enrichment lookups); Lambda cold starts hurt at this volume
Iceberg over Delta/Hudi - best Athena support, OCSF project uses Iceberg natively, time travel critical for forensics
Bedrock option - if Ollama on EKS GPU is too expensive, AWS Bedrock (Claude/Llama) as fallback for AI layer