β‘ InvestCloud Security Lakehouse
Timeline & ROI
24-week phased rollout from foundation to full AI capability. Estimated savings $78Kβ$478K/yr vs current Splunk licensing.
Project Timeline & Phased
Rollout
Author: RedEye Security | Date: 2026-04-06 | Status: Draft
v1.0
Summary
| Phase |
Duration |
Outcome |
| 0 - Foundation |
Weeks 1β4 |
AWS infra live, parallel ingest flowing |
| 1 - Sourcetype Migration |
Weeks 5β16 |
All 100+ sourcetypes in lakehouse |
| 2 - Team Onboarding |
Weeks 8β20 |
Teams on Grafana/API, AD groups wired |
| 3 - AI Layer |
Weeks 12β24 |
NL queries, auto-dashboards, anomaly detection |
| 4 - Decommission |
Weeks 24β52 |
Splunk reduction, cost savings realized |
Total: 6 months to full capability, 12 months to
full Splunk decommission (optional).
Phase 0 - Foundation (Weeks
1β4)
Goal: Infrastructure live, first data flowing, zero
Splunk disruption.
| Week |
Tasks |
Owner |
Dependencies |
| 1 |
AWS account setup (landing zone, SCPs, IAM Identity Center) |
Platform |
AWS account access |
| 1 |
GitLab repo created, branch strategy, Jenkins connected |
DevOps |
GitLab instance |
| 1 |
Terraform modules: S3, KMS, IAM roles scaffolded |
DevOps |
|
| 2 |
ACM Private CA deployed, CSR sent to Investcloud PKI |
Security |
PKI team engagement |
| 2 |
S3 buckets created (raw, ocsf, iceberg, athena-results) |
DevOps |
TF complete |
| 2 |
Kinesis Data Stream (10 shards) deployed |
DevOps |
|
| 2 |
VPC + networking (subnets, SGs, Transit Gateway peering) |
Platform |
|
| 3 |
EKS cluster deployed (ingest node group only) |
DevOps |
VPC complete |
| 3 |
Vector deployed on EKS, consuming 1 test sourcetype
(aws:cloudwatch) |
Eng |
EKS up |
| 3 |
Glue catalog + first Iceberg table (network_activity) |
Eng |
S3 + Glue |
| 3 |
Athena workgroup configured, test query passes |
Eng |
Glue |
| 4 |
Grafana: Athena datasource added, first raw volume dashboard |
Eng |
Athena |
| 4 |
Jenkins pipelines: TF + Ansible + Helm all passing |
DevOps |
|
| 4 |
AD β IAM Identity Center federation tested (2β3 test users) |
Identity |
AD team |
| 4 |
Phase 0 sign-off: data flowing, dashboards live, no Splunk
change |
All |
|
Milestone: aws:cloudwatch data visible
in Grafana. Splunk untouched.
Phase 1 - Sourcetype
Migration (Weeks 5β16)
Goal: All sourcetypes producing data in the
lakehouse. Teams notified, not forced.
Sprint 1 (Weeks 5β6): Cloud
& Native JSON
| Sourcetype |
OCSF class |
Effort |
aws:cloudwatch |
Network Activity |
Low |
aws:s3, aws:metadata,
aws:billing:cur |
Multiple |
Low |
cloudflare:json |
Network Activity |
Low |
o365:graph:api |
Authentication |
Low |
Sprint 2 (Weeks 7β8):
Kubernetes
| Sourcetype |
OCSF class |
Effort |
kube:container:* (all 10+ variants) |
Application Activity |
Medium |
kube:container:eks |
Network Activity |
Medium |
Sprint 3 (Weeks 9β10): Linux
& Network
| Sourcetype |
OCSF class |
Effort |
linux_audit, linux_secure,
nix_logs |
Authentication, File Activity |
Medium |
syslog (generic) |
Network Activity |
Low |
cisco:asa |
Network Activity |
Medium |
F5, nginx, squid |
HTTP Activity |
Medium |
Sprint 4 (Weeks 11β12):
Windows
| Sourcetype |
OCSF class |
Effort |
XmlWinEventLog:Security |
Authentication |
Medium |
XmlWinEventLog, XmlWinEventLog:System |
Process Activity |
Medium |
WinHostMon, WinEventLog* |
System Activity |
Medium |
Sprint 5 (Weeks 13β14):
Application
| Sourcetype |
OCSF class |
Effort |
kafka, iis, ms:iis:auto,
tomcat_* |
Application Activity |
Medium |
apigee, api |
HTTP Activity |
Medium |
oracle_*, ora_unf_aud_dbx |
Database Activity |
High |
SQL Script:* |
Database Activity |
High |
Sprint 6 (Weeks 15β16):
Custom / Financial
| Sourcetype |
OCSF class |
Notes |
FIXTDR_PRD/PRE/UAT |
Application Activity |
FIX protocol - custom transform |
MWM_PRD/PRE/UAT |
Application Activity |
Custom schema - needs SME |
TFLOW_PRD/UAT |
Application Activity |
Custom - needs SME |
RECON_PRD/PRE/UAT |
Application Activity |
Custom - needs SME |
NDM |
Network Activity |
Custom |
Milestone: All sourcetypes in lakehouse. Grafana
dashboard per sourcetype family.
Phase 2 -
Team Onboarding (Weeks 8β20, overlaps Phase 1)
Goal: Each team has Grafana dashboards equivalent to
their Splunk content.
| Week |
Activity |
| 8 |
AD group mapping documented (sg-splunk-* β Grafana orgs) |
| 9 |
Grafana SAML/AD auth live, 5 pilot users onboarded |
| 10 |
Splunk DB Connect installed - Splunk users can query lakehouse
without UI change |
| 11β12 |
Team-by-team: βYour data is in Grafana - hereβs your dashboard
linkβ |
| 13β14 |
API access: service accounts issued to developer teams |
| 15β16 |
Self-service: teams request new dashboards via Jira (AI
generates) |
| 17β18 |
Power users: direct Athena SQL access enabled |
| 19β20 |
30-day usage report: which teams fully migrated vs still
Splunk-only |
Success metric: 50% of Splunk active users have
logged into Grafana.
Phase 3 - AI Layer (Weeks
12β24)
| Week |
Milestone |
| 12β13 |
Bedrock integration + Text2SQL working in dev |
| 14 |
NL Query API deployed to EKS, tested with 3 sample questions |
| 15 |
Grafana AI panel live (text input β query β results) |
| 16β17 |
Dashboard Generator: Jira webhook β draft PR in < 5 min |
| 18 |
Zendesk webhook added |
| 19β20 |
Anomaly Detector: first scheduled run, first auto-Jira ticket |
| 21β22 |
Tuning: LLM prompts refined from real usage |
| 23β24 |
Anomaly detection in production, PagerDuty integration for
critical |
Phase 4 - Decommission (Weeks
24β52)
Goal: Reduce or eliminate Splunk licensing cost.
| Milestone |
Target Date |
| Splunk usage report: which sourcetypes/searches still active |
Week 24 |
| Remove parallel ingest for fully-migrated sourcetypes (reduce Splunk
ingest GB) |
Week 28 |
| License negotiation: reduce Splunk GB/day commitment |
Week 32 |
| Final holdout teams migrated or formally granted Splunk
exemption |
Week 40 |
| Decision: full Splunk decommission or retain for specific use
cases |
Week 48 |
| If decommission: Splunk turned off |
Week 52 |
Risk & Mitigation
| Risk |
Mitigation |
| ACM PCA signing delayed by Investcloud PKI team |
Start in Week 1; use self-signed intermediate if delayed (swap
later) |
| Custom sourcetypes (FIXTDR/MWM/TFLOW) require SME access |
Schedule SME time in Weeks 13β16; hold sprint if blocked |
| Team resistance to leaving Splunk |
Splunk stays live; DB Connect means zero forced change |
| AWS cost over-run |
Budget alert at 80%; right-size after 30 days of production
data |
| AD federation delays |
Use IAM users as fallback for pilot; federation unblocks Phase 2
fully |
Resource Requirements
| Role |
Phase 0 |
Phase 1β2 |
Phase 3β4 |
Total |
| DevOps / Platform (TF, EKS, pipelines) |
2 FTE |
1 FTE |
0.5 FTE |
|
| Security Engineering (OCSF transforms, Vector) |
1 FTE |
2 FTE |
1 FTE |
|
| AI/ML Engineer (LLM, query API, anomaly) |
0 |
0.5 FTE |
2 FTE |
|
| Identity/AD integration |
0.5 FTE |
0.5 FTE |
0 |
|
| SME per business app (FIXTDR, MWM, etc.) |
0 |
0.5 FTE |
0 |
|
ROI Summary
| Item |
Current (Splunk) |
Future (Lakehouse) |
| License cost |
$100Kβ$500K/yr (est. at 1TB/day) |
~$22K/yr (AWS services) |
| Query flexibility |
Splunk SPL only |
SQL + NL + Grafana + API |
| Access control |
Splunk roles |
AD groups, per-team scoping |
| Retention |
Expensive (Splunk storage) |
Cheap (S3 Glacier) |
| New dashboard time |
Daysβweeks |
< 5 min (AI generator) |
| Multi-team self-service |
Limited |
Full (Grafana orgs, API tokens) |
Estimated annual savings: $78Kβ$478K depending on
current Splunk contract.