AI Layer: Auto-Dashboards, NL Queries, Jira/Zendesk Integration
Author: RedEye Security | Date: 2026-04-06 | Status: Draft v1.0
Overview
Three AI capabilities: 1. NL Query - “Show me auth failures from EKS last 24h” → Athena SQL → results 2. Dashboard Generator - Jira/Zendesk ticket → Grafana dashboard → auto-deploy PR 3. Anomaly Detection - scheduled AI analysis → Jira ticket if anomaly found
All run on EKS. LLM options: AWS Bedrock (managed) or Ollama on GPU node (self-hosted).
1. Natural Language Query API
User (Grafana panel, Slack, CLI)
│ POST /query {"question": "auth failures from EKS last 24h"}
▼
Query API (FastAPI, EKS)
│
├── 1. Retrieve schema context (Glue catalog → table + column names)
├── 2. Build prompt: schema + question → LLM → SQL
├── 3. Validate SQL (parse, check table/column names)
├── 4. Execute via Athena
├── 5. Summarize result → LLM → plain English
└── 6. Return: {sql, rows, summary, athena_execution_id}
FastAPI service
# query-api/app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import boto3, json, time
app = FastAPI()
athena = boto3.client('athena')
bedrock = boto3.client('bedrock-runtime')
SCHEMA_CONTEXT = """
Tables in database: security_lakehouse
network_activity: time(bigint), dt(date), src_endpoint(struct<ip:string,hostname:string>),
dst_endpoint(struct<ip:string,hostname:string,port:int>), severity_id(int),
activity_id(int), sourcetype(string), region(string), raw_data(string)
authentication: time(bigint), dt(date), user(struct<name:string,uid:string>),
src_endpoint(struct<ip:string>), dst_endpoint(struct<hostname:string>),
status_id(int), status(string), auth_protocol(string), sourcetype(string)
security_finding: time(bigint), dt(date), finding(struct<title:string,desc:string,severity:string>),
resources(array<struct<type:string,name:string>>), sourcetype(string)
"""
TEXT2SQL_PROMPT = """You are a SQL expert for AWS Athena (Presto SQL dialect).
Convert the question to valid Athena SQL. Rules:
- Always include: WHERE dt >= current_date - interval '7' day (unless user specifies range)
- Use LIMIT 1000 unless asking for counts/aggregates
- For struct fields use dot notation: src_endpoint.ip
- Return ONLY the SQL, no explanation, no markdown
Schema:
{schema}
Question: {question}
SQL:"""
class QueryRequest(BaseModel):
question: str
max_rows: int = 100
@app.post("/query")
async def query(req: QueryRequest):
# Step 1: Text to SQL
sql = llm_call(TEXT2SQL_PROMPT.format(schema=SCHEMA_CONTEXT, question=req.question))
sql = sql.strip().rstrip(';')
# Step 2: Execute in Athena
execution = athena.start_query_execution(
QueryString=sql,
WorkGroup='security-lakehouse',
ResultConfiguration={'OutputLocation': 's3://ic-security-log-archive/athena-results/'}
)
exec_id = execution['QueryExecutionId']
# Step 3: Poll for completion
for _ in range(30):
status = athena.get_query_execution(QueryExecutionId=exec_id)
state = status['QueryExecution']['Status']['State']
if state == 'SUCCEEDED': break
if state in ('FAILED', 'CANCELLED'):
raise HTTPException(500, f"Athena query {state}: {status['QueryExecution']['Status'].get('StateChangeReason')}")
time.sleep(2)
# Step 4: Fetch results
results = athena.get_query_results(QueryExecutionId=exec_id, MaxResults=req.max_rows)
rows = parse_athena_results(results)
# Step 5: Summarize
summary = llm_call(f"Summarize these security findings in 2-3 sentences:\n{json.dumps(rows[:20])}")
return {"sql": sql, "rows": rows, "summary": summary, "execution_id": exec_id}
def llm_call(prompt: str) -> str:
response = bedrock.invoke_model(
modelId='anthropic.claude-3-haiku-20240307-v1:0', # cheap + fast
body=json.dumps({"messages": [{"role": "user", "content": prompt}], "max_tokens": 1024, "anthropic_version": "bedrock-2023-05-31"})
)
return json.loads(response['body'].read())['content'][0]['text']
2. Dashboard Generator (Jira/Zendesk → Grafana)
Flow
Jira ticket created with label: "dashboard-request"
│
│ Jira webhook → POST /generate-dashboard
▼
Dashboard Generator API (FastAPI, EKS)
│
├── 1. Parse ticket: title, description, acceptance criteria
├── 2. Identify required data: extract sourcetypes/metrics mentioned
├── 3. Query Glue catalog: get relevant table schemas
├── 4. LLM → generate Grafana dashboard JSON
├── 5. Validate JSON (grafana-dash-linter)
├── 6. Git: create branch, commit dashboard JSON
├── 7. GitLab: open MR with ticket link
└── 8. Comment on Jira ticket: "Dashboard PR ready: <link>"
Dashboard Generator
# dashboard-gen/app.py
DASHBOARD_PROMPT = """You are a Grafana dashboard expert. Generate a complete Grafana dashboard JSON.
Requirements from ticket:
Title: {title}
Description: {description}
Acceptance criteria: {acceptance_criteria}
Available data (Athena tables):
{schema_context}
Rules:
- Use Athena datasource (uid: "athena-security-lakehouse")
- All time queries must use $__timeFilter(dt) macro
- Include: title, description, tags, panels array, time range, refresh
- Panel types allowed: timeseries, bargauge, stat, table, piechart
- Each panel needs: title, type, targets (with rawSQL), gridPos
- Use variables: $sourcetype, $region, $environment
- Return ONLY valid JSON, no explanation
Generate the dashboard JSON:"""
@app.post("/generate-dashboard")
async def generate_dashboard(req: DashboardRequest):
# 1. Get schema for relevant tables
schema = get_relevant_schema(req.description)
# 2. Generate dashboard JSON via LLM
dashboard_json = llm_call(DASHBOARD_PROMPT.format(
title=req.title,
description=req.description,
acceptance_criteria=req.acceptance_criteria,
schema_context=schema
))
# 3. Parse + validate
dashboard = json.loads(dashboard_json)
validate_dashboard(dashboard) # check required fields, SQL syntax
# 4. Commit to GitLab
branch = f"auto/dashboard-{req.ticket_id}"
filename = f"grafana/dashboards/auto/{req.ticket_id}-{slugify(req.title)}.json"
gitlab_create_branch(branch)
gitlab_commit_file(branch, filename, json.dumps(dashboard, indent=2))
mr_url = gitlab_create_mr(branch, f"Auto: Dashboard for {req.ticket_id} - {req.title}")
# 5. Comment on Jira
jira_comment(req.ticket_id, f"Dashboard PR ready for review: {mr_url}\nPreview SQL included in PR description.")
return {"mr_url": mr_url, "dashboard_title": dashboard["title"]}
Zendesk integration (same flow)
@app.post("/webhook/zendesk")
async def zendesk_webhook(payload: dict):
ticket = payload['ticket']
if 'dashboard-request' not in ticket.get('tags', []):
return {"status": "ignored"}
await generate_dashboard(DashboardRequest(
ticket_id=f"ZD-{ticket['id']}",
title=ticket['subject'],
description=ticket['description'],
acceptance_criteria=extract_acceptance_criteria(ticket['description'])
))
3. Anomaly Detection
Scheduled job (runs every 15 min via EKS CronJob)
# anomaly-detector/detector.py
ANOMALY_PROMPT = """Analyze this security data for anomalies.
Look for: unusual auth failure spikes, new source IPs, traffic to unexpected destinations,
volume deviations > 3 sigma from 7-day baseline.
Current window (last 15 min):
{current_data}
7-day baseline (avg per 15-min window):
{baseline_data}
If anomalies found, respond with JSON:
{{"anomalies": [{{"type": str, "severity": "low|medium|high|critical", "description": str, "recommended_action": str}}]}}
If no anomalies, respond: {{"anomalies": []}}"""
def run_detection():
# Query current + baseline
current = athena_query("""
SELECT sourcetype, count(*) as events, count(distinct src_endpoint.ip) as unique_ips
FROM network_activity
WHERE time >= (unix_timestamp() - 900) * 1000 -- last 15 min
GROUP BY sourcetype
""")
baseline = athena_query("""
SELECT sourcetype,
avg(cnt) as avg_events,
stddev(cnt) as stddev_events
FROM (
SELECT sourcetype,
date_trunc('minute', from_unixtime(time/1000)) as bucket,
count(*) as cnt
FROM network_activity
WHERE dt >= current_date - interval '7' day
GROUP BY 1, 2
)
GROUP BY sourcetype
""")
# Ask LLM
result = json.loads(llm_call(ANOMALY_PROMPT.format(
current_data=current.to_string(),
baseline_data=baseline.to_string()
)))
# Create Jira tickets for high/critical anomalies
for anomaly in result['anomalies']:
if anomaly['severity'] in ('high', 'critical'):
jira_create_ticket(
project='SEC',
summary=f"[AUTO] {anomaly['type']}: {anomaly['description'][:80]}",
description=f"""
**Detected by:** AI Anomaly Detector
**Severity:** {anomaly['severity']}
**Time:** {datetime.now().isoformat()}
**Description:**
{anomaly['description']}
**Recommended Action:**
{anomaly['recommended_action']}
**Query used:** See attachment
""",
priority=anomaly['severity'].capitalize(),
labels=['auto-detected', 'security-anomaly']
)
LLM Options
| Option | Cost | Latency | Privacy | Setup |
|---|---|---|---|---|
| AWS Bedrock Claude Haiku | ~$0.001/query | 1-2s | AWS boundary | Zero - already in AWS |
| AWS Bedrock Llama 3 | ~$0.0006/query | 1-3s | AWS boundary | Zero |
| Ollama on EKS GPU node | ~$0.80/hr (g4dn.xlarge) | 2-5s | Fully private | Medium |
| OpenRouter | ~$0.001/query | 1-3s | Leaves AWS | Already have key |
Recommendation: Bedrock Claude Haiku for Text2SQL and summarization (fast, cheap, no egress). Ollama/GPU only if hard data residency requirement.
Grafana AI Panel
{
"type": "text",
"title": "AI Security Analyst",
"options": {
"content": "<form onsubmit='queryAI(event)'><input id='q' placeholder='Ask a question...' style='width:80%'/><button>Ask</button></form><div id='result'></div><script>async function queryAI(e){e.preventDefault();const r=await fetch('https://api.ic-security.internal/query',{method:'POST',body:JSON.stringify({question:document.getElementById('q').value})});const d=await r.json();document.getElementById('result').innerHTML='<b>SQL:</b><pre>'+d.sql+'</pre><b>Summary:</b><p>'+d.summary+'</p>';}</script>"
}
}
Security Controls on AI Layer
- All Bedrock calls stay within AWS VPC endpoint (no public internet)
- Query API: AD auth required (IAM Identity Center SAML token)
- SQL injection prevention: LLM output validated against whitelist of allowed tables/columns before execution
- Athena workgroup: read-only, result encryption, max scan 10GB per query
- Anomaly tickets: auto-created but require human assignment before action
- Dashboard PRs: require human MR approval before deploy (not auto-merged)