Active / flagship

kube-insight

The missing history layer for Kubernetes AIOps.

Logs have search systems. Metrics have time-series stores. Traces preserve application flows. Kubernetes infrastructure state is still too often reduced to whatever the apiserver shows right now. kube-insight turns that gap into an AIOps foundation: it records Kubernetes resource history at low operational cost, extracts facts and topology, and exposes human- and agent-friendly query surfaces. Agents can work from retained evidence first, then use live kubectl only for final confirmation instead of rebuilding context from scratch.

24-215 msfive retained-evidence agent workflow queries
14.9x-221xfaster than comparable broad kubectl paths
auto-redactionconfigurable filters and extractors keep sensitive data out of evidence

Current state is useful. It is not the whole story.

kubectl is still the live-state baseline, but many incidents are already gone by the time someone investigates: Events expire, rollouts are reverted, RBAC edits are fixed, EndpointSlices move, and Pods are replaced. kube-insight keeps the missing Kubernetes evidence and shapes it into fast, scoped investigation paths.

without historycurrent objects only
with kube-insightversions, facts, edges, observations

Keep the state that disappeared

Events expire, Pods restart, EndpointSlices change, and deleted objects vanish from the apiserver. kube-insight keeps observed versions and timestamps so the old state can still be inspected.

Turn raw history into queryable clues

Extracted facts, changes, and topology edges let operators and agents rank candidate Services, Pods, Events, owners, RBAC, webhooks, and policies before opening full JSON proof.

Reduce the agent blast radius

Configurable filters and extractors redact sensitive data before storage. Future service mode will inherit Kubernetes RBAC so agents see only what they are allowed to inspect.

Measured as investigation workflows, not isolated database tricks.

The validation compares retained evidence against broad live kubectl paths, then separates SQLite, ClickHouse, and chDB tradeoffs. The product claim is focused: pre-extracted evidence makes AIOps workflows faster and more repeatable before the final live-state check.

Validation profile

Evidence queries stay small because the joins are already shaped.

2026-05-18
agent query phase 24-215 ms

Five retained-evidence workflows over SQLite evidence.

raw kubectl baseline 3,104-5,745 ms

Comparable broad live calls reconstructing the same context.

live service case 448.746 ms vs 3,462.546 ms

ClickHouse SQL/API path used 3 operations; raw kubectl used 4 calls.

Agent workflow benchmark

Retained evidence vs broad live kubectl

14.9x-221x
Scenario kube-insight kubectl Speedup
PolicyViolation Event count 215 ms 3,214 ms 14.9x
Event to affected resource 26 ms 3,307 ms 127.2x
Event keyword search 24 ms 3,794 ms 158.1x
Service topology candidates 32 ms 3,104 ms 97.0x
Workload scope inventory 26 ms 5,745 ms 221.0x

Same-dataset storage harness

Choose by operating model, not a single latency number.

smallest local start
SQLite
ingest
17.42 s
service
80.6 ms
storage
4.61 MB DB
central history
ClickHouse
ingest
7.91 s
service
182.0 ms
storage
597 KiB active, ~4.9x
local ClickHouse shape
chDB
ingest
1.52 s
service
506.9 ms
storage
1.23 MB dir, ~5.7x

Actual investigation shapes from the project docs.

The website should show more than a capability list. These cases demonstrate how retained facts, edges, observations, and versions become practical incident evidence.

Expired events

PolicyViolation events after the workload looks healthy

Symptom

A deployment was rejected or repeatedly reconciled with policy warnings. By the time someone investigates, the workload may look healthy and Events may have rotated out.

Why live kubectl is weak later

  • Events are short-lived and often rotated.
  • Warning Events must be joined back to Deployments, ReplicaSets, and Pods.
  • The policy controller may no longer list every affected object.

Evidence kube-insight uses

  • k8s_event.reason, type, and message facts
  • event edges to involved resources
  • Deployment, ReplicaSet, and Pod retained versions
01Check coverage
02Find warning Events
03Follow involved-object edges
04Open retained history

Query shape

where fact_key in ('k8s_event.reason', 'k8s_event.type') and (fact_value = 'Warning' or severity >= 60)

What you get

PolicyViolation warning Events tied back to workload objects, even when the current cluster no longer shows the full incident window.

Service topology

Service / EndpointSlice proof after resources changed

Symptom

A Service briefly routed to no endpoints or unready Pods. Later the Service is healthy, old Pods may be replaced, and the useful topology has moved on.

Why live kubectl is weak later

  • Current EndpointSlices only show current endpoints.
  • Deleted rollout objects and old Pods cannot be reconstructed from live state alone.
  • Pod readiness transitions and Events may no longer line up in one live query.

Evidence kube-insight uses

  • endpointslice_for_service edges
  • endpointslice_targets_pod edges
  • Endpoint readiness, Pod readiness, and restart facts
  • Service investigation bundle with proof versions
01Find Service facts
02Expand EndpointSlice edges
03Inspect Pod readiness
04Cross-check current kubectl

Query shape

endpointslice_for_service -> endpointslice_targets_pod -> Pod readiness facts -> retained versions

What you get

The investigation can show which historical EndpointSlices pointed at which Pods, then use kubectl only as the final live-state comparison.

Facts and edges are the candidate path. Versions are the proof.

Kubernetes data is captured once, filtered before storage, extracted into investigation tables, then served through narrow read surfaces: CLI, HTTP API, read-only SQL, MCP tools, and agent prompts.

Architecture flow

Same shape as the project architecture: capture, filter, store, query.

read-only outputs
Kubernetes API
Discovery
List / Watch
kube-insight ingestion
Filters redact, normalize, discard
Retained versions content-addressed JSON
Evidence extraction facts, edges, changes
Evidence store
versions
facts
edges
observations

SQLite default / chDB local / ClickHouse central

Read surfaces
CLI
HTTP API
SQL
MCP tools + prompts
Investigations humans, scripts, and agents inspect scoped proof

Small tables, useful answers.

versionscontent-addressed retained JSON
factsstatus, events, RBAC, rollout, webhook, cert facts
edgesService, EndpointSlice, owner, policy, and event relationships
observationswatch/list timestamps and coverage signals

Start local. Keep history central when the team needs it.

default local smallest start

SQLite

A pure-Go default artifact with one local evidence database for first captures, laptops, CI fixtures, and local agent workflows.

local ClickHouse shape embedded analytics

chDB

A chDB-enabled artifact when you want ClickHouse-compatible local tables without operating a ClickHouse server.

central history team service

ClickHouse

A continuous evidence service for append-heavy history, compression, API/MCP reads, and future cold-tiering work.

Start with the repository quickstart and storage notes.

Installation, MCP usage, SQL recipes, security, retention, and storage-mode tradeoffs are kept in project documentation so the website can stay focused on product shape.