Active / flagship

kube-insight

The missing history layer for Kubernetes AIOps.

Logs have search systems. Metrics have time-series stores. Traces preserve application flows. Kubernetes infrastructure state is still too often reduced to whatever the apiserver shows right now. kube-insight turns that gap into an AIOps foundation: it records Kubernetes resource history at low operational cost, extracts facts and topology, and exposes human- and agent-friendly query surfaces. Agents can work from retained evidence first, then use live kubectl only for final confirmation instead of rebuilding context from scratch.

GitHub Docs Watch demo

24-215 msfive retained-evidence agent workflow queries

14.9x-221xfaster than comparable broad kubectl paths

auto-redactionconfigurable filters and extractors keep sensitive data out of evidence

Latest release

v0.1.3 adds A2A, kagent, and Helm integration.

Published on July 1, 2026, this release ships default Linux, macOS, and Windows archives, chDB-enabled archives, a matching container image, and the embedded Web UI.

Release notes

A2A

Agent-to-agent integration surface for teams connecting kube-insight into agent workflows.

kagent

Dedicated kagent chart support for running kube-insight with Kubernetes-native agent automation.

Helm

Published chart path plus image defaults tied to the released container tag.

Web UI

Release binaries still ship the embedded React app for serve --app demos and local investigations.

Built-in agent demo

Ask about a node-pool change and get retained proof back.

The demo asks the embedded kube-insight agent whether a managed Kubernetes node pool changed recently. The answer uses retained Node lifecycle history, SQL aggregation, current node capacity, and citations instead of broad live cluster browsing.

Built-in Web UI agent answering from retained Node history, SQL aggregation, and citations. Open original

Quickstart

Run the collector, Web UI, API, and MCP surface in one local loop.

Start with the SQLite-backed service mode. It discovers every list/watch-capable Kubernetes resource allowed by the current kubeconfig, keeps sanitized evidence locally, exposes read-only API checks, and gives agents the embedded Web UI plus a Streamable HTTP MCP surface.

Full quickstart Agent workflow

Install the binary

Download a release artifact and keep the binary in the working directory for the local run.

KI_VERSION="$(curl -fsSL https://api.github.com/repos/nowakeai/kube-insight/releases/latest | sed -n 's/.*"tag_name": "v\([^"]*\)".*/\1/p')"
test -n "${KI_VERSION}"
KI_OS="$(uname -s | tr '[:upper:]' '[:lower:]')"
KI_ARCH="$(uname -m)"
case "${KI_ARCH}" in
  x86_64) KI_ARCH=amd64 ;;
  aarch64) KI_ARCH=arm64 ;;
esac

curl -L -o kube-insight.tar.gz \
  "https://github.com/nowakeai/kube-insight/releases/download/v${KI_VERSION}/kube-insight_${KI_VERSION}_${KI_OS}_${KI_ARCH}.tar.gz"
tar -xzf kube-insight.tar.gz kube-insight
chmod +x kube-insight

Start service mode

Run collection, storage, the embedded Web UI, HTTP API, and the Streamable HTTP MCP service together. Leaving resources unspecified uses discovery for the resources the collector can list/watch.

./kube-insight serve --watch --app \
  --db kubeinsight.db \
  --listen 127.0.0.1:8090

Verify API and coverage

Check service health, inspect the active schema, then look for collection errors before trusting an investigation result.

curl http://127.0.0.1:8090/healthz
curl http://127.0.0.1:8090/api/v1/schema
curl 'http://127.0.0.1:8090/api/v1/health?errorsOnly=true&problemLimit=20'

Connect an agent through Streamable HTTP MCP

Point the agent at the long-running /mcp service endpoint instead of opening the database from a new stdio process. Keep serve mcp only as a fallback for runtimes that do not support remote MCP.

{
  "mcpServers": {
    "kube-insight": {
      "type": "streamable-http",
      "url": "http://127.0.0.1:8090/mcp"
    }
  }
}

# First investigation loop:
kube_insight_schema -> kube_insight_health -> kube_insight_sql -> kube_insight_history

Docs map

The useful repo docs are now first-class website pages.

kube-insight's project repository now separates user, operator, and contributor docs. The website keeps the public paths task-oriented: agent setup, investigation cases, configuration, evidence vocabulary, and security boundaries are reachable without reading the repository tree.

Run the embedded agent Built-in Web UI Agent

Configure server-side LLM credentials, start serve --app, and investigate retained proof in the browser.

Bring your own agent External Agent Skill

Connect Codex, Claude, or another MCP-capable runtime to kube-insight as a retained-evidence service.

Operational cases Real-World Cases

Webhook, Event, Service, EndpointSlice, RBAC, cert-manager, Flux, and Node-capacity investigations grounded in retained evidence.

Operate it Configuration And Processing

Validate config, inspect effective defaults, tune resource profiles, and understand filters, extractors, and retention behavior.

Evidence vocabulary Facts Catalog

Use extracted fact families as fast candidate paths before opening retained JSON versions as proof.

Security roadmap RBAC-Aware Evidence

Track the design for Kubernetes RBAC inheritance and authorization-aware SQL over derived evidence.

Why it matters

Current state is useful. It is not the whole story.

kubectl is still the live-state baseline, but many incidents are already gone by the time someone investigates: Events expire, rollouts are reverted, RBAC edits are fixed, EndpointSlices move, and Pods are replaced. kube-insight keeps the missing Kubernetes evidence and shapes it into fast, scoped investigation paths.

without historycurrent objects only

with kube-insightversions, facts, edges, observations

Keep the state that disappeared

Events expire, Pods restart, EndpointSlices change, and deleted objects vanish from the apiserver. kube-insight keeps observed versions and timestamps so the old state can still be inspected.

Turn raw history into queryable clues

Extracted facts, changes, and topology edges let operators and agents rank candidate Services, Pods, Events, owners, RBAC, webhooks, and policies before opening full JSON proof.

Reduce the agent blast radius

Configurable filters and extractors redact sensitive data before storage. Future service mode will inherit Kubernetes RBAC so agents see only what they are allowed to inspect.

Performance

Measured as investigation workflows, not isolated database tricks.

The validation compares retained evidence against broad live kubectl paths, then separates SQLite, ClickHouse, and chDB tradeoffs. The product claim is focused: pre-extracted evidence makes AIOps workflows faster and more repeatable before the final live-state check.

Validation profile

Evidence queries stay small because the joins are already shaped.

2026-05-18

agent query phase 24-215 ms

Five retained-evidence workflows over SQLite evidence.

raw kubectl baseline 3,104-5,745 ms

Comparable broad live calls reconstructing the same context.

live service case 448.746 ms vs 3,462.546 ms

ClickHouse SQL/API path used 3 operations; raw kubectl used 4 calls.

Agent workflow benchmark

Retained evidence vs broad live kubectl

14.9x-221x

Scenario kube-insight kubectl Speedup

PolicyViolation Event count 215 ms 3,214 ms 14.9x

Event to affected resource 26 ms 3,307 ms 127.2x

Event keyword search 24 ms 3,794 ms 158.1x

Service topology candidates 32 ms 3,104 ms 97.0x

Workload scope inventory 26 ms 5,745 ms 221.0x

Same-dataset storage harness

Choose by operating model, not a single latency number.

smallest local start

SQLite

ingest: 17.42 s
service: 80.6 ms
storage: 4.61 MB DB

central history

ClickHouse

ingest: 7.91 s
service: 182.0 ms
storage: 597 KiB active, ~4.9x

local ClickHouse shape

chDB

ingest: 1.52 s
service: 506.9 ms
storage: 1.23 MB dir, ~5.7x

Use cases

Actual investigation shapes from the project docs.

The website should show more than a capability list. These cases demonstrate how retained facts, edges, observations, and versions become practical incident evidence.

Expired events Service topology

Expired events

PolicyViolation events after the workload looks healthy

Symptom

A deployment was rejected or repeatedly reconciled with policy warnings. By the time someone investigates, the workload may look healthy and Events may have rotated out.

Why live kubectl is weak later

Events are short-lived and often rotated.
Warning Events must be joined back to Deployments, ReplicaSets, and Pods.
The policy controller may no longer list every affected object.

Evidence kube-insight uses

k8s_event.reason, type, and message facts
event edges to involved resources
Deployment, ReplicaSet, and Pod retained versions

01Check coverage

02Find warning Events

03Follow involved-object edges

04Open retained history

Query shape

where fact_key in ('k8s_event.reason', 'k8s_event.type') and (fact_value = 'Warning' or severity >= 60)

What you get

PolicyViolation warning Events tied back to workload objects, even when the current cluster no longer shows the full incident window.

Service topology

Service / EndpointSlice proof after resources changed

Symptom

A Service briefly routed to no endpoints or unready Pods. Later the Service is healthy, old Pods may be replaced, and the useful topology has moved on.

Why live kubectl is weak later

Current EndpointSlices only show current endpoints.
Deleted rollout objects and old Pods cannot be reconstructed from live state alone.
Pod readiness transitions and Events may no longer line up in one live query.

Evidence kube-insight uses

endpointslice_for_service edges
endpointslice_targets_pod edges
Endpoint readiness, Pod readiness, and restart facts
Service investigation bundle with proof versions

01Find Service facts

02Expand EndpointSlice edges

03Inspect Pod readiness

04Cross-check current kubectl

Query shape

endpointslice_for_service -> endpointslice_targets_pod -> Pod readiness facts -> retained versions

What you get

The investigation can show which historical EndpointSlices pointed at which Pods, then use kubectl only as the final live-state comparison.

Architecture

Facts and edges are the candidate path. Versions are the proof.

Kubernetes data is captured once, filtered before storage, extracted into investigation tables, then served through narrow read surfaces: CLI, HTTP API, read-only SQL, MCP tools, and agent prompts.

Architecture flow

Same shape as the project architecture: capture, filter, store, query.

read-only outputs

Kubernetes API

Discovery

List / Watch

kube-insight ingestion

Filters redact, normalize, discard

Retained versions content-addressed JSON

Evidence extraction facts, edges, changes

Evidence store

versions

facts

edges

observations

SQLite default / chDB local / ClickHouse central

Read surfaces

CLI

HTTP API

SQL

MCP tools + prompts

Investigations humans, scripts, and agents inspect scoped proof

Storage modes

Start local. Keep history central when the team needs it.

default local smallest start

SQLite

A pure-Go default artifact with one local evidence database for first captures, laptops, CI fixtures, and local agent workflows.

local ClickHouse shape embedded analytics

chDB

A chDB-enabled artifact when you want ClickHouse-compatible local tables without operating a ClickHouse server.

central history team service

ClickHouse

A continuous evidence service for append-heavy history, compression, API/MCP reads, and future cold-tiering work.

Next steps

Use the docs site when you are ready for the full path.

Installation, MCP usage, SQL recipes, security, retention, and storage-mode tradeoffs are rendered in the docs site from the project repository source.

Open GitHub Read Docs Quickstart