Agent SQL Workflows
Agents should start with query schema, then use read-only SQL to investigate.
Wrapper commands and typed MCP/API tools are convenience shortcuts; SQL is the
primary interface because it lets agents compose new joins, rank candidates, and
ask follow-up questions without waiting for product-specific endpoints.
The active storage backend changes the SQL shape. SQLite exposes tables such as
object_facts, object_edges, object_changes, object_observations,
latest_index, and clusters. ClickHouse and chDB expose the
ClickHouse-compatible tables facts, edges, changes, observations,
versions, api_resources, and ingestion_offsets. Always read schema notes
first and adapt examples to the reported backend.
kube-insight query schema --db kubeinsight.dbkube-insight query sql --db kubeinsight.db --max-rows 50 --sql '...'The same primitives are available over the read-only HTTP API and MCP. Both
follow the configured storage.driver:
kube-insight serve api --db kubeinsight.db --listen 127.0.0.1:8080curl http://127.0.0.1:8080/api/v1/schemacurl -X POST http://127.0.0.1:8080/api/v1/sql \ -H 'content-type: application/json' \ -d '{"sql":"select name from latest_index limit 10","maxRows":10}'curl http://127.0.0.1:8080/api/v1/healthcurl 'http://127.0.0.1:8080/api/v1/history?kind=Pod&namespace=default&name=api-1&maxVersions=5&maxObservations=20'schema returns more than raw columns:
tables: tables/views, descriptions, columns, and indexes,relationships: stable join hints such as latest object -> kind, object -> versions/blobs, edge source/target, and collector health,recipes: small read-only SQL templates agents can adapt instead of calling scenario-specific tools,notes: global rules such as timestamp format and proof/index guidance.
SQLite timestamps are Unix milliseconds. ClickHouse-compatible timestamps are
DateTime64 UTC unless a column explicitly stores milliseconds. The examples
below are SQLite examples first because the default artifact uses SQLite. Use the
ClickHouse-compatible section when schema notes report ClickHouse or chDB.
For SQLite, use latest_raw_documents for the latest observed sanitized cluster
snapshot, and latest_documents for the latest retained/normalized proof
document.
SQL-First Investigation Loop
Section titled “SQL-First Investigation Loop”Use typed API/MCP endpoints as guardrails and summaries, not as the main investigation surface. A reliable agent loop is:
- Detect backend with schema.
- Check collector coverage from
ingestion_offsets. - Select one
cluster_idand keep it in every query. - Inventory available
fact_keyvalues before guessing what to search. - Query
factsandchangesto rank candidate objects. - Query
edgesto expand topology around candidateobject_idvalues. - Query
observationsandversionsfor proof timestamps, resource versions, document hashes, and retained JSON when needed.
For ClickHouse-compatible backends, ingestion_offsets is append-only. Always
collapse it with argMax(..., updated_at) before judging current coverage. A
plain group by status mixes old and current states.
Cluster Scope Pattern
Section titled “Cluster Scope Pattern”Start every investigation by identifying cluster scope. SQLite has a clusters
table:
select id, name, source from clusters order by id;ClickHouse-compatible stores keep the cluster ID directly on evidence rows:
select cluster_id, count() as rowsfrom versionsgroup by cluster_idorder by rows desc;Then keep the selected cluster_id in follow-up queries. This avoids broad
table scans in large multi-cluster databases and keeps agent queries predictable.
The SQLite examples below use cluster_id = 1; replace it with the selected
cluster.
Coverage First
Section titled “Coverage First”Before making a strong claim, inspect collector health:
select c.name as cluster, ar.api_group, ar.api_version, ar.resource, coalesce(io.status, 'not_started') as status, io.error, datetime(io.updated_at / 1000, 'unixepoch') as updated_atfrom clusters cjoin api_resources ar on ar.removed_at is nullleft join ingestion_offsets io on io.cluster_id = c.id and io.api_resource_id = ar.idwhere coalesce(io.status, 'not_started') in ('not_started', 'retrying', 'list_error', 'watch_error') and c.id = 1order by status, ar.api_group, ar.resourcelimit 50;Treat queued as healthy progress: the resource completed its initial LIST and
is waiting for a watch stream slot. Treat retrying as degraded coverage: the
resource has listed/watched before, but the stream is reconnecting and may lag
until a bookmark or watch event lands.
ClickHouse-Compatible Query Shape
Section titled “ClickHouse-Compatible Query Shape”Use these examples when query schema or kube_insight_schema reports
Active SQL backend: ClickHouse-compatible. They work for remote ClickHouse and
for the chDB-enabled local variant.
Check collector coverage with the current-state offset table:
with latest as ( select cluster_id, api_group, api_version, resource, kind, argMax(status, updated_at) as status, argMax(error, updated_at) as error, max(updated_at) as latest_update from ingestion_offsets group by cluster_id, api_group, api_version, resource, kind)select cluster_id, status, count() as resources, max(latest_update) as latest_updatefrom latestgroup by cluster_id, statusorder by resources desc;Show only degraded resources:
with latest as ( select cluster_id, api_group, api_version, resource, kind, argMax(status, updated_at) as status, argMax(error, updated_at) as error, max(updated_at) as latest_update from ingestion_offsets group by cluster_id, api_group, api_version, resource, kind)select cluster_id, api_group, api_version, resource, kind, status, error, latest_updatefrom latestwhere status in ('retrying', 'list_error', 'watch_error')order by latest_update desclimit 50;Inventory available facts before choosing a predicate:
select kind, fact_key, severity, count() as rows, max(ts) as latestfrom factswhere cluster_id = 'c1'group by kind, fact_key, severityorder by severity desc, rows desclimit 100;Useful Service LoadBalancer predicates include service.load_balancer.pending,
service.load_balancer.ingress_ip, service.load_balancer.ingress_hostname,
and service.load_balancer.ingress_count. Use these facts first, then open
versions.doc only when exact proof JSON is needed.
Find high-severity Pod candidates:
select ts, object_id, namespace, name, fact_key, fact_value, severityfrom factswhere cluster_id = 'c1' and kind = 'Pod' and severity >= 90order by ts desclimit 50;Compare status changes for the same candidates:
select ts, object_id, kind, namespace, name, change_family, path, op, old_scalar, new_scalar, severityfrom changeswhere cluster_id = 'c1' and severity >= 90order by ts desclimit 50;Find recent Event facts:
select ts as observed_at, object_id, fact_key, fact_value, severityfrom factswhere cluster_id = 'c1' and fact_key = 'k8s_event.reason' and fact_value in ('PolicyViolation', 'FailedScheduling', 'BackOff')order by ts desclimit 100;Follow topology edges around a known object ID:
select edge_type, src_id, dst_id, valid_from, valid_tofrom edgeswhere cluster_id = 'c1' and (src_id = 'c1/example-uid' or dst_id = 'c1/example-uid')order by valid_from desclimit 100;Pull proof observations and versions after narrowing candidates:
select observed_at, observation_type, resource, kind, namespace, name, resource_version, partialfrom observationswhere cluster_id = 'c1' and uid = 'example-uid'order by observed_at desclimit 20;
select object_id, kind, namespace, name, observed_at, resource_version, doc_hash, raw_size, stored_sizefrom versionswhere cluster_id = 'c1' and object_id = 'c1/example-uid'order by observed_at desclimit 10;Use typed commands or MCP tools for object history and service investigation when they package the final answer more cleanly. They hide backend-specific joins and return the same product DTO across SQLite, ClickHouse, and chDB, but SQL remains the main exploratory interface.
Webhook Broke GitOps
Section titled “Webhook Broke GitOps”Find admission webhooks that fail closed and the Service they call:
select c.name as cluster, ok.kind, li.name as webhook_config, f.fact_key, f.fact_value, f.severity, datetime(f.ts / 1000, 'unixepoch') as observed_atfrom object_facts fjoin clusters c on c.id = f.cluster_idjoin latest_index li on li.object_id = f.object_idjoin object_kinds ok on ok.id = f.kind_idwhere ok.kind in ('ValidatingWebhookConfiguration', 'MutatingWebhookConfiguration') and f.cluster_id = 1 and f.fact_key in ('admission_webhook.failure_policy', 'admission_webhook.service')order by f.severity desc, f.ts desclimit 100;ClickHouse-compatible version:
select ts, object_id, kind, namespace, name, fact_key, fact_value, severityfrom factswhere cluster_id = 'c1' and kind in ('ValidatingWebhookConfiguration', 'MutatingWebhookConfiguration') and fact_key in ('admission_webhook.failure_policy', 'admission_webhook.service')order by severity desc, ts desclimit 100;Connect webhook error Events to likely webhook configs by text:
select ev.namespace, ev.name as event_name, reason.fact_value as event_reason, preview.fact_value as message_preview, datetime(reason.ts / 1000, 'unixepoch') as event_timefrom object_facts previewjoin objects ev on ev.id = preview.object_idjoin object_kinds evk on evk.id = preview.kind_idjoin object_facts reason on reason.object_id = ev.id and reason.fact_key = 'k8s_event.reason'where evk.kind = 'Event' and preview.cluster_id = 1 and preview.fact_key = 'k8s_event.message_preview' and lower(preview.fact_value) like '%webhook%'order by reason.ts desclimit 50;Events And Involved Resources
Section titled “Events And Involved Resources”Show warning Events and the object they point at through extracted edges:
select ev.namespace as event_namespace, ev.name as event_name, dst_kind.kind as involved_kind, dst.namespace as involved_namespace, dst.name as involved_name, reason.fact_value as reason, preview.fact_value as message_preview, datetime(reason.ts / 1000, 'unixepoch') as event_timefrom object_facts reasonjoin objects ev on ev.id = reason.object_idjoin object_edges e on e.src_id = ev.idjoin objects dst on dst.id = e.dst_idjoin object_kinds dst_kind on dst_kind.id = dst.kind_idleft join object_facts preview on preview.version_id = reason.version_id and preview.fact_key = 'k8s_event.message_preview'where reason.fact_key = 'k8s_event.reason' and reason.cluster_id = 1 and reason.fact_key <> 'k8s_event.message_preview' and reason.severity >= 40order by reason.ts desclimit 100;RBAC Role And Binding Chain
Section titled “RBAC Role And Binding Chain”Find RoleBinding or ClusterRoleBinding changes and their referenced roles or subjects:
select ok.kind as binding_kind, li.namespace, li.name as binding_name, f.fact_key, f.fact_value, datetime(f.ts / 1000, 'unixepoch') as observed_atfrom object_facts fjoin latest_index li on li.object_id = f.object_idjoin object_kinds ok on ok.id = f.kind_idwhere ok.kind in ('RoleBinding', 'ClusterRoleBinding') and f.cluster_id = 1 and f.fact_key like 'rbac.%'order by f.ts desclimit 100;Find role rule mutations during an outage window:
select ok.kind, o.namespace, o.name, ch.path, ch.old_scalar, ch.new_scalar, datetime(ch.ts / 1000, 'unixepoch') as changed_atfrom object_changes chjoin objects o on o.id = ch.object_idjoin object_kinds ok on ok.id = o.kind_idwhere ok.kind in ('Role', 'ClusterRole') and ch.cluster_id = 1 and (ch.path like '%rules%' or ch.change_family = 'rbac')order by ch.ts desclimit 100;cert-manager Certificate Chain
Section titled “cert-manager Certificate Chain”Find unhealthy or recently changed cert-manager Certificates and related Secrets or Issuers:
select ok.kind, li.namespace, li.name, f.fact_key, f.fact_value, f.severity, datetime(f.ts / 1000, 'unixepoch') as observed_atfrom object_facts fjoin latest_index li on li.object_id = f.object_idjoin object_kinds ok on ok.id = f.kind_idwhere ok.kind in ('Certificate', 'CertificateRequest', 'Issuer', 'ClusterIssuer') and f.cluster_id = 1 and f.fact_key like 'status_condition.%'order by f.severity desc, f.ts desclimit 100;Follow extracted cert-manager edges:
select src_kind.kind as source_kind, src.namespace as source_namespace, src.name as source_name, e.edge_type, dst_kind.kind as target_kind, dst.namespace as target_namespace, dst.name as target_namefrom object_edges ejoin objects src on src.id = e.src_idjoin object_kinds src_kind on src_kind.id = src.kind_idjoin objects dst on dst.id = e.dst_idjoin object_kinds dst_kind on dst_kind.id = dst.kind_idwhere e.edge_type like 'certmanager_%' and e.cluster_id = 1order by e.valid_from desclimit 100;CRD Schema Or Conversion Regression
Section titled “CRD Schema Or Conversion Regression”Find CRD conversion webhook changes:
select li.name as crd_name, ch.path, ch.old_scalar, ch.new_scalar, datetime(ch.ts / 1000, 'unixepoch') as changed_atfrom object_changes chjoin latest_index li on li.object_id = ch.object_idjoin object_kinds ok on ok.id = li.kind_idwhere ok.kind = 'CustomResourceDefinition' and ch.cluster_id = 1 and (ch.path like '%conversion%' or ch.path like '%schema%')order by ch.ts desclimit 100;Find custom resources with degraded status conditions after a CRD change:
select ok.kind, li.namespace, li.name, f.fact_key, f.fact_value, f.severity, datetime(f.ts / 1000, 'unixepoch') as observed_atfrom object_facts fjoin latest_index li on li.object_id = f.object_idjoin object_kinds ok on ok.id = f.kind_idwhere f.fact_key like 'status_condition.%' and f.cluster_id = 1 and f.severity >= 50order by f.ts desclimit 100;