Gap Analysis

Gap Analysis identifies topics where your knowledge bases are missing content or where existing content is consistently unhelpful. It shows you exactly what agents and customers are asking about that your KB can't answer — and helps you fill those gaps automatically.

How It Works

The system continuously monitors every knowledge search query and tracks two types of gaps:

1. Zero-Hit Queries

Queries where the knowledge search found no matching sources at all. The customer asked a question, the AI searched the knowledge base, and came back empty. These are the clearest signal that content is missing.

Detection: Every rag_query event where sources_count = 0 is recorded. The gap analysis aggregates these by query text and counts how many times the same (or similar) question was asked.

2. Rejected-Source Queries

Queries where the knowledge search found sources, but the agent rejected them as unhelpful. The AI returned an answer with citations, but the agent thumbs-downed the sources because they were irrelevant, outdated, or incorrect.

Detection: When source_rejected events outnumber source_accepted events for the same query, and the rejection count exceeds the configured threshold (default: 3 rejections), the query is flagged as a gap.

This is often a more important signal than zero-hit queries — it means you have content but it's wrong or misleading, which is worse than having no content at all.

Gap Analysis Views

Basic Mode (Default)

When you open Gap Analysis, it loads all gap queries for the selected date range. Each row shows:

Column	Description
Query	The exact question the customer/agent asked
Count	How many times this query went unanswered
Gap Type	`zero_hit` (no sources) or `rejected_sources` (sources found but unhelpful)
Priority	High (10+ occurrences), Medium (3-9), Low (1-2)
First Seen	When this gap was first detected
Last Seen	Most recent occurrence

Sorting options:

Frequency (default) — most asked questions first
Recency — most recently occurring gaps first
Priority — high/medium/low priority grouping

Semantic Mode (AI-Powered Clustering)

Click Analyze with AI to run semantic clustering. This groups similar queries together even if they use different words.

For example, these three queries would be clustered into one gap:

"How do I enroll my spouse?"
"Adding a dependent to my plan"
"Can I put my wife on my insurance?"

How semantic clustering works:

Embed — All gap queries are sent to the embedding model (gemini-embedding-001) to generate vector representations
Cluster — Queries with cosine similarity above the configured threshold (default: 0.82) are grouped into clusters using greedy centroid-based clustering
Label — The LLM reads each cluster's sample queries and generates a descriptive topic label and summary
Prioritize — Clusters are scored by total occurrence count across all queries in the cluster

Each semantic cluster shows:

Field	Description
Topic	AI-generated descriptive label (e.g., "Spouse Benefits Enrollment")
Summary	One-sentence description of what customers are asking
Total Count	Sum of all individual query occurrences in the cluster
Query Count	Number of distinct query variations grouped together
Sample Queries	Up to 5 representative queries from the cluster
Gap Type	Dominant type in the cluster (zero_hit or rejected_sources)
Priority	Based on total count
Transcript Count	Number of conversations where these queries appeared

Configuration

Two settings in Settings > Knowledge & AI control gap analysis behavior:

Setting	Default	Description
Min Rejections	3	Minimum source rejections before a query is flagged as a rejected-sources gap
Similarity Threshold	0.82	Cosine similarity threshold for semantic clustering. Lower values create larger, broader clusters. Higher values create more specific clusters.

Transcript Review

For any gap cluster, click View Transcripts to see the actual conversations where the gap occurred. This shows:

The agent's name
The full message exchange (customer + agent + bot messages)
Up to 10 conversations per cluster, 20 messages each

This helps you understand the context around the gap — what the customer actually needed, how the agent handled it without KB support, and what content would have helped.

Auto-Generate Draft Articles

The most powerful feature: click Generate Draft on any gap cluster to automatically create a knowledge base article.

How it works:

The system fetches the conversation transcripts for the cluster's queries (up to 5 conversations)
An LLM reads the transcripts and the sample queries
It generates a well-structured markdown article based on what the agents actually said to customers — not invented content
The article includes: title, introduction, organized sections with headers, step-by-step instructions where applicable, and important notes
The draft is saved to the selected knowledge base as a markdown document with [DRAFT] prefix

If no transcripts are available (e.g., voice-only conversations without transcript storage), the system generates a structured outline with placeholder markers indicating where content should be added.

After generation:

The draft appears in your knowledge base document list with [DRAFT] in the title
Review and edit the content before indexing
Once satisfied, trigger indexing to make it searchable
The gap should decrease as agents start getting answers for those queries

How Gaps Feed Back Into the System

Gap analysis creates a continuous improvement loop:

Customer asks question
  → KB search finds nothing (gap detected)
  → Gap appears in analysis
  → Admin generates draft from transcripts
  → Article added to KB and indexed
  → Next time customer asks → KB search finds the new article
  → Agent gets a suggestion → gap resolved

The gap count for that topic should decrease over time. If it doesn't, the article may need improvement — check the feedback analytics to see if agents are accepting or rejecting the new sources.

Endpoints

Method	Path	Purpose
GET	`/knowledge/gaps`	List gap clusters (basic mode) with date/sort filters
GET	`/knowledge/unanswered`	Paginated list of individual unanswered queries
POST	`/knowledge/gaps/analyze`	Run semantic clustering with LLM labeling
POST	`/knowledge/gaps/transcripts`	Fetch conversation transcripts for a cluster's queries
POST	`/knowledge/gaps/draft`	Generate and save a KB article draft

Tips

Start with High-Priority Gaps

Focus on gaps with 10+ occurrences first. These represent the questions agents hear most often without KB support. Even one article covering a high-frequency gap can significantly improve answer rates.

Use Semantic Mode for Patterns

Basic mode shows exact queries. Semantic mode reveals patterns — you might see 20 different wordings for the same question. One article can address the entire cluster.

Review Before Publishing

Auto-generated drafts are based on what agents said in conversations. Agents may have given incorrect or incomplete information. Always review drafts for accuracy before indexing.

Rejected Sources vs Zero Hits

A rejected-sources gap is often more urgent than a zero-hit gap. Zero hits mean missing content — agents know they need to find the answer elsewhere. Rejected sources mean the AI is confidently showing wrong content, which can mislead agents.

Gap Analysis ​

How It Works ​

1. Zero-Hit Queries ​

2. Rejected-Source Queries ​

Gap Analysis Views ​

Basic Mode (Default) ​

Semantic Mode (AI-Powered Clustering) ​

Configuration ​

Transcript Review ​

Auto-Generate Draft Articles ​

How Gaps Feed Back Into the System ​

Endpoints ​

Tips ​