Theme
Gap Analysis
Gap Analysis identifies topics where your knowledge bases are missing content or where existing content is consistently unhelpful. It shows you exactly what agents and customers are asking about that your KB can't answer — and helps you fill those gaps automatically.
How It Works
The system continuously monitors every knowledge search query and tracks two types of gaps:
1. Zero-Hit Queries
Queries where the knowledge search found no matching sources at all. The customer asked a question, the AI searched the knowledge base, and came back empty. These are the clearest signal that content is missing.
Detection: Every rag_query event where sources_count = 0 is recorded. The gap analysis aggregates these by query text and counts how many times the same (or similar) question was asked.
2. Rejected-Source Queries
Queries where the knowledge search found sources, but the agent rejected them as unhelpful. The AI returned an answer with citations, but the agent thumbs-downed the sources because they were irrelevant, outdated, or incorrect.
Detection: When source_rejected events outnumber source_accepted events for the same query, and the rejection count exceeds the configured threshold (default: 3 rejections), the query is flagged as a gap.
This is often a more important signal than zero-hit queries — it means you have content but it's wrong or misleading, which is worse than having no content at all.
Gap Analysis Views
Basic Mode (Default)
When you open Gap Analysis, it loads all gap queries for the selected date range. Each row shows:
| Column | Description |
|---|---|
| Query | The exact question the customer/agent asked |
| Count | How many times this query went unanswered |
| Gap Type | zero_hit (no sources) or rejected_sources (sources found but unhelpful) |
| Priority | High (10+ occurrences), Medium (3-9), Low (1-2) |
| First Seen | When this gap was first detected |
| Last Seen | Most recent occurrence |
Sorting options:
- Frequency (default) — most asked questions first
- Recency — most recently occurring gaps first
- Priority — high/medium/low priority grouping
Semantic Mode (AI-Powered Clustering)
Click Analyze with AI to run semantic clustering. This groups similar queries together even if they use different words.
For example, these three queries would be clustered into one gap:
- "How do I enroll my spouse?"
- "Adding a dependent to my plan"
- "Can I put my wife on my insurance?"
How semantic clustering works:
- Embed — All gap queries are sent to the embedding model (gemini-embedding-001) to generate vector representations
- Cluster — Queries with cosine similarity above the configured threshold (default: 0.82) are grouped into clusters using greedy centroid-based clustering
- Label — The LLM reads each cluster's sample queries and generates a descriptive topic label and summary
- Prioritize — Clusters are scored by total occurrence count across all queries in the cluster
Each semantic cluster shows:
| Field | Description |
|---|---|
| Topic | AI-generated descriptive label (e.g., "Spouse Benefits Enrollment") |
| Summary | One-sentence description of what customers are asking |
| Total Count | Sum of all individual query occurrences in the cluster |
| Query Count | Number of distinct query variations grouped together |
| Sample Queries | Up to 5 representative queries from the cluster |
| Gap Type | Dominant type in the cluster (zero_hit or rejected_sources) |
| Priority | Based on total count |
| Transcript Count | Number of conversations where these queries appeared |
Configuration
Two settings in Settings > Knowledge & AI control gap analysis behavior:
| Setting | Default | Description |
|---|---|---|
| Min Rejections | 3 | Minimum source rejections before a query is flagged as a rejected-sources gap |
| Similarity Threshold | 0.82 | Cosine similarity threshold for semantic clustering. Lower values create larger, broader clusters. Higher values create more specific clusters. |
Transcript Review
For any gap cluster, click View Transcripts to see the actual conversations where the gap occurred. This shows:
- The agent's name
- The full message exchange (customer + agent + bot messages)
- Up to 10 conversations per cluster, 20 messages each
This helps you understand the context around the gap — what the customer actually needed, how the agent handled it without KB support, and what content would have helped.
Auto-Generate Draft Articles
The most powerful feature: click Generate Draft on any gap cluster to automatically create a knowledge base article.
How it works:
- The system fetches the conversation transcripts for the cluster's queries (up to 5 conversations)
- An LLM reads the transcripts and the sample queries
- It generates a well-structured markdown article based on what the agents actually said to customers — not invented content
- The article includes: title, introduction, organized sections with headers, step-by-step instructions where applicable, and important notes
- The draft is saved to the selected knowledge base as a markdown document with
[DRAFT]prefix
If no transcripts are available (e.g., voice-only conversations without transcript storage), the system generates a structured outline with placeholder markers indicating where content should be added.
After generation:
- The draft appears in your knowledge base document list with
[DRAFT]in the title - Review and edit the content before indexing
- Once satisfied, trigger indexing to make it searchable
- The gap should decrease as agents start getting answers for those queries
How Gaps Feed Back Into the System
Gap analysis creates a continuous improvement loop:
Customer asks question
→ KB search finds nothing (gap detected)
→ Gap appears in analysis
→ Admin generates draft from transcripts
→ Article added to KB and indexed
→ Next time customer asks → KB search finds the new article
→ Agent gets a suggestion → gap resolvedThe gap count for that topic should decrease over time. If it doesn't, the article may need improvement — check the feedback analytics to see if agents are accepting or rejecting the new sources.
Endpoints
| Method | Path | Purpose |
|---|---|---|
| GET | /knowledge/gaps | List gap clusters (basic mode) with date/sort filters |
| GET | /knowledge/unanswered | Paginated list of individual unanswered queries |
| POST | /knowledge/gaps/analyze | Run semantic clustering with LLM labeling |
| POST | /knowledge/gaps/transcripts | Fetch conversation transcripts for a cluster's queries |
| POST | /knowledge/gaps/draft | Generate and save a KB article draft |
Tips
Start with High-Priority Gaps
Focus on gaps with 10+ occurrences first. These represent the questions agents hear most often without KB support. Even one article covering a high-frequency gap can significantly improve answer rates.
Use Semantic Mode for Patterns
Basic mode shows exact queries. Semantic mode reveals patterns — you might see 20 different wordings for the same question. One article can address the entire cluster.
Review Before Publishing
Auto-generated drafts are based on what agents said in conversations. Agents may have given incorrect or incomplete information. Always review drafts for accuracy before indexing.
Rejected Sources vs Zero Hits
A rejected-sources gap is often more urgent than a zero-hit gap. Zero hits mean missing content — agents know they need to find the answer elsewhere. Rejected sources mean the AI is confidently showing wrong content, which can mislead agents.
