Theme
Reranking & Feedback Scoring
Agent Assist uses a multi-stage ranking pipeline to ensure the most relevant knowledge base content surfaces for each query. Agent feedback (thumbs up/down on sources) directly influences future search results through a feedback scoring system.
Search Pipeline Overview
Customer utterance
│
▼
Classifier (is this a meaningful query?)
│ ┌──────────────────────┐
▼ │ Runs in PARALLEL │
Vector Search (pgvector) │ to save ~150ms │
│ └──────────────────────┘
▼
Min Score Filter (drop low-relevance chunks)
│
▼
Feedback Boost (re-sort by feedback-adjusted score)
│
▼
Top K Results → Context → LLM → AnswerStage 1: Vector Similarity Search
The foundation of every search. Each query is converted to a vector embedding and compared against all chunk embeddings using cosine similarity.
sql
SELECT content, 1 - (embedding <=> query_embedding) AS score
FROM chunks
WHERE tenant_id = :tenant_id
AND knowledge_base_id = ANY(:kb_ids)
AND embedding_status = 'completed'
ORDER BY embedding <=> query_embedding
LIMIT :top_k- Score range: 0.0 (no match) to 1.0 (identical)
- Min score threshold: Configurable per tenant (default 0.5). Chunks below this score are discarded.
- Embedding model: Vertex AI
gemini-embedding-001(3072 dimensions) - Suppressed chunks excluded: Chunks with
embedding_status = 'suppressed'are filtered out at the SQL level and never appear in results.
Stage 2: Feedback-Boosted Reranking
When enabled, agent feedback adjusts the ranking of search results. Chunks that agents consistently find helpful are boosted higher; chunks they reject sink lower.
The Formula
combined_score = vector_score × (1 + weight × feedback_score × confidence)Where:
vector_score: Original cosine similarity (0.0 to 1.0)weight: How much influence feedback has (default: 0.15 = 15%)feedback_score: Running average of agent votes (-1.0 to +1.0)confidence: How much to trust the feedback score, based on vote count
confidence = min(feedback_count, max_influence) / max_influenceThe max_influence cap (default: 20 votes) prevents any single chunk from having outsized influence. After 20 votes, additional votes still update the running average but don't increase confidence further.
Example
| Chunk | Vector Score | Feedback Score | Vote Count | Confidence | Combined Score |
|---|---|---|---|---|---|
| A | 0.82 | +0.6 | 15 | 0.75 | 0.82 × (1 + 0.15 × 0.6 × 0.75) = 0.875 |
| B | 0.85 | -0.3 | 8 | 0.40 | 0.85 × (1 + 0.15 × -0.3 × 0.40) = 0.835 |
| C | 0.80 | +0.8 | 25 | 1.00 | 0.80 × (1 + 0.15 × 0.8 × 1.0) = 0.896 |
Result: Chunk C (lower vector score but strong positive feedback) ranks first. Chunk B (highest vector score but negative feedback) drops to last.
Configuration
Configure in Settings > Feedback Rerank in the Agent Assist Portal:
| Setting | Default | Range | Description |
|---|---|---|---|
| Feedback Rerank Enabled | Off | on/off | Toggle feedback-adjusted scoring |
| Feedback Weight | 0.15 | 0.0 – 1.0 | How much feedback influences ranking. 0.0 = no influence, 1.0 = feedback dominates. |
| Max Influence | 20 | 1 – 100 | Vote count cap for confidence calculation. Higher = requires more votes to reach full confidence. |
TIP
Start with the defaults (15% weight, 20 max influence). Only increase the weight if you have high feedback volume and trust your agents' judgment. Setting it too high can cause feedback bias — popular but generic content may outrank specific, relevant content.
Feedback Score Calculation
Every time an agent clicks thumbs up or thumbs down on a source citation, the chunk's feedback score is updated.
Running Average
new_score = (old_score × old_count + vote_value) / (old_count + 1)Where vote_value = +1.0 for thumbs up, -1.0 for thumbs down.
The score is clamped to the range [-1.0, +1.0].
Example Progression
| Action | Score | Count |
|---|---|---|
| Initial state | 0.0 | 0 |
| Agent A: thumbs up | 1.0 | 1 |
| Agent B: thumbs down | 0.0 | 2 |
| Agent C: thumbs up | 0.333 | 3 |
| Agent D: thumbs up | 0.5 | 4 |
| Agent E: thumbs down | 0.2 | 5 |
The running average naturally smooths out individual opinions. A chunk needs sustained negative feedback to drop significantly.
Automatic Chunk Suppression
Chunks with consistently negative feedback are automatically suppressed — removed from search results entirely.
Suppression Rules
| Condition | Threshold | Action |
|---|---|---|
| Suppress | feedback_score ≤ -0.7 AND feedback_count ≥ 5 | Set embedding_status = 'suppressed' |
| Restore | feedback_score > -0.3 (after being suppressed) | Set embedding_status = 'completed' |
Why two thresholds? The suppress threshold (-0.7) is much stricter than the restore threshold (-0.3) to prevent chunks from flipping between states. A chunk must be significantly rehabilitated by positive votes before it returns to search results.
What Suppression Means
- The chunk is not deleted — its content, embedding, and metadata remain in the database
- It is invisible to search — the SQL WHERE clause filters out
embedding_status = 'suppressed' - It can be restored if feedback improves (e.g., new agents find it useful)
- It appears in the knowledge base document list (operations portal) with a "suppressed" indicator
When Suppression Helps
- An outdated policy document that agents keep rejecting
- A chunk that's technically accurate but never answers the actual question agents need
- Duplicate or low-quality content that dilutes search results
The Complete Feedback Loop
1. Agent sees suggestion with source citations
│
├─ Thumbs UP on source
│ └─ Socket event: "rag-source-feedback" (vote: "up")
│
└─ Thumbs DOWN on source
└─ Socket event: "rag-source-feedback" (vote: "down")
└─ Optional: reason code (irrelevant, incorrect, too generic, misleading)
└─ Optional: comment
│
▼
2. Connector service processes feedback
├─ Insert conversation_event (source_accepted / source_rejected)
│ └─ Stored with query, chunk_id, doc_id, vote, reason, comment
│
└─ Update chunk in database
├─ feedback_count += 1
├─ feedback_score = running_average(old, vote)
├─ Check suppression: score ≤ -0.7 AND count ≥ 5 → suppress
└─ Check restoration: score > -0.3 AND was suppressed → restore
│
▼
3. Next search for same query
├─ Vector search returns candidates
├─ Suppressed chunks excluded (SQL WHERE)
├─ Feedback boost applied (if enabled):
│ combined = vector_score × (1 + weight × feedback_score × confidence)
├─ Results re-sorted by combined_score
└─ Top K returned to agent
│
▼
4. Agent sees improved results
├─ Previously helpful chunks ranked higher
├─ Previously unhelpful chunks ranked lower or suppressed
└─ Cycle continues — each vote makes future results more relevantAnalytics & Monitoring
Feedback data is tracked in two places:
1. Conversation Events (Analytics)
Every vote is recorded as a source_accepted or source_rejected event with full metadata. This powers:
- Feedback Analytics view: Acceptance rates, rejection reasons, per-agent breakdown
- Gap Analysis: Queries with consistently rejected sources are flagged as knowledge gaps
- Source Performance: Which documents/chunks are most/least helpful
2. Chunk Model (Scoring)
The feedback_score and feedback_count on each chunk enable:
- Feedback-boosted reranking: Real-time score adjustment in search
- Auto-suppression: Removing consistently unhelpful content
- KB health monitoring: Identify chunks that need updating
Best Practices
Enable Feedback Reranking After 2 Weeks
Don't enable feedback reranking on day one — you need enough feedback data for it to be meaningful. Wait until you have at least 100 feedback events across your knowledge base, then enable with the default settings.
Monitor Suppressed Chunks
Regularly check the operations portal for suppressed chunks. They may indicate content that needs updating rather than removing. If a chunk about a valid topic keeps getting rejected, the content may be accurate but poorly written — rewrite it instead of leaving it suppressed.
Use Reason Codes
Encourage agents to select a reason when rejecting sources (irrelevant, incorrect, too generic, misleading). This data appears in Gap Analysis and helps you understand WHY content is failing, not just that it failed.
Don't Set Weight Too High
A feedback weight above 0.3 can cause popular-but-generic content to outrank specific, relevant content. The feedback signal is noisy — agents sometimes reject good content because they already knew the answer, not because the content was wrong.
