Theme
RAG Settings
RAG settings control how the retrieval pipeline searches your knowledge base and selects chunks to send to the LLM. Tuning these settings directly affects the quality and relevance of suggestions delivered to agents.
RAG settings panel showing similarity threshold slider, top-K selector, reranking toggle, and chunk overlap input
Retrieval Parameters
| Setting | Default | Range | Description |
|---|---|---|---|
| Similarity threshold | 0.7 | 0.0 - 1.0 | Minimum cosine similarity score for a chunk to be included in results. Chunks below this threshold are discarded. |
| Top-K results | 5 | 1 - 20 | Maximum number of chunks returned from the vector search before reranking. |
| Reranking enabled | true | - | When enabled, a cross-encoder reranker re-scores the top-K results for relevance before sending to the LLM. |
| Chunk overlap | 50 | 0 - 200 | Number of overlapping tokens between adjacent chunks. Higher overlap reduces the chance of splitting key information across chunks. |
Embedding Model Selection
The embedding model converts text into vector representations for similarity search. Choose the model when creating a knowledge base. Available models depend on your tenant's configured AI integrations.
| Consideration | Guidance |
|---|---|
| Accuracy | Larger embedding models generally produce more accurate similarity results |
| Speed | Smaller models generate embeddings faster, reducing indexing and query latency |
| Dimension | Higher-dimension embeddings capture more nuance but require more storage |
WARNING
Changing the embedding model requires a full re-index of all documents in the knowledge base. Schedule this during off-peak hours.
Reranking
When reranking is enabled, the pipeline applies a two-stage retrieval process:
- Stage 1 - Vector search: The top-K chunks are retrieved using fast approximate nearest neighbor search.
- Stage 2 - Cross-encoder reranking: A cross-encoder model re-scores each chunk against the original query for fine-grained relevance. The reranked results are sent to the LLM.
Reranking improves answer quality at the cost of slightly higher latency. For most deployments, the quality improvement justifies the additional processing time.
Feedback-Based Chunk Suppression
Agent Assist uses feedback signals to improve retrieval quality over time. When agents reject a suggestion (thumbs down), the system tracks which chunks contributed to that answer.
The suppression mechanism works as follows:
- An agent marks a suggestion as unhelpful.
- The system records negative feedback against each chunk that was used to generate the answer.
- Chunks that accumulate repeated negative feedback have their relevance scores penalized in future searches.
- Severely penalized chunks are effectively suppressed and no longer appear in results.
This creates a continuous improvement loop where agent feedback directly refines the quality of future suggestions.
TIP
Review suppressed chunks periodically in the Feedback Reports dashboard. Some chunks may have been suppressed due to outdated content that should be updated rather than permanently excluded.
Tuning Recommendations
| Scenario | Suggested Changes |
|---|---|
| Too many irrelevant suggestions | Increase similarity threshold to 0.8, enable reranking |
| Missing answers that should be found | Decrease similarity threshold to 0.6, increase top-K to 10 |
| Suggestions are too slow | Disable reranking, reduce top-K to 3 |
| Answers lack sufficient context | Increase chunk size and overlap in the KB settings |
