RAG Settings

RAG settings control how the retrieval pipeline searches your knowledge base and selects chunks to send to the LLM. Tuning these settings directly affects the quality and relevance of suggestions delivered to agents.

imageRAG settings panel showing similarity threshold slider, top-K selector, reranking toggle, and chunk overlap input

Configuring RAG retrieval parameters

Retrieval Parameters

Setting	Default	Range	Description
Similarity threshold	`0.7`	0.0 - 1.0	Minimum cosine similarity score for a chunk to be included in results. Chunks below this threshold are discarded.
Top-K results	`5`	1 - 20	Maximum number of chunks returned from the vector search before reranking.
Reranking enabled	`true`	-	When enabled, a cross-encoder reranker re-scores the top-K results for relevance before sending to the LLM.
Chunk overlap	`50`	0 - 200	Number of overlapping tokens between adjacent chunks. Higher overlap reduces the chance of splitting key information across chunks.

Embedding Model Selection

The embedding model converts text into vector representations for similarity search. Choose the model when creating a knowledge base. Available models depend on your tenant's configured AI integrations.

Consideration	Guidance
Accuracy	Larger embedding models generally produce more accurate similarity results
Speed	Smaller models generate embeddings faster, reducing indexing and query latency
Dimension	Higher-dimension embeddings capture more nuance but require more storage

WARNING

Changing the embedding model requires a full re-index of all documents in the knowledge base. Schedule this during off-peak hours.

Reranking

When reranking is enabled, the pipeline applies a two-stage retrieval process:

Stage 1 - Vector search: The top-K chunks are retrieved using fast approximate nearest neighbor search.
Stage 2 - Cross-encoder reranking: A cross-encoder model re-scores each chunk against the original query for fine-grained relevance. The reranked results are sent to the LLM.

Reranking improves answer quality at the cost of slightly higher latency. For most deployments, the quality improvement justifies the additional processing time.

Feedback-Based Chunk Suppression

Agent Assist uses feedback signals to improve retrieval quality over time. When agents reject a suggestion (thumbs down), the system tracks which chunks contributed to that answer.

The suppression mechanism works as follows:

An agent marks a suggestion as unhelpful.
The system records negative feedback against each chunk that was used to generate the answer.
Chunks that accumulate repeated negative feedback have their relevance scores penalized in future searches.
Severely penalized chunks are effectively suppressed and no longer appear in results.

This creates a continuous improvement loop where agent feedback directly refines the quality of future suggestions.

TIP

Review suppressed chunks periodically in the Feedback Reports dashboard. Some chunks may have been suppressed due to outdated content that should be updated rather than permanently excluded.

Tuning Recommendations

Scenario	Suggested Changes
Too many irrelevant suggestions	Increase similarity threshold to 0.8, enable reranking
Missing answers that should be found	Decrease similarity threshold to 0.6, increase top-K to 10
Suggestions are too slow	Disable reranking, reduce top-K to 3
Answers lack sufficient context	Increase chunk size and overlap in the KB settings

RAG Settings ​

Retrieval Parameters ​

Embedding Model Selection ​

Reranking ​

Feedback-Based Chunk Suppression ​