Skip to content

RAG Settings

RAG settings control how the retrieval pipeline searches your knowledge base and selects chunks to send to the LLM. Tuning these settings directly affects the quality and relevance of suggestions delivered to agents.

imageRAG settings panel showing similarity threshold slider, top-K selector, reranking toggle, and chunk overlap input
Configuring RAG retrieval parameters

Retrieval Parameters

SettingDefaultRangeDescription
Similarity threshold0.70.0 - 1.0Minimum cosine similarity score for a chunk to be included in results. Chunks below this threshold are discarded.
Top-K results51 - 20Maximum number of chunks returned from the vector search before reranking.
Reranking enabledtrue-When enabled, a cross-encoder reranker re-scores the top-K results for relevance before sending to the LLM.
Chunk overlap500 - 200Number of overlapping tokens between adjacent chunks. Higher overlap reduces the chance of splitting key information across chunks.

Embedding Model Selection

The embedding model converts text into vector representations for similarity search. Choose the model when creating a knowledge base. Available models depend on your tenant's configured AI integrations.

ConsiderationGuidance
AccuracyLarger embedding models generally produce more accurate similarity results
SpeedSmaller models generate embeddings faster, reducing indexing and query latency
DimensionHigher-dimension embeddings capture more nuance but require more storage

WARNING

Changing the embedding model requires a full re-index of all documents in the knowledge base. Schedule this during off-peak hours.

Reranking

When reranking is enabled, the pipeline applies a two-stage retrieval process:

  1. Stage 1 - Vector search: The top-K chunks are retrieved using fast approximate nearest neighbor search.
  2. Stage 2 - Cross-encoder reranking: A cross-encoder model re-scores each chunk against the original query for fine-grained relevance. The reranked results are sent to the LLM.

Reranking improves answer quality at the cost of slightly higher latency. For most deployments, the quality improvement justifies the additional processing time.

Feedback-Based Chunk Suppression

Agent Assist uses feedback signals to improve retrieval quality over time. When agents reject a suggestion (thumbs down), the system tracks which chunks contributed to that answer.

The suppression mechanism works as follows:

  1. An agent marks a suggestion as unhelpful.
  2. The system records negative feedback against each chunk that was used to generate the answer.
  3. Chunks that accumulate repeated negative feedback have their relevance scores penalized in future searches.
  4. Severely penalized chunks are effectively suppressed and no longer appear in results.

This creates a continuous improvement loop where agent feedback directly refines the quality of future suggestions.

TIP

Review suppressed chunks periodically in the Feedback Reports dashboard. Some chunks may have been suppressed due to outdated content that should be updated rather than permanently excluded.

Tuning Recommendations

ScenarioSuggested Changes
Too many irrelevant suggestionsIncrease similarity threshold to 0.8, enable reranking
Missing answers that should be foundDecrease similarity threshold to 0.6, increase top-K to 10
Suggestions are too slowDisable reranking, reduce top-K to 3
Answers lack sufficient contextIncrease chunk size and overlap in the KB settings

OmniBots Agent Assist