Skip to content

Tenant Configuration

Each Agent Assist tenant has a configuration that controls how the assist pipeline behaves. These settings are managed in the Settings page of the Agent Assist Portal.

Assist Mode

ModeDescription
Omni (default)Uses the built-in RAG pipeline — classifier, vector search, and LLM answer generation with any supported AI model
Google NativeUses Google CCAI natively with Dialogflow CX agents for suggestions

WARNING

Switching between assist modes requires reconfiguring your CCaaS integration. Plan mode changes during a maintenance window.

Knowledge & AI Settings

These settings control how the RAG pipeline searches and generates answers.

SettingRangeDefaultDescription
Default Knowledge BaseNoneThe KB collection used when no queue-specific mapping exists
Default PromptNoneThe published system prompt used for answer generation. Can be overridden per queue in Queue Mapping.
Max Sources1 – 205Maximum number of source document chunks returned per query. Higher values provide more context but increase token usage and latency.
Confidence Threshold0.0 – 1.00.7Minimum vector similarity score for a chunk to be included in results. Chunks scoring below this are filtered out.
Max Tokens4096Maximum number of tokens the LLM can generate per answer
Context Moderag / structured / agent_ready / minimalragControls how much context is sent to the LLM. rag includes full KB chunks; minimal sends only the query.
System PromptTextCustom instructions sent to the LLM for answer generation. Use this to set tone, formatting rules, or domain-specific guidance.

Classifier Settings

The classifier evaluates each customer utterance and decides whether it should trigger a knowledge base search.

SettingRangeDefaultDescription
Classifier Threshold0.0 – 1.00.5Minimum confidence for the classifier to proceed with retrieval. If the classifier returns a confidence below this threshold, the message is treated as noise even if the LLM classified it as meaningful.

Adjusting the classifier threshold trades off between coverage and noise:

  • Lower threshold (0.2 – 0.4): More suggestions generated, but some may be triggered by irrelevant utterances
  • Default threshold (0.5): Balanced coverage and relevance
  • Higher threshold (0.7 – 0.9): Fewer suggestions, higher precision — only clear questions trigger search

TIP

Start with the default threshold and adjust based on agent feedback. The Classifier Analytics dashboard shows filter rates and confidence distributions to guide tuning.

Domain Keywords

Domain keywords help the classifier recognize industry-specific terminology that it might otherwise treat as noise. Navigate to Settings > Domain Keywords to manage them.

Keywords are organized into categories (e.g., "Medical Terms", "Product Names", "Legal Terms"). When the classifier encounters a domain keyword in an utterance, it receives a hint that boosts confidence, making it more likely to trigger a knowledge search.

TIP

Add domain-specific acronyms, product names, and jargon that general-purpose LLMs may not recognize. This is especially useful for technical support and regulated industries.

Prompt Caching

When enabled, the classifier system prompt is cached in Gemini's context cache. This reduces input token costs by approximately 90% on cached tokens.

SettingDescription
Prompt Caching EnabledToggle on/off. Requires Gemini 2.5 Flash or later.

TIP

Enable prompt caching for high-volume deployments. The classifier system prompt is the same for every request, so caching it dramatically reduces costs.

Session Settings

SettingRangeDefaultDescription
Session TTL5 min – 7 days24 hoursHow long a conversation session stays active in Redis without activity. After expiry, the session data is cleared.
Max Transcript History10 – 20050Maximum number of transcript messages stored in conversation history. This history is sent as context to the classifier and LLM. Higher values provide more context but increase token usage.
Auto-Summary Interval15 – 300 sec60 secHow often the conversation summary auto-refreshes in the agent widget. Set higher to reduce LLM token usage.

Caching

Response Cache

Caches full RAG answers so identical queries return instantly without calling the LLM again.

SettingRangeDefaultDescription
Enabledon/offOnToggle response caching
TTL30 – 3600 sec300 secHow long a cached answer is valid

Classifier Cache

Caches classifier decisions so repeated utterances (e.g., "hello", "thank you") skip the LLM classifier call entirely.

SettingRangeDefaultDescription
Enabledon/offOffToggle classifier caching
TTL60 – 3600 sec600 secHow long a cached classification is valid

TIP

Enable the classifier cache for high-volume voice deployments where the same phrases appear frequently. This significantly reduces LLM token usage for noise filtering.

Coaching

AI coaching provides real-time guidance to agents during conversations. Configure coaching behavior under the Coaching section in Settings.

SettingRangeDefaultDescription
Coaching Enabledon/offOffMaster toggle for the coaching engine
Coaching Modedeterministic / generative / hybridhybriddeterministic: playbook-based steps only. generative: AI-generated guidance from KB. hybrid: tries playbooks first, falls back to generative.
Confidence Threshold0.0 – 1.00.75Minimum confidence for a coaching suggestion to be shown to the agent
Max Active Playbooks1 – 205Maximum number of playbooks that can be evaluated simultaneously per conversation
Coaching KBNoneKnowledge base used for generative coaching (when no playbook matches)

When coaching is enabled, the coaching feature flag must also be turned on in the widget feature toggles for agents to see the coaching tab. See Widget Deployment.

Playbooks are managed under Coaching > Playbooks and assigned to queues under Coaching > Queue Assignments.

RAG Trigger Roles

Controls which participant roles automatically trigger knowledge base search from speech transcription.

OptionDescription
HUMAN_AGENT (default)Only agent utterances trigger search — useful when agents ask questions aloud to get help
END_USEROnly customer utterances trigger search
BothBoth agent and customer utterances trigger search

Service Toggles

SettingDefaultDescription
AudioHook EnabledOffEnable real-time voice transcription via the AudioHook service
Middleware EnabledOffEnable the CCaaS middleware gateway for queue routing and conversation lifecycle

Recording

SettingRangeDefaultDescription
Recording Retention0 – 365 days1 dayHow long audio recordings from AudioHook streams are retained in cloud storage. Set to 0 to disable recording.

Budget

SettingDescription
Budget ThresholdMonthly spending cap in cents. You will be notified when estimated LLM cost exceeds this amount. Leave empty for no limit.

Next Steps

OmniBots Agent Assist