Tenant Configuration

Each Agent Assist tenant has a configuration that controls how the assist pipeline behaves. These settings are managed in the Settings page of the Agent Assist Portal.

Assist Mode

Mode	Description
Omni (default)	Uses the built-in RAG pipeline — classifier, vector search, and LLM answer generation with any supported AI model
Google Native	Uses Google CCAI natively with Dialogflow CX agents for suggestions

WARNING

Switching between assist modes requires reconfiguring your CCaaS integration. Plan mode changes during a maintenance window.

Knowledge & AI Settings

These settings control how the RAG pipeline searches and generates answers.

Setting	Range	Default	Description
Default Knowledge Base	—	None	The KB collection used when no queue-specific mapping exists
Default Prompt	—	None	The published system prompt used for answer generation. Can be overridden per queue in Queue Mapping.
Max Sources	1 – 20	5	Maximum number of source document chunks returned per query. Higher values provide more context but increase token usage and latency.
Confidence Threshold	0.0 – 1.0	0.7	Minimum vector similarity score for a chunk to be included in results. Chunks scoring below this are filtered out.
Max Tokens	—	4096	Maximum number of tokens the LLM can generate per answer
Context Mode	`rag` / `structured` / `agent_ready` / `minimal`	`rag`	Controls how much context is sent to the LLM. `rag` includes full KB chunks; `minimal` sends only the query.
System Prompt	Text	—	Custom instructions sent to the LLM for answer generation. Use this to set tone, formatting rules, or domain-specific guidance.

Classifier Settings

The classifier evaluates each customer utterance and decides whether it should trigger a knowledge base search.

Setting	Range	Default	Description
Classifier Threshold	0.0 – 1.0	0.5	Minimum confidence for the classifier to proceed with retrieval. If the classifier returns a confidence below this threshold, the message is treated as noise even if the LLM classified it as meaningful.

Adjusting the classifier threshold trades off between coverage and noise:

Lower threshold (0.2 – 0.4): More suggestions generated, but some may be triggered by irrelevant utterances
Default threshold (0.5): Balanced coverage and relevance
Higher threshold (0.7 – 0.9): Fewer suggestions, higher precision — only clear questions trigger search

TIP

Start with the default threshold and adjust based on agent feedback. The Classifier Analytics dashboard shows filter rates and confidence distributions to guide tuning.

Domain Keywords

Domain keywords help the classifier recognize industry-specific terminology that it might otherwise treat as noise. Navigate to Settings > Domain Keywords to manage them.

Keywords are organized into categories (e.g., "Medical Terms", "Product Names", "Legal Terms"). When the classifier encounters a domain keyword in an utterance, it receives a hint that boosts confidence, making it more likely to trigger a knowledge search.

TIP

Add domain-specific acronyms, product names, and jargon that general-purpose LLMs may not recognize. This is especially useful for technical support and regulated industries.

Prompt Caching

When enabled, the classifier system prompt is cached in Gemini's context cache. This reduces input token costs by approximately 90% on cached tokens.

Setting	Description
Prompt Caching Enabled	Toggle on/off. Requires Gemini 2.5 Flash or later.

TIP

Enable prompt caching for high-volume deployments. The classifier system prompt is the same for every request, so caching it dramatically reduces costs.

Session Settings

Setting	Range	Default	Description
Session TTL	5 min – 7 days	24 hours	How long a conversation session stays active in Redis without activity. After expiry, the session data is cleared.
Max Transcript History	10 – 200	50	Maximum number of transcript messages stored in conversation history. This history is sent as context to the classifier and LLM. Higher values provide more context but increase token usage.
Auto-Summary Interval	15 – 300 sec	60 sec	How often the conversation summary auto-refreshes in the agent widget. Set higher to reduce LLM token usage.

Caching

Response Cache

Caches full RAG answers so identical queries return instantly without calling the LLM again.

Setting	Range	Default	Description
Enabled	on/off	On	Toggle response caching
TTL	30 – 3600 sec	300 sec	How long a cached answer is valid

Classifier Cache

Caches classifier decisions so repeated utterances (e.g., "hello", "thank you") skip the LLM classifier call entirely.

Setting	Range	Default	Description
Enabled	on/off	Off	Toggle classifier caching
TTL	60 – 3600 sec	600 sec	How long a cached classification is valid

TIP

Enable the classifier cache for high-volume voice deployments where the same phrases appear frequently. This significantly reduces LLM token usage for noise filtering.

Coaching

AI coaching provides real-time guidance to agents during conversations. Configure coaching behavior under the Coaching section in Settings.

Setting	Range	Default	Description
Coaching Enabled	on/off	Off	Master toggle for the coaching engine
Coaching Mode	`deterministic` / `generative` / `hybrid`	`hybrid`	`deterministic`: playbook-based steps only. `generative`: AI-generated guidance from KB. `hybrid`: tries playbooks first, falls back to generative.
Confidence Threshold	0.0 – 1.0	0.75	Minimum confidence for a coaching suggestion to be shown to the agent
Max Active Playbooks	1 – 20	5	Maximum number of playbooks that can be evaluated simultaneously per conversation
Coaching KB	—	None	Knowledge base used for generative coaching (when no playbook matches)

When coaching is enabled, the coaching feature flag must also be turned on in the widget feature toggles for agents to see the coaching tab. See Widget Deployment.

Playbooks are managed under Coaching > Playbooks and assigned to queues under Coaching > Queue Assignments.

RAG Trigger Roles

Controls which participant roles automatically trigger knowledge base search from speech transcription.

Option	Description
HUMAN_AGENT (default)	Only agent utterances trigger search — useful when agents ask questions aloud to get help
END_USER	Only customer utterances trigger search
Both	Both agent and customer utterances trigger search

Service Toggles

Setting	Default	Description
AudioHook Enabled	Off	Enable real-time voice transcription via the AudioHook service
Middleware Enabled	Off	Enable the CCaaS middleware gateway for queue routing and conversation lifecycle

Recording

Setting	Range	Default	Description
Recording Retention	0 – 365 days	1 day	How long audio recordings from AudioHook streams are retained in cloud storage. Set to 0 to disable recording.

Budget

Setting	Description
Budget Threshold	Monthly spending cap in cents. You will be notified when estimated LLM cost exceeds this amount. Leave empty for no limit.

Next Steps

Widget Deployment — Deploy and brand the agent widget
CCaaS Integration — Connect to your contact center platform
Knowledge Bases — Set up knowledge base collections
Domain Keywords — Add industry-specific terminology

Tenant Configuration ​

Assist Mode ​

Knowledge & AI Settings ​

Classifier Settings ​

Domain Keywords ​

Prompt Caching ​

Session Settings ​

Caching ​

Response Cache ​

Classifier Cache ​

Coaching ​

RAG Trigger Roles ​

Service Toggles ​

Recording ​

Budget ​

Next Steps ​