Theme
Tenant Configuration
Each Agent Assist tenant has a configuration that controls how the assist pipeline behaves. These settings are managed in the Settings page of the Agent Assist Portal.
Assist Mode
| Mode | Description |
|---|---|
| Omni (default) | Uses the built-in RAG pipeline — classifier, vector search, and LLM answer generation with any supported AI model |
| Google Native | Uses Google CCAI natively with Dialogflow CX agents for suggestions |
WARNING
Switching between assist modes requires reconfiguring your CCaaS integration. Plan mode changes during a maintenance window.
Knowledge & AI Settings
These settings control how the RAG pipeline searches and generates answers.
| Setting | Range | Default | Description |
|---|---|---|---|
| Default Knowledge Base | — | None | The KB collection used when no queue-specific mapping exists |
| Default Prompt | — | None | The published system prompt used for answer generation. Can be overridden per queue in Queue Mapping. |
| Max Sources | 1 – 20 | 5 | Maximum number of source document chunks returned per query. Higher values provide more context but increase token usage and latency. |
| Confidence Threshold | 0.0 – 1.0 | 0.7 | Minimum vector similarity score for a chunk to be included in results. Chunks scoring below this are filtered out. |
| Max Tokens | — | 4096 | Maximum number of tokens the LLM can generate per answer |
| Context Mode | rag / structured / agent_ready / minimal | rag | Controls how much context is sent to the LLM. rag includes full KB chunks; minimal sends only the query. |
| System Prompt | Text | — | Custom instructions sent to the LLM for answer generation. Use this to set tone, formatting rules, or domain-specific guidance. |
Classifier Settings
The classifier evaluates each customer utterance and decides whether it should trigger a knowledge base search.
| Setting | Range | Default | Description |
|---|---|---|---|
| Classifier Threshold | 0.0 – 1.0 | 0.5 | Minimum confidence for the classifier to proceed with retrieval. If the classifier returns a confidence below this threshold, the message is treated as noise even if the LLM classified it as meaningful. |
Adjusting the classifier threshold trades off between coverage and noise:
- Lower threshold (0.2 – 0.4): More suggestions generated, but some may be triggered by irrelevant utterances
- Default threshold (0.5): Balanced coverage and relevance
- Higher threshold (0.7 – 0.9): Fewer suggestions, higher precision — only clear questions trigger search
TIP
Start with the default threshold and adjust based on agent feedback. The Classifier Analytics dashboard shows filter rates and confidence distributions to guide tuning.
Domain Keywords
Domain keywords help the classifier recognize industry-specific terminology that it might otherwise treat as noise. Navigate to Settings > Domain Keywords to manage them.
Keywords are organized into categories (e.g., "Medical Terms", "Product Names", "Legal Terms"). When the classifier encounters a domain keyword in an utterance, it receives a hint that boosts confidence, making it more likely to trigger a knowledge search.
TIP
Add domain-specific acronyms, product names, and jargon that general-purpose LLMs may not recognize. This is especially useful for technical support and regulated industries.
Prompt Caching
When enabled, the classifier system prompt is cached in Gemini's context cache. This reduces input token costs by approximately 90% on cached tokens.
| Setting | Description |
|---|---|
| Prompt Caching Enabled | Toggle on/off. Requires Gemini 2.5 Flash or later. |
TIP
Enable prompt caching for high-volume deployments. The classifier system prompt is the same for every request, so caching it dramatically reduces costs.
Session Settings
| Setting | Range | Default | Description |
|---|---|---|---|
| Session TTL | 5 min – 7 days | 24 hours | How long a conversation session stays active in Redis without activity. After expiry, the session data is cleared. |
| Max Transcript History | 10 – 200 | 50 | Maximum number of transcript messages stored in conversation history. This history is sent as context to the classifier and LLM. Higher values provide more context but increase token usage. |
| Auto-Summary Interval | 15 – 300 sec | 60 sec | How often the conversation summary auto-refreshes in the agent widget. Set higher to reduce LLM token usage. |
Caching
Response Cache
Caches full RAG answers so identical queries return instantly without calling the LLM again.
| Setting | Range | Default | Description |
|---|---|---|---|
| Enabled | on/off | On | Toggle response caching |
| TTL | 30 – 3600 sec | 300 sec | How long a cached answer is valid |
Classifier Cache
Caches classifier decisions so repeated utterances (e.g., "hello", "thank you") skip the LLM classifier call entirely.
| Setting | Range | Default | Description |
|---|---|---|---|
| Enabled | on/off | Off | Toggle classifier caching |
| TTL | 60 – 3600 sec | 600 sec | How long a cached classification is valid |
TIP
Enable the classifier cache for high-volume voice deployments where the same phrases appear frequently. This significantly reduces LLM token usage for noise filtering.
Coaching
AI coaching provides real-time guidance to agents during conversations. Configure coaching behavior under the Coaching section in Settings.
| Setting | Range | Default | Description |
|---|---|---|---|
| Coaching Enabled | on/off | Off | Master toggle for the coaching engine |
| Coaching Mode | deterministic / generative / hybrid | hybrid | deterministic: playbook-based steps only. generative: AI-generated guidance from KB. hybrid: tries playbooks first, falls back to generative. |
| Confidence Threshold | 0.0 – 1.0 | 0.75 | Minimum confidence for a coaching suggestion to be shown to the agent |
| Max Active Playbooks | 1 – 20 | 5 | Maximum number of playbooks that can be evaluated simultaneously per conversation |
| Coaching KB | — | None | Knowledge base used for generative coaching (when no playbook matches) |
When coaching is enabled, the coaching feature flag must also be turned on in the widget feature toggles for agents to see the coaching tab. See Widget Deployment.
Playbooks are managed under Coaching > Playbooks and assigned to queues under Coaching > Queue Assignments.
RAG Trigger Roles
Controls which participant roles automatically trigger knowledge base search from speech transcription.
| Option | Description |
|---|---|
| HUMAN_AGENT (default) | Only agent utterances trigger search — useful when agents ask questions aloud to get help |
| END_USER | Only customer utterances trigger search |
| Both | Both agent and customer utterances trigger search |
Service Toggles
| Setting | Default | Description |
|---|---|---|
| AudioHook Enabled | Off | Enable real-time voice transcription via the AudioHook service |
| Middleware Enabled | Off | Enable the CCaaS middleware gateway for queue routing and conversation lifecycle |
Recording
| Setting | Range | Default | Description |
|---|---|---|---|
| Recording Retention | 0 – 365 days | 1 day | How long audio recordings from AudioHook streams are retained in cloud storage. Set to 0 to disable recording. |
Budget
| Setting | Description |
|---|---|
| Budget Threshold | Monthly spending cap in cents. You will be notified when estimated LLM cost exceeds this amount. Leave empty for no limit. |
Next Steps
- Widget Deployment — Deploy and brand the agent widget
- CCaaS Integration — Connect to your contact center platform
- Knowledge Bases — Set up knowledge base collections
- Domain Keywords — Add industry-specific terminology
