Theme
Architecture
Agent Assist is composed of several services that work together to deliver real-time AI suggestions, coaching, summaries, and analysis to contact center agents. This page explains what each component does, how they communicate, and how the two operating modes differ.
System Overview
Agent Assist architecture diagram showing CCaaS platform, connector service, RAG service, widget, portal, and database
┌─────────────────┐ Pub/Sub ┌──────────────────────────┐ HTTP ┌──────────────────┐
│ CCaaS Platform │ ──────────────> │ assist-connector-service │ <──────────────── │ agent-assist- │
│ (Genesys, 5nine │ │ (Socket.IO hub) │ │ portal (admin) │
│ 8x8, SF, etc.) │ └────────────┬─────────────┘ └──────────────────┘
└─────────────────┘ │
Socket.IO │
│
┌──────────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌───────────────────┐ ┌──────────────────┐
│ rag-service │ │ agent-assist-widget│ │ AlloyDB + Redis │
│ (classifier + │ │ (Vue 3 frontend) │ │ (persistence + │
│ search + LLM) │ └───────────────────┘ │ session/cache) │
└─────────────────┘ └──────────────────┘
▲
┌─────────────────┐ ┌───────────────────┐ │
│ assist-audiohook │ │ assist-middleware │ ───────────┘
│ (voice STT) │ │ (CCaaS gateway) │
└─────────────────┘ └───────────────────┘Services
| Service | Role | Technology |
|---|---|---|
| assist-connector-service | Central hub. Receives CCaaS events via Pub/Sub, runs the RAG pipeline in omni mode, manages sessions, orchestrates coaching, and pushes suggestions to the widget over Socket.IO. | FastAPI + python-socketio |
| rag-service | Handles utterance classification, document embedding, vector search (pgvector), and LLM answer streaming. Called by the connector service for each qualifying message. | FastAPI + pgvector |
| assist-audiohook-service | Streams real-time audio from voice calls, transcribes speech via STT, and forwards transcripts to the connector service via Pub/Sub for processing. | FastAPI + WebSocket |
| assist-middleware-service | Genesys Cloud integration gateway. Manages CCaaS connections, queue-to-KB mappings, and conversation lifecycle bridging. | FastAPI |
| agent-assist-widget | Vue 3 frontend embedded in the agent desktop (Genesys, Salesforce, 8x8, Five9, Google CCaaS, or standalone). Connects to the connector service over Socket.IO. Renders suggestions, transcript, summary, analysis, and coaching panels. | Vue 3 + Vite |
| agent-assist-portal | Vue 3 admin dashboard for configuration, analytics, coaching playbook management, knowledge base administration, and service health monitoring. | Vue 3 + PrimeVue + Vite |
| Dialogflow CCAI | Google-managed suggestion engine used in google_native mode. Publishes suggestion events to Pub/Sub, which the connector service relays to the widget. | Google Cloud |
Operating Modes
Agent Assist supports two modes, configured per tenant via the assist_mode setting.
Omni Mode (omni)
In omni mode, the connector service runs the full RAG pipeline locally:
- Classify -- A lightweight LLM classifier determines whether a knowledge base lookup is warranted, extracts a clean query, and optionally generates quick replies for noise messages.
- Search -- The RAG service performs a vector similarity search against the tenant's knowledge bases, returning the top-k relevant document chunks with feedback-based reranking.
- Stream -- An LLM generates a suggestion grounded in the retrieved chunks. The answer streams chunk-by-chunk over Socket.IO so the agent sees it progressively.
- Deliver -- Source citations, follow-up questions, and quick replies are delivered alongside the answer.
- Coach -- If coaching is enabled, the coaching engine evaluates the utterance against active playbooks or generates AI-based guidance.
Customer message
→ Pub/Sub event (or audiohook transcript)
→ assist-connector-service
→ Check classifier cache (Redis)
→ rag-service /retrieve (classify + vector search)
→ rag-service /stream (LLM answer via SSE)
→ Socket.IO → widget (streaming chunks)
→ Coaching engine (async, parallel)
→ Socket.IO → widget (coaching suggestion)When to use omni mode
Use omni mode when you want full control over the knowledge sources, embedding models, and LLM used for answer synthesis. This is the default and recommended mode for most deployments.
Google Native Mode (google_native)
In google_native mode, Dialogflow CCAI handles suggestion generation. The connector service acts as a pass-through:
- Dialogflow processes the conversation transcript and generates suggestions using its own knowledge connectors.
- Pub/Sub delivers suggestion events to the connector service.
- Connector relays the suggestions to the widget over Socket.IO without modification.
Customer message
→ Dialogflow CCAI (processes + generates suggestion)
→ Pub/Sub event
→ assist-connector-service (relay)
→ Socket.IO → widgetLimitations of google_native mode
In google_native mode, you cannot use OmniBots knowledge bases or customize the RAG pipeline. All suggestion logic is managed by Google Dialogflow CCAI. Feedback analytics are still collected but do not influence suggestion quality directly.
Event Flow
All communication between the CCaaS platform and the agent widget passes through the connector service. Here is the detailed event flow for a typical interaction.
1. Connection & Session
When an agent opens the widget:
- The widget authenticates via embed token + provider OAuth (e.g., Genesys PKCE, Salesforce token exchange)
- The connector service validates credentials and issues a session JWT
- The widget connects via Socket.IO and emits
join-conversationwith the conversation name - The connector service resolves session config (KB IDs, system prompt, LLM integration) from queue mappings or deployment key
- The widget receives
conversation-joinedandsession-updatedevents
2. Customer Message (Omni Mode)
When the customer sends a message or the audiohook service transcribes speech:
- A
NEW_RECOGNITION_RESULTorNEW_MESSAGEevent arrives via Pub/Sub - The connector service checks the classifier cache (Redis, 10-min TTL)
- On cache miss: calls rag-service
/retrievewhich runs classification + vector search in parallel - If classified as noise: emits
rag-quickreplies-eventwith quick replies and returns - If meaningful: emits
rag-content-eventwith sources, then streams answer chunks asrag-content-eventmessages - On stream complete: emits
rag-complete-eventwith the final answer, sources, and timing metrics - Follow-up questions are emitted as
rag-followup-event - Results are cached in Redis (5-min TTL) for identical queries
3. Agent Feedback
When an agent rates a suggestion or source:
- The widget emits
rag-answer-feedback(thumbs up/down with optional reason code and comment) orrag-source-feedback(per-source rating) - The connector service persists feedback to AlloyDB
- Source feedback updates
kb_documents.feedback_score-- chunks with low scores are automatically suppressed from future results - The widget receives a
feedback-receivedacknowledgement
4. Coaching
When coaching is enabled for the tenant:
- On each customer utterance, the coaching engine evaluates against active playbooks (async, parallel to RAG)
- Deterministic mode: Matches playbook conditions and emits step-by-step guidance
- Generative mode: Uses LLM + KB context to generate situation-aware coaching
- Hybrid mode: Tries playbook match first, falls back to generative
- The widget receives
coach-suggestionandcoach-step-updateevents - Agents provide coaching feedback which is tracked for playbook effectiveness
5. Summary & Analysis
On-demand or auto-refreshing:
- The widget or portal requests a summary via
POST /conversations/{id}/summary - The connector service builds a transcript from stored messages and calls an LLM to generate situation/action/next-steps
- Analysis (
POST /conversations/{id}/analysis) generates customer sentiment, conversation quality scores, key topics, talk time breakdown, and compliance risk assessment - Results are stored in
assist_conversation_analysesand returned to the widget
6. Conversation End
When the conversation closes:
- A
leave-conversationevent is emitted (or disconnect detected) - The connector service marks the conversation as completed in AlloyDB
- If applicable, triggers export to CCAI Insights for reporting
- Session state is cleaned up from Redis
Audio Streaming
For voice conversations, the assist-audiohook-service provides real-time transcription:
| Stage | Description |
|---|---|
| Audio capture | The CCaaS platform streams audio via WebSocket (typically 16kHz PCM) |
| Transcription | The audiohook service uses a speech-to-text engine to produce interim and final transcripts |
| PII redaction | Transcripts are passed through the PII redactor before storage |
| Forwarding | Final transcripts are published to Pub/Sub and received by the connector service |
| Processing | The connector service runs the same RAG pipeline used for chat messages |
TIP
The audiohook service supports the Genesys AudioHook protocol and generic WebSocket audio streams. Configure the audio format and encoding in the integration settings.
Data Storage
| Data | Storage | Retention |
|---|---|---|
| Conversation sessions & messages | AlloyDB | Configurable per tenant (default 90 days) |
| Suggestions, feedback, and analyses | AlloyDB | Same as conversation retention |
| Document embeddings | AlloyDB with pgvector | Persisted until document is deleted |
| Real-time session state | Redis | Duration of conversation + 1 hour TTL |
| Classifier cache | Redis | 10-minute TTL |
| RAG result cache | Redis | 5-minute TTL |
| Coaching step state | Redis | Duration of conversation |
| Widget translations | Redis | 24-hour TTL |
| Audio streams | Not persisted | Processed in real-time, discarded after transcription |
| LLM token usage | AlloyDB (ai_usage_records) | Indefinite (billing records) |
Security
- All service-to-service communication uses internal networking (no public endpoints for backend services)
- The widget authenticates via provider-specific OAuth (Genesys PKCE, Salesforce token exchange, etc.) followed by a tenant-scoped JWT
- Embed tokens are signed JWTs with allowed origin restrictions, validated on every bootstrap request
- Pub/Sub subscriptions use GCP IAM service accounts
- PII in conversation transcripts is redacted before storage using configurable redaction rules
- Tenant isolation is enforced on every database query via
tenant_idfiltering - Feedback-based chunk suppression prevents low-quality sources from surfacing
Key Files
| File | Description |
|---|---|
backend/services/assist-connector-service/app/socket_handlers.py | Socket.IO event registration |
backend/services/assist-connector-service/app/handlers/rag_trigger.py | RAG pipeline: classify → retrieve → stream → deliver |
backend/services/assist-connector-service/app/handlers/coach_engine.py | Coaching orchestration (deterministic, generative, hybrid) |
backend/services/assist-connector-service/app/handlers/session.py | Session config resolution (queue, deployment key, provider) |
backend/services/assist-connector-service/app/handlers/feedback.py | Answer and source feedback persistence |
backend/services/assist-connector-service/app/services/session_manager.py | Redis-backed session and transcript history |
backend/services/assist-connector-service/app/services/pubsub_subscriber.py | Pub/Sub event routing and PII redaction |
backend/services/rag-service/app/routes/assist.py | /retrieve and /stream endpoints |
backend/services/rag-service/app/services/assist_classifier.py | LLM-based utterance classifier |
frontend/agent-assist-widget/src/views/WidgetView.vue | Main widget layout (transcript, suggestions, summary, analysis, coaching) |
frontend/agent-assist-widget/src/services/socket.ts | Socket.IO client event handling |
frontend/agent-assist-portal/src/views/DashboardView.vue | Admin analytics dashboard |
Next Steps
- Quick Start -- Set up Agent Assist end-to-end
- Administration -- Configure tenants, knowledge bases, and integrations
- Agent Guide -- How agents interact with the widget panels
- Supervisor Guide -- Monitor and optimize assist performance
