Architecture

Agent Assist is composed of several services that work together to deliver real-time AI suggestions, coaching, summaries, and analysis to contact center agents. This page explains what each component does, how they communicate, and how the two operating modes differ.

System Overview

imageAgent Assist architecture diagram showing CCaaS platform, connector service, RAG service, widget, portal, and database

High-level architecture of Agent Assist

┌─────────────────┐     Pub/Sub      ┌──────────────────────────┐       HTTP        ┌──────────────────┐
│  CCaaS Platform  │ ──────────────> │  assist-connector-service │ <──────────────── │  agent-assist-    │
│  (Genesys, 5nine │                  │  (Socket.IO hub)          │                   │  portal (admin)   │
│   8x8, SF, etc.) │                  └────────────┬─────────────┘                   └──────────────────┘
└─────────────────┘                                │
                                         Socket.IO │
                                                   │
                        ┌──────────────────────────┼──────────────────────┐
                        │                          │                      │
                        ▼                          ▼                      ▼
               ┌─────────────────┐     ┌───────────────────┐   ┌──────────────────┐
               │   rag-service    │     │ agent-assist-widget│   │  AlloyDB + Redis  │
               │ (classifier +    │     │ (Vue 3 frontend)   │   │  (persistence +   │
               │  search + LLM)   │     └───────────────────┘   │   session/cache)   │
               └─────────────────┘                              └──────────────────┘
                                                                         ▲
               ┌─────────────────┐     ┌───────────────────┐             │
               │ assist-audiohook │     │ assist-middleware  │ ───────────┘
               │ (voice STT)      │     │ (CCaaS gateway)    │
               └─────────────────┘     └───────────────────┘

Services

Service	Role	Technology
assist-connector-service	Central hub. Receives CCaaS events via Pub/Sub, runs the RAG pipeline in `omni` mode, manages sessions, orchestrates coaching, and pushes suggestions to the widget over Socket.IO.	FastAPI + python-socketio
rag-service	Handles utterance classification, document embedding, vector search (pgvector), and LLM answer streaming. Called by the connector service for each qualifying message.	FastAPI + pgvector
assist-audiohook-service	Streams real-time audio from voice calls, transcribes speech via STT, and forwards transcripts to the connector service via Pub/Sub for processing.	FastAPI + WebSocket
assist-middleware-service	Genesys Cloud integration gateway. Manages CCaaS connections, queue-to-KB mappings, and conversation lifecycle bridging.	FastAPI
agent-assist-widget	Vue 3 frontend embedded in the agent desktop (Genesys, Salesforce, 8x8, Five9, Google CCaaS, or standalone). Connects to the connector service over Socket.IO. Renders suggestions, transcript, summary, analysis, and coaching panels.	Vue 3 + Vite
agent-assist-portal	Vue 3 admin dashboard for configuration, analytics, coaching playbook management, knowledge base administration, and service health monitoring.	Vue 3 + PrimeVue + Vite
Dialogflow CCAI	Google-managed suggestion engine used in `google_native` mode. Publishes suggestion events to Pub/Sub, which the connector service relays to the widget.	Google Cloud

Operating Modes

Agent Assist supports two modes, configured per tenant via the assist_mode setting.

Omni Mode (`omni`)

In omni mode, the connector service runs the full RAG pipeline locally:

Classify -- A lightweight LLM classifier determines whether a knowledge base lookup is warranted, extracts a clean query, and optionally generates quick replies for noise messages.
Search -- The RAG service performs a vector similarity search against the tenant's knowledge bases, returning the top-k relevant document chunks with feedback-based reranking.
Stream -- An LLM generates a suggestion grounded in the retrieved chunks. The answer streams chunk-by-chunk over Socket.IO so the agent sees it progressively.
Deliver -- Source citations, follow-up questions, and quick replies are delivered alongside the answer.
Coach -- If coaching is enabled, the coaching engine evaluates the utterance against active playbooks or generates AI-based guidance.

Customer message
  → Pub/Sub event (or audiohook transcript)
    → assist-connector-service
      → Check classifier cache (Redis)
        → rag-service /retrieve (classify + vector search)
          → rag-service /stream (LLM answer via SSE)
            → Socket.IO → widget (streaming chunks)
      → Coaching engine (async, parallel)
        → Socket.IO → widget (coaching suggestion)

When to use omni mode

Use omni mode when you want full control over the knowledge sources, embedding models, and LLM used for answer synthesis. This is the default and recommended mode for most deployments.

Google Native Mode (`google_native`)

In google_native mode, Dialogflow CCAI handles suggestion generation. The connector service acts as a pass-through:

Dialogflow processes the conversation transcript and generates suggestions using its own knowledge connectors.
Pub/Sub delivers suggestion events to the connector service.
Connector relays the suggestions to the widget over Socket.IO without modification.

Customer message
  → Dialogflow CCAI (processes + generates suggestion)
    → Pub/Sub event
      → assist-connector-service (relay)
        → Socket.IO → widget

Limitations of google_native mode

In google_native mode, you cannot use OmniBots knowledge bases or customize the RAG pipeline. All suggestion logic is managed by Google Dialogflow CCAI. Feedback analytics are still collected but do not influence suggestion quality directly.

Event Flow

All communication between the CCaaS platform and the agent widget passes through the connector service. Here is the detailed event flow for a typical interaction.

1. Connection & Session

When an agent opens the widget:

The widget authenticates via embed token + provider OAuth (e.g., Genesys PKCE, Salesforce token exchange)
The connector service validates credentials and issues a session JWT
The widget connects via Socket.IO and emits join-conversation with the conversation name
The connector service resolves session config (KB IDs, system prompt, LLM integration) from queue mappings or deployment key
The widget receives conversation-joined and session-updated events

2. Customer Message (Omni Mode)

When the customer sends a message or the audiohook service transcribes speech:

A NEW_RECOGNITION_RESULT or NEW_MESSAGE event arrives via Pub/Sub
The connector service checks the classifier cache (Redis, 10-min TTL)
On cache miss: calls rag-service /retrieve which runs classification + vector search in parallel
If classified as noise: emits rag-quickreplies-event with quick replies and returns
If meaningful: emits rag-content-event with sources, then streams answer chunks as rag-content-event messages
On stream complete: emits rag-complete-event with the final answer, sources, and timing metrics
Follow-up questions are emitted as rag-followup-event
Results are cached in Redis (5-min TTL) for identical queries

3. Agent Feedback

When an agent rates a suggestion or source:

The widget emits rag-answer-feedback (thumbs up/down with optional reason code and comment) or rag-source-feedback (per-source rating)
The connector service persists feedback to AlloyDB
Source feedback updates kb_documents.feedback_score -- chunks with low scores are automatically suppressed from future results
The widget receives a feedback-received acknowledgement

4. Coaching

When coaching is enabled for the tenant:

On each customer utterance, the coaching engine evaluates against active playbooks (async, parallel to RAG)
Deterministic mode: Matches playbook conditions and emits step-by-step guidance
Generative mode: Uses LLM + KB context to generate situation-aware coaching
Hybrid mode: Tries playbook match first, falls back to generative
The widget receives coach-suggestion and coach-step-update events
Agents provide coaching feedback which is tracked for playbook effectiveness

5. Summary & Analysis

On-demand or auto-refreshing:

The widget or portal requests a summary via POST /conversations/{id}/summary
The connector service builds a transcript from stored messages and calls an LLM to generate situation/action/next-steps
Analysis (POST /conversations/{id}/analysis) generates customer sentiment, conversation quality scores, key topics, talk time breakdown, and compliance risk assessment
Results are stored in assist_conversation_analyses and returned to the widget

6. Conversation End

When the conversation closes:

A leave-conversation event is emitted (or disconnect detected)
The connector service marks the conversation as completed in AlloyDB
If applicable, triggers export to CCAI Insights for reporting
Session state is cleaned up from Redis

Audio Streaming

For voice conversations, the assist-audiohook-service provides real-time transcription:

Stage	Description
Audio capture	The CCaaS platform streams audio via WebSocket (typically 16kHz PCM)
Transcription	The audiohook service uses a speech-to-text engine to produce interim and final transcripts
PII redaction	Transcripts are passed through the PII redactor before storage
Forwarding	Final transcripts are published to Pub/Sub and received by the connector service
Processing	The connector service runs the same RAG pipeline used for chat messages

TIP

The audiohook service supports the Genesys AudioHook protocol and generic WebSocket audio streams. Configure the audio format and encoding in the integration settings.

Data Storage

Data	Storage	Retention
Conversation sessions & messages	AlloyDB	Configurable per tenant (default 90 days)
Suggestions, feedback, and analyses	AlloyDB	Same as conversation retention
Document embeddings	AlloyDB with pgvector	Persisted until document is deleted
Real-time session state	Redis	Duration of conversation + 1 hour TTL
Classifier cache	Redis	10-minute TTL
RAG result cache	Redis	5-minute TTL
Coaching step state	Redis	Duration of conversation
Widget translations	Redis	24-hour TTL
Audio streams	Not persisted	Processed in real-time, discarded after transcription
LLM token usage	AlloyDB (`ai_usage_records`)	Indefinite (billing records)

Security

All service-to-service communication uses internal networking (no public endpoints for backend services)
The widget authenticates via provider-specific OAuth (Genesys PKCE, Salesforce token exchange, etc.) followed by a tenant-scoped JWT
Embed tokens are signed JWTs with allowed origin restrictions, validated on every bootstrap request
Pub/Sub subscriptions use GCP IAM service accounts
PII in conversation transcripts is redacted before storage using configurable redaction rules
Tenant isolation is enforced on every database query via tenant_id filtering
Feedback-based chunk suppression prevents low-quality sources from surfacing

Key Files

File	Description
`backend/services/assist-connector-service/app/socket_handlers.py`	Socket.IO event registration
`backend/services/assist-connector-service/app/handlers/rag_trigger.py`	RAG pipeline: classify → retrieve → stream → deliver
`backend/services/assist-connector-service/app/handlers/coach_engine.py`	Coaching orchestration (deterministic, generative, hybrid)
`backend/services/assist-connector-service/app/handlers/session.py`	Session config resolution (queue, deployment key, provider)
`backend/services/assist-connector-service/app/handlers/feedback.py`	Answer and source feedback persistence
`backend/services/assist-connector-service/app/services/session_manager.py`	Redis-backed session and transcript history
`backend/services/assist-connector-service/app/services/pubsub_subscriber.py`	Pub/Sub event routing and PII redaction
`backend/services/rag-service/app/routes/assist.py`	`/retrieve` and `/stream` endpoints
`backend/services/rag-service/app/services/assist_classifier.py`	LLM-based utterance classifier
`frontend/agent-assist-widget/src/views/WidgetView.vue`	Main widget layout (transcript, suggestions, summary, analysis, coaching)
`frontend/agent-assist-widget/src/services/socket.ts`	Socket.IO client event handling
`frontend/agent-assist-portal/src/views/DashboardView.vue`	Admin analytics dashboard

Next Steps

Quick Start -- Set up Agent Assist end-to-end
Administration -- Configure tenants, knowledge bases, and integrations
Agent Guide -- How agents interact with the widget panels
Supervisor Guide -- Monitor and optimize assist performance

Architecture ​

System Overview ​

Services ​

Operating Modes ​

Omni Mode (omni) ​

Google Native Mode (google_native) ​

Event Flow ​

1. Connection & Session ​

2. Customer Message (Omni Mode) ​

3. Agent Feedback ​

4. Coaching ​

5. Summary & Analysis ​

6. Conversation End ​

Audio Streaming ​

Data Storage ​

Security ​

Key Files ​

Next Steps ​