Core Architecture Deep Dive¶
Audience
Developers who need to understand how all DeVILSona components fit together before modifying code.
This page provides a full-stack breakdown of the DeVILSona ecosystem—from the VR headset hardware to the cloud database—using architecture diagrams and detailed component descriptions.
System Component Diagram¶
graph TB
subgraph "Meta Quest Headset (Android/UE5)"
MIC[Microphone Input]
AIS[AudioInputSubsystem<br/>Capture + Downsample + Encode]
ORCH[AIConversationOrchestratorSubsystem<br/>Data Routing]
WSS[WebSocketSubsystem<br/>Raw WebSocket I/O]
OAIS[OpenAiApiSubsystem<br/>Protocol & Parsing]
TOOLS[AIToolInterpreterSubsystem<br/>Function Call Execution]
ACTOR[IntervieweeActor<br/>Character State & Lip Sync]
METAHUMAN[MetaHuman Character<br/>Facial Animation]
SAVESYS[BP_GameInstance<br/>Local Save System]
AWSSAVE[SaveToAWS<br/>Remote Save System]
end
subgraph "OpenAI Cloud"
OAI_RT[OpenAI Realtime API<br/>GPT-4 Audio Model]
end
subgraph "AWS Cloud (us-east-2)"
APIGW[API Gateway<br/>FSE100-Session-API]
LAMBDA_SAVE[Lambda: FSE100_SaveSession]
LAMBDA_LOGIN[Lambda: FSE100_Login]
DYNAMO[DynamoDB<br/>StudentSessions Table]
end
subgraph "Instructor Machine (Windows)"
DS[DeVILStarter<br/>Wails + React]
TF[Terraform<br/>Infrastructure as Code]
end
MIC --> AIS
AIS -->|base64 PCM16 chunks| ORCH
ORCH -->|SendAudioInputToAI| OAIS
OAIS -->|JSON WebSocket messages| WSS
WSS <-->|WSS: wss://api.openai.com| OAI_RT
OAI_RT -->|Audio deltas, transcripts, function calls| OAIS
OAIS -->|OnResponseAudioDeltaReceived| ACTOR
OAIS -->|OnFunctionCallReceived| TOOLS
TOOLS -->|OnSetEmotion| ACTOR
ACTOR -->|PCM buffer + viseme weights| METAHUMAN
SAVESYS <--> AWSSAVE
AWSSAVE <-->|HTTPS REST| APIGW
APIGW --> LAMBDA_SAVE
APIGW --> LAMBDA_LOGIN
LAMBDA_SAVE <--> DYNAMO
LAMBDA_LOGIN <--> DYNAMO
DS --> TF
TF -->|Provisions/Destroys| APIGW
TF -->|Provisions/Destroys| LAMBDA_SAVE
TF -->|Provisions/Destroys| LAMBDA_LOGIN
TF -->|Provisions/Destroys| DYNAMO
Component Roles & Responsibilities¶
UE5 Client (on Meta Quest)¶
| Component | Type | Responsibility |
|---|---|---|
AudioInputSubsystem |
UGameInstanceSubsystem (C++) |
Captures microphone audio, downsamples to 24kHz PCM16, base64-encodes, broadcasts chunks |
WebSocketSubsystem |
UGameInstanceSubsystem (C++) |
Manages the raw WebSocket connection to OpenAI—connect, send, receive, reconnect logic |
OpenAiApiSubsystem |
UGameInstanceSubsystem (C++) |
Formats OpenAI Realtime API messages, parses responses, fires typed events (audio, transcript, function calls) |
AIToolInterpreterSubsystem |
UGameInstanceSubsystem (C++) |
Listens for function call events, executes set_emotion and other game logic triggers |
AIConversationOrchestratorSubsystem |
UGameInstanceSubsystem (C++) |
Connects audio input to OpenAI API; the "glue layer" between capture and transmission |
IntervieweeActor |
AActor (C++) |
The AI character actor; handles audio playback, lip sync via OVRLipSync, emotion state, subtitle display |
BP_GameInstance |
Blueprint UGameInstance |
Session state management, local save/load via SG_SaveData, orchestrates startup flow |
SaveToAWS |
Static Blueprint Function Library (C++) | Asynchronous HTTP calls to POST /session (save) and POST /login (retrieve) |
| MetaHuman Blueprint | Blueprint (generated) | The visual character; drives ABP_Face_PostProcess for facial animation |
Learn More
If you'd like to learn more, you can read our more fine-grained technical documentation on every public class, method, and delegate signature at Auto-generated API Documentation.
Learn More
For a hands-on guide to referencing, calling, and subscribing to these subsystems from your own C++ classes, see Subsystems.
AWS Backend¶
| Component | Technology | Responsibility |
|---|---|---|
| API Gateway | AWS API Gateway V2 (HTTP API) | Public REST endpoint; routes /session → FSE100_SaveSession and /login → FSE100_Login |
FSE100_SaveSession |
AWS Lambda (Node.js 22) | Upserts session record to DynamoDB |
FSE100_Login |
AWS Lambda (Node.js 22) | Queries DynamoDB for student's session records |
StudentSessions |
AWS DynamoDB | NoSQL persistent store for all student session data |
DeVILStarter (Desktop Launcher)¶
| Component | Technology | Responsibility |
|---|---|---|
| Go backend | Wails + Go | Executes Terraform CLI commands, streams log output, manages process lifecycle |
| React frontend | React + TypeScript + Vite + MUI | UI: Start/Stop buttons, log panel, status indicators |
| Terraform | HashiCorp Terraform + HCL | Provisions/destroys all AWS resources as code |
Data Flow: The Complete Lifecycle of One AI Response¶
This section traces exactly what happens from the moment a student stops speaking to when the AI character's mouth stops moving after its reply.
Phase 1: Audio Capture & Transmission (Student Speaks)¶
sequenceDiagram
participant Student
participant Mic as Microphone
participant AIS as AudioInputSubsystem
participant ORCH as AIConversationOrchestratorSubsystem
participant OAI as OpenAiApiSubsystem
participant WSS as WebSocketSubsystem
participant Cloud as OpenAI Realtime API
Student->>Mic: Speak
Mic->>AIS: Raw audio buffer
Note over AIS: Downsample to 24kHz mono, PCM16, base64
AIS->>ORCH: OnAudioChunkCaptured(base64)
ORCH->>OAI: SendAudioInputToAI(base64)
OAI->>WSS: SendMessage(JSON)
WSS->>Cloud: input_audio_buffer.append
- Student's voice -> Microphone hardware.
- Microphone hardware ->
AudioInputSubsystem(via UE5 audio capture API). AudioInputSubsystem: a. Receives raw PCM audio buffer (any sample rate, any channels). b. Downsamples to 24,000 Hz. c. Converts to single channel (mono). d. Casts to int16 PCM16 format. e. Base64-encodes the buffer. f. Broadcasts viaOnAudioChunkCaptured(const FString& Base64Audio).AIConversationOrchestratorSubsystemreceives the event.- Orchestrator calls
OpenAiApiSubsystem::SendAudioInputToAI(Base64Audio). OpenAiApiSubsystemconstructs the JSON message:
OpenAiApiSubsystemcallsWebSocketSubsystem::SendMessage(JsonString).WebSocketSubsystemsends the message over the persistent WSS connection.
Phase 2: AI Processing (OpenAI Cloud)¶
sequenceDiagram
participant Cloud as OpenAI Realtime API
participant Client as UE5 Client
Cloud->>Cloud: Receive audio chunks
Cloud->>Cloud: Detect speech end (VAD)
Cloud->>Cloud: Speech-to-text, LLM response
Cloud-->>Client: response.audio.delta
Cloud-->>Client: response.audio_transcript.delta
Cloud-->>Client: response.function_call_arguments.done
Cloud-->>Client: response.done
- OpenAI receives streamed audio chunks.
- OpenAI detects speech end (server-side VAD).
- OpenAI processes speech-to-text, generates LLM response.
- OpenAI streams back the response in small incremental messages:
response.audio.delta:{ "delta": "<base64 PCM16 audio>" }response.audio_transcript.delta:{ "delta": "Hello, I was saying..." }response.done(when response completes)response.function_call_arguments.done(if character triggers an emotion)
Phase 3: AI Response Reception & Rendering¶
sequenceDiagram
participant WSS as WebSocketSubsystem
participant OAI as OpenAiApiSubsystem
participant IA as IntervieweeActor
participant Lip as OVRLipSyncContext
participant ABP as ABP_Face_PostProcess
participant Tools as AIToolInterpreterSubsystem
WSS->>OAI: OnMessageReceived(JSON)
OAI->>IA: OnResponseAudioDeltaReceived(base64)
OAI->>Tools: OnFunctionCallReceived(json)
IA->>IA: Queue audio + buffer PCM
IA->>Lip: ProcessFrame(10ms)
Lip-->>IA: Viseme weights
ABP->>IA: GetCurrentVisemes()
Tools-->>IA: OnSetEmotion(emotion)
WebSocketSubsystem::OnMessageReceivedfires with each JSON message.OpenAiApiSubsystemparses the JSON:- Audio delta -> fires
OnResponseAudioDeltaReceived(const FString& Base64Audio). - Transcript delta -> fires
OnResponseTranscriptDeltaReceived(const FString& Text). - Function call -> fires
OnFunctionCallReceived(TSharedPtr<FJsonObject> FunctionCall).
- Audio delta -> fires
IntervieweeActor::HandleAudioDeltaReceived: a. Decodes base64 -> raw bytes. b. Queues audio toAIResponseSoundWave(USoundWaveProcedural). c. Casts bytes to int16 PCM samples. d. Appends toPlaybackPCMBuffer(for lip sync processing). e. Computes RMS loudness. f. Starts the lip sync timer (10ms interval) if not already running.- LipSync Timer ->
IntervieweeActor::ProcessLipSyncFrames(every 10ms): a. Extracts 240-sample frame (10ms at 24kHz). b. Passes toOVRLipSyncContext::ProcessFrame(). c. Stores returned 15 viseme weights inCurrentVisemes[]. ABP_Face_PostProcess(Animation Blueprint) runs at ~72fps: a. ReadsCurrentVisemes[]viaGetCurrentVisemes(). b. Applies per-viseme multipliers. c. Smooths withFInterpTo. d. Writes toLipSyncCurvesmap. e. ModifyCurve node applies curves toCTRL_expressions_*control rig. f. RigLogic evaluates full facial deformation.AIToolInterpreterSubsystem::HandleFunctionCall: a. Readsnamefield from function call JSON. b. Ifname == "set_emotion": firesOnSetEmotion(emotion). c.IntervieweeActorreceivesOnSetEmotion, switches animation state.
Phase 4: Session Save (During/After Session)¶
sequenceDiagram
participant GI as BP_GameInstance
participant Save as SaveToAWS
participant APIGW as API Gateway
participant Lambda as FSE100_SaveSession
participant DB as DynamoDB
GI->>Save: SendStudentSessionToAWS(...)
Save->>APIGW: POST /session
APIGW->>Lambda: Invoke
Lambda->>DB: PutItem
DB-->>Lambda: OK
Lambda-->>APIGW: 200 OK
APIGW-->>Save: 200 OK
BP_GameInstancecallsUSaveToAWS::SendStudentSessionToAWS(...).SaveToAWSconstructs JSON body with session fields.- HTTP request: POST ->
https://<id>.execute-api.us-east-2.amazonaws.com/session. - API Gateway invokes
FSE100_SaveSessionLambda. - Lambda calls DynamoDB
PutItemwith session data. - Response:
200 OK-> logged to Unreal Output Log.
Multi-Repository Relationship Map¶
| Repository | Role | Relationships |
|---|---|---|
| FSE100Capstone/DeVILSona | UE5 project | - References: DeVILSona-infra (API Gateway URLs embedded at build time). - Uses: OVRLipSync plugin (local, in Plugins/). - Uses: Meta XR plugin (Engine/Plugins/Marketplace/). |
| FSE100Capstone/DeVILSona-infra | Terraform configuration | - Outputs: API Gateway URLs (fed back into DeVILSona UE5 project). - Managed by: DeVILStarter. |
| FSE100Capstone/DeVILStarter | Desktop launcher | - Wraps: DeVILSona-infra (the Terraform directory). |
| FSE100Capstone/DeVILSona.wiki | Documentation only | - No runtime dependencies. |
Key Design Decisions & Rationale¶
| Decision | Rationale |
|---|---|
| GameInstance Subsystems for AI pipeline | Subsystems persist across level transitions without needing cross-level references. The AI conversation continues seamlessly even if the UE5 level changes. |
| Separate AudioInputSubsystem | Isolates the platform-specific audio capture code. If the API changes between UE5 versions, only this subsystem needs updating. |
| Base64 PCM16 encoding | OpenAI Realtime API requires PCM16 audio at 24kHz mono, transmitted as base64 over JSON WebSocket messages. This is the only supported format. |
| Persistent WebSocket (not HTTP) | AI conversation requires bi-directional, low-latency streaming. HTTP request-response is unsuitable for real-time audio. WebSocket provides the persistent full-duplex channel needed. |
| OVRLipSync timer-based processing | Streaming packets arrive faster than playback speed. A timer-driven approach ensures lip sync continues for buffered audio even after the last network packet arrives. |
| AWS PAY_PER_REQUEST DynamoDB | The usage pattern is highly bursty (heavy during class, zero otherwise). Provisioned capacity would waste money. PAY_PER_REQUEST is nearly free at educational scale. |
| Terraform for infrastructure | Ensures reproducible, version-controlled infrastructure. Any developer can redeploy from scratch in minutes. |
| DeVILStarter wraps Terraform | Educators don't know Terraform. DeVILStarter provides a one-click interface for the non-technical operators who run the system in the classroom. |
➡️ Next: The AI Pipeline