FlashTalk¶
When to Use FlashTalk¶
FlashTalk is suited to high-quality realtime digital humans, livestream hosts, customer-support avatars, and other scenarios that need stronger expressiveness. It is heavier than Wav2Lip / QuickTalk and should usually run as an OmniRT service instead of inside the OpenTalking API process.
OpenTalking and OmniRT Boundary¶
OpenTalking owns the WebUI, sessions, TTS, WebRTC, recording, and status management. FlashTalk weight loading, GPU scheduling, and actual inference are handled by OmniRT or a dedicated FlashTalk service.
Requirements¶
GPU¶
Multi-GPU is recommended, or a single GPU with enough VRAM. FlashTalk has stricter requirements for memory, throughput, and service stability.
NPU¶
If the FlashTalk backend is already adapted for NPU, expose it through OmniRT. OpenTalking should not manage the NPU runtime directly.
Memory¶
If memory is tight, prefer quantization, lower resolution, fewer concurrent sessions, shorter cache windows, or splitting the model service.
Disk¶
You need space for weights, quantized weights, temporary audio/video files, and logs. Production deployments should keep weights and runtime caches on fast disks.
Prepare Weights¶
FlashTalk weights usually live on the OmniRT service side. OpenTalking keeps the following default paths:
OPENTALKING_FLASHTALK_CKPT_DIR=./avatar_models/SoulX-FlashTalk-14B
OPENTALKING_FLASHTALK_WAV2VEC_DIR=./avatar_models/chinese-wav2vec2-base
For production, let OmniRT manage these paths.
Start OmniRT¶
Configure OpenTalking¶
export OPENTALKING_OMNIRT_ENDPOINT=http://127.0.0.1:9000
export OPENTALKING_FLASHTALK_BACKEND=omnirt
Start OpenTalking¶
bash scripts/start_unified.sh \
--backend omnirt \
--model flashtalk \
--omnirt http://127.0.0.1:9000
Verify¶
- Run
bash scripts/quickstart/status.sh. - Confirm that
flashtalkappears in the model list. - Select a FlashTalk-compatible avatar in the WebUI.
- Send a short prompt and observe first-frame latency, audio boundaries, and stability.
Performance Notes¶
- Do not run the API process and the FlashTalk heavy model in the same process.
- Limit session duration and queue length in production.
- TTS chunking affects first-frame latency and continuity.
- For multi-model deployments, give FlashTalk a dedicated GPU or host.
Troubleshooting¶
Queue is blocked¶
Check slot timeout, max session, and active sessions. Production deployments should have explicit session release rules.
First-frame latency is too high¶
Check for cold starts, long TTS waits, overly large frame_num, or heavy sampling settings.
Out of memory¶
Consider quantization, lower resolution, fewer concurrent sessions, service splitting, or a larger VRAM device.