Skip to content

OpenTalking

OpenTalking logo

GitHub stars GitHub forks Open issues Issue resolution PyPI planned Python >= 3.10 GitHub downloads Source downloads ModelScope Hugging Face

Project Introduction

OpenTalking is an open-source orchestration framework for real-time digital-human applications. It connects frontend interaction, session state, LLM responses, TTS and voice settings, subtitle events, WebRTC audio/video playback, and local or remote digital-human synthesis backends.

OpenTalking is not a single talking-head model. It sits between product experiences and model services, organizing LLM, speech recognition, speech synthesis, avatar rendering, event streaming, and browser playback into a unified runtime. Developers can start with Mock validation and then move to real models and inference backends such as Wav2Lip, QuickTalk, FasterLivePortrait, MuseTalk, FlashTalk, or OmniRT.

It is designed for scenarios such as AI customer support, product demos, course presenters, news anchors, companion characters, and private digital-human deployments. If you are new to the project, start with Quick Start and run the Mock path first. If you are already evaluating models, runtime backends, GPU/NPU resources, or OmniRT, continue with Model Support.

Demo Video

Get Started Fast

Key Features

  • Real-time conversation pipeline: coordinates speech input, LLM response, TTS synthesis, subtitle events, avatar rendering, and WebRTC playback.
  • Pluggable model backends: supports backend modes such as mock, local, direct_ws, and omnirt, from local validation to remote inference services.
  • Multiple model paths: provides an evolving integration plan for Wav2Lip, QuickTalk, FasterLivePortrait, MuseTalk, FlashTalk, FlashHead, and related talking-head models.
  • Video Clone workflow: use camera frames or uploaded video as driving input in WebUI to drive a source digital-human avatar.
  • Open LLM/TTS configuration: supports OpenAI-compatible LLM endpoints, including DashScope, DeepSeek, Ollama, vLLM, or internal model services.
  • WebUI and command-line tools: use WebUI for session validation, avatar selection, voice configuration, and model status; use CLI entrypoints for service startup and debugging.
  • Production-oriented runtime modes: supports local development, Mock validation, Docker, API / Worker split, and external inference-service integration.

User Guide

  • Usage: command-line startup, WebUI usage, Video Clone, avatar configuration, and voice/TTS settings.
  • Examples: customer support, product demos, course presenters, and similar scenarios.
  • Model Support: model and backend selection, plus production topology.
  • Reference Materials: benchmark metrics and changelog entries.
  • FAQ: installation, configuration, WebRTC, model backend, and runtime issues.

License Information

OpenTalking is released under the Apache License 2.0. Talking-head models, model weights, TTS services, LLM services, and external inference backends may have their own licenses or terms of use. Check the corresponding project or service before deployment or commercial use.