docs: split quickstart paths

This commit is contained in:
zyairehhh
2026-06-19 14:41:04 +08:00
parent f65f2bb5d0
commit 2128e1d256
5 changed files with 102 additions and 66 deletions

View File

@@ -132,7 +132,14 @@ OpenTalking's **orchestration layer** (API / Worker / frontend) and **digital-hu
## Quickstart
### Quick Experience: Compshare Image
Choose one of the two quickstart paths first:
| Path | Use when | What you need | What it validates |
| --- | --- | --- | --- |
| Compshare image | You want to try OpenTalking before setting up dependencies or downloading model weights. | A Compshare instance created from the published image, with port `5173` open. | WebUI, LLM replies, streaming TTS, subtitle events, WebRTC delivery, and the prebuilt image workflow. |
| Self deployment | You want to run the repo on your own machine or server, customize config, or continue into local/remote model deployment. | Python, Node.js, FFmpeg, `.env` provider config; real models also need GPU/runtime/model weights. | Mock first-run path, then local QuickTalk or remote OmniRT model paths. |
### 1. Compshare Image
If you want to try the OpenTalking + OmniRT + QuickTalk real-time digital-human path before setting up everything manually, use the community image we published on Compshare:
@@ -142,7 +149,9 @@ If you want to try the OpenTalking + OmniRT + QuickTalk real-time digital-human
The image includes OpenTalking, OmniRT, the QuickTalk runtime environment, and model files. After deploying an instance, open port `5173` and visit the instance URL provided by the platform. If you need to restart services manually, follow the commands in the guide.
Use this path when you are trying the project for the first time and do not want to download video model weights yet. The digital-human image uses the built-in static Mock frame, while LLM replies, streaming TTS, subtitle events, and WebRTC delivery still run through the full product path.
### 2. Self Deployment
Use this path when you want to run OpenTalking from source. Start with Mock mode if you do not want to download video model weights yet: Mock mode uses the built-in static frame, while LLM replies, streaming TTS, subtitle events, and WebRTC delivery still run through the full product path.
```bash
git clone https://github.com/datascale-ai/opentalking.git
@@ -171,7 +180,7 @@ Stop services:
bash scripts/quickstart/stop_all.sh
```
### Real Model Entrypoints
#### Real Model Entrypoints
After Mock mode works, choose a real model path based on your machine. Weight downloads, directory layout, mirrors, checks, and troubleshooting are maintained in the docs; the README keeps only the startup entrypoints:

View File

@@ -132,7 +132,14 @@ OpenTalking's **orchestration layer** (API / Worker / frontend) and **digital-hu
## Quickstart
### Quick Experience: Compshare Image
Choose one of the two quickstart paths first:
| Path | Use when | What you need | What it validates |
| --- | --- | --- | --- |
| Compshare image | You want to try OpenTalking before setting up dependencies or downloading model weights. | A Compshare instance created from the published image, with port `5173` open. | WebUI, LLM replies, streaming TTS, subtitle events, WebRTC delivery, and the prebuilt image workflow. |
| Self deployment | You want to run the repo on your own machine or server, customize config, or continue into local/remote model deployment. | Python, Node.js, FFmpeg, `.env` provider config; real models also need GPU/runtime/model weights. | Mock first-run path, then local QuickTalk or remote OmniRT model paths. |
### 1. Compshare Image
If you want to try the OpenTalking + OmniRT + QuickTalk real-time digital-human path before setting up everything manually, use the community image we published on Compshare:
@@ -142,7 +149,9 @@ If you want to try the OpenTalking + OmniRT + QuickTalk real-time digital-human
The image includes OpenTalking, OmniRT, the QuickTalk runtime environment, and model files. After deploying an instance, open port `5173` and visit the instance URL provided by the platform. If you need to restart services manually, follow the commands in the guide.
Use this path when you are trying the project for the first time and do not want to download video model weights yet. The digital-human image uses the built-in static Mock frame, while LLM replies, streaming TTS, subtitle events, and WebRTC delivery still run through the full product path.
### 2. Self Deployment
Use this path when you want to run OpenTalking from source. Start with Mock mode if you do not want to download video model weights yet: Mock mode uses the built-in static frame, while LLM replies, streaming TTS, subtitle events, and WebRTC delivery still run through the full product path.
```bash
git clone https://github.com/datascale-ai/opentalking.git
@@ -171,7 +180,7 @@ Stop services:
bash scripts/quickstart/stop_all.sh
```
### Real Model Entrypoints
#### Real Model Entrypoints
After Mock mode works, choose a real model path based on your machine. Weight downloads, directory layout, mirrors, checks, and troubleshooting are maintained in the docs; the README keeps only the startup entrypoints:

View File

@@ -132,7 +132,14 @@ OpenTalking 的 **编排层**API / Worker / 前端)和 **数字人合成后
## 快速开始
### 快速体验:优云智算镜像
先按目标选择一条快速开始路径:
| 路线 | 适合场景 | 需要准备 | 验证内容 |
| --- | --- | --- | --- |
| 优云镜像 | 想先体验 OpenTalking不想配置依赖或下载模型权重。 | 使用已发布镜像创建优云实例,并开放 `5173` 端口。 | WebUI、LLM 回复、流式 TTS、字幕事件、WebRTC 传输和预置镜像工作流。 |
| 自己部署 | 想在自己的机器或服务器运行仓库、调整配置,或继续接本地/远端真实模型。 | Python、Node.js、FFmpeg、`.env` provider 配置;真实模型还需要 GPU、运行时和模型权重。 | 先跑通 Mock 首次运行链路,再切到本地 QuickTalk 或远端 OmniRT。 |
### 1. 优云
如果你只是想先体验 OpenTalking + OmniRT + QuickTalk 的实时数字人链路,可以直接使用我们在优云智算发布的社区镜像:
@@ -142,7 +149,9 @@ OpenTalking 的 **编排层**API / Worker / 前端)和 **数字人合成后
镜像内已预置 OpenTalking、OmniRT、QuickTalk 运行环境和模型文件。部署实例后开放 `5173` 端口,在浏览器访问平台提供的实例地址即可进入 WebUI如需手动重启服务请按操作文档中的命令执行。
适用:第一次接触项目,不下载视频模型权重,先用 Mock 模式跑通产品链路。数字人画面使用内置静态帧LLM 回复、流式 TTS、字幕事件和 WebRTC 传输仍是完整链路。
### 2. 自己部署
适用:想从源码运行 OpenTalking。第一次部署时可以先用 Mock 模式不下载视频模型权重Mock 模式使用内置静态帧LLM 回复、流式 TTS、字幕事件和 WebRTC 传输仍是完整链路。
```bash
git clone https://github.com/datascale-ai/opentalking.git
@@ -171,7 +180,7 @@ bash scripts/start_unified.sh --mock --api-port 8210 --web-port 5280
bash scripts/quickstart/stop_all.sh
```
### 真实模型启动入口
#### 真实模型启动入口
完成 Mock 验证后再按机器条件选择真实模型路线。权重下载、目录结构、国内镜像、校验、故障排查都放在文档站中维护README 只保留启动入口:

View File

@@ -1,12 +1,12 @@
# Quick Start
This page helps you quickly run OpenTalking. Start with **Mock mode** to validate the orchestration layer, LLM, TTS, subtitle events, and WebRTC playback. Then use the real model **QuickTalk** to validate real digital-human video rendering.
This page helps you quickly run OpenTalking. Choose one of two paths first: use the published **Compshare image** for the fastest hosted trial, or use **self deployment** when you want to run and customize the repo on your own machine or server.
- Mock mode: no model weights, no GPU, uses built-in static frames to validate the full interaction flow.
- QuickTalk mode: uses a local CUDA GPU and QuickTalk weights to validate the real digital-human rendering path.
- Compshare image: no local dependency installation or model download; use the published instance image and open port `5173`.
- Self deployment: clone the repo, configure providers, start Mock mode first, then move to local QuickTalk or remote OmniRT when needed.
- WebUI validation: select avatar, model, and voice in the page, then start a real-time conversation.
## Quick Experience: Compshare Image
## 1. Compshare Image
If you want to skip local dependency installation and model downloads, deploy our published Compshare community image:
@@ -16,7 +16,11 @@ If you want to skip local dependency installation and model downloads, deploy ou
The image already includes OpenTalking, OmniRT, the QuickTalk runtime environment, and model files. Use it to try the real digital-human path first; continue with the source-based steps below when you need local installation or development.
## Mock Mode
## 2. Self Deployment
Use this path when you want to run OpenTalking from source, change configuration, or continue into local/remote model deployment.
### 2.1 Mock Mode
Mock mode is the recommended first path for OpenTalking. It does not require GPU, model weights, or an external inference service, but still validates the API, LLM, TTS, subtitle events, WebRTC, and browser playback path.
@@ -26,7 +30,7 @@ Use it for:
- Checking whether LLM / TTS configuration works.
- Previewing WebUI and session flow on a machine without GPU.
### Mock Mode Environment
#### Mock Mode Environment
| Dependency | Recommended Version | Notes |
| --- | --- | --- |
@@ -35,7 +39,7 @@ Use it for:
| FFmpeg | Available as a system command | Audio/video processing dependency. |
| GPU | Not required | Uses the built-in Mock static frame. |
### 1. Clone Repository
#### 1. Clone Repository
```bash
export DIGITAL_HUMAN_HOME=/opt/digital_human
@@ -46,7 +50,7 @@ git clone https://github.com/datascale-ai/opentalking.git
cd opentalking
```
### 2. Install Basic Dependencies
#### 2. Install Basic Dependencies
Using `uv` is recommended:
@@ -68,7 +72,7 @@ pip install --index-url https://pypi.tuna.tsinghua.edu.cn/simple -e ".[dev]"
cp .env.example .env
```
### 3. Configure Minimal Environment Variables
#### 3. Configure Minimal Environment Variables
Edit `.env` and configure at least LLM and TTS. The example below uses an OpenAI-compatible endpoint and `edge` TTS:
@@ -83,7 +87,7 @@ OPENTALKING_TTS_EDGE_VOICE=zh-CN-XiaoxiaoNeural
`edge` TTS does not require an API key. If you use DashScope STT or DashScope TTS, configure `OPENTALKING_STT_DASHSCOPE_API_KEY` or `OPENTALKING_TTS_DASHSCOPE_API_KEY` for that module.
### 4. Start Mock Mode
#### 4. Start Mock Mode
```bash
cd "$DIGITAL_HUMAN_HOME/opentalking"
@@ -101,7 +105,7 @@ To specify ports:
bash scripts/start_unified.sh --mock --api-port 8210 --web-port 5280
```
### 5. Open WebUI
#### 5. Open WebUI
After startup, the terminal prints the WebUI URL. The default URL is:
@@ -113,7 +117,7 @@ http://127.0.0.1:5173
*After startup, WebUI shows the avatar library, model selector, voice controls, and conversation area.*
### 6. Complete Your First Conversation
#### 6. Complete Your First Conversation
In WebUI, select Mock / driverless mode, confirm LLM and TTS configuration, enter a short test sentence, and start the session. If the browser plays audio, shows subtitles, and displays the Mock frame, the base pipeline is working.
@@ -121,7 +125,7 @@ In WebUI, select Mock / driverless mode, confirm LLM and TTS configuration, ente
*For the first validation, check user input, subtitle events, playback state, and video output.*
## QuickTalk Mode
### 2.2 QuickTalk Mode
QuickTalk mode is a faster path toward real digital-human output. It can load QuickTalk weights locally and is suitable for single-machine validation on consumer CUDA GPUs.
@@ -130,7 +134,7 @@ Use it when:
- You have an available NVIDIA GPU and CUDA environment.
- You want to see real lip motion and avatar driving.
### QuickTalk Mode Environment
#### QuickTalk Mode Environment
| Dependency | Recommended Version | Notes |
| --- | --- | --- |
@@ -140,7 +144,7 @@ Use it when:
| GPU | NVIDIA CUDA GPU | Start with a 3090 / 4090 class machine if possible. |
| Weights | QuickTalk, HuBERT, InsightFace `buffalo_l` | Download or sync offline according to this page. |
### 1. Check GPU and System Environment
#### 1. Check GPU and System Environment
QuickTalk mode requires a local CUDA GPU. Check:
@@ -151,7 +155,7 @@ python --version
node --version
```
### 2. Install Model Dependencies
#### 2. Install Model Dependencies
```bash
cd "$DIGITAL_HUMAN_HOME/opentalking"
@@ -159,7 +163,7 @@ uv sync --extra dev --extra models --python 3.11
source .venv/bin/activate
```
### 3. Prepare QuickTalk Weights
#### 3. Prepare QuickTalk Weights
Place local QuickTalk weights and dependencies under repository-root `models/quicktalk/`.
@@ -231,7 +235,7 @@ models/
...
```
### 4. Prepare a Custom Avatar
#### 4. Prepare a Custom Avatar
You can start with the built-in QuickTalk example avatar. Later, if you want to upload your own identity, use a clear frontal or half-body image and create a custom avatar in WebUI through “upload from local”.
@@ -239,7 +243,7 @@ You can start with the built-in QuickTalk example avatar. Later, if you want to
*The WebUI avatar library supports built-in avatars and custom images through the upload entry.*
### 5. Start QuickTalk Mode
#### 5. Start QuickTalk Mode
```bash
export OPENTALKING_TORCH_DEVICE=cuda:0
@@ -262,7 +266,7 @@ bash scripts/start_unified.sh \
The first startup may build face cache and worker state, so it can take longer than Mock mode.
### 6. Select QuickTalk in WebUI
#### 6. Select QuickTalk in WebUI
After opening WebUI, select a `QuickTalk` avatar and the `quicktalk` model, then start a session. If the video frame is generated along with audio, the local QuickTalk rendering path is available.

View File

@@ -1,13 +1,14 @@
# 快速开始
本页帮助你快速跑通 OpenTalking,先用 **Mock 模式** 验证编排层、LLM、TTS、
字幕事件和 WebRTC 播放,再使用实际模型 **QuickTalk**,验证真实数字人视频渲染。
本页帮助你快速跑通 OpenTalking。先选择一条路径:如果只是想最快体验,使用已发布的
**优云镜像**;如果要在自己的机器或服务器上运行、修改配置和继续部署模型,选择
**自己部署**
- Mock 模式:不下载模型权重,不需要 GPU用内置静态帧验证完整交互链路
- QuickTalk 模式:使用本地 CUDA GPU 和 QuickTalk 权重,验证真实数字人渲染链路
- 优云镜像:不需要本地安装依赖或下载模型,使用已发布实例镜像并开放 `5173` 端口
- 自己部署:克隆仓库、配置 provider先跑通 Mock 模式,再按需要切到本地 QuickTalk 或远端 OmniRT
- WebUI 验证:在页面中选择 Avatar、模型、音色发起一次实时对话。
## 快速体验:优云智算镜像
## 1. 优云
如果你希望跳过本地依赖安装和模型下载,可以直接部署我们发布的优云智算社区镜像:
@@ -17,7 +18,11 @@
镜像已预置 OpenTalking、OmniRT、QuickTalk 运行环境和模型文件,适合先体验真实数字人链路;需要从源码安装或做二次开发时,再继续阅读本页后续步骤。
## Mock 模式
## 2. 自己部署
如果你要从源码运行 OpenTalking、修改配置或继续做本地/远端模型部署,请按下面步骤操作。
### 2.1 Mock 模式
Mock 模式是第一次使用 OpenTalking 的推荐路径。它不需要 GPU、模型权重或外部推理服务
但仍然会跑通 API、LLM、TTS、字幕事件、WebRTC 和浏览器播放链路。
@@ -28,7 +33,7 @@ Mock 模式是第一次使用 OpenTalking 的推荐路径。它不需要 GPU、
- 检查 LLM / TTS 配置是否可用。
- 在没有 GPU 的机器上预览 WebUI 和会话流程。
### Mock 模式环境
#### Mock 模式环境
| 依赖 | 建议版本 | 说明 |
| --- | --- | --- |
@@ -37,7 +42,7 @@ Mock 模式是第一次使用 OpenTalking 的推荐路径。它不需要 GPU、
| FFmpeg | 系统可执行命令 | 音视频处理依赖。 |
| GPU | 不需要 | 使用内置 Mock 静态帧。 |
### 1. 克隆项目
#### 1. 克隆项目
```bash
export DIGITAL_HUMAN_HOME=/opt/digital_human
@@ -48,7 +53,7 @@ git clone https://github.com/datascale-ai/opentalking.git
cd opentalking
```
### 2. 安装基础依赖
#### 2. 安装基础依赖
推荐使用 `uv` 安装依赖:
@@ -70,7 +75,7 @@ pip install --index-url https://pypi.tuna.tsinghua.edu.cn/simple -e ".[dev]"
cp .env.example .env
```
### 3. 配置最小环境变量
#### 3. 配置最小环境变量
编辑 `.env`,至少配置 LLM 和 TTS。下面是一个使用 OpenAI-compatible endpoint 和 `edge`
TTS 的最小示例:
@@ -87,7 +92,7 @@ OPENTALKING_TTS_EDGE_VOICE=zh-CN-XiaoxiaoNeural
`edge` TTS 不需要 API key。如果使用 DashScope STT 或 DashScope TTS按模块配置
`OPENTALKING_STT_DASHSCOPE_API_KEY``OPENTALKING_TTS_DASHSCOPE_API_KEY`
### 4. 启动 mock 模式
#### 4. 启动 mock 模式
```bash
cd "$DIGITAL_HUMAN_HOME/opentalking"
@@ -105,7 +110,7 @@ bash scripts/start_unified.sh --mock
bash scripts/start_unified.sh --mock --api-port 8210 --web-port 5280
```
### 5. 打开 WebUI
#### 5. 打开 WebUI
启动成功后,终端会输出 WebUI 地址。默认访问:
@@ -117,7 +122,7 @@ http://127.0.0.1:5173
*启动成功后的 WebUI 页面会展示 Avatar、模型、音色和会话区域。*
### 6. 完成第一次对话
#### 6. 完成第一次对话
在 WebUI 中选择 Mock / driverless 模式,确认 LLM 和 TTS 配置后,输入一句测试文本并开始会话。
如果浏览器能播放音频、显示字幕,并看到 Mock 画面,说明基础链路已经跑通。
@@ -126,7 +131,7 @@ http://127.0.0.1:5173
*第一次验证时,重点确认用户输入、字幕事件、播放状态和画面输出都正常。*
## QuickTalk 模式
### 2.2 QuickTalk 模式
QuickTalk 模式是更接近真实数字人效果的快速路径。它可以在本地加载
QuickTalk 权重,适合消费级 CUDA GPU 单机验证。
@@ -136,7 +141,7 @@ QuickTalk 权重,适合消费级 CUDA GPU 单机验证。
- 有可用的 NVIDIA GPU 和 CUDA 环境。
- 希望看到真实口型和头像驱动效果。
### QuickTalk 模式环境
#### QuickTalk 模式环境
| 依赖 | 建议版本 | 说明 |
| --- | --- | --- |
@@ -146,7 +151,7 @@ QuickTalk 权重,适合消费级 CUDA GPU 单机验证。
| GPU | NVIDIA CUDA GPU | 推荐从 3090 / 4090 级别机器开始验证。 |
| 权重 | QuickTalk、HuBERT、InsightFace `buffalo_l` | 需按本页步骤下载或离线同步。 |
### 1. 确认 GPU 与系统环境
#### 1. 确认 GPU 与系统环境
QuickTalk 模式需要本地 CUDA GPU。建议先确认
@@ -157,7 +162,7 @@ python --version
node --version
```
### 2. 安装模型依赖
#### 2. 安装模型依赖
```bash
cd "$DIGITAL_HUMAN_HOME/opentalking"
@@ -165,7 +170,7 @@ uv sync --extra dev --extra models --python 3.11
source .venv/bin/activate
```
### 3. 准备 QuickTalk 权重
#### 3. 准备 QuickTalk 权重
QuickTalk 本地权重和依赖建议统一放在仓库根目录的 `models/quicktalk/`
@@ -237,7 +242,7 @@ models/
...
```
### 4. 准备自定义 Avatar
#### 4. 准备自定义 Avatar
可以先使用项目内置的 QuickTalk 示例 Avatar。后续如果要上传自己的形象建议使用清晰正脸或半身图
并在 WebUI 中通过“从本地上传新形象”创建自定义 Avatar。
@@ -246,7 +251,7 @@ models/
*WebUI 形象库支持选择内置 Avatar也可以通过上传入口添加自定义形象。*
### 5. 启动 QuickTalk 模式
#### 5. 启动 QuickTalk 模式
```bash
export OPENTALKING_TORCH_DEVICE=cuda:0
@@ -269,7 +274,7 @@ bash scripts/start_unified.sh \
首次启动可能会构建 face cache 和 worker耗时会比 Mock 模式更长。
### 6. 在 WebUI 中选择 QuickTalk
#### 6. 在 WebUI 中选择 QuickTalk
打开 WebUI 后,选择 `QuickTalk` 相关 Avatar 和 `quicktalk` 模型,再发起一次会话。
如果视频画面随音频生成,说明本地 QuickTalk 渲染链路已经可用。
@@ -278,9 +283,9 @@ bash scripts/start_unified.sh \
*选择 QuickTalk 相关 Avatar 和模型后,确认生成状态、连接状态和播放画面正常。*
## 验证
### 2.3 验证
### 检查浏览器播放
#### 检查浏览器播放
在 WebUI 中确认:
@@ -289,7 +294,7 @@ bash scripts/start_unified.sh \
- 有字幕或状态事件更新。
- 浏览器可以播放音频和视频。
### 检查 QuickTalk 输出
#### 检查 QuickTalk 输出
QuickTalk 模式下,如果第一次会话较慢,先等待缓存和 worker 初始化完成。确认 GPU 正在工作:
@@ -299,9 +304,9 @@ nvidia-smi
如果 WebUI 能看到动态视频结果,说明 QuickTalk 本地链路已经跑通。
## 常见问题
### 2.4 常见问题
### 端口被占用
#### 端口被占用
换一组端口启动:
@@ -315,51 +320,51 @@ bash scripts/start_unified.sh --mock --api-port 8210 --web-port 5280
bash scripts/quickstart/stop_all.sh
```
### LLM 鉴权失败
#### LLM 鉴权失败
检查 `.env` 中的 `OPENTALKING_LLM_BASE_URL``OPENTALKING_LLM_API_KEY`
`OPENTALKING_LLM_MODEL`。如果使用 DashScope compatible mode确认 URL 包含
`/compatible-mode/v1`
### 浏览器没有声音
#### 浏览器没有声音
确认浏览器没有静音,页面获得了播放权限,并检查 TTS provider 是否可用。使用 `edge` TTS 时一般不需要
API key使用 DashScope TTS 时需要配置 `OPENTALKING_TTS_DASHSCOPE_API_KEY`
### WebRTC 没有画面
#### WebRTC 没有画面
先用 Mock 模式确认浏览器和 WebRTC 链路可用。如果本机访问正常、远端访问异常,通常需要检查网络、
防火墙、HTTPS 和 TURN 配置。
### Mock 正常但 QuickTalk 不可用
#### Mock 正常但 QuickTalk 不可用
常见原因是模型依赖未安装、权重目录不完整、CUDA 不可用或 `OPENTALKING_QUICKTALK_ASSET_ROOT`
指向错误。先运行 `stat` 命令检查关键文件,再用 `nvidia-smi` 确认 GPU 状态。
### GPU / 权重路径错误
#### GPU / 权重路径错误
确认 `OPENTALKING_TORCH_DEVICE``OPENTALKING_QUICKTALK_ASSET_ROOT``models/quicktalk`
目录结构一致。如果使用离线权重,只要最终文件路径与本页示例一致即可。
## 下一步
### 2.5 下一步
### 不同平台环境
#### 不同平台环境
查看 [平台说明](platform-notes.md),了解 Linux、macOS、Windows、国内镜像源和常见系统依赖。
### Docker 部署
#### Docker 部署
查看 [Docker 部署](docker-deployment.md),了解容器化运行方式。
### WebUI 使用
#### WebUI 使用
查看 [WebUI 使用](../usage/webui/basic.md)继续了解页面布局、Avatar、音色和会话操作。
### 模型支持
#### 模型支持
查看 [模型支持](../model-support/index.md),选择 Wav2Lip、QuickTalk、MuseTalk、FlashTalk、
OmniRT 或后续推理后端。
### 命令行工具
#### 命令行工具
查看 [命令行工具](../usage/cli.md),了解 `opentalking-unified`、启动脚本和常用参数。