feat: add local CosyVoice TRT sidecar deployment (#119)

This commit is contained in:
zyairehhh
2026-06-23 23:06:58 +08:00
committed by GitHub
parent 61f4007965
commit 7f37c3b49d
14 changed files with 691 additions and 21 deletions

View File

@@ -126,7 +126,7 @@ OpenTalking's **orchestration layer** (API / Worker / frontend) and **digital-hu
| Fast trial | `mock` | CPU / no GPU | Validate API, LLM, TTS, WebRTC, and browser playback without downloading model weights | [Quickstart](docs/en/user-guide/quickstart.md) |
| Entry validation | `quicktalk` / `wav2lip` | RTX 3050 Laptop, RTX 3060, RTX 4060 | Run real video rendering for demos and deployment validation; lower the resolution on low-memory devices | [QuickTalk](docs/en/model-deployment/quicktalk.md) / [Wav2Lip](docs/en/model-deployment/wav2lip-local.md) |
| Consumer-GPU single machine | `quicktalk` / `wav2lip` / `musetalk` | RTX 3090, RTX 4090 | Closer to real-time local demos, private validation, and lightweight pre-production evaluation | [Model deployment](docs/en/model-deployment/index.md) |
| Fully local private path | `sensevoice` + `local_cosyvoice` + `quicktalk` | RTX 3090 / 4090 or similar GPU | Run STT, TTS, and video driving with local models to reduce external dependencies | [Local STT/TTS + QuickTalk](docs/en/model-deployment/local-quicktalk-audio.md) |
| Fully local private path | `sensevoice` + `local_cosyvoice` + `quicktalk` | RTX 3090 / 4090 or similar GPU | Run STT, TTS, and video driving locally; OpenTalking uses the main `.venv`, while CosyVoice runs in a dedicated sidecar venv | [Local STT/TTS + QuickTalk](docs/en/model-deployment/local-quicktalk-audio.md) |
| High-quality remote inference | `flashtalk` / `flashhead` / `fasterliveportrait` + OmniRT | Multi-GPU, Ascend 910B2, remote GPU service | Multi-card, GPU/NPU, production isolation, higher visual quality, or video clone workflows | [FlashTalk](docs/en/model-deployment/flashtalk.md) / [FasterLivePortrait](docs/en/model-deployment/fasterliveportrait.md) |
| Docker / production deployment | API, Web, Worker, external model services | Single GPU, remote GPU, distributed cluster | Service deployment, remote GPU, distributed runtime, and production validation | [Deployment](docs/en/user-guide/deployment.md) |

View File

@@ -126,7 +126,7 @@ OpenTalking's **orchestration layer** (API / Worker / frontend) and **digital-hu
| Fast trial | `mock` | CPU / no GPU | Validate API, LLM, TTS, WebRTC, and browser playback without downloading model weights | [Quickstart](docs/en/user-guide/quickstart.md) |
| Entry validation | `quicktalk` / `wav2lip` | RTX 3050 Laptop, RTX 3060, RTX 4060 | Run real video rendering for demos and deployment validation; lower the resolution on low-memory devices | [QuickTalk](docs/en/model-deployment/quicktalk.md) / [Wav2Lip](docs/en/model-deployment/wav2lip-local.md) |
| Consumer-GPU single machine | `quicktalk` / `wav2lip` / `musetalk` | RTX 3090, RTX 4090 | Closer to real-time local demos, private validation, and lightweight pre-production evaluation | [Model deployment](docs/en/model-deployment/index.md) |
| Fully local private path | `sensevoice` + `local_cosyvoice` + `quicktalk` | RTX 3090 / 4090 or similar GPU | Run STT, TTS, and video driving with local models to reduce external dependencies | [Local STT/TTS + QuickTalk](docs/en/model-deployment/local-quicktalk-audio.md) |
| Fully local private path | `sensevoice` + `local_cosyvoice` + `quicktalk` | RTX 3090 / 4090 or similar GPU | Run STT, TTS, and video driving locally; OpenTalking uses the main `.venv`, while CosyVoice runs in a dedicated sidecar venv | [Local STT/TTS + QuickTalk](docs/en/model-deployment/local-quicktalk-audio.md) |
| High-quality remote inference | `flashtalk` / `flashhead` / `fasterliveportrait` + OmniRT | Multi-GPU, Ascend 910B2, remote GPU service | Multi-card, GPU/NPU, production isolation, higher visual quality, or video clone workflows | [FlashTalk](docs/en/model-deployment/flashtalk.md) / [FasterLivePortrait](docs/en/model-deployment/fasterliveportrait.md) |
| Docker / production deployment | API, Web, Worker, external model services | Single GPU, remote GPU, distributed cluster | Service deployment, remote GPU, distributed runtime, and production validation | [Deployment](docs/en/user-guide/deployment.md) |

View File

@@ -126,7 +126,7 @@ OpenTalking 的 **编排层**API / Worker / 前端)和 **数字人合成后
| 快速体验 | `mock` | CPU / 无 GPU | 不下载模型权重,先验证 API、LLM、TTS、WebRTC 与浏览器播放链路 | [快速开始](docs/zh/user-guide/quickstart.md) |
| 入门验证 | `quicktalk` / `wav2lip` | RTX 3050 Laptop、RTX 3060、RTX 4060 | 能跑通真实视频渲染,适合功能演示和部署验证;低显存设备建议降低分辨率 | [QuickTalk](docs/zh/model-deployment/quicktalk.md) / [Wav2Lip](docs/zh/model-deployment/wav2lip-local.md) |
| 消费级显卡单机 | `quicktalk` / `wav2lip` / `musetalk` | RTX 3090、RTX 4090 | 更接近实时体验,适合本地 demo、私有化验证和轻量生产前评估 | [模型部署](docs/zh/model-deployment/index.md) |
| 全本地私有化 | `sensevoice` + `local_cosyvoice` + `quicktalk` | RTX 3090 / 4090 或同级 GPU | STT、TTS、视频驱动都走本地模型,减少外部依赖 | [本地 STT/TTS + QuickTalk](docs/zh/model-deployment/local-quicktalk-audio.md) |
| 全本地私有化 | `sensevoice` + `local_cosyvoice` + `quicktalk` | RTX 3090 / 4090 或同级 GPU | STT、TTS、视频驱动都走本地OpenTalking 使用主 `.venv`CosyVoice 使用独立 sidecar venv | [本地 STT/TTS + QuickTalk](docs/zh/model-deployment/local-quicktalk-audio.md) |
| 高质量远端推理 | `flashtalk` / `flashhead` / `fasterliveportrait` + OmniRT | 多卡 GPU、Ascend 910B2、远端 GPU 服务 | 多卡、GPU/NPU、生产隔离、更高画质或视频克隆 | [FlashTalk](docs/zh/model-deployment/flashtalk.md) / [FasterLivePortrait](docs/zh/model-deployment/fasterliveportrait.md) |
| Docker / 生产部署 | API、Web、Worker、外部模型服务分离 | 单机 GPU、远端 GPU、分布式集群 | 服务化部署、远端 GPU、分布式和生产验证 | [部署文档](docs/zh/user-guide/deployment.md) |

View File

@@ -24,6 +24,7 @@ router = APIRouter(prefix="/tts", tags=["tts"])
logger = logging.getLogger(__name__)
MAX_PREVIEW_TEXT_CHARS = 1000
LOCAL_COSYVOICE_PREVIEW_SECONDS = 3.0
_INDEXTTS_PROVIDERS = {"indextts", "local_indextts", "omnirt_indextts"}
PreviewUploadFile = UploadFile | StarletteUploadFile
@@ -36,6 +37,12 @@ class TTSPreviewRequest(BaseModel):
indextts_config: dict[str, Any] | None = None
def _preview_sample_limit(provider: str | None, sample_rate: int) -> int | None:
if provider == "local_cosyvoice":
return max(1, int(sample_rate * LOCAL_COSYVOICE_PREVIEW_SECONDS))
return None
def _wav_bytes(chunks: list[np.ndarray], sample_rate: int) -> bytes:
pcm = np.concatenate(chunks) if chunks else np.zeros(0, dtype=np.int16)
pcm = np.asarray(pcm, dtype="<i2").reshape(-1)
@@ -215,12 +222,17 @@ async def preview_tts(request: Request) -> Response:
)
chunks: list[np.ndarray] = []
effective_sample_rate = sample_rate
sample_limit = _preview_sample_limit(provider, sample_rate)
total_samples = 0
try:
async for chunk in tts.synthesize_stream(text, voice=voice):
arr = np.asarray(chunk.data, dtype=np.int16).reshape(-1)
if arr.size:
chunks.append(arr.copy())
total_samples += int(arr.size)
effective_sample_rate = int(chunk.sample_rate or effective_sample_rate)
if sample_limit is not None and total_samples >= sample_limit:
break
except Exception as exc:
raise HTTPException(status_code=502, detail=f"TTS preview failed: {exc}") from exc
finally:

View File

@@ -246,6 +246,45 @@ def test_tts_preview_form_passes_indextts_emotion_audio_file(monkeypatch):
assert calls[0]["emotion_audio_bytes"] == b"RIFFemotion"
def test_tts_preview_local_cosyvoice_returns_after_enough_preview_audio(monkeypatch):
from apps.api.routes import tts_preview
yielded: list[int] = []
class FakeTTS:
async def synthesize_stream(self, text: str, voice: str | None = None):
for i in range(20):
yielded.append(i)
yield AudioChunk(
data=np.ones(16000, dtype=np.int16),
sample_rate=16000,
duration_ms=1000.0,
)
def fake_build_tts_adapter(**kwargs):
return FakeTTS()
monkeypatch.setattr(tts_preview, 'build_tts_adapter', fake_build_tts_adapter)
app = FastAPI()
app.include_router(tts_preview.router)
client = TestClient(app)
response = client.post(
'/tts/preview',
json={
'text': '你好,我正在测试音色。',
'voice': 'local-office-serena',
'tts_provider': 'local_cosyvoice',
'tts_model': 'FunAudioLLM/Fun-CosyVoice3-0.5B-2512',
},
)
assert response.status_code == 200
assert response.content.startswith(b'RIFF')
assert 1 <= len(yielded) < 20
def test_tts_preview_rejects_empty_text():
from apps.api.routes import tts_preview

View File

@@ -60,13 +60,18 @@ OPENTALKING_TTS_DASHSCOPE_API_KEY=<dashscope-tts-key>
## Install and Models
```bash title="terminal"
uv sync --extra dev --extra models --extra local-audio --extra local-cosyvoice-service --extra quicktalk-cuda --python 3.11
uv sync --extra dev --extra models --extra local-audio --extra quicktalk-cuda --python 3.11
python scripts/download_local_audio_models.py \
--root ./models/local-audio \
--model sensevoice-small \
--model fun-cosyvoice3-0.5b-2512
```
Use the main `.venv` for OpenTalking, SenseVoice, and QuickTalk. Create a
separate CosyVoice sidecar venv after the runtime checkout.
For CosyVoice3 model sources and the optional fp16 TensorRT ONNX files, see [TTS deployment](../tts.md#local-cosyvoice3-05b).
Prepare QuickTalk weights as described in [QuickTalk Local](../quicktalk/local.md). Put the CosyVoice runtime under the model directory:
```bash title="terminal"
@@ -74,6 +79,9 @@ mkdir -p ./models/local-audio/runtime
git clone https://github.com/FunAudioLLM/CosyVoice.git ./models/local-audio/runtime/CosyVoice
cd ./models/local-audio/runtime/CosyVoice
git submodule update --init --recursive
cd "$DIGITAL_HUMAN_HOME/opentalking"
OPENTALKING_COSYVOICE_VENV_DIR=.venv-cosyvoice \
bash scripts/prepare_cosyvoice_venv.sh
```
## Start
@@ -81,8 +89,7 @@ git submodule update --init --recursive
Start the local TTS service first:
```bash title="terminal"
OPENTALKING_TTS_LOCAL_COSYVOICE_PRELOAD=1 \
python scripts/local_cosyvoice_service.py --host 127.0.0.1 --port 19090
bash scripts/quickstart/start_local_cosyvoice.sh --port 19090
```
Then start OpenTalking:

View File

@@ -60,12 +60,41 @@ export UV_DEFAULT_INDEX="${UV_DEFAULT_INDEX:-https://pypi.tuna.tsinghua.edu.cn/s
export UV_INDEX_URL="${UV_INDEX_URL:-https://pypi.tuna.tsinghua.edu.cn/simple}"
export PIP_INDEX_URL="${PIP_INDEX_URL:-https://pypi.tuna.tsinghua.edu.cn/simple}"
export UV_LINK_MODE=copy
uv sync --extra dev --extra models --extra local-audio --extra local-cosyvoice-service --python 3.11
uv sync --extra dev --extra models --extra local-audio --python 3.11
.venv/bin/python scripts/download_local_audio_models.py \
--root ./models/local-audio \
--model fun-cosyvoice3-0.5b-2512
```
This downloads the base CosyVoice3 model from ModelScope:
| Asset | Source | Destination |
|---|---|---|
| Base CosyVoice3 weights | ModelScope `FunAudioLLM/Fun-CosyVoice3-0.5B-2512` | `./models/local-audio/FunAudioLLM__Fun-CosyVoice3-0.5B-2512/` |
The base model directory must include the files used by the sidecar runtime,
including `cosyvoice3.yaml`, `llm.pt`, `flow.pt`, `hift.pt`,
`speech_tokenizer_v3.onnx`, `speech_tokenizer_v3.batch.onnx`, `campplus.onnx`,
and `flow.decoder.estimator.fp32.onnx`. The built-in zero-shot voice also needs
a prompt wav configured by `OPENTALKING_TTS_LOCAL_COSYVOICE_PROMPT_AUDIO`; cloned
voices store their own prompt wav under the local voice directory.
For fp16 TensorRT, download the extra ONNX assets from Hugging Face and place
them in the same base model directory:
| Asset | Source | Required for |
|---|---|---|
| `flow.decoder.estimator.autocast_fp16.onnx` | Hugging Face `yuekai/Fun-CosyVoice3-0.5B-2512-FP16-ONNX` | `FP16 + LOAD_TRT=1`; OpenTalking builds `flow.decoder.estimator.autocast_fp16.mygpu.plan` from it. |
| `flow.decoder.estimator.streaming.autocast_fp16.onnx` | Hugging Face `yuekai/Fun-CosyVoice3-0.5B-2512-FP16-ONNX` | Optional streaming fp16 ONNX asset; keep beside the estimator ONNX for runtime compatibility. |
The generated `*.mygpu.plan` files are machine-specific TensorRT engines. Do not
copy them between different GPU / TensorRT / CUDA environments; rebuild them on
the target host from the ONNX files.
This main `.venv` is for OpenTalking, SenseVoice, and the video backend. Keep
CosyVoice in its own sidecar venv so its `transformers==4.51.3` runtime does not
conflict with OpenTalking's `transformers>=4.57,<6`.
Prepare the CosyVoice runtime:
```bash title="Terminal"
@@ -75,18 +104,50 @@ cd ./models/local-audio/runtime/CosyVoice
git submodule update --init --recursive
```
Create or update the CosyVoice sidecar venv:
```bash title="Terminal"
OPENTALKING_COSYVOICE_VENV_DIR=.venv-cosyvoice \
bash scripts/prepare_cosyvoice_venv.sh
```
If you need TensorRT, install the TRT dependencies into the CosyVoice sidecar venv, not into OpenTalking's main `.venv`:
```bash title="Terminal"
PIP_EXTRA_INDEX_URL=https://pypi.nvidia.com/ OPENTALKING_COSYVOICE_INSTALL_TENSORRT=1 \
OPENTALKING_COSYVOICE_VENV_DIR=.venv-cosyvoice bash scripts/prepare_cosyvoice_venv.sh
```
Start the local TTS service:
```bash title="Terminal"
OPENTALKING_TTS_LOCAL_COSYVOICE_PRELOAD=1 \
python scripts/local_cosyvoice_service.py --host 127.0.0.1 --port 19090
bash scripts/quickstart/start_local_cosyvoice.sh --port 19090
```
In prior GPU validation, the main CosyVoice3 issue was not a single TTFA number but seed-dependent output-length drift. The local CosyVoice service therefore keeps two stability guards on by default: `OPENTALKING_TTS_LOCAL_COSYVOICE_MASK_STOP_TOKENS=1` masks every stop token exposed by the CosyVoice LLM, and `OPENTALKING_TTS_LOCAL_COSYVOICE_MAX_TOKEN_TEXT_RATIO=6` bounds the token/text ratio so long prompts do not occasionally produce runaway audio. Keep these guards enabled for realtime use.
TensorRT is optional. Enable it only after the current CosyVoice runtime, CUDA, onnxruntime-gpu/TensorRT engines, and model directory are compatible:
```env title=".env"
```bash title="Terminal"
.venv-cosyvoice/bin/python -c "import tensorrt as trt; print(trt.__version__)"
test -f ./models/local-audio/FunAudioLLM__Fun-CosyVoice3-0.5B-2512/flow.decoder.estimator.fp32.onnx
```
For CosyVoice3 fp16 TRT, prefer the official autocast fp16 ONNX asset. A TRT engine can be built from `flow.decoder.estimator.fp32.onnx`, but some GPU/TensorRT combinations can produce NaNs or silent audio. Before enabling `FP16 + LOAD_TRT`, place `flow.decoder.estimator.autocast_fp16.onnx` in the same model directory. If the server needs a proxy for Hugging Face, inject proxy variables only for the download command; do not add them to the OpenTalking service env:
```bash title="Terminal"
env ALL_PROXY=socks5h://127.0.0.1:7890 HTTPS_PROXY=socks5h://127.0.0.1:7890 \
HF_ENDPOINT=https://huggingface.co .venv-cosyvoice/bin/python - <<'PY'
from huggingface_hub import hf_hub_download
repo = "yuekai/Fun-CosyVoice3-0.5B-2512-FP16-ONNX"
target = "./models/local-audio/FunAudioLLM__Fun-CosyVoice3-0.5B-2512"
for name in ["flow.decoder.estimator.autocast_fp16.onnx", "flow.decoder.estimator.streaming.autocast_fp16.onnx"]:
hf_hub_download(repo_id=repo, filename=name, repo_type="model", local_dir=target)
PY
```
```env title="scripts/quickstart/env"
OPENTALKING_TTS_LOCAL_COSYVOICE_FP16=auto
OPENTALKING_TTS_LOCAL_COSYVOICE_LOAD_TRT=1
OPENTALKING_TTS_LOCAL_COSYVOICE_TRT_CONCURRENT=1
OPENTALKING_TTS_LOCAL_COSYVOICE_TOKEN_HOP_LEN=8
@@ -94,12 +155,26 @@ OPENTALKING_TTS_LOCAL_COSYVOICE_TOKEN_MAX_HOP_LEN=16
OPENTALKING_TTS_LOCAL_COSYVOICE_STREAM_SCALE_FACTOR=1
```
After startup, check the sidecar health payload and verify `runtime_flags.load_trt`, `streaming`, `llm_token_ratio`, and `llm_stop_token_patch`:
`start_local_cosyvoice.sh` automatically adds the sidecar venv's `site-packages/tensorrt_libs` directory to `LD_LIBRARY_PATH`. On first startup with `FP16 + LOAD_TRT=1`, if `flow.decoder.estimator.autocast_fp16.onnx` exists in the model directory, OpenTalking builds the GPU-specific `flow.decoder.estimator.autocast_fp16.mygpu.plan` from it; this can take longer than a normal startup. SenseVoice still runs in the OpenTalking main `.venv` and should not follow the CosyVoice TRT settings.
After startup, check the sidecar health payload and verify `runtime_flags.load_trt`, `runtime.trt_autocast_fp16`, `streaming`, `llm_token_ratio`, and `llm_stop_token_patch`:
```bash title="Terminal"
curl -fsS http://127.0.0.1:19090/health | python3 -m json.tool
```
Measured on a Linux server with an NVIDIA RTX 3090, CosyVoice3 sidecar venv,
`FP16 + LOAD_TRT=1`, and the autocast fp16 TensorRT plan loaded. The benchmark
called the sidecar `/synthesize` endpoint directly and measured first PCM byte
arrival as TTFB:
| Text length | TTFB | Wall time | Audio duration | RTF |
|---:|---:|---:|---:|---:|
| 43 chars | 0.683 s | 6.215 s | 7.200 s | 0.863 |
| 42 chars | 0.642 s | 5.858 s | 6.960 s | 0.842 |
| 29 chars | 0.639 s | 5.771 s | 6.520 s | 0.885 |
| **Average** | **0.655 s** | **5.948 s** | **6.893 s** | **0.863** |
For the full local speech input, speech synthesis, and QuickTalk video chain, see [Local STT/TTS + QuickTalk](recipes/local-quicktalk-audio.md).
## IndexTTS Deployment (provider = indextts)

View File

@@ -60,13 +60,18 @@ OPENTALKING_TTS_DASHSCOPE_API_KEY=<dashscope-tts-key>
## 安装与模型
```bash title="终端"
uv sync --extra dev --extra models --extra local-audio --extra local-cosyvoice-service --extra quicktalk-cuda --python 3.11
uv sync --extra dev --extra models --extra local-audio --extra quicktalk-cuda --python 3.11
python scripts/download_local_audio_models.py \
--root ./models/local-audio \
--model sensevoice-small \
--model fun-cosyvoice3-0.5b-2512
```
主 `.venv` 只负责 OpenTalking、SenseVoice 和 QuickTalk。CosyVoice runtime
准备好后,创建独立 sidecar venv。
CosyVoice3 主权重来源和可选 fp16 TensorRT ONNX 文件见 [TTS 部署](../tts.md)。
QuickTalk 权重按 [QuickTalk Local](../quicktalk/local.md) 页面准备。CosyVoice runtime 放在模型目录下即可:
```bash title="终端"
@@ -74,6 +79,9 @@ mkdir -p ./models/local-audio/runtime
git clone https://github.com/FunAudioLLM/CosyVoice.git ./models/local-audio/runtime/CosyVoice
cd ./models/local-audio/runtime/CosyVoice
git submodule update --init --recursive
cd "$DIGITAL_HUMAN_HOME/opentalking"
OPENTALKING_COSYVOICE_VENV_DIR=.venv-cosyvoice \
bash scripts/prepare_cosyvoice_venv.sh
```
## 启动
@@ -81,8 +89,7 @@ git submodule update --init --recursive
先启动本地 TTS service
```bash title="终端"
OPENTALKING_TTS_LOCAL_COSYVOICE_PRELOAD=1 \
python scripts/local_cosyvoice_service.py --host 127.0.0.1 --port 19090
bash scripts/quickstart/start_local_cosyvoice.sh --port 19090
```
再启动 OpenTalking

View File

@@ -60,12 +60,39 @@ export UV_DEFAULT_INDEX="${UV_DEFAULT_INDEX:-https://pypi.tuna.tsinghua.edu.cn/s
export UV_INDEX_URL="${UV_INDEX_URL:-https://pypi.tuna.tsinghua.edu.cn/simple}"
export PIP_INDEX_URL="${PIP_INDEX_URL:-https://pypi.tuna.tsinghua.edu.cn/simple}"
export UV_LINK_MODE=copy
uv sync --extra dev --extra models --extra local-audio --extra local-cosyvoice-service --python 3.11
uv sync --extra dev --extra models --extra local-audio --python 3.11
.venv/bin/python scripts/download_local_audio_models.py \
--root ./models/local-audio \
--model fun-cosyvoice3-0.5b-2512
```
这一步会从 ModelScope 下载 CosyVoice3 主模型:
| 资产 | 来源 | 目标目录 |
|---|---|---|
| CosyVoice3 主权重 | ModelScope `FunAudioLLM/Fun-CosyVoice3-0.5B-2512` | `./models/local-audio/FunAudioLLM__Fun-CosyVoice3-0.5B-2512/` |
主模型目录至少需要包含 sidecar runtime 会加载的文件,包括 `cosyvoice3.yaml`、
`llm.pt`、`flow.pt`、`hift.pt`、`speech_tokenizer_v3.onnx`、
`speech_tokenizer_v3.batch.onnx`、`campplus.onnx` 和
`flow.decoder.estimator.fp32.onnx`。内置 zero-shot 音色还需要
`OPENTALKING_TTS_LOCAL_COSYVOICE_PROMPT_AUDIO` 指向一段 prompt wav复刻音色会把
自己的 prompt wav 保存在本地音色目录。
如果要启用 fp16 TensorRT再从 Hugging Face 下载额外 ONNX 资产,并放到同一个主模型目录:
| 资产 | 来源 | 用途 |
|---|---|---|
| `flow.decoder.estimator.autocast_fp16.onnx` | Hugging Face `yuekai/Fun-CosyVoice3-0.5B-2512-FP16-ONNX` | `FP16 + LOAD_TRT=1` 必需OpenTalking 会由它生成 `flow.decoder.estimator.autocast_fp16.mygpu.plan`。 |
| `flow.decoder.estimator.streaming.autocast_fp16.onnx` | Hugging Face `yuekai/Fun-CosyVoice3-0.5B-2512-FP16-ONNX` | 可选 streaming fp16 ONNX 资产;建议和 estimator ONNX 放在一起,保持 runtime 兼容。 |
生成的 `*.mygpu.plan` 是和机器绑定的 TensorRT engine不要跨 GPU / TensorRT /
CUDA 环境复制;在目标机器上由 ONNX 重新构建。
这个主 `.venv` 负责 OpenTalking、SenseVoice 和视频后端。CosyVoice 需要独立
sidecar venv避免它的 `transformers==4.51.3` runtime 与 OpenTalking 的
`transformers>=4.57,<6` 冲突。
准备 CosyVoice runtime
```bash title="终端"
@@ -75,18 +102,50 @@ cd ./models/local-audio/runtime/CosyVoice
git submodule update --init --recursive
```
创建或更新 CosyVoice 专用 sidecar venv
```bash title="终端"
OPENTALKING_COSYVOICE_VENV_DIR=.venv-cosyvoice \
bash scripts/prepare_cosyvoice_venv.sh
```
如果要启用 TensorRT把 TRT 依赖安装在 CosyVoice sidecar venv 中,不要安装进 OpenTalking 主 `.venv`
```bash title="终端"
PIP_EXTRA_INDEX_URL=https://pypi.nvidia.com/ OPENTALKING_COSYVOICE_INSTALL_TENSORRT=1 \
OPENTALKING_COSYVOICE_VENV_DIR=.venv-cosyvoice bash scripts/prepare_cosyvoice_venv.sh
```
启动本地 TTS service
```bash title="终端"
OPENTALKING_TTS_LOCAL_COSYVOICE_PRELOAD=1 \
python scripts/local_cosyvoice_service.py --host 127.0.0.1 --port 19090
bash scripts/quickstart/start_local_cosyvoice.sh --port 19090
```
在既有 GPU 验证中CosyVoice3 的关键问题不是单次 TTFA而是随机种子导致的生成长度漂移。OpenTalking 的本地 CosyVoice service 因此默认保留两类稳定性保护:`OPENTALKING_TTS_LOCAL_COSYVOICE_MASK_STOP_TOKENS=1` 会屏蔽 CosyVoice LLM 暴露的全部 stop token`OPENTALKING_TTS_LOCAL_COSYVOICE_MAX_TOKEN_TEXT_RATIO=6` 会限制 token/text 比例,避免长文本偶发生成过长音频。不要为了追求更快首包把这两个保护关掉。
TensorRT 是可选加速。只有当当前 CosyVoice runtime、CUDA、onnxruntime-gpu/TensorRT engine 与模型目录匹配时再开启:
```env title=".env"
```bash title="终端"
.venv-cosyvoice/bin/python -c "import tensorrt as trt; print(trt.__version__)"
test -f ./models/local-audio/FunAudioLLM__Fun-CosyVoice3-0.5B-2512/flow.decoder.estimator.fp32.onnx
```
CosyVoice3 的 fp16 TRT 推荐使用官方 autocast fp16 ONNX 资产。普通 `flow.decoder.estimator.fp32.onnx` 可以生成 TRT engine但在部分 GPU/TensorRT 组合上会出现 NaN 或静音;如果要开启 `FP16 + LOAD_TRT`,先把 `flow.decoder.estimator.autocast_fp16.onnx` 放到同一个模型目录。服务器需要代理访问 Hugging Face 时,只给下载命令临时注入代理变量即可,不要写入 OpenTalking 主服务环境:
```bash title="终端"
env ALL_PROXY=socks5h://127.0.0.1:7890 HTTPS_PROXY=socks5h://127.0.0.1:7890 \
HF_ENDPOINT=https://huggingface.co .venv-cosyvoice/bin/python - <<'PY'
from huggingface_hub import hf_hub_download
repo = "yuekai/Fun-CosyVoice3-0.5B-2512-FP16-ONNX"
target = "./models/local-audio/FunAudioLLM__Fun-CosyVoice3-0.5B-2512"
for name in ["flow.decoder.estimator.autocast_fp16.onnx", "flow.decoder.estimator.streaming.autocast_fp16.onnx"]:
hf_hub_download(repo_id=repo, filename=name, repo_type="model", local_dir=target)
PY
```
```env title="scripts/quickstart/env"
OPENTALKING_TTS_LOCAL_COSYVOICE_FP16=auto
OPENTALKING_TTS_LOCAL_COSYVOICE_LOAD_TRT=1
OPENTALKING_TTS_LOCAL_COSYVOICE_TRT_CONCURRENT=1
OPENTALKING_TTS_LOCAL_COSYVOICE_TOKEN_HOP_LEN=8
@@ -94,12 +153,25 @@ OPENTALKING_TTS_LOCAL_COSYVOICE_TOKEN_MAX_HOP_LEN=16
OPENTALKING_TTS_LOCAL_COSYVOICE_STREAM_SCALE_FACTOR=1
```
启动后先检查 sidecar 健康信息,确认 `runtime_flags.load_trt`、`streaming`、`llm_token_ratio` 和 `llm_stop_token_patch` 符合预期:
`start_local_cosyvoice.sh` 会自动把 sidecar venv 里的 `site-packages/tensorrt_libs` 加入 `LD_LIBRARY_PATH`。首次启动 `FP16 + LOAD_TRT=1` 时,如果模型目录里存在 `flow.decoder.estimator.autocast_fp16.onnx`OpenTalking 会从它生成当前 GPU 对应的 `flow.decoder.estimator.autocast_fp16.mygpu.plan`这个步骤可能比普通启动更久。SenseVoice 仍然运行在 OpenTalking 主 `.venv`,不需要也不应该跟随 CosyVoice TRT 配置。
启动后先检查 sidecar 健康信息,确认 `runtime_flags.load_trt`、`runtime.trt_autocast_fp16`、`streaming`、`llm_token_ratio` 和 `llm_stop_token_patch` 符合预期:
```bash title="终端"
curl -fsS http://127.0.0.1:19090/health | python3 -m json.tool
```
在 NVIDIA RTX 3090 Linux 服务器上实测CosyVoice3 使用独立 sidecar venv已加载
`FP16 + LOAD_TRT=1` 和 autocast fp16 TensorRT plan。测试直接请求 sidecar 的
`/synthesize`TTFB 按第一批 PCM 字节到达时间计算:
| 文本长度 | TTFB | 总耗时 | 音频时长 | RTF |
|---:|---:|---:|---:|---:|
| 43 字 | 0.683 s | 6.215 s | 7.200 s | 0.863 |
| 42 字 | 0.642 s | 5.858 s | 6.960 s | 0.842 |
| 29 字 | 0.639 s | 5.771 s | 6.520 s | 0.885 |
| **平均** | **0.655 s** | **5.948 s** | **6.893 s** | **0.863** |
完整本地语音输入、语音合成和 QuickTalk 视频链路见 [本地 STT/TTS + QuickTalk](recipes/local-quicktalk-audio.md)。
## IndexTTS 部署provider = indextts

View File

@@ -104,7 +104,7 @@ quicktalk-cpu = [
]
quicktalk-cuda = [
"imageio-ffmpeg>=0.5",
"onnxruntime-gpu>=1.24.0",
"onnxruntime-gpu>=1.24.0,<1.27",
]
local-cosyvoice-service = [
"fastapi>=0.109",

View File

@@ -1,6 +1,7 @@
from __future__ import annotations
import argparse
import importlib
import io
import os
import sys
@@ -18,6 +19,112 @@ from fastapi.responses import StreamingResponse
from pydantic import BaseModel
def _soundfile_load_wav(wav: str, target_sr: int):
import torch
audio, sr = sf.read(wav, dtype="float32", always_2d=False)
arr = np.asarray(audio, dtype=np.float32)
if arr.ndim > 1:
arr = arr.mean(axis=1)
tensor = torch.from_numpy(arr).unsqueeze(0)
if int(sr) == int(target_sr):
return tensor
try:
import torchaudio.functional as AF
return AF.resample(tensor, int(sr), int(target_sr))
except Exception:
import torch.nn.functional as F
n_dst = max(1, int(round(tensor.shape[-1] * int(target_sr) / int(sr))))
return F.interpolate(
tensor.unsqueeze(0),
size=n_dst,
mode="linear",
align_corners=False,
).squeeze(0)
def _build_strongly_typed_trt(trt_model: str, trt_kwargs: dict[str, Any], onnx_model: str) -> None:
import tensorrt as trt
logger = trt.Logger(trt.Logger.INFO)
builder = trt.Builder(logger)
network_flags = 1 << int(trt.NetworkDefinitionCreationFlag.STRONGLY_TYPED)
network = builder.create_network(network_flags)
parser = trt.OnnxParser(network, logger)
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 32)
profile = builder.create_optimization_profile()
with open(onnx_model, "rb") as f:
if not parser.parse(f.read()):
errors = [str(parser.get_error(i)) for i in range(parser.num_errors)]
raise RuntimeError(f"failed to parse {onnx_model}: {'; '.join(errors)}")
for i, name in enumerate(trt_kwargs["input_names"]):
profile.set_shape(name, trt_kwargs["min_shape"][i], trt_kwargs["opt_shape"][i], trt_kwargs["max_shape"][i])
config.add_optimization_profile(profile)
engine_bytes = builder.build_serialized_network(network, config)
if engine_bytes is None:
raise RuntimeError(f"failed to build TensorRT engine from {onnx_model}")
with open(trt_model, "wb") as f:
f.write(engine_bytes)
def _patch_cosyvoice_autocast_fp16_trt() -> None:
try:
import cosyvoice.cli.model as cosy_model
except Exception:
return
if getattr(cosy_model, "_opentalking_autocast_fp16_trt_patched", False):
return
original_convert = cosy_model.convert_onnx_to_trt
original_load_trt = cosy_model.CosyVoiceModel.load_trt
def convert_onnx_to_trt(trt_model, trt_kwargs, onnx_model, fp16):
onnx_path = Path(str(onnx_model))
if fp16 and onnx_path.name == "flow.decoder.estimator.autocast_fp16.onnx":
print(f"building strongly-typed autocast fp16 TensorRT engine: {trt_model}", flush=True)
return _build_strongly_typed_trt(str(trt_model), trt_kwargs, str(onnx_model))
return original_convert(trt_model, trt_kwargs, onnx_model, fp16)
def load_trt(self, flow_decoder_estimator_model, flow_decoder_onnx_model, trt_concurrent, fp16):
if fp16:
model_dir = Path(str(flow_decoder_estimator_model)).parent
autocast_onnx = model_dir / "flow.decoder.estimator.autocast_fp16.onnx"
if autocast_onnx.exists():
flow_decoder_estimator_model = str(model_dir / "flow.decoder.estimator.autocast_fp16.mygpu.plan")
flow_decoder_onnx_model = str(autocast_onnx)
setattr(self, "_opentalking_trt_autocast_fp16", True)
setattr(self, "_opentalking_trt_plan", flow_decoder_estimator_model)
setattr(self, "_opentalking_trt_onnx", flow_decoder_onnx_model)
print(
"using CosyVoice autocast fp16 TensorRT asset "
f"onnx={flow_decoder_onnx_model} plan={flow_decoder_estimator_model}",
flush=True,
)
return original_load_trt(self, flow_decoder_estimator_model, flow_decoder_onnx_model, trt_concurrent, fp16)
cosy_model.convert_onnx_to_trt = convert_onnx_to_trt
cosy_model.CosyVoiceModel.load_trt = load_trt
cosy_model._opentalking_autocast_fp16_trt_patched = True
print("patched cosyvoice autocast fp16 TensorRT loader", flush=True)
def _patch_cosyvoice_load_wav() -> None:
patched: list[str] = []
for module_name in ("cosyvoice.utils.file_utils", "cosyvoice.cli.frontend"):
try:
module = importlib.import_module(module_name)
except Exception:
continue
setattr(module, "load_wav", _soundfile_load_wav)
patched.append(module_name)
if patched:
print(f"patched cosyvoice load_wav via soundfile modules={','.join(patched)}", flush=True)
class SynthesizeRequest(BaseModel):
text: str
voice: str | None = None
@@ -79,6 +186,15 @@ def apply_streaming_tuning(
return {"requested": requested, "applied": applied, "effective": effective}
def ensure_cosyvoice_flow_half(cosyvoice: Any) -> bool:
model = _cosyvoice_model(cosyvoice)
flow = getattr(model, "flow", None)
if flow is None or not hasattr(flow, "half"):
return False
flow.half()
return True
def reset_streaming_tuning(cosyvoice: Any) -> dict[str, Any]:
model = _cosyvoice_model(cosyvoice)
baseline = getattr(model, "_opentalking_streaming_tuning", None)
@@ -207,6 +323,9 @@ def current_runtime_info(cosyvoice: Any) -> dict[str, Any]:
"fp16": bool(getattr(cosyvoice, "fp16", False)),
"flow_decoder_estimator": estimator_type,
"flow_decoder_trt": estimator_type == "TrtContextWrapper",
"trt_autocast_fp16": bool(getattr(model, "_opentalking_trt_autocast_fp16", False)),
"trt_plan": getattr(model, "_opentalking_trt_plan", ""),
"trt_onnx": getattr(model, "_opentalking_trt_onnx", ""),
}
@@ -293,8 +412,10 @@ class CosyVoiceService:
for path in (runtime, matcha):
if str(path) not in sys.path:
sys.path.insert(0, str(path))
_patch_cosyvoice_load_wav()
try:
from cosyvoice.cli.cosyvoice import AutoModel
_patch_cosyvoice_autocast_fp16_trt()
except ImportError as exc:
raise RuntimeError(
"CosyVoice runtime is not importable. Clone FunAudioLLM/CosyVoice and install its requirements in this service venv."
@@ -318,6 +439,9 @@ class CosyVoiceService:
"trt_concurrent": self.trt_concurrent,
}
self._model, self._loaded_model_kwargs = _instantiate_automodel(AutoModel, model_kwargs)
flow_half_applied = False
if self.load_trt and self.fp16:
flow_half_applied = ensure_cosyvoice_flow_half(self._model)
self._apply_runtime_tuning()
# Keep the service zero-shot first so it does not require precomputed spk2info.pt.
print(
@@ -325,6 +449,7 @@ class CosyVoiceService:
f"model={self.model_dir} runtime={runtime} device={self.device} "
f"fp16={self.fp16} load_jit={self.load_jit} load_trt={self.load_trt} "
f"load_vllm={self.load_vllm} trt_concurrent={self.trt_concurrent} "
f"flow_half_applied={flow_half_applied} "
f"seconds={time.perf_counter() - t0:.3f}",
flush=True,
)
@@ -386,6 +511,7 @@ class CosyVoiceService:
"torch",
"torchaudio",
"numpy",
"onnxruntime-gpu",
"onnxruntime",
),
}
@@ -542,9 +668,10 @@ class CosyVoiceService:
self.model()
return
req = SynthesizeRequest(text=warmup_text)
# Exhaust the stream so CosyVoice releases its request state and model lock.
stream, _sr = self.synthesize_pcm_stream(req)
for _chunk in stream:
break
pass
def create_app(service: CosyVoiceService) -> FastAPI:

121
scripts/prepare_cosyvoice_venv.sh Executable file
View File

@@ -0,0 +1,121 @@
#!/usr/bin/env bash
set -euo pipefail
script_dir="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
repo_root="$(cd -- "$script_dir/.." && pwd)"
venv_dir="${OPENTALKING_COSYVOICE_VENV_DIR:-$repo_root/.venv-cosyvoice}"
runtime_dir="${OPENTALKING_TTS_LOCAL_COSYVOICE_RUNTIME_DIR:-$repo_root/models/local-audio/runtime/CosyVoice}"
requirements_file="${OPENTALKING_COSYVOICE_REQUIREMENTS:-$runtime_dir/requirements.txt}"
export PIP_INDEX_URL="${PIP_INDEX_URL:-https://pypi.tuna.tsinghua.edu.cn/simple}"
if [[ ! -d "$runtime_dir" ]]; then
echo "Missing CosyVoice runtime: $runtime_dir" >&2
echo "Clone FunAudioLLM/CosyVoice there, or set OPENTALKING_TTS_LOCAL_COSYVOICE_RUNTIME_DIR." >&2
exit 1
fi
if [[ ! -f "$requirements_file" ]]; then
echo "Missing CosyVoice requirements: $requirements_file" >&2
exit 1
fi
resolve_bootstrap_python() {
if [[ -n "${OPENTALKING_COSYVOICE_BOOTSTRAP_PYTHON:-}" ]]; then
printf '%s\n' "$OPENTALKING_COSYVOICE_BOOTSTRAP_PYTHON"
return 0
fi
if [[ -n "${PYTHON:-}" ]]; then
printf '%s\n' "$PYTHON"
return 0
fi
if command -v python3.11 >/dev/null 2>&1; then
command -v python3.11
return 0
fi
if command -v uv >/dev/null 2>&1; then
if uv python find 3.11 >/dev/null 2>&1; then
uv python find 3.11
return 0
fi
uv python install 3.11 >/dev/null
uv python find 3.11
return 0
fi
command -v python3
}
python_bin="$(resolve_bootstrap_python)"
if [[ ! -x "$venv_dir/bin/python" ]]; then
echo "Creating CosyVoice venv: $venv_dir"
"$python_bin" -m venv "$venv_dir"
fi
venv_python="$venv_dir/bin/python"
tmp_dir="${OPENTALKING_COSYVOICE_TMPDIR:-$venv_dir/.tmp}"
pip_cache_dir="${OPENTALKING_COSYVOICE_PIP_CACHE_DIR:-$venv_dir/.pip-cache}"
mkdir -p "$tmp_dir" "$pip_cache_dir"
export TMPDIR="$tmp_dir"
export PIP_CACHE_DIR="$pip_cache_dir"
find "$tmp_dir" -mindepth 1 -maxdepth 1 -name 'pip-*' -exec rm -rf {} +
pip_install_initial() {
"$venv_python" -m pip install \
--retries "${OPENTALKING_COSYVOICE_PIP_RETRIES:-10}" \
--timeout "${OPENTALKING_COSYVOICE_PIP_TIMEOUT:-120}" \
"$@"
}
echo "Installing CosyVoice runtime dependencies"
pip_install_initial --upgrade "pip<26" "setuptools<81" wheel
pip_common_args=(
--retries "${OPENTALKING_COSYVOICE_PIP_RETRIES:-10}"
--timeout "${OPENTALKING_COSYVOICE_PIP_TIMEOUT:-120}"
)
if "$venv_python" -m pip install --help | grep -q -- '--resume-retries'; then
pip_common_args+=(--resume-retries "${OPENTALKING_COSYVOICE_PIP_RESUME_RETRIES:-10}")
fi
pip_install() {
"$venv_python" -m pip install "${pip_common_args[@]}" "$@"
}
pip_install "numpy==1.26.4" "Cython>=3.0"
filtered_requirements="$(mktemp)"
trap 'rm -f "$filtered_requirements"' EXIT
filter_pattern='^[[:space:]]*(openai-whisper|pyworld|torch|torchaudio)=='
if [[ "${OPENTALKING_COSYVOICE_INSTALL_TENSORRT:-0}" != "1" ]]; then
filter_pattern='^[[:space:]]*(openai-whisper|pyworld|torch|torchaudio|tensorrt-cu12.*)=='
fi
grep -Ev "$filter_pattern" "$requirements_file" \
| grep -Ev '^[[:space:]]*--extra-index-url[[:space:]]+https://download\.pytorch\.org/' \
>"$filtered_requirements"
pip_install \
"torch==2.3.1" \
"torchaudio==2.3.1"
pip_install -r "$filtered_requirements"
pip_install --no-build-isolation "openai-whisper==20231117"
pip_install --no-build-isolation "pyworld==0.3.4"
echo "Installing OpenTalking sidecar service dependencies"
pip_install \
"fastapi>=0.109" \
"uvicorn[standard]>=0.27" \
"pydantic>=2" \
"numpy>=1.24,<2" \
"soundfile>=0.12" \
"transformers==4.51.3"
"$venv_python" - <<'PY'
import importlib.metadata as metadata
for package in ("transformers", "tokenizers", "torch", "torchaudio", "onnxruntime-gpu", "onnxruntime"):
try:
print(f"{package}={metadata.version(package)}")
except metadata.PackageNotFoundError:
print(f"{package}=missing")
PY
echo "CosyVoice venv is ready: $venv_dir"

View File

@@ -45,6 +45,11 @@
# Local CosyVoice3 sidecar. Keep TensorRT off until the CosyVoice runtime has
# built/loaded compatible TRT engines for this GPU and model directory.
# Prepare the sidecar venv with OPENTALKING_COSYVOICE_INSTALL_TENSORRT=1 before
# setting OPENTALKING_TTS_LOCAL_COSYVOICE_LOAD_TRT=1. The sidecar starter adds
# .venv-cosyvoice/site-packages/tensorrt_libs to LD_LIBRARY_PATH automatically.
# For CosyVoice3 fp16 TRT, place the official flow.decoder.estimator.autocast_fp16.onnx
# in the model directory; OpenTalking will build flow.decoder.estimator.autocast_fp16.mygpu.plan.
# OPENTALKING_TTS_DEFAULT_PROVIDER=local_cosyvoice
# OPENTALKING_TTS_ENABLED_PROVIDERS=local_cosyvoice,dashscope,edge
# OPENTALKING_LOCAL_AUDIO_MODEL_ROOT=$DIGITAL_HUMAN_HOME/models/local-audio

View File

@@ -0,0 +1,205 @@
#!/usr/bin/env bash
set -euo pipefail
script_dir="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
repo_root="$(cd -- "$script_dir/../.." && pwd)"
default_home="$(cd -- "$repo_root/.." && pwd)"
# shellcheck disable=SC1091
source "$script_dir/_helpers.sh"
usage() {
cat <<'USAGE'
Usage:
bash scripts/quickstart/start_local_cosyvoice.sh [--host HOST] [--port PORT] [--env FILE]
Options:
--host HOST Bind host for the local CosyVoice sidecar. Defaults to 127.0.0.1.
--port PORT Bind port. Defaults to OPENTALKING_TTS_LOCAL_COSYVOICE_SERVICE_URL or 19090.
--env FILE Source a quickstart env file before starting the sidecar.
--help Show this help.
USAGE
}
env_file="${OPENTALKING_QUICKSTART_ENV:-$script_dir/env}"
host="${OPENTALKING_TTS_LOCAL_COSYVOICE_HOST:-127.0.0.1}"
port=""
while [[ $# -gt 0 ]]; do
case "$1" in
--host)
if [[ $# -lt 2 ]]; then
echo "Missing value for --host" >&2
exit 2
fi
host="$2"
shift 2
;;
--port)
if [[ $# -lt 2 ]]; then
echo "Missing value for --port" >&2
exit 2
fi
port="$2"
shift 2
;;
--env)
if [[ $# -lt 2 ]]; then
echo "Missing value for --env" >&2
exit 2
fi
env_file="$2"
export OPENTALKING_QUICKSTART_ENV="$env_file"
shift 2
;;
--help|-h)
usage
exit 0
;;
*)
echo "Unknown argument: $1" >&2
usage >&2
exit 2
;;
esac
done
quickstart_source_env "$env_file"
export DIGITAL_HUMAN_HOME="${DIGITAL_HUMAN_HOME:-$default_home}"
run_dir="$DIGITAL_HUMAN_HOME/run"
log_dir="$DIGITAL_HUMAN_HOME/logs"
mkdir -p "$run_dir" "$log_dir"
if [[ -z "$port" ]]; then
port="${OPENTALKING_TTS_LOCAL_COSYVOICE_PORT:-}"
fi
if [[ -z "$port" && -n "${OPENTALKING_TTS_LOCAL_COSYVOICE_SERVICE_URL:-}" ]]; then
port="$(
python3 - <<'PY'
import os
from urllib.parse import urlparse
url = os.environ.get("OPENTALKING_TTS_LOCAL_COSYVOICE_SERVICE_URL", "")
parsed = urlparse(url)
print(parsed.port or "")
PY
)"
fi
port="${port:-19090}"
resolve_cosyvoice_python() {
if [[ -n "${OPENTALKING_COSYVOICE_PYTHON:-}" ]]; then
if [[ -x "$OPENTALKING_COSYVOICE_PYTHON" ]]; then
printf '%s\n' "$OPENTALKING_COSYVOICE_PYTHON"
return 0
fi
echo "OPENTALKING_COSYVOICE_PYTHON is not executable: $OPENTALKING_COSYVOICE_PYTHON" >&2
return 1
fi
local candidate_dir=""
for candidate_dir in \
"${OPENTALKING_COSYVOICE_VENV_DIR:-}" \
"$repo_root/.venv-cosyvoice" \
"$DIGITAL_HUMAN_HOME/.venv-cosyvoice" \
"/root/cosyvoice/.venv"
do
[[ -n "$candidate_dir" ]] || continue
if [[ -x "$candidate_dir/bin/python" ]]; then
printf '%s\n' "$candidate_dir/bin/python"
return 0
fi
done
echo "Missing CosyVoice sidecar venv." >&2
echo "Create it first: OPENTALKING_COSYVOICE_VENV_DIR=\"$repo_root/.venv-cosyvoice\" bash scripts/prepare_cosyvoice_venv.sh" >&2
return 1
}
cosy_python="$(resolve_cosyvoice_python)"
case "$cosy_python" in
"$repo_root/.venv/"*)
echo "Refusing to start local CosyVoice from the OpenTalking main venv: $cosy_python" >&2
echo "Use OPENTALKING_COSYVOICE_VENV_DIR or OPENTALKING_COSYVOICE_PYTHON for the sidecar venv." >&2
exit 1
;;
esac
cosy_site_packages="$($cosy_python - <<'PY'
import sysconfig
print(sysconfig.get_paths().get("purelib", ""))
PY
)"
cosy_trt_lib_dir="$cosy_site_packages/tensorrt_libs"
if [[ -d "$cosy_trt_lib_dir" ]]; then
export LD_LIBRARY_PATH="$cosy_trt_lib_dir${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
fi
pid_file="$run_dir/local-cosyvoice-$port.pid"
log_file="$log_dir/local-cosyvoice-$port.log"
if [[ -f "$pid_file" ]]; then
old_pid="$(cat "$pid_file" 2>/dev/null || true)"
if [[ -n "$old_pid" ]] && kill -0 "$old_pid" >/dev/null 2>&1; then
if curl --max-time 2 -fsS "http://127.0.0.1:$port/health" >/dev/null 2>&1; then
echo "Local CosyVoice is already running: pid=$old_pid port=$port"
echo "Log: $log_file"
exit 0
fi
echo "Stale Local CosyVoice pid file: pid=$old_pid port=$port" >&2
fi
rm -f "$pid_file"
fi
if quickstart_port_in_use "$port"; then
echo "Local CosyVoice port $port is already in use." >&2
quickstart_describe_port "$port" >&2 || true
exit 1
fi
echo "Starting Local CosyVoice"
echo " repo: $repo_root"
echo " python: $cosy_python"
echo " host: $host"
echo " port: $port"
echo " log: $log_file"
if [[ -d "$cosy_trt_lib_dir" ]]; then
echo " trt lib: $cosy_trt_lib_dir"
fi
(
cd "$repo_root"
export PYTHONPATH="$repo_root${PYTHONPATH:+:$PYTHONPATH}"
export OPENTALKING_TTS_LOCAL_COSYVOICE_PRELOAD="${OPENTALKING_TTS_LOCAL_COSYVOICE_PRELOAD:-1}"
if declare -F quickstart_detach >/dev/null 2>&1; then
quickstart_detach "$log_file" "$cosy_python" scripts/local_cosyvoice_service.py --host "$host" --port "$port" >"$pid_file"
else
setsid "$cosy_python" scripts/local_cosyvoice_service.py --host "$host" --port "$port" >"$log_file" 2>&1 < /dev/null &
echo "$!" >"$pid_file"
fi
)
pid="$(cat "$pid_file" 2>/dev/null || true)"
if [[ -z "$pid" ]]; then
echo "Failed to capture Local CosyVoice pid." >&2
exit 1
fi
for _ in {1..120}; do
if ! kill -0 "$pid" >/dev/null 2>&1; then
echo "Local CosyVoice exited during startup. Last log lines:" >&2
tail -80 "$log_file" >&2 || true
rm -f "$pid_file"
exit 1
fi
if curl --max-time 2 -fsS "http://127.0.0.1:$port/health" >/dev/null 2>&1; then
echo "Local CosyVoice is up: http://127.0.0.1:$port"
exit 0
fi
sleep 1
done
echo "Local CosyVoice did not become ready in 120s. Last log lines:" >&2
tail -80 "$log_file" >&2 || true
exit 1