Skip to content

Wav2Lip Local Deployment

Local mode uses OpenTalking's built-in Wav2Lip adapter. It is the lightest path for validating real lip sync and works well for single-GPU demos and avatar-asset checks.

Use Cases

  • First move from mock to a real talking-head model.
  • Run inference inside the OpenTalking process without deploying OmniRT.
  • Use built-in or custom shared avatars, and let the Wav2Lip flow consume reference images or frame assets as needed.

Weight Preparation

Terminal
cd "$DIGITAL_HUMAN_HOME/opentalking"
mkdir -p models/wav2lip

uv pip install -U "huggingface_hub[cli]"
export HF_ENDPOINT="${HF_ENDPOINT:-https://hf-mirror.com}"

hf download Pypa/wav2lip384 \
  wav2lip384.pth \
  --local-dir models/wav2lip
hf download rippertnt/wav2lip \
  s3fd.pth \
  --local-dir models/wav2lip

stat models/wav2lip/wav2lip384.pth
stat models/wav2lip/s3fd.pth

Start Command

Terminal
cd "$DIGITAL_HUMAN_HOME/opentalking"
uv sync --extra dev --extra models --python 3.11

export OPENTALKING_WAV2LIP_MODEL_ROOT="$DIGITAL_HUMAN_HOME/opentalking/models/wav2lip"
export OPENTALKING_WAV2LIP_DEVICE=cuda
export OPENTALKING_WAV2LIP_BATCH_SIZE=16
export OPENTALKING_WAV2LIP_MAX_LONG_EDGE=832
export OPENTALKING_WAV2LIP_FACE_DET_DEVICE=cpu

bash scripts/start_unified.sh --backend local --model wav2lip --api-port 8210 --web-port 5280

Open http://localhost:5280, choose an available avatar, and select the wav2lip model.

Verification

Terminal
curl -fsS http://127.0.0.1:8210/health
curl -s http://127.0.0.1:8210/models | jq '.statuses[] | select(.id=="wav2lip")'

Expect backend=local and connected=true. The first load initializes the checkpoint, S3FD, and avatar cache, which can take tens of seconds.

Common Errors

Symptom Action
Checkpoint not found Check OPENTALKING_WAV2LIP_MODEL_ROOT and both .pth files.
Out of GPU memory Lower OPENTALKING_WAV2LIP_BATCH_SIZE or OPENTALKING_WAV2LIP_MAX_LONG_EDGE.
Slow first frame Set OPENTALKING_PREWARM_AVATARS=singer for common avatars.
Enhancement mode fails easy_enhanced requires GFPGAN and OPENTALKING_WAV2LIP_GFPGAN_CHECKPOINT.