Wav2Lip¶
When to Use Wav2Lip¶
Wav2Lip is a good fit for quick lip-sync validation, image avatars, short-video avatars, and lightweight local demos. It has a relatively low deployment cost and a clear asset path, which makes it a strong first real model for OpenTalking.
Requirements¶
- Python dependencies must include the Wav2Lip extra.
- The model weights must include
wav2lip384.pthor a compatible checkpoint. s3fd.pthis required for face detection.- NVIDIA GPU is recommended; CPU is only for functional checks.
- Avatars use OpenTalking's shared avatar flow; Wav2Lip consumes a reference image, preprocessed frames, or a detectable face region when it runs.
Prepare Weights¶
Default model directory:
Configurable paths:
export OPENTALKING_WAV2LIP_MODEL_ROOT=./avatar_models/wav2lip
export OPENTALKING_WAV2LIP_CHECKPOINT=./avatar_models/wav2lip/wav2lip384.pth
s3fd.pth can live at:
Full download commands are in Wav2Lip Local.
Prepare Avatar Derivatives¶
To pre-generate image-frame assets for Wav2Lip:
uv run python scripts/prepare_wav2lip_image_asset.py \
--source-image ./assets/avatar.png \
--out ./examples/avatars/my-wav2lip \
--avatar-id my-wav2lip \
--name "My Wav2Lip Avatar"
To pre-generate video-frame assets for Wav2Lip:
uv run python scripts/prepare_wav2lip_video_asset.py \
--source-video ./assets/avatar.mp4 \
--out ./examples/avatars/my-wav2lip-video \
--avatar-id my-wav2lip-video \
--name "My Wav2Lip Video Avatar"
Configure Backend¶
local¶
omnirt¶
Start Service¶
Verify¶
- Open the WebUI.
- Select an available avatar.
- Select the
wav2lipmodel. - Send a short sentence and confirm first frame, audio, and lip output.
Troubleshooting¶
Missing checkpoint¶
Check OPENTALKING_WAV2LIP_MODEL_ROOT and OPENTALKING_WAV2LIP_CHECKPOINT.
Missing s3fd.pth¶
Put s3fd.pth under models/wav2lip/.
Mouth region looks unnatural¶
Adjust OPENTALKING_WAV2LIP_PADS, the postprocess mode, and the avatar reference image.