13 Commits

Author SHA1 Message Date
kero-ly
57b3e48718 chore: ignore local worktrees
feat: add scene asset store
fix: validate scene asset edge cases
docs: record task 1 review fix
feat: expose scene asset api
test api composition deletion
feat: add scene assets to asset library
Fix scene asset library integration
Update asset tab test expectations
feat: render realtime stage with scene assets
feat: add immersive conversation mode
Gate immersive mode during startup
feat: surface avatar matting readiness
Fix scene asset selection and upload validation
Fix scene avatar anchor rendering
Fix unified scene asset routes
Stabilize unified scene route test
Fix default realtime stage scene behavior
Keep realtime video across view switches
Refine immersive chrome and keyboard handling
Preserve audio when switching realtime views
Keep realtime media mounted in immersive mode
Restore realtime audio and refine immersive controls
Scope scene backgrounds to matching avatar
Preserve transparent avatar backgrounds in scenes
Keep WebRTC remote stream identity
Fallback to muted stage video playback
Refine immersive QuickTalk controls
Make immersive adjustment panel opaque
Expand immersive adjustment ranges
Use avatar-specific scene defaults
Fix mypy type narrowing for scene assets
2026-06-22 19:58:38 +08:00
cwang0810
3519989ba3 feat: add Persona Package support (#87) 2026-06-10 20:41:18 +08:00
zyairehhh
f16f786889 feat: align avatar cache prewarm flow 2026-05-21 20:07:33 +08:00
lyfics
7fa6e5e6b4 Add Cantonese Qwen voices and avatar updates 2026-05-20 19:56:25 +08:00
lyfics
47cceb1d1e Improve Wav2Lip avatar playback and per-asset preload cache (#47)
* Support per-avatar Wav2Lip preload cache

* Align unified wav2lip preload startup with upstream

* Add Wav2Lip avatar caches under wav2lip directories

---------

Co-authored-by: cwang10 <cwang10@mail.ustc.edu.cn>
2026-05-20 15:32:12 +08:00
zyairehhh
ef99fec5e0 feat: align wav2lip and quicktalk asset handling 2026-05-19 19:45:50 +08:00
zyairehhh
e0da9184ef feat: route QuickTalk through OmniRT audio2video (#41)
* feat: route quicktalk through omnirt
* docs: add quicktalk quickstart
2026-05-15 12:03:39 +08:00
keroly
587d1fb16b Refactor/architecture v2 (#27)
* docs: add architecture review and refactor plan

* chore: snapshot baseline test/lint output before refactor

* chore: catalog import sites that depend on deletable code

* refactor: remove src/opentalking/engine (FlashTalk local inference)

* refactor: remove local model implementations (musetalk/wav2lip/flashtalk-local)

* chore: remove demo media, multitalk_utils, duplicate env examples; relocate images

* refactor: consolidate configs to root configs/, drop src duplicate

* refactor: drop OPENTALKING_FLASHTALK_MODE; rebuild model registry shim

* refactor: remove dead src/opentalking/server (superseded by apps/api)

* refactor: drop src/opentalking/cli stubs in favor of apps/cli

* refactor: move worker test to tests/unit; stash bailian_clone for relocation

* chore: temporarily restore bailian_clone (will relocate in phase D)

* fix: stub legacy registry get_adapter/register_model for import compat

* feat: unified .env.example, hardware profiles, omnirt endpoint catalog

* feat: install/up/down/ensure_omnirt scripts + cuda/dev compose

* docs: rewrite README + quickstart + architecture pointer; drop flashtalk-omnirt

* chore: drop refactor scratch files

* refactor: relocate src/opentalking → packages/opentalking (no internal restructure)

* feat: core/registry.py + STTAdapter/SynthesisAdapter interfaces

* fix: untrack 'packages/' from gitignore so packages/opentalking/ commits

* refactor(D): reorganize into providers/{stt,tts,llm,rtc,synthesis} + media; mass-rewrite imports

* refactor(E): avatars→avatar, voices→voice (singular naming)

* refactor(F): worker → pipeline/{session,speak,recording} + runtime/

* refactor(G): drop legacy wav2lip official_runtime imports; remove empty configs/worker dirs

* feat(registry): wire all providers via core.registry decorators + bootstrap

* chore(cli): drop dead generate_video / gradio_app (engine removed)

* refactor(pipeline): extract pure helpers from synthesis_runner (audio_utils/env_helpers/idle_frames/tts_openers)

* docs(env): rebuild .env.example aligned with actual Settings model + .env

* refactor: flatten packages/opentalking → opentalking (drop empty packages/ wrapper)

* docs: refresh architecture diagram for flat opentalking/ layout

* feat: two-path deploy (docker mock-default + python venv) — mock synthesis, opentalking-doctor, run_omnirt.sh

* docs: rewrite Quickstart around 3 paths (mock / lightweight / high-quality); update Project layout to flat layout

* docs(env): rewrite .env.example to match README's 3 paths (progressive complexity)

* fix(mock): wire OPENTALKING_INFERENCE_MOCK end-to-end (Settings + API + MockFlashTalkClient + task_consumer)

* fix: restore idle_generator.py (thin client, deleted by mistake in phase B)

* docs: clarify OPENTALKING_FLASHTALK_WS_URL (active) vs OMNIRT_ENDPOINT (placeholder, not wired)

* feat(omnirt): wire OMNIRT_ENDPOINT end-to-end (URL resolver + WS auth headers + path-based model routing)

* fix: omnirt_auth_headers import (real symbol is auth_headers; aliased only in providers/synthesis/__init__)

* fix(omnirt): align default path template with OmniRT actual routing (/v1/avatar/{model})

* fix: 3 UX issues — avatar/model decouple by input form, drop OPENTALKING_INFERENCE_MOCK, doctor loads .env

* fix(sessions): drop avatar/model compatibility gate entirely (full decoupling)

* fix(tts): decouple Edge voices (whitelist→format check) + silently ignore tts_model on Edge

* fix(runtime): preserve user's chosen model_type (musetalk/wav2lip/mock no longer relabeled to flashtalk)

* chore(config): drop unused OPENTALKING_DEFAULT_MODEL (model always supplied per-session)

* feat(api): SUPPORTED_MODELS allowlist (mock,flashtalk) — reject musetalk/wav2lip with 400 instead of silent FlashTalk fallback

* feat(web): surface backend 400 detail in toast (e.g. 'model not yet supported')

* feat(avatars): allow deleting custom avatars (DELETE /avatars/{id} + frontend × button)

* 适配wav2lip384 (#23)

* feat(wav2lip): integrate avatar metadata for architecture v2
* fix(wav2lip): validate mouth metadata freshness
* docs(wav2lip): add chinese pr summary
* ci: update refactor lint paths
* ci: remove missing worker test path

* feat(wav2lip): add preprocessed video avatar support (#25)

* feat: route avatar models through omnirt audio2video

* feat(wav2lip): route postprocess mode via audio2video

* feat: streamline omnirt audio2video setup

* feat: support configurable wav2lip modes and refresh assets

* Add QuickTalk model adapter

* Fix QuickTalk prefetch type annotation

* Keep QuickTalk init asynchronous

* Document QuickTalk configuration

* Refine FlashHead adapter integration

* Update model status test for QuickTalk

* Update lockfile for QuickTalk dependencies

* Fix QuickTalk OpenCV fourcc typing

---------

Co-authored-by: kero <keroly950928@gmail.com>
Co-authored-by: zyairehhh <zyaireliu@outlook.com>
Co-authored-by: cwang10 <cwang10@mail.ustc.edu.cn>
2026-05-11 23:25:25 +08:00
cwang10
c81ce98068 Refine FlashHead adapter integration 2026-05-06 22:44:44 +08:00
cwang10
9b280ab718 Improve quickstart defaults and avatar asset alignment 2026-05-03 21:06:27 +08:00
XX123122
82d7a08b9f feat: add musetalk and wav2lip runtime support (#13)
* feat: add musetalk and wav2lip runtime support

* fix: resolve worker lint errors

* fix: address mypy issues

* fix: lazy load wav2lip audio dependencies

* fix: lazy load wav2lip face detector

* fix: lazy load wav2lip network modules

* fix: align api and tts behavior with tests

* fix: address split mode review feedback
2026-05-03 00:21:14 +08:00
pb19834141522-ally
a93f70d1d9 feat: 百炼多线路TTS/STT、字幕同步、TTS开场白、idle视频、FlashTalk队列调度、新增录制和上传音频功能 2026-04-28 09:41:49 +08:00
cwang10
c33a43181a Initial commit: OpenTalking real-time digital human framework
Modular pipeline for text-driven talking avatars with WebRTC streaming:
- FlashTalk / Wav2Lip / MuseTalk model adapters
- LLM (OpenAI-compatible) → sentence split → Edge TTS → video generation
- Interleaved A/V queue for lip-sync accuracy
- Idle animation cache with crossfade and mouth stabilization
- Unified server mode (API + worker in one process)
- Immersive chat frontend (React + Tailwind + WebRTC)
- Docker Compose configs for local, distributed, and Ascend 910B deployments
2026-04-16 15:28:52 +08:00