datascale-ai-opentalking

mirror of https://github.com/datascale-ai/opentalking.git synced 2026-07-03 15:22:34 +08:00
Files
keroly 587d1fb16b Refactor/architecture v2 (#27 )
* docs: add architecture review and refactor plan

* chore: snapshot baseline test/lint output before refactor

* chore: catalog import sites that depend on deletable code

* refactor: remove src/opentalking/engine (FlashTalk local inference)

* refactor: remove local model implementations (musetalk/wav2lip/flashtalk-local)

* chore: remove demo media, multitalk_utils, duplicate env examples; relocate images

* refactor: consolidate configs to root configs/, drop src duplicate

* refactor: drop OPENTALKING_FLASHTALK_MODE; rebuild model registry shim

* refactor: remove dead src/opentalking/server (superseded by apps/api)

* refactor: drop src/opentalking/cli stubs in favor of apps/cli

* refactor: move worker test to tests/unit; stash bailian_clone for relocation

* chore: temporarily restore bailian_clone (will relocate in phase D)

* fix: stub legacy registry get_adapter/register_model for import compat

* feat: unified .env.example, hardware profiles, omnirt endpoint catalog

* feat: install/up/down/ensure_omnirt scripts + cuda/dev compose

* docs: rewrite README + quickstart + architecture pointer; drop flashtalk-omnirt

* chore: drop refactor scratch files

* refactor: relocate src/opentalking → packages/opentalking (no internal restructure)

* feat: core/registry.py + STTAdapter/SynthesisAdapter interfaces

* fix: untrack 'packages/' from gitignore so packages/opentalking/ commits

* refactor(D): reorganize into providers/{stt,tts,llm,rtc,synthesis} + media; mass-rewrite imports

* refactor(E): avatars→avatar, voices→voice (singular naming)

* refactor(F): worker → pipeline/{session,speak,recording} + runtime/

* refactor(G): drop legacy wav2lip official_runtime imports; remove empty configs/worker dirs

* feat(registry): wire all providers via core.registry decorators + bootstrap

* chore(cli): drop dead generate_video / gradio_app (engine removed)

* refactor(pipeline): extract pure helpers from synthesis_runner (audio_utils/env_helpers/idle_frames/tts_openers)

* docs(env): rebuild .env.example aligned with actual Settings model + .env

* refactor: flatten packages/opentalking → opentalking (drop empty packages/ wrapper)

* docs: refresh architecture diagram for flat opentalking/ layout

* feat: two-path deploy (docker mock-default + python venv) — mock synthesis, opentalking-doctor, run_omnirt.sh

* docs: rewrite Quickstart around 3 paths (mock / lightweight / high-quality); update Project layout to flat layout

* docs(env): rewrite .env.example to match README's 3 paths (progressive complexity)

* fix(mock): wire OPENTALKING_INFERENCE_MOCK end-to-end (Settings + API + MockFlashTalkClient + task_consumer)

* fix: restore idle_generator.py (thin client, deleted by mistake in phase B)

* docs: clarify OPENTALKING_FLASHTALK_WS_URL (active) vs OMNIRT_ENDPOINT (placeholder, not wired)

* feat(omnirt): wire OMNIRT_ENDPOINT end-to-end (URL resolver + WS auth headers + path-based model routing)

* fix: omnirt_auth_headers import (real symbol is auth_headers; aliased only in providers/synthesis/__init__)

* fix(omnirt): align default path template with OmniRT actual routing (/v1/avatar/{model})

* fix: 3 UX issues — avatar/model decouple by input form, drop OPENTALKING_INFERENCE_MOCK, doctor loads .env

* fix(sessions): drop avatar/model compatibility gate entirely (full decoupling)

* fix(tts): decouple Edge voices (whitelist→format check) + silently ignore tts_model on Edge

* fix(runtime): preserve user's chosen model_type (musetalk/wav2lip/mock no longer relabeled to flashtalk)

* chore(config): drop unused OPENTALKING_DEFAULT_MODEL (model always supplied per-session)

* feat(api): SUPPORTED_MODELS allowlist (mock,flashtalk) — reject musetalk/wav2lip with 400 instead of silent FlashTalk fallback

* feat(web): surface backend 400 detail in toast (e.g. 'model not yet supported')

* feat(avatars): allow deleting custom avatars (DELETE /avatars/{id} + frontend × button)

* 适配wav2lip384 (#23)

* feat(wav2lip): integrate avatar metadata for architecture v2
* fix(wav2lip): validate mouth metadata freshness
* docs(wav2lip): add chinese pr summary
* ci: update refactor lint paths
* ci: remove missing worker test path

* feat(wav2lip): add preprocessed video avatar support (#25)

* feat: route avatar models through omnirt audio2video

* feat(wav2lip): route postprocess mode via audio2video

* feat: streamline omnirt audio2video setup

* feat: support configurable wav2lip modes and refresh assets

* Add QuickTalk model adapter

* Fix QuickTalk prefetch type annotation

* Keep QuickTalk init asynchronous

* Document QuickTalk configuration

* Refine FlashHead adapter integration

* Update model status test for QuickTalk

* Update lockfile for QuickTalk dependencies

* Fix QuickTalk OpenCV fourcc typing

---------

Co-authored-by: kero <keroly950928@gmail.com>
Co-authored-by: zyairehhh <zyaireliu@outlook.com>
Co-authored-by: cwang10 <cwang10@mail.ustc.edu.cn>
2026-05-11 23:25:25 +08:00
__init__.py
Refactor/architecture v2 (#27 )
2026-05-11 23:25:25 +08:00
emitter.py
Refactor/architecture v2 (#27 )
2026-05-11 23:25:25 +08:00
schemas.py
Refactor/architecture v2 (#27 )
2026-05-11 23:25:25 +08:00