datascale-ai-opentalking

mirror of https://github.com/datascale-ai/opentalking.git synced 2026-07-03 15:22:34 +08:00

Author	SHA1	Message	Date
kero-ly	57b3e48718	chore: ignore local worktrees feat: add scene asset store fix: validate scene asset edge cases docs: record task 1 review fix feat: expose scene asset api test api composition deletion feat: add scene assets to asset library Fix scene asset library integration Update asset tab test expectations feat: render realtime stage with scene assets feat: add immersive conversation mode Gate immersive mode during startup feat: surface avatar matting readiness Fix scene asset selection and upload validation Fix scene avatar anchor rendering Fix unified scene asset routes Stabilize unified scene route test Fix default realtime stage scene behavior Keep realtime video across view switches Refine immersive chrome and keyboard handling Preserve audio when switching realtime views Keep realtime media mounted in immersive mode Restore realtime audio and refine immersive controls Scope scene backgrounds to matching avatar Preserve transparent avatar backgrounds in scenes Keep WebRTC remote stream identity Fallback to muted stage video playback Refine immersive QuickTalk controls Make immersive adjustment panel opaque Expand immersive adjustment ranges Use avatar-specific scene defaults Fix mypy type narrowing for scene assets	2026-06-22 19:58:38 +08:00
cwang0810	3519989ba3	feat: add Persona Package support (#87 )	2026-06-10 20:41:18 +08:00
zyairehhh	f16f786889	feat: align avatar cache prewarm flow	2026-05-21 20:07:33 +08:00
lyfics	7fa6e5e6b4	Add Cantonese Qwen voices and avatar updates	2026-05-20 19:56:25 +08:00
lyfics	47cceb1d1e	Improve Wav2Lip avatar playback and per-asset preload cache (#47 ) * Support per-avatar Wav2Lip preload cache * Align unified wav2lip preload startup with upstream * Add Wav2Lip avatar caches under wav2lip directories --------- Co-authored-by: cwang10 <cwang10@mail.ustc.edu.cn>	2026-05-20 15:32:12 +08:00
zyairehhh	ef99fec5e0	feat: align wav2lip and quicktalk asset handling	2026-05-19 19:45:50 +08:00
zyairehhh	e0da9184ef	feat: route QuickTalk through OmniRT audio2video (#41 ) * feat: route quicktalk through omnirt * docs: add quicktalk quickstart	2026-05-15 12:03:39 +08:00
keroly	587d1fb16b	Refactor/architecture v2 (#27 ) * docs: add architecture review and refactor plan * chore: snapshot baseline test/lint output before refactor * chore: catalog import sites that depend on deletable code * refactor: remove src/opentalking/engine (FlashTalk local inference) * refactor: remove local model implementations (musetalk/wav2lip/flashtalk-local) * chore: remove demo media, multitalk_utils, duplicate env examples; relocate images * refactor: consolidate configs to root configs/, drop src duplicate * refactor: drop OPENTALKING_FLASHTALK_MODE; rebuild model registry shim * refactor: remove dead src/opentalking/server (superseded by apps/api) * refactor: drop src/opentalking/cli stubs in favor of apps/cli * refactor: move worker test to tests/unit; stash bailian_clone for relocation * chore: temporarily restore bailian_clone (will relocate in phase D) * fix: stub legacy registry get_adapter/register_model for import compat * feat: unified .env.example, hardware profiles, omnirt endpoint catalog * feat: install/up/down/ensure_omnirt scripts + cuda/dev compose * docs: rewrite README + quickstart + architecture pointer; drop flashtalk-omnirt * chore: drop refactor scratch files * refactor: relocate src/opentalking → packages/opentalking (no internal restructure) * feat: core/registry.py + STTAdapter/SynthesisAdapter interfaces * fix: untrack 'packages/' from gitignore so packages/opentalking/ commits * refactor(D): reorganize into providers/{stt,tts,llm,rtc,synthesis} + media; mass-rewrite imports * refactor(E): avatars→avatar, voices→voice (singular naming) * refactor(F): worker → pipeline/{session,speak,recording} + runtime/ * refactor(G): drop legacy wav2lip official_runtime imports; remove empty configs/worker dirs * feat(registry): wire all providers via core.registry decorators + bootstrap * chore(cli): drop dead generate_video / gradio_app (engine removed) * refactor(pipeline): extract pure helpers from synthesis_runner (audio_utils/env_helpers/idle_frames/tts_openers) * docs(env): rebuild .env.example aligned with actual Settings model + .env * refactor: flatten packages/opentalking → opentalking (drop empty packages/ wrapper) * docs: refresh architecture diagram for flat opentalking/ layout * feat: two-path deploy (docker mock-default + python venv) — mock synthesis, opentalking-doctor, run_omnirt.sh * docs: rewrite Quickstart around 3 paths (mock / lightweight / high-quality); update Project layout to flat layout * docs(env): rewrite .env.example to match README's 3 paths (progressive complexity) * fix(mock): wire OPENTALKING_INFERENCE_MOCK end-to-end (Settings + API + MockFlashTalkClient + task_consumer) * fix: restore idle_generator.py (thin client, deleted by mistake in phase B) * docs: clarify OPENTALKING_FLASHTALK_WS_URL (active) vs OMNIRT_ENDPOINT (placeholder, not wired) * feat(omnirt): wire OMNIRT_ENDPOINT end-to-end (URL resolver + WS auth headers + path-based model routing) * fix: omnirt_auth_headers import (real symbol is auth_headers; aliased only in providers/synthesis/__init__) * fix(omnirt): align default path template with OmniRT actual routing (/v1/avatar/{model}) * fix: 3 UX issues — avatar/model decouple by input form, drop OPENTALKING_INFERENCE_MOCK, doctor loads .env * fix(sessions): drop avatar/model compatibility gate entirely (full decoupling) * fix(tts): decouple Edge voices (whitelist→format check) + silently ignore tts_model on Edge * fix(runtime): preserve user's chosen model_type (musetalk/wav2lip/mock no longer relabeled to flashtalk) * chore(config): drop unused OPENTALKING_DEFAULT_MODEL (model always supplied per-session) * feat(api): SUPPORTED_MODELS allowlist (mock,flashtalk) — reject musetalk/wav2lip with 400 instead of silent FlashTalk fallback * feat(web): surface backend 400 detail in toast (e.g. 'model not yet supported') * feat(avatars): allow deleting custom avatars (DELETE /avatars/{id} + frontend × button) * 适配wav2lip384 (#23) * feat(wav2lip): integrate avatar metadata for architecture v2 * fix(wav2lip): validate mouth metadata freshness * docs(wav2lip): add chinese pr summary * ci: update refactor lint paths * ci: remove missing worker test path * feat(wav2lip): add preprocessed video avatar support (#25) * feat: route avatar models through omnirt audio2video * feat(wav2lip): route postprocess mode via audio2video * feat: streamline omnirt audio2video setup * feat: support configurable wav2lip modes and refresh assets * Add QuickTalk model adapter * Fix QuickTalk prefetch type annotation * Keep QuickTalk init asynchronous * Document QuickTalk configuration * Refine FlashHead adapter integration * Update model status test for QuickTalk * Update lockfile for QuickTalk dependencies * Fix QuickTalk OpenCV fourcc typing --------- Co-authored-by: kero <keroly950928@gmail.com> Co-authored-by: zyairehhh <zyaireliu@outlook.com> Co-authored-by: cwang10 <cwang10@mail.ustc.edu.cn>	2026-05-11 23:25:25 +08:00
cwang10	c81ce98068	Refine FlashHead adapter integration	2026-05-06 22:44:44 +08:00
cwang10	9b280ab718	Improve quickstart defaults and avatar asset alignment	2026-05-03 21:06:27 +08:00
XX123122	82d7a08b9f	feat: add musetalk and wav2lip runtime support (#13 ) * feat: add musetalk and wav2lip runtime support * fix: resolve worker lint errors * fix: address mypy issues * fix: lazy load wav2lip audio dependencies * fix: lazy load wav2lip face detector * fix: lazy load wav2lip network modules * fix: align api and tts behavior with tests * fix: address split mode review feedback	2026-05-03 00:21:14 +08:00
pb19834141522-ally	a93f70d1d9	feat: 百炼多线路TTS/STT、字幕同步、TTS开场白、idle视频、FlashTalk队列调度、新增录制和上传音频功能	2026-04-28 09:41:49 +08:00
cwang10	c33a43181a	Initial commit: OpenTalking real-time digital human framework Modular pipeline for text-driven talking avatars with WebRTC streaming: - FlashTalk / Wav2Lip / MuseTalk model adapters - LLM (OpenAI-compatible) → sentence split → Edge TTS → video generation - Interleaved A/V queue for lip-sync accuracy - Idle animation cache with crossfade and mouth stabilization - Unified server mode (API + worker in one process) - Immersive chat frontend (React + Tailwind + WebRTC) - Docker Compose configs for local, distributed, and Ascend 910B deployments	2026-04-16 15:28:52 +08:00

13 Commits