nexu-io-open-design

mirror of https://github.com/nexu-io/open-design.git synced 2026-07-03 12:27:55 +08:00

Author	SHA1	Message	Date
Aadi Jai Gupta	49a1c7bd79	fix daemon opencode headless permissions (#4957 )	2026-07-02 08:49:44 +00:00
Amy	56c410e9ef	[codex] strengthen daemon diagnostics coverage (#4948 ) * test: expand automation coverage gaps * test: cover additional automation gaps * test: strengthen daemon diagnostics coverage * fix(e2e): align media provider project modal helper	2026-06-30 16:25:17 +00:00
kokisanai	54de349c47	fix: replace expired Discord invite (#4849 ) Co-authored-by: koki yanlai xu <koki@kokideMacBook-Air.local>	2026-06-29 15:00:45 +00:00
open-design-bot[bot]	dbd280610e	docs(readme): refresh contributors wall (#4832 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-06-29 07:16:33 +00:00
xne998808-ai	23704a9d54	feat(landing-page): add 5 design-agent alternative comparison pages (Stitch, Genspark, Figma Make, Qoder, Trae) (#4853 ) * feat(landing-page): add 5 design-agent alternative comparison pages Add /alternatives/{stitch,genspark,figma-make,qoder,trae}/ pages comparing Open Design with five AI design/coding tools. Each page carries deeply researched, fact-specific copy (pricing, usage limits, lock-in, honest trade-offs), real competitor screenshots plus current Open Design product screenshots, a feature table, and a decision matrix. Rebuild the shared alternative-detail renderer onto an editorial band layout (full-width alternating bands, one unified content column, dark closing CTA band) so every /alternatives/ page is visually consistent. Render list and table block text through set:html so inline <b> emphasis works. Localise all five pages into the 11 landing locales (machine-translated then native-audited) and re-shard into alternatives-i18n.part-. Wire the new slugs into the /compare/ hub and the footer. Save the reusable current Open Design UI screenshots under docs/screenshots/current-2026-06/. fix(landing-page): address review on alternatives layout - clip page-level horizontal overflow on .v4 pages so the 100vw full-bleed shell no longer jitters when a vertical scrollbar is present - port the FAQ accordion refinements (border, padding, type scale) under .v4 using the landing token set, since the rich wrapper moved off .solution-page --------- Co-authored-by: Joey <276262049+xne998808-ai@users.noreply.github.com>	2026-06-28 13:17:43 +00:00
open-design-bot[bot]	59214704f6	docs(readme): refresh contributors wall (#4719 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-06-26 13:55:47 +00:00
PerishFire	273bacfc78	[codex] Add design-system tracking analytics (#4783 ) * feat(analytics): design-system tracking foundation (project_kind, entry source, brand picker) Implements the first slices of the design-system tracking spec: - §1 project_kind: brand-extraction backing projects now report project_kind=design_system so DS-project runs (creation + edits) drill down cleanly instead of collapsing into 'brand'. - U3/U4: report design_system_source from the real selection source (request->user_selected, plugin->template_inherited, project-> project_saved, app-default->default) instead of the hard-wired unknown/not_applicable; add design_system_kind (official/custom, derived from the user:<id> shape) and design_system_slug (official). - §3.1 create entry source: thread the real entry (onboarding / design_systems_page / composer_picker / home_card / library / project_canvas) from each navigate() call site into the create page_view and create_result, replacing the binary heuristic. - C7-C9 preset-brand picker: instrument the "Start from a brand" trigger click, picker surface_view, and brand_pick with preset_brand_category (never the domain). Contracts: add TrackingDesignSystemKind, TrackingDesignSystemEditSurface, DesignSystemsPresetBrandPicker{Click,SurfaceView}Props; extend RunCreatedProps with design_system_kind/slug/edit_surface and both entry_from enums with composer_picker/project_canvas/library. Adds docs/design-system-tracking-spec.md. Verified: contracts build + 170 tests, web typecheck + full suite (3463), daemon typecheck + run-analytics/brand-extraction tests (70). * feat(analytics): track AI-optimize (design_system_enrich) click C13 (tracking spec §3.3): instrument the "AI Optimize" banner CTA on a programmatically-extracted DS-as-project so the AI-conversion rate (clicked ÷ programmatic creates) is visible. Contracts: add design_system_enrich area, DesignSystemEnrichClickProps, DesignSystemEnrichResultProps, and the design_system_enrich_result event name (result event reserved for the C14 follow-up). Adds an implementation-status table to the spec. C14 (enrich_result) and C15 (ProjectMetadata.enrichmentStatus) remain follow-ups since they need correlation with the async enrichment run's completion. Verified: contracts build, web typecheck, brand-enrichment + analytics tests (89). * feat(analytics): report all design-system create sources (ds_source_origins) Comment ②: multi-source creates were flattened to a single `mixed` design_system_source, hiding which sources combined. Add a comma-joined `ds_source_origins` (e.g. `source_url,local_code`) on create_result that lists every source actually used, keeping the singular field for back-compat. Mirrors the target_platforms/connectors multi-value convention. Verified: contracts build, web typecheck. * feat(analytics): distinguish onboarding build-DS vs skip-to-home (C2) C2 (tracking spec §3.1): the final onboarding step forks into "Build a design system" and going straight Home, but both reported completion_type=completed_without_design_system, so the skip rate was invisible. Parameterize runOnboardingCompletion so the build fork reports completed_with_design_system and the home forks report completed_without_design_system. Update the EntryShell onboarding test, which clicked "Build a design system" yet asserted the without value. Verified: web typecheck, EntryShell onboarding tests (24). * feat(analytics): instrument design-system edit lifecycle (E1-E3) Tracking spec §3.4/§3.6: - E3 (direct module edits, ui_click area=design_system_edit, all carrying edit_surface=direct_module + artifact_kind=design_system + the DS id): DesignSystemsTab general ops (edit-with-agent / refresh / download / reset), DesignKitView module buttons via the parent handlers (logo/font/image upload incl. paste, color edit, logo/image delete, design_md edit-save), and the BrandPreviewCard three actions (use-in-chat / open-project / delete). - E1 (agent-routed edits): a DS-project run editing an EXISTING design system now carries edit_surface — comment/mark from their entry_from, otherwise chat. First-generation runs get none. Contracts: DesignSystemEditClickProps (+ union); reuses the TrackingDesignSystemEditSurface added earlier and RunCreatedProps.edit_surface. Not wired (enum reserved / no handler today): kit_import, kit_open (external link in the stateless kit view), design_md_copy/upload. Verified: contracts build; web + daemon typecheck; DS/Brand/Kit web tests (89) + daemon run-analytics tests (44). * feat(analytics): AI-optimize result + enrichment status (C14/C15) Closes the programmatic-vs-ai_refined comparison gap (tracking spec §6): - C14: the AI-optimize run is tagged (analyticsHints.dsEnrichment) from the ProjectView banner CTA; when it settles, the daemon emits design_system_enrich_result (result / design_system_id / project_id / run_id / duration_ms / error_code). - C15: on a successful enrichment run the daemon flags the backing project metadata enrichmentStatus='ai_refined' + enrichmentCompletedAt, so the two DS cohorts can be split for retention/usage analysis. Contracts: ProjectMetadata.enrichmentStatus/enrichmentCompletedAt; reuses the design_system_enrich_result event + DesignSystemEnrichResultProps added earlier. ProjectChatSendMeta gains dsEnrichment. Verified: contracts build; web + daemon typecheck; daemon run-analytics + brand-extraction tests (70); web brand-enrichment tests. * fix(contracts): declare dsEnrichment hint in ChatAnalyticsHints The daemon run route reads analyticsHints.dsEnrichment and ProjectView's AI-optimize path sends it, but the shared ChatAnalyticsHints DTO did not declare the flag — leaving the web/daemon HTTP shape ahead of the contract layer so typed ChatRequest callers could not discover or type-check the hint that drives design_system_enrich_result and the ai_refined metadata stamp. Declare dsEnrichment?: boolean with analytics-only semantics and add a request-shape test that compile-fails if the field is dropped again. Addresses review feedback on PR #4740. * chore: ignore .playwright-mcp capture output Local Playwright MCP runs write YAML page snapshots and PNG screenshots to .playwright-mcp/. These are ad-hoc local visual-check output (including local UI/session state), not a maintained test fixture, and per the root AGENTS.md local runtime data must stay out of git. Add the ignore rule so the directory is never accidentally staged again. * fix: repair AI optimize enrichment tracking Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex) * fix: track design kit module edit clicks Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex) * fix: stabilize project P0 catalog loading Stub unrelated catalog endpoints in the scoped project workspace and runtime Playwright flows so the P0 checks assert their target behavior instead of waiting on large registry responses.\n\nGenerated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex) * fix: guard design-system tracking repairs Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex) --------- Co-authored-by: free666799 <293857035+free666799@users.noreply.github.com> Co-authored-by: Looper <looper@noreply.github.com>	2026-06-26 02:57:18 +00:00
Sid	48695090f8	Make WSL agent setup unambiguous for MCP installs (#4655 ) Windows users running agent CLIs inside WSL were landing on the native Windows guide and copying the README MCP command directly, which left Linux /usr/bin/od shadowing, daemon origin, Node ABI, and Codex config parse failures unexplained. Add a WSL2 guide and teach the Codex config normalizer to drop nested feature tables that current Codex parses as maps instead of boolean flags. Constraint: WSL agent CLIs need the same Linux environment for the wrapper, daemon, Node modules, and credentials Rejected: Docs-only workaround \| would leave daemon-launched Codex runs failing on nested features tables Confidence: high Scope-risk: narrow Directive: Keep WSL-specific guidance separate from native Windows PowerShell troubleshooting Tested: pnpm --filter @open-design/daemon exec vitest run -c vitest.config.ts tests/codex-config-normalize.test.ts Tested: pnpm --filter @open-design/daemon typecheck Tested: pnpm guard Tested: git diff --check Not-tested: Manual WSL2 end-to-end MCP install on a Windows host Related: Fixes #4648	2026-06-23 09:56:22 +00:00
open-design-bot[bot]	f94e887d65	docs(readme): refresh contributors wall (#4653 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-06-23 08:55:14 +00:00
PerishFire	0bf1b6d6b8	[codex] converge release workflows and stable dry-runs (#4390 ) * fix(tools-pack): use junctions for Windows standalone peer deps * fix(desktop): expose IPC during startup * fix(tools-pack): preserve Windows inspect diagnostics * fix(tools-pack): report Windows inspect status errors * fix(packaged): use Electron net fetch for app protocol * fix(packaged): load Windows renderer from web sidecar * fix(desktop): show Windows packaged window during startup * fix(packaged): disable Windows GPU startup * fix(tools-pack): keep Windows core smoke observable * fix(packaged): remove Windows startup probes * fix(tools-pack): trace Windows desktop IPC status * fix(tools-pack): add Windows IPC diagnose loop * fix(release): default beta-s Windows updater feed * chore: clean merged test eof * refactor(release): unify prerelease channel model * chore(release): close prerelease doc escape hatches * refactor(release): converge release channel workflows * fix(release): install toolchain in metadata jobs * fix(release): build release package before contracts * chore(release): bump development version to 0.10.1 * fix(e2e): seed windows packaged smoke runtime config * fix(release): install toolchain for metadata publish * fix(release): materialize betas metadata checkout * chore(release): bump development version to 0.10.2 * fix(release): allow betas metadata cold start from s3 * fix(e2e): support betas packaged update scenarios * fix(release): pass betas channel into packaged smoke * fix(release): set betas channel during self-hosted builds * fix(release): verify counted channel reservations * fix(release): use pnpm cmd for betas windows publish * fix(release): add betas manifest artifact fallback * fix(release): skip beta-s public metadata fetch * fix(release): read beta-s manifests from storage * fix(release): cache beta windows tools-pack builds * fix(release): inline beta mac tools-pack builds * fix(pack): deep sign unsigned mac bundles * docs(pack): document payload-first beta updater validation * fix(release): align preview tools-pack cache flow * fix(release): align prerelease tools-pack cache flow * fix(release): pass github token to prerelease metadata * fix(release): setup pnpm before feishu notify * fix(release): add stable dry-run prepublish flow * fix(release): accept completed prerelease metadata gate * fix(release): require stable release branches * fix(release): converge r2 access checks * fix(updater): use release channel parser for defaults * fix(updater): harden windows payload relaunch * fix(release): converge updater smoke fixture contract * test(e2e): require silent updater fixture output * fix(release): align stable windows smoke build path * fix(ci): include release workspace in validation * fix(ci): repair release validation lanes Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex) * fix(ci): restore zero-install Feishu notification Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex) --------- Co-authored-by: Looper <looper@noreply.github.com>	2026-06-23 06:13:21 +00:00
PerishFire	fa544f2836	[codex] Prune unused automation and repair metrics publishing (#4612 ) * chore(workflows): prune unused automation * chore(workflows): update github app token action * chore(workflows): use github app client id	2026-06-23 04:01:49 +00:00
xne998808-ai	91e5207d72	docs(readme): fix 404 agent-install command, use `od mcp install <agent>` (#4649 ) The "install into your coding agent" sections pointed users at `curl -fsSL https://open-design.ai/install.sh \| sh -s <agent>`, but that URL returns 404 — the bootstrap script was never published, and a standalone `od` CLI is not yet distributed. Every README (English + 12 i18n) shipped a copy-paste command that fails. Replace it with the real, shipped command `od mcp install <agent>`, matching the Platform Compatibility table already present in each file ("Once OD is installed, a single `od mcp install <agent>` ..."). The surrounding "one-line install" comment stays accurate and the install-OD-first prerequisite is covered by that table. Co-authored-by: Joey <276262049+xne998808-ai@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 00:19:59 +00:00
PerishFire	d3ebed564c	[codex] Update Open Design X account links (#4611 ) * fix: update Open Design X account links * test(web): assert settings X link target Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex) --------- Co-authored-by: Siri-Ray <2667192167@qq.com> Co-authored-by: Looper <looper@noreply.github.com>	2026-06-22 09:04:49 +00:00
Nagendhra Madishetti	37d52288b0	docs(windows): explain the SmartScreen 'unknown publisher' installer warning (#4554 ) New Windows users hit a blue Defender SmartScreen dialog the first time they run the packaged installer, with Run anyway hidden behind More info. The Windows troubleshooting guide only covered the dev/source setup, so there was no answer for the very first thing an end user sees. Add a section that explains why the warning appears (unsigned build, not a threat), how to proceed (More info then Run anyway), and how to verify the download source and SHA-256 checksum first. Co-authored-by: Nagendhra <nagendhra405@gmail.com>	2026-06-22 05:18:39 +00:00
open-design-bot[bot]	c958cdb7ae	docs(readme): refresh contributors wall (#4560 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-06-22 05:18:39 +00:00
Marc Chan	a0412c12e6	fix(docker): add opt-in API auth disable flag (#4541 ) * fix(docker): add opt-in API auth disable flag Generated-By: looper 0.9.10 (runner=worker, agent=opencode) * docs(docker): add .env setup to beginner guide Generated-By: looper 0.9.10 (runner=fixer, agent=opencode)	2026-06-19 03:24:14 +00:00
Carson Yang	11fb1ad6f0	docs: add Sealos deployment option (#4472 ) * docs: add Sealos deployment option * docs: update Sealos deploy slug * docs: point Sealos badge to app store page * docs: add Sealos deploy section to localized READMEs * docs: clarify Sealos authentication guidance	2026-06-18 06:38:10 +00:00
PerishFire	c782aeb3bb	ci: stop duplicate post-merge validation (#4469 )	2026-06-17 10:04:03 +00:00
kokisanai	b02d20e6b9	fix: unify Discord invite links (#4452 ) Co-authored-by: koki yanlai xu <koki@kokideMacBook-Air.local>	2026-06-17 08:25:16 +00:00
Amy	8ff19c6b41	[codex] Add main-risk smoke and E2E coverage (#4322 ) * test(e2e): cover daemon reload artifact recovery * test(e2e): expand critical flow coverage * test(daemon): add startup and recovery smoke coverage * test(e2e): add visual and desktop chrome smoke coverage * ci: gate packaged onboarding smoke to nightly * fix(web): repair artifact replay type narrowing Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(ci): repair preflight smoke coverage checks Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(web): tighten artifact recovery polling Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(web): keep error recovery retries alive Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(e2e): relax brittle UI P0 assertions Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(web): recover pointer artifact outputs Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(e2e): stabilize manual edit reload smoke Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(e2e): avoid noisy win onboarding cleanup Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(web): stop failed artifact recovery polling Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(daemon): isolate startup smoke data root Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(web): recover standalone html artifacts Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(web): guard pointer artifact recovery by run time Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(ci): narrow strict visual gate to critical smoke Generated-By: looper 0.9.7 (runner=fixer, agent=codex) * fix(ci): limit PR visual capture to critical cases * fix(ci): keep PR visual capture complete * fix(web): keep reload artifact recovery retryable * fix(ci): clarify PR visual gate layering * fix(e2e): stabilize visual home P2 capture * fix(e2e): align visual gallery detail clicks * fix(e2e): preserve packaged install during onboarding cleanup * fix(ci): shard PR visual capture * fix(ci): remove duplicate nix validation from ci * fix(ci): keep visual capture screenshots flat * fix(ci): point nix hash autofix at ci-nix * fix(ci): stabilize PR P0 recovery gate * fix(web): ignore stale artifact pointers during recovery * fix(web): reset BYOK model after clearing key * ci: run critical UI P0 subset on PRs * ci: combine PR UI P0 critical shards * ci: combine PR visual capture shards * ci: preserve visual capture shard manifests * fix(web): clear ProjectView delayed timers on unmount * test(e2e): harden onboarding connect gate coverage * test(e2e): align AMR auth recovery and visual catalog timeout	2026-06-17 16:01:53 +08:00
Marc Chan	7abb7888df	fix(deploy): align Docker defaults with GHCR releases (#4327 ) * fix(deploy): align Docker defaults with GHCR releases Generated-By: looper 0.9.9 (runner=worker, agent=opencode) * fix(ci): publish stable Docker tags from release workflow Generated-By: looper 0.9.9 (runner=fixer, agent=opencode) * fix(ci): fold reusable workflow guard expression Generated-By: looper 0.9.9 (runner=fixer, agent=opencode) * fix(ci): gate Docker release publish Generated-By: looper 0.9.9 (runner=fixer, agent=opencode) * fix(ci): publish stable Docker tags after release Generated-By: looper 0.9.9 (runner=fixer, agent=opencode) * fix(ci): guard Docker latest tag enable expression Generated-By: looper 0.9.9 (runner=fixer, agent=opencode) * fix(deploy): update Helm chart GHCR defaults Generated-By: looper 0.9.9 (runner=fixer, agent=opencode) * fix(ci): publish latest from release workflow inputs Generated-By: looper 0.9.9 (runner=fixer, agent=opencode)	2026-06-17 12:25:18 +08:00
PerishFire	a0afc584bb	[codex] centralize daemon data directory docs (#4222 ) * docs: centralize daemon data directory contract * fix(e2e): allow slower artifact consistency navigation Generated-By: looper 0.9.5 (runner=fixer, agent=codex) * docs: localize daemon data directory pointers Generated-By: looper 0.9.5 (runner=fixer, agent=codex) --------- Co-authored-by: Looper <looper@noreply.github.com>	2026-06-15 02:52:05 +00:00
Denis Redozubov	4b3bf91f27	Model orchestrator scratch workspaces (#4263 ) * Model orchestrator scratch workspaces * Address scratch workspace contract review	2026-06-14 09:14:30 +00:00
Lucas Bento	d0cc28d339	feat(daemon): add Amp CLI as a coding-agent adapter (#3861 ) * feat(daemon): add Amp CLI as a coding-agent adapter Amp (ampcode.com) runs headless via `amp -x --stream-json`, which emits the Claude Code-compatible stream JSON format the daemon already parses, with the prompt delivered over stdin. Add it as a coding-agent adapter reusing the claude-stream-json parser; the model picker maps to Amp's agent --mode (smart/deep/rush). - new ampAgentDef + registry entry + install metadata - web agent label/alias - replay mock (renders via the claude renderer) - docs adapter-catalog row + README CLI count (21 -> 22) - adapter buildArgs/stream-format test * fix(daemon): order Amp adapter at P2, not first-run default Move ampAgentDef next to copilot (the other P2 local headless CLI adapter) instead of the head of BASE_AGENT_DEFS. The daemon preserves AGENT_DEFS order in /api/agents and the web app fills an empty config with agents.find((a) => a.available), so listing Amp first made it the default engine ahead of AMR/Claude/Codex for any user with amp on PATH and no persisted agentId — inconsistent with its P2 classification. Reordering restores the prior first-run default. Addresses PR review feedback. --------- Co-authored-by: Siri-Ray <2667192167@qq.com>	2026-06-11 15:17:07 +00:00
elihahah666	1a10fa2600	docs(readme): announce 0.10.0 and add a permanent AMR entry across all READMEs (#4154 ) * docs(readme): announce 0.10.0 and add AMR entry across all READMEs * docs(readme): link Gemini CLI and Kimi CLI rows in agent compatibility table * docs(readme): expand AMR acronym to Open Design Model Router in the AMR blockquote * docs(readme): rename nav AMR link to Model Router and correct acronym to Agentic Model Router --------- Co-authored-by: qiongyu1999 <2694684348@qq.com>	2026-06-11 14:37:53 +00:00
lefarcen	6869b1208b	fix(plugins-home): correct deck/scroll preview capture + smoother gallery playback (#4044 ) * fix(bake): classify deck-vs-scroll by viewport, real-wheel pan, motion config Systematic audit of all 126 baked previews surfaced three capture bugs: - 2 vertical pages misread as decks (the input probe wheel-scrolled them and the scroll-driven animation looked like a slide change), so they got walked sideways. Classify by viewport height instead: a fixed-viewport page is a deck, a vertically-scrollable page is a landing page (pan it) even with a horizontal marquee/carousel sub-component. - 9 scroll-hijack landing pages (custom/transform scroll) that window.scrollTo can't move, so the pan was static. Pan those with REAL wheel events (page.mouse.wheel), which drive the page's own scroll handler. - single-screen pages now hold (static) instead of being forced down a deck path. Plus an opt-in override: authors can declare od.preview.motion ('scroll' \| 'deck' \| 'static') and the bake honors it, auto-detecting only when it's absent. Schema + plugins-spec document the field. (Also strips a stray NUL byte from the hash line that made the file read as binary.) BAKE_VERSION -> 4 re-bakes everything. * perf(plugins-home): only decode visible gallery clips + stream first frame Two cheap wins for the baked gallery videos: - Decouple mount from play. The tile mounts the <video> across the wider inView margin (so scroll-in/hover never remounts + reloads), but only PLAYS while truly visible — off-screen tiles in the mount margin hold their poster frame paused instead of all running a simultaneous decode. Adds a 0-margin visible observer in PreviewSurface alongside the existing near one. - preload=metadata instead of auto: paints the first frame off the +faststart header instead of eagerly buffering the whole clip up front, so tiles show fast and don't saturate the network. The idle hold buffers the pan before hover. * perf(plugins-home): keep baked clips mounted across a scroll window Scrolling a tile out of view and back re-showed a load even though the clip bytes are HTTP-cached (R2 immutable): the <video> unmounted at the tight 120px margin, so scroll-back remounted a fresh element that re-fetches metadata and re-decodes the first frame. Add a wide keep-mounted observer (~1500/1800px) so a clip stays mounted for a few screens — instant scroll-back — while iframes keep the tight margin and play stays gated to the truly-visible zone (paused, not unmounted, off screen). * fix(bake,contracts): probe scroll mechanism before recording; validate motion Address review: - Move the window.scrollTo probe before Page.startScreencast so its scrollTo 160 -> 0 jump isn't baked into the head of the pan as a visible lurch. - Type od.preview.motion in the Zod PluginManifestSchema (enum scroll\|deck\| static) so an invalid value fails doctor/install instead of silently parsing via passthrough and being ignored by the bake; add contract test coverage. * fix(bake): auto-detect single-screen fixed pages as static, not deck A fixed-viewport page is only a deck if an input actually advances it; probe the driver during auto-detect and fall back to 'static' (default viewport + a hold) when nothing moves it, instead of routing every non-scrollable page through the deck path where walkSlides(null) just held at the deck-sized capture. Extracted the arrow/wheel probe into probeDeckDriver(). Verified: a waitlist page now bakes a 2.5s static hold, guizang still walks, acreage still pans. --------- Co-authored-by: audit <a@b.c>	2026-06-10 05:43:50 +00:00
yinjialu	fcef49b342	feat: add AI Native observability trace diagnostics (#3714 ) * docs: spec ai native observability loop * docs: clarify observability loop gates Generated-By: looper 0.9.3 (runner=fixer, agent=codex) * docs: include evaluation runs in score model Generated-By: looper 0.9.3 (runner=fixer, agent=codex) * docs: broaden experiment score eligibility Generated-By: looper 0.9.3 (runner=fixer, agent=codex) * docs: define observability artifact storage boundary * docs: expand observability registry rollout * docs: complete artifact manifest example Generated-By: looper 0.9.3 (runner=fixer, agent=codex) * docs: align observability manifest and scoring examples Generated-By: looper 0.9.3 (runner=fixer, agent=codex) * docs: define fixed dataset trust gate * feat: add ai observability trace diagnostics * docs: clarify object storage issue compatibility * fix: tighten Langfuse event provenance and attachment caps Generated-By: looper 0.9.3 (runner=fixer, agent=codex)	2026-06-08 03:43:43 +00:00
chaoxiaoche	eabed76a4c	docs: plan design system 2.0 backfill (#3776 ) Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>	2026-06-06 08:02:31 +00:00
elihahah666	67077fd36f	chore(docs): move translated docs into docs/i18n/ (#3621 ) * chore(docs): move translated docs into docs/i18n/ Collect the translated README/QUICKSTART/CONTRIBUTING/MAINTAINERS files (including the Korean set) into docs/i18n/, leaving only the English sources in the repo root so the GitHub project home page file list stays clean. Rewrite internal links for the new layout (../../ for repo-root resources, sibling filenames between translations), update both switcher conventions, the i18n-check mixed-layout support, the contributors-wall workflow globs, TRANSLATIONS.md guidance, and drop now-dead root translation paths from the fork-PR docs allowlist. * fix(docs): correct root-relative links in Korean contribution guide Prefix repo-root targets (scripts/sync-design-systems.ts, TRANSLATIONS.md, package.json, .github/pull_request_template.md) with ../../ so they resolve from the new docs/i18n/ depth; sibling translated docs stay bare. Generated-By: looper 0.8.1 (runner=fixer, agent=claude-code) --------- Co-authored-by: qiongyu1999 <2694684348@qq.com>	2026-06-04 06:55:16 +00:00
Vladyslav Ovdeychuk	77fee5fe42	docs: document macOS Docker host networking workaround (#3417 ) Co-authored-by: Vladislav Ovdeychuk <ovdeychuk@trueconf.ru> Co-authored-by: Siri-Ray <2667192167@qq.com>	2026-06-04 04:04:32 +00:00
Amy	8fefebbbaf	test(e2e): add priority tiers and main UI alerts (#3574 ) * test(e2e): add priority tiers and stabilize p0 coverage * test(e2e): align restoration artifact reopen path * test(e2e): stabilize p1 workspace flows * ci(e2e): run extended UI on main and notify failures * fix(e2e): repair priority preflight checks Generated-By: looper 0.9.2 (runner=fixer, agent=codex) * fix(e2e): restore AMR login pill coverage Generated-By: looper 0.9.2 (runner=fixer, agent=codex) * fix(e2e): add full UI test script Generated-By: looper 0.9.2 (runner=fixer, agent=codex) * fix(e2e): remove disallowed ui script Generated-By: looper 0.9.2 (runner=fixer, agent=codex) * fix(e2e): allow full UI script in guard Generated-By: looper 0.9.2 (runner=fixer, agent=codex)	2026-06-03 12:08:06 +00:00
yinjialu	c848e53b6c	docs: plan run reliability optimizations (#3526 )	2026-06-03 09:33:21 +00:00
PerishFire	98767fb302	chore: move safe large assets to R2 (#3503 ) * chore: tighten blob guard to 2MiB * chore: move safe 1MiB assets to R2	2026-06-02 11:10:07 +00:00
PerishFire	c3c356961b	chore: move large repository assets to r2 (#3492 )	2026-06-02 10:19:12 +00:00
Shivam	3f165b5498	docs(deploy): add Azure Container Instances guide (#3163 ) * docs(deploy): add Azure Container Instances guide * docs(deploy): clarify Azure proxy topology * docs(deploy): keep Azure proxy streams unbuffered --------- Co-authored-by: Shivam <shivam2931120@users.noreply.github.com>	2026-06-02 08:14:21 +00:00
Denis Redozubov	3da33f92a1	Harden sandbox orchestration daemon chokepoints (#3420 ) * Harden sandbox orchestration chokepoints * Cover web app public copy in neutrality guard	2026-06-02 07:33:12 +00:00
Dhakshin V	484ec7c664	docs: add Alibaba Cloud (阿里云) deployment guide (#3275 ) * docs: add Alibaba Cloud (阿里云) deployment guide Adds docs/deployment/cloud/aliyun.md with: - ECS single-machine deployment using the existing Docker Compose stack - ACK (Kubernetes) reference manifest and multi-replica caveats - Image acceleration setup via Alibaba Cloud Container Registry - ICP filing (备案) overview for mainland China public hosting - Common pitfalls and references to existing Docker / install-guide docs Docs-only slice for #1025. Live ROS templates, one-click scripts, and verification screenshots are out of scope here and tracked as follow-up work in the issue. * docs: fix ACK manifest reachability and OD_ALLOWED_ORIGINS alias Address PerishCode review on PR #3275: - The daemon defaults to OD_BIND_HOST=127.0.0.1 (apps/daemon/src/server.ts), so the readinessProbe and ClusterIP Service in the previous manifest could never reach the Pod. Add OD_BIND_HOST=0.0.0.0 + OD_API_TOKEN (required by the bound-API-token guard for non-loopback binds) and a kubectl secret step. - The daemon reads OD_ALLOWED_ORIGINS, not OPEN_DESIGN_ALLOWED_ORIGINS. OPEN_DESIGN_* names are Compose-only aliases mapped in deploy/docker-compose.yml. Use OD_ALLOWED_ORIGINS for the direct-container ACK path and call out both names in the network-exposure section and the pitfalls table. Also adds an Ingress / Bearer-token note for operators fronting the Service externally. * docs: document OD_API_TOKEN bearer-token forwarding in Nginx block PerishCode follow-up review on PR #3275: The Path A Nginx block as written would 401 every UI call except the three open probes (/api/health, /api/version, /api/daemon/status). Same root cause as the ACK fix in `9d5f6ec` — the auth model affects both paths, not just direct-container deployments. Verified against source: - deploy/scripts/install.sh:386 always writes a generated OD_API_TOKEN into deploy/.env (no opt-out flag). - deploy/docker-compose.yml:18 requires OD_API_TOKEN (Compose ? syntax) and binds OD_BIND_HOST=0.0.0.0, so the daemon-side bearer middleware is always active for the Compose path. - apps/daemon/src/server.ts:3777 keys the loopback short-circuit on isLoopbackPeerAddress(req.socket?.remoteAddress) — the TCP peer, not X-Forwarded-For — so a reverse-proxied request from a Docker bridge IP never gets the localhost bypass. Adds proxy_set_header Authorization to the Nginx block, a paragraph explaining where OD_API_TOKEN comes from, and updates the pitfalls row that previously only mentioned CORS to also list the missing-bearer cause.	2026-06-02 05:53:06 +00:00
Amy	3083388a1a	Add launch review regression coverage (#3300 ) * Add main launch review E2E coverage * Add daemon launch review regression coverage * Tighten plugin authoring completion regressions * fix(web): preserve deck slide on preview switches Generated-By: looper 0.9.2 (runner=fixer, agent=codex) * Add project detail regression coverage	2026-06-01 14:19:33 +00:00
open-design-bot[bot]	e1f93a2f40	Update docs/assets/github-metrics.svg (#3376 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-31 14:48:58 +00:00
open-design-bot[bot]	e76eb6da63	Update docs/assets/github-metrics.svg (#3338 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-30 04:31:16 +00:00
open-design-bot[bot]	482e318afe	Update docs/assets/github-metrics.svg (#3267 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-29 14:12:36 +00:00
lefarcen	da19ff3ca0	feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241 ) * feat(mocks): replay-based mock CLIs for opencode/claude/codex/deepseek/qwen/grok Drops in a `mocks/` top-level dir that pretends to be the real agent CLIs by streaming pre-recorded sessions in each CLI's native stdout protocol. Zero LLM tokens. ## Use cases - E2E tests in `apps/daemon/tests/` — exercise the full chat-server pipeline against a known trace, assert UI events / artifacts. - Self-validation during dev — iterate on `claude-stream.ts` / `json-event-stream.ts` parser changes without burning provider budget. - Regression harness — replay the same trace before and after a charter / parser change; diff the daemon events the UI surfaces. - Demo / onboarding — show what a 17-tool claude editing session looks like end-to-end, offline. ## How - 6 bash wrappers (`mocks/bin/`) shadow the real CLIs when PATH-overlaid. - `mocks/mock-agent.mjs` reads `mocks/recordings/<trace>.jsonl`, picks one via env var (`SYNCLO_EXPLORE_MOCK_TRACE` / `_POOL` / `_BY_PROMPT_HASH`), streams the trace in the requested format. - Each format renderer matches the EXACT JSON shape the OD daemon parser expects, verified line-by-line against `apps/daemon/src/{json-event-stream,claude-stream}.ts`: \| CLI \| streamFormat \| parser source \| \| ------------------------- \| ------------------------- \| ------------------------------------------ \| \| `opencode` \| `json-event-stream` \| `handleOpenCodeEvent` \| \| `codex` \| `json-event-stream` \| `handleCodexEvent` \| \| `claude` \| `claude-stream-json` \| `createClaudeStreamHandler` \| \| `deepseek` `qwen` `grok` \| `plain` \| `server.ts` (raw stdout) \| ## Quick start ```bash export PATH="$PWD/mocks/bin:$PATH" export SYNCLO_EXPLORE_MOCK_TRACE=04097377 # 8-char prefix OK export SYNCLO_EXPLORE_MOCK_NO_DELAY=1 echo "any prompt" \| opencode run echo "any prompt" \| claude -p --output-format=stream-json echo "any prompt" \| codex exec ``` The mock binary announces the picked trace id on stderr: `[mock-opencode] picked 04097377… via fixed`. Recording selection (env, in priority order): - `SYNCLO_EXPLORE_MOCK_TRACE=<id>` — fixed (prefix OK) - `SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH=1` + stdin prompt — `sha256(prompt) % N` - `SYNCLO_EXPLORE_MOCK_POOL=<tag>` — random within `agent:claude` / `skill:agent-browser` / `outcome:failed` / etc. - (default) uniform random - `SYNCLO_EXPLORE_MOCK_SEED=<str>` — reproducible "random" - `SYNCLO_EXPLORE_MOCK_NO_DELAY=1` — skip inter-event waits ## Dataset 179 anonymized Langfuse traces from this project's own production telemetry: - 9 agents: claude 57 · opencode 41 · codex 38 · gemini 25 · cursor-agent 11 · qwen 2 · copilot 2 · deepseek 2 · antigravity 1 - outcomes: succeeded 144 · failed 35 - skills: default 71 · ad-creative 50 · algorithmic-art 30 · agent-browser 22 · video-hyperframes 2 · plus magazine-web-ppt / brainstorming / data-report / penpot-flutter-design-source 1 each - 124 multi-turn (sessions with ≥2 turns) - 18 produce `<artifact>` output - ~4.5 MB on disk total Anonymization: `/Users/<name>/` → `${HOME}/`, `C:\Users\<name>\` → `%USERPROFILE%\`, project UUIDs → stable `proj-001`, `proj-002`, …. Tool input/output payloads preserved verbatim (templated UI, no cell-level PII). ## Smoke test `bash mocks/scripts/smoke-test.sh` — 6 checks across all 6 agents. All pass on this branch (verified locally): ``` ✓ opencode first event = step_start ✓ codex first event = thread.started ✓ claude first event = system ✓ deepseek emitted plain text (144 chars on first line) ✓ qwen emitted plain text (144 chars on first line) ✓ grok emitted plain text (144 chars on first line) All mock CLIs working. ✅ ``` ## Adding more recordings The exporter that produced this set lives in [nexu-io/agent-pr-explore](https://github.com/nexu-io/agent-pr-explore) (see `cli/src/local/orchestrator/langfuse-import.ts` + the `local langfuse-import` CLI command). Operators with the Langfuse keys can pull more by tag / outcome / artifact / multi-turn filter, then run `local recordings anonymize --out-dir ~/Documents/open-design/mocks/recordings`. `mocks/README.md` has the full instructions. ## Out of scope (follow-ups) - ACP agents (`devin`, `hermes`, `kilo`, `kimi`, `kiro`, `vibe`) need a JSON-RPC server on stdio rather than a one-shot stream — separate `format-acp.mjs` module not yet written. - Per-agent json-event-stream variants (`cursor-agent`, `gemini`, `qoder`, `copilot`, `pi`) currently fall back to the `plain` renderer; their parsers are in `apps/daemon/src/json-event-stream.ts` and follow the same template as `format-codex.mjs`. ## AGENTS.md updates - Added `mocks/` to the top-level content directories listing - Added a Validation strategy bullet pointing here for agent-stream / parser changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mocks): add opencode-cli/kiro-cli/vibe-acp bin aliases and unref ACP timeout - Add mocks/bin/opencode-cli, kiro-cli, vibe-acp wrappers for the primary RuntimeAgentDef bin names OD resolves before any fallback. Without these, a PATH-overlaid OD daemon run bypasses the mock entirely (opencode-cli, kiro-cli) or cannot find the mock at all (vibe-acp, which has no fallback). - Include opencode-cli, kiro-cli, vibe-acp in the smoke-test ACP/JSON loop so coverage is verified end-to-end. - Call .unref() on the 30s safety timeout in format-acp.mjs so a completed ACP session exits promptly instead of waiting the full 30 seconds. Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code) * feat(mocks): add vela (AMR) — login / models / ACP with strict set_model gate Extends mocks/ to cover OD's own AMR runtime. `vela` is the bin name `apps/daemon/src/runtimes/defs/amr.ts` specifies (`bin: 'vela'`, `streamFormat: 'acp-json-rpc'`). It's richer than the generic ACP agents — covers full login + models + chat-session lifecycle. ### What vela does (mirrored from apps/daemon/tests/fixtures/fake-vela.mjs) 1. `vela login` — writes ~/.amr/config.json with a fake profile (controlKey, runtimeKey, user{email,name,plan}, profile-specific apiUrl/linkUrl). The on-disk projection is what OD's daemon login route + AmrLoginPill poller read; production goes through device-auth, the mock skips straight to the file write. 2. `vela models` — prints the production-shaped public model catalog as newline-separated `public_model_* vela` lines. Override via FAKE_VELA_MODELS env. 3. `vela agent run --runtime opencode` — ACP JSON-RPC server with three vela-specific protocol extensions: a. `initialize` response carries `agentCapabilities` (`promptCapabilities.embeddedContext`) + `models` (`currentModelId` + `availableModels`). b. `session/new` response carries the same `models` block. c. Strict set_model gate: `session/prompt` is rejected with JSON-RPC -32602 ("session/set_model must be called before session/prompt") UNLESS `session/set_model` (or `session/set_config_option`) has been called for the current sessionId. Mirrors real vela 0.0.1 contract; catches regressions in `attachAcpSession` that silently skip set_model. ### Error injection envs (in sync with fake-vela.mjs) FAKE_VELA_SESSION_ID - sessionId returned by session/new FAKE_VELA_TEXT - override assistant text FAKE_VELA_THOUGHT - optional thought_chunk before text FAKE_VELA_SESSION_NEW_ERROR - fail session/new FAKE_VELA_SET_MODEL_ERROR - fail session/set_model FAKE_VELA_PROMPT_ERROR - fail session/prompt FAKE_VELA_REQUIRE_SET_MODEL='0' - disable the strict gate (legacy) FAKE_VELA_LOGIN_USER_EMAIL - email written into config profile FAKE_VELA_LOGIN_USER_PLAN - plan written into config profile FAKE_VELA_LOGIN_DELAY_MS - sleep before write (test in-flight) FAKE_VELA_LOGIN_FAIL - print + exit 1 FAKE_VELA_MODELS - override models stdout VELA_PROFILE - profile slot (prod \| test \| local) ### Components `mocks/lib/format-vela.mjs` (~205 LOC) - Full ACP server with vela protocol extensions - Strict set_model gate - Error injection plumbing `mocks/lib/vela-subcommands.mjs` (~90 LOC) - runVelaLogin() — writes ~/.amr/config.json - runVelaModels() — prints catalog `mocks/bin/vela` — dispatcher wrapper. Forwards `vela <subcmd>` to mock-agent.mjs which routes to login/models or falls through to ACP. `mocks/mock-agent.mjs` — parseArgs now collects positionals so the vela dispatcher can read subcommand from there; switch case added for vela. `mocks/scripts/smoke-test.sh` — +4 assertions: vela models prints ≥10 catalog lines vela login writes ~/.amr/config.json with the requested email vela agent run ACP roundtrip (initialize+models+set_model+stream+result) vela strict set_model gate rejects prompt without prior set_model ### Verified locally ✓ vela models printed 15 catalog lines ✓ vela login wrote ~/.amr/config.json with profile.prod.user.email ✓ vela agent run ACP roundtrip (initialize+models, set_model accepted, prompt streamed) ✓ vela strict set_model gate rejects session/prompt without prior set_model All 21 smoke checks pass (up from 17 with previous P3 ACP commit). ### AGENTS.md + README updates AGENTS.md — mention `vela (AMR — vela CLI)` alongside ACP agents in the directory listing entry. mocks/README.md — protocol table row + dedicated vela section with subcommand contract, strict gate explanation, env-injection cheat sheet. Mock-tree listing updated. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mocks): honor REPORT_FILE env when --report-file flag not given Harnesses that spawn the mock without translating their report-path contract to the mock's CLI flag (notably nexu-io/agent-pr-explore's orchestrator, which passes REPORT_FILE as env per the existing opencode/claude/codex agent launchers) wouldn't get a report file written, so the harness's "agent exit 0 but produced no report" check would always fire and mark mock runs as failure even though the stdout stream was complete. Fix: in mock-agent.mjs parseArgs, fall through to process.env.REPORT_FILE when --report-file wasn't provided on argv. Each format renderer already accepts opts.reportFile and writes the recording's final assistant text to it (`format-.mjs` already had this — only the wiring was missing). Verified: synclo-explore run with `mock=true, mock_trace=04097377` against the opencode wrapper now produces a plan.md with the recording's 17-tool claude editing session report. ~1.5s per run vs ~70s real opencode. mocks: move recordings to Cloudflare R2; PR→main→Action upload path The 179-recording corpus (~4.5 MB raw, ~280 KB after compression) has been moved off git into Cloudflare R2 at the bucket open-design-mocks under recordings/v1/. The repo now ships: - mocks/manifest.json — the canonical catalog (renamed from recordings/index.json) with sha256 + storage hints; consumers fetch this to discover what exists, then pull individual jsonl files on demand - mocks/scripts/fetch-recordings.sh — parallel, sha256-verified, idempotent puller for the public r2.dev URL - mocks/scripts/add-recording.sh — local maintainer helper that validates a new .jsonl and copies it into recordings-staging/ (no R2 calls; no credentials needed) - mocks/scripts/upload-to-r2.mjs — called only by the CI workflow - mocks/scripts/lib/manifest-utils.mjs — shared sha256/meta/ rebuild-histograms logic, used by both add-recording (preview) and upload-to-r2 (actual write) so the entry shape never drifts - .github/workflows/sync-mocks-to-r2.yml — fires on push to main when mocks/recordings-staging/ changes; uploads to R2, updates manifest, commits cleanup back; serialized via concurrency group Trust model: R2 write credentials (CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID) are repo secrets; nobody can push from a laptop. Read stays public via the r2.dev URL. Why not pnpm install integration: contributors who do not touch agent code do not pay the fetch cost. Fetch happens on first smoke-test run (auto-fallback) or when a mock spawn needs data. Repo size: -4.55 MB net (delete 179 jsonl, +280 KB manifest + scripts). Smoke test (21 checks) still green against the fetched corpus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: scope R2 write token to a dedicated secret name Use CLOUDFLARE_R2_MOCKS_TOKEN (instead of reusing the shared CLOUDFLARE_API_TOKEN that landing-page-.yml uses for Pages deploys) so the R2 write capability can be scoped to just the open-design-mocks bucket without bleeding extra capability into the Pages workflows. Also hardcode the powerformer CF account_id directly in the workflow (account IDs are not secret and the shared CLOUDFLARE_ACCOUNT_ID secret may point at a different account). Workflow now fails fast with an actionable error message + dashboard link if the secret is unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> mocks: switch R2 sync to S3-compat API (wrangler getMemberships gate) wrangler 4.x calls /memberships before any r2 action, requiring user:read scope. R2 "Object Read & Write" tokens deliberately lack that scope (defense in depth — a leaked token should not enumerate account-level resources). The workflow now uses the aws CLI talking straight to the R2 S3-compatible endpoint with SigV4, no membership lookup. Secret rotation: CLOUDFLARE_R2_MOCKS_TOKEN (Bearer) is replaced by CLOUDFLARE_R2_MOCKS_AK / CLOUDFLARE_R2_MOCKS_SK (matching the existing CLOUDFLARE_R2_RELEASES_AK/SK naming convention). End-to-end tested locally: PUT recording → manifest rebuild → manifest PUT → staging cleanup all green. aws CLI is pre-installed on ubuntu-latest, so no install step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: scrub synclo namespace; use OD_MOCKS_* env prefix throughout These mocks were copy-pasted from synclo-explore, where they originated, and inherited the SYNCLO_EXPLORE_MOCK_* env-var convention. That brand-bleed is not appropriate in OD: rename the public env surface to OD_MOCKS_* (matching OD-native prefixes like OD_MOCKS_CACHE_DIR, OD_TRACE_R2_UPLOAD, OD_EXPECT_TIMEOUT_SECONDS). Renames: SYNCLO_EXPLORE_MOCK_TRACE → OD_MOCKS_TRACE SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH → OD_MOCKS_BY_PROMPT_HASH SYNCLO_EXPLORE_MOCK_POOL → OD_MOCKS_POOL SYNCLO_EXPLORE_MOCK_SEED → OD_MOCKS_SEED SYNCLO_EXPLORE_MOCK_NO_DELAY → OD_MOCKS_NO_DELAY SYNCLO_EXPLORE_MOCK_RECORDINGS_DIR → OD_MOCKS_RECORDINGS_DIR SYNCLO_EXPLORE_MOCK_SMOKE_TRACE → OD_MOCKS_SMOKE_TRACE SYNCLO_OD_MOCKS_I_KNOW_WHAT_IM_DOING → OD_MOCKS_ALLOW_LOCAL_UPLOAD Also drop the inline harvester usage from README. The harvester is an external CLI in nexu-io/agent-pr-explore — its README is the right place for langfuse-import flags, anonymization options, etc. OD only documents its own staging→PR→Action workflow. Smoke test (21 checks) still green; OD_MOCKS_TRACE end-to-end verified to route correctly. Consumers of the OLD env names (notably the orchestrator in nexu-io/agent-pr-explore) need a matching rename. No back-compat shim here — the explore side has zero external users today and a one-line follow-up is cleaner than a permanent deprecation layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * AGENTS.md: align mock env names with mocks/ rename (SYNCLO_* → OD_MOCKS_) Missed in the prior commit (`a30b868a`) — only grepped mocks/ subdir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> mocks: drop staging dir + GH Action; back to local-script upload The staging-dir + Action design (added earlier in this PR) had a flaw the user caught: new recordings briefly entered the repo on their way through staging, leaving them in git history forever even after the Action cleanup commit removed them from HEAD. That defeats the whole point of moving recordings to R2. Replace with the simpler local-maintainer flow: bash mocks/scripts/upload-recording.sh /path/to/<trace>.jsonl # → validates, wrangler r2 put, updates manifest.json, wrangler r2 put manifest git add mocks/manifest.json && git commit && git push # → only the ~200B manifest delta enters git The wrangler-OAuth gate replaces the CI secret + Action duo. For a solo / small maintainer team this collapses the trust chain down to "do you have wrangler login to the powerformer account?" — no GH secrets to rotate, no concurrency window to worry about, no inevitable repo-history bloat. Deletes: - .github/workflows/sync-mocks-to-r2.yml - mocks/scripts/upload-to-r2.mjs (CI-only) - mocks/scripts/add-recording.sh (staging helper, now obsolete) - mocks/recordings-staging/ (empty dir, never to be repopulated) Adds: - mocks/scripts/upload-recording.sh Kept: - mocks/scripts/fetch-recordings.sh - mocks/scripts/lib/manifest-utils.mjs (still used by upload-recording.sh) - mocks/manifest.json (committed; the only mocks artifact in git) End-to-end tested locally: re-upload an existing recording is idempotent, manifest math is stable, fetch + smoke test still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: address review — guard allowlist + safe ~/.amr + loud OD_MOCKS_TRACE typo Three concrete issues raised across recent Siri-Ray (Looper) review threads on #3241: 1. scripts/guard.ts only allowlisted mocks/lib/ + mocks/mock-agent.mjs, leaving mocks/scripts/lib/manifest-utils.mjs outside the residual- JS guard. Result: Preflight fail on every push. Extend the allowlist to mocks/scripts/ — same precedent as the lib/ entry directly above. 2. mocks/scripts/smoke-test.sh moved the caller real ~/.amr to ~/.amr-smoke-backup, ran vela login (which writes a fake config), then rm -rf the .amr and restored the backup. Two failure modes: crash mid-run loses the user real config, and re-running before restore overwrites the backup with the fake login. Fix: sandbox vela login into a mktemp -d HOME via env (HOME=$amr_sandbox vela login). Never touches the real ~/.amr at all. trap cleans up. 3. mocks/lib/recording-picker.mjs silently fell through to prompt-hash → pool → random when OD_MOCKS_TRACE was set but did not match any recording (typo, prefix too short, corpus not fetched). Tests using a pinned trace would silently get a different trace, hiding regressions. Fix: throw an explicit error with the failing value + a pointer at fetch-recordings.sh. Verified locally: pnpm guard prints "Residual JavaScript check passed", smoke-test still 21/21, ~/.amr mtime unchanged after run, typo on OD_MOCKS_TRACE now produces "mock-agent: OD_MOCKS_TRACE=... set but no matching recording in <dir>" on stderr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fetch-recordings: detect empty filter result before line-counting printf '%s\n' on an empty string emits a single empty line, so the previous TOTAL=$(printf ... \| grep -c "") math returned 1 on an empty $ENTRIES_TSV — a typo like `--agent no-such-agent` printed "Fetching up to 1 recordings", downloaded zero, and exited 0 ("ready"). Check `-z $ENTRIES_TSV` first. Reproduced + fix verified per the reviewer thread. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * mocks: address mrcfps review — goldens + provenance + contract check Three durability improvements suggested in the PR #3241 top-level review: ## 1. Golden daemon-event snapshots (mocks/golden/.events.json + apps/daemon/tests/mocks-golden.test.ts) Smoke-test verified that mocks RUN; that catches crashes but not a parser change that semantically reshapes the events the daemon emits. Commit the daemon-event sequence for 3 representative traces: - claude 314d6833 — median-complexity agent-browser session - codex dcdff3b3 — 14-tool refactor - opencode 9a9522ec — 7-tool data-report apps/daemon/tests/mocks-golden.test.ts spawns the mock, feeds stdout through the real createClaudeStreamHandler / createJsonEventStreamHandler, normalizes per-spawn volatile fields (only sessionId today, only on claude), and deep-equals against the committed snapshot. A parser regression fails the test loudly. After an intentional parser change, regenerate: MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden git diff mocks/golden/ # eyeball; commit if shapes match intent ## 2. Provenance fields on every manifest entry (mocks/scripts/lib/manifest-utils.mjs + mocks/manifest.json) Augment inspectRecording() to write: captured_at — ISO 8601 from existing meta.timestamp cli_version — null until harvester writes it protocol_version — null until harvester writes it anonymization_version — null until harvester writes it captured_at is now populated for all 179 existing entries from the meta event the harvester already emits. The harvester in nexu-io/agent-pr-explore is the next step for cli_version / protocol_version / anonymization_version — once those are populated, consumers can detect when a recording is older than ~1 minor version behind the live CLI and flag for re-harvest. No matrix of (cli_version × agent) recordings — that explodes maintenance. Just metadata per recording so trust decay is visible. ## 3. Real-CLI contract check (mocks/scripts/contract-check.sh + docs/MOCKS-CONTRACT-CHECK.md) Mocks catch parser regressions against recordings; they do NOT catch recordings drifting away from the live agent CLI as that CLI evolves. The contract check spawns the real CLI alongside the mock with a fixed deterministic prompt + diffs top-level event-type distributions. Deliberately human-driven, not cron-scheduled: - costs real LLM tokens per invocation - requires real CLI auth - maintainer reads the output, not a regex Suggested triggers per doc: real-CLI release notes mentioning "output format" / "stream" / "JSON" / "events"; before a parser refactor; ad-hoc when something looks off. ## Coverage note README updated to position mocks as "deterministic protocol/parser coverage" (not "e2e replacement") per mrcfps framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(mocks-golden test): drop import of non-exported ParserKind Use plain string (the type alias is `string` anyway) — Preflight typecheck on `a31fa71a` failed: tests/mocks-golden.test.ts(29,8): error TS2459: Module "../src/json-event-stream.js" declares "ParserKind" locally, but it is not exported. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * recording-picker: structured OD_MOCKS_POOL + hard-fail no-match Siri-Ray review: \`OD_MOCKS_POOL=outcome:failed\` was documented as a supported selection knob, but the matcher only checked tags and \`meta.agent\` — so the negative-path pool found 0 candidates and silently fell through to global random, validating against any recording instead of a failed trace. Fix: - Parse \`<dim>:<value>\` shape and route each dim to the right meta field: \`outcome\` → \`meta.outcome\`, \`agent\` → \`meta.agent\`, \`skill\` → \`tags[]\`. Bare values still fall back to tag substring. - If the env was set and matched nothing, throw with the failing value and a jq one-liner for inspection. Same loud-fail policy as OD_MOCKS_TRACE — silent fallback was the original bug. Verified locally: outcome:failed, agent:codex, skill:agent-browser all route correctly; outcome:nonsense throws the explicit error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * contract-check.sh: fix lost $PROMPT in mock invocation Siri-Ray review on `e576074a`: the mock side wrapped its pipeline in `bash -c "printf %s \"\$PROMPT\" \| ..."` — but $PROMPT was a parent shell variable, not exported, so the child bash expanded it to an empty string. Result: the contract check sent the real prompt to the real CLI and an empty string to the mock, defeating the same-input invariant the whole script rests on. Also let the mock randomly select a different trace whenever a maintainer happens to have OD_MOCKS_BY_PROMPT_HASH=1 in their env. Fix: drop the inner bash -c entirely; use a subshell that scopes the PATH overlay and pipes printf into the PATH-resolved mock binary directly. The subshell limits the PATH change without var-passing. Verified locally: with prompt-A the mock picks trace 54ec02ee via hash; prompt-B → 2667e851 via hash; empty prompt (old broken behavior) → random — confirms the prompt is now actually reaching the mock under PATH overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 07:17:20 +00:00
open-design-bot[bot]	49573f031a	Update docs/assets/github-metrics.svg (#3159 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-29 03:02:19 +00:00
Amy	1c2a1c4459	Add launch review regression coverage and stabilize daemon tests (#3207 ) * Add launch review E2E regression coverage * Harden daemon launch review regressions * Stabilize daemon runtime tests * fix(tests): restore e2e preflight typing Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * fix(tests): make fake plugin runtime ESM-safe Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * Stabilize e2e fake agent and regression tests * fix(tests): repair fake agent cjs runtime Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * fix(review): harden plugin authoring checks Generated-By: looper 0.9.2 (runner=fixer, agent=codex) * fix(tests): bind plugin authoring run to seeded conversation Generated-By: looper 0.9.2 (runner=fixer, agent=codex)	2026-05-29 02:39:33 +00:00
Denis Redozubov	f70fa0eb35	docs(media): describe external media composition (#3201 )	2026-05-28 10:41:02 +00:00
lefarcen	df8a0faff6	feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355 ) * feat(runtimes): register AMR (vela) as an ACP stdio agent AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode` speaks ACP JSON-RPC over stdio (see vela's `specs/current/runtime/manual-agent-run-openrouter.md`); per `docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat: 'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc. The new `defs/amr.ts` is the entire wiring — `buildArgs` returns `['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses `detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN` allowlist + install/docs URLs, so users can configure the per-agent env in Settings without leaking into other adapters. Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns the documented `initialize` / `session/new` / `session/set_model` / `session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via `child_process.spawn` and drives a full turn through `attachAcpSession` and `detectAcpModels`, so the ACP transport contract for AMR is end-to-end verified locally even before a real `vela` binary is installed. Validated: - pnpm guard - pnpm typecheck (all workspace projects) - pnpm --filter @open-design/daemon test (2881/2881) Deferred: real OpenRouter-backed turn through a built `vela` binary — the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY` and `VELA_LINK_URL` in env (or Settings). * fix(runtimes/amr): pin a concrete default model and bare openai ids End-to-end validation against a freshly-built `vela` (nexu-io/vela@main) + OpenRouter surfaced two contract details the first AMR runtime def got wrong: 1. vela rejects `session/prompt` with `session/set_model must be called before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts skips set_model whenever the picked model is the synthetic 'default' id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The def now ships a concrete `gpt-5.4-mini` as both `fetchModels`' default option and `fallbackModels[0]`, which makes attachAcpSession always send a real `session/set_model` for AMR turns. 2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId it forwards to opencode's openai provider. With OpenRouter-style ids like `openai/gpt-5.4-mini`, opencode receives the double-prefixed `openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`. The new fallback list ships the bare ids opencode's openai registry actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.). Stub + tests: - tests/fixtures/fake-vela.mjs now enforces the set_model gate the same way real vela does, so a regression that silently goes back to model: 'default' would surface as a fatal error in tests instead of a hidden production failure. - tests/amr-acp-integration.test.ts pins both contracts: no 'default' / no 'openai/' prefix in fallbackModels, and a negative case that asserts session/prompt fails when no model is set. Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time runner that drives `attachAcpSession` against a real `vela` binary and prints the daemon's chat events, so future protocol drift can be checked against an actual OpenRouter call. Verified locally: `vela agent run --runtime opencode` + OpenRouter returns the prompted string ("AMR-E2E-PASS") through the full daemon pipeline; daemon test suite stays 2883/2883. * fix(runtimes/amr): substitute concrete model when chat run sends 'default' A plugin-driven AMR run from the UI surfaced a real-world hole in the prior commit: json-rpc id 3: session/set_model must be called before session/prompt The Default-design-router plugin (and any caller that doesn't pin a real model) sends `model: 'default'` straight through, which the AMR runtime def cannot accept — vela rejects `session/prompt` without `session/set_model` and attachAcpSession skips set_model whenever model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the adapter's `fallbackModels` is not enough: the chat-run handler in server.ts still forwarded 'default' verbatim. This adds `resolveModelForAgent(def, resolved, env?)` as the single source of truth for the substitution: 1. If the caller picked a real id, pass it through. 2. Else, if `def.defaultModelEnvVar` is set and the daemon process env has a non-empty value for it, return that (operator escape hatch — see below). 3. Else, if the def's `fallbackModels` does NOT contain a 'default' id, return `fallbackModels[0].id`. 4. Else, return the original value (the historic shape — defs that list 'default' themselves are untouched). AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when opencode's openai-provider registry deprecates `gpt-5.4-mini` upstream, an operator can swap the fallback id without a code change by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev / od. Worth noting the env var must live in the daemon's `process.env` (Settings-UI per-agent env values only reach the spawned child, not the daemon's resolver) — the new field's docblock spells this out. Coverage: - `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all four resolver branches plus the env-override happy path / fallback / ignore-when-user-picked-a-real-id case. - `pnpm --filter @open-design/daemon typecheck` clean. * chore(runtimes/amr): move AMR to the top of the base agent list So `AMR (vela)` shows up first in the agent picker / status views, ahead of claude / codex. Pure ordering change; no behavior delta. * feat(amr): Sign-in / Sign-out button on the AMR Settings card The first half of the AMR work assumed the operator would set VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never surfaced login state to users. This adds the missing UX so a fresh install can drive the full path from Settings: - GET /api/integrations/vela/status reads ~/.vela/config.json for the active profile and returns { loggedIn, profile, user } (without leaking the runtime/control keys themselves). - POST /api/integrations/vela/login spawns `vela login` once (409 if one is already in flight). The vela CLI opens the user's browser to the device-authorization page itself — Open Design only needs to kick the subprocess off. - POST /api/integrations/vela/logout removes ~/.vela/config.json so the next status read returns logged-out. `AmrAgentCard` is a dedicated agent-card component for AMR because the existing `<button>` row can't host an interactive sub-control (nested interactive elements). It polls /status after a login click until the daemon reports loggedIn=true (or 5 minutes elapse), and exposes a Sign-out action on hover. Other adapters (claude, codex, hermes, …) keep their existing `<button>` card. i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.) added to en + zh-CN. Other locales spread `en` and inherit the English copy until translations land. Coverage: - `tests/integrations/vela.test.ts` pins the config.json reader against a tmp HOME — including the negative case where a profile has user info but no runtimeKey (still logged-out), and the secret-leak guard ("rt-secret-" must not appear in the projection payload). - `tests/components/AmrAgentCard.test.tsx` covers all four UI states (logged-out, logging-in, logged-in, logging-out) plus the click-propagation invariant the divergent card was built to keep. `pnpm --filter @open-design/daemon test` 2901 / 2901 passing. `pnpm --filter @open-design/web test` 1719 / 1719 passing. `pnpm typecheck` + `pnpm guard` clean. Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs` no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if VELA_PROFILE is set, the vela CLI is allowed to resolve credentials from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to `scripts/guard.ts` allowlist with the executable-fixture / dev-runner rationale. fix(connection-test): substitute model for AMR before attachAcpSession The chat-run path in server.ts already routes the requested model through `resolveModelForAgent` so AMR / vela (whose CLI demands an explicit `session/set_model` before `session/prompt`) gets the def's first concrete fallback id when the chat run ships `model: 'default'`. `connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })` directly, which made the Test Connection button on the AMR Settings card deadlock with the same `session/set_model must be called before session/prompt` error the chat-run path already handles — surfaced as a permanent "Testing connection…" spinner in the UI. Reuse the same helper here so Test Connection mirrors chat-run behavior. * test(amr): three-layer end-to-end coverage for the AMR login + turn flow The PR up to this point shipped runtime + UI code with unit-level Vitest coverage. This commit adds the cross-layer regression net the live demo relied on: 1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest) Spins up the real daemon Express app via `startServer({port:0,...})`, persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json, and exercises every /api/integrations/vela/* endpoint against the extended fake-vela stub: - status reads ~/.vela/config.json under various states - login spawns the fake, waits for config.json to appear, returns pid + startedAt + profile - 409 already-running guard with the stub's delay knob - logout removes the file (idempotent) - secrets (runtimeKey / controlKey) never leak in the projection - login → status round-trip flips loggedIn=false → true 2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest) Boots a namespaced daemon + web pair through `createSmokeSuite`, inlines a self-contained fake `vela` binary that handles BOTH `vela login` (writes ~/.vela/config.json) and `vela agent run --runtime opencode` (ACP stdio with the `session/set_model must precede session/prompt` gate the real binary enforces), then drives a complete /api/runs lifecycle for `agentId: 'amr', model: 'default'` and asserts the assistant message captures the fake's streamed text. This is the test that would have surfaced today's plugin-default-model regression (the `set_model before prompt` error) at PR time instead of demo time. 3. e2e/ui/amr-login-pill.test.ts (Playwright) Mocks /api/agents + /api/integrations/vela/{status,login,logout} to drive the Settings AMR card through the full Sign in → Signed in → Sign out cycle. Pins the AmrLoginPill polling contract and the aria-label semantics (the pill's accessible name is "Sign out" once logged in, regardless of which label the hover-state text shows). fake-vela.mjs extensions: - Handles `vela login` argv by writing ~/.vela/config.json for the active VELA_PROFILE and exiting 0 — mirrors real vela's on-disk side-effect without the device-auth loop. - FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the in-flight state of the spawn lifecycle. - FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced user fields end-to-end. Validated: - `pnpm guard` + `pnpm typecheck` (all workspace projects) - `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing, including the new 8-test integration suite. - `cd e2e && pnpm test tests/amr`: 1 / 1 passing. - `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`: 1 / 1 passing (6.7s). * feat(amr): package native cli and refine login ui * feat(amr): wire vela cli beta packaging * docs(amr): document vela ci packaging review * docs(amr): refine vela ci integration review * fix(ci): refresh nix pnpm dependency hashes * fix(pack): clean up Vela CLI packaging * fix(pack): bundle Vela CLI support files * fix(amr): recover login attempts from stale auth state * test: expand AMR and automations coverage * fix(amr): address review follow-ups * test(web): align tasks fixtures with contracts * fix(daemon): type wildcard route params * fix(ci): refresh PR merge validation * fix(amr): clear env credentials on logout * feat(settings): inline local CLI model configuration * fix(amr): recognize daemon env credentials * [codex] Fix Vela companion packaging (#2979) * Fix Vela companion packaging * Update Nix pnpm dependency hashes * [codex] Surface AMR account failures (#2980) * fix: surface AMR account failures * fix: cover AMR recovery error guidance * chore: bump beta base version to 0.8.1 (#2990) * Fix AMR profile and packaged runtime review issues * Detect packaged AMR OpenCode companion tree * feat(web): polish AMR frontend flows * Polish AMR onboarding card * fix: read AMR login state from dot-amr config (#3048) * test: tighten AMR credential and packaging coverage * test: restore AMR executable test env helper * [codex] Fix packaged mac Dock identity and AMR label (#3076) * Fix packaged mac sidecar Dock identity * Rename AMR assistant label * Fix AMR live models and dot-amr login state (#3073) * fix: read AMR login state from dot-amr config * fix: load live AMR models before runs * fix: point AMR onboarding link to production wallet * fix: address AMR model review feedback * fix: persist live AMR model fallback * [codex] Fix AMR link catalog model ids (#3088) * Fix packaged mac sidecar Dock identity * Rename AMR assistant label * Fix AMR link catalog model ids * Fix AMR model normalization typecheck * Use live AMR model for default runs * fix: polish AMR runtime settings UI * Accelerate AMR startup defaults (#3092) * Surface AMR insufficient balance wallet URL (#3099) * fix(web): polish onboarding controls (#3112) * fix(web): show CLI scan loading state * Avoid duplicate AMR wallet recharge links (#3117) * Avoid duplicate AMR wallet recharge links * Use Vela CLI 0.0.3 test package * chore(nix): refresh pnpm deps hash * Fix AMR wallet guidance display --------- Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com> * chore(pack): pin Vela CLI 0.0.3-test.1 (#3127) * chore(nix): refresh pnpm deps hash * chore(pack): pin Vela CLI 0.0.3 * chore(nix): refresh pnpm deps hash * fix(web): suppress AMR exit 130 fallback (#3136) * feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083) * feat(web): nudge users to hosted AMR on model/auth/quota failures When a non-AMR agent run fails with an auth / quota / upstream model error, surface an inline nudge under the error pill linking to Open Design's hosted AMR gateway (https://open-design.ai/amr). The nudge fires `surface_view` (element=run_failed_toast) on impression and `ui_click` (element=go_amr) on the link. Also teach the daemon to classify CLI-agent auth/quota/upstream failures (Claude Code, codex, ...) into specific API error codes (AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of the generic AGENT_EXECUTION_FAILED, so both the error message and the nudge key off accurate codes. AMR's own runs are excluded from the nudge — they keep the dedicated sign-in / recharge affordances. * feat(web): rework failed-run AMR guidance into per-case error UI Replace the single inline nudge with a per-case failed-run experience driven by the run's error code + agent: - The error card is now neutral gray (was red) and always carries a retry button; it is driven by the persisted per-message error event so it survives a reload. - Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion card under the error card offers "switch to AMR & retry" — switches the run to AMR, opens Settings on the AMR card, and auto-retries once the account signs in (ProjectView polls vela login status, independent of the Settings pill lifecycle, with success / 5-min-timeout / unmount exits). - AMR agent unauthorized: clearer copy + an "authorize & retry" button. - AMR agent out of balance: clearer copy + a "top up" button to the AMR wallet, with manual retry. - Settings AMR card: when opened from the nudge, it scrolls into view and pulses, and an authorize-button coachmark (a fake hand cursor that rises in and dismisses on hover) points at the sign-in control when not yet authorized. analytics: surface_view (run_failed_toast) on the promotion card and ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.* and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall back to en) and drops the old chat.amrErrorGuidance keys. * fix(daemon): require status context for numeric service-failure codes Per review on #3083: the model-service classifier matched bare HTTP status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like `line 500`, `read 502 bytes`, or `exit code 401` could be misclassified as a provider outage / auth wall and wrongly surface the AMR nudge. Now a status number only counts when it carries explicit context (`HTTP 500`, `status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases (overloaded, bad gateway, service unavailable, rate limit, …) are unchanged. Adds fixtures proving unrelated numeric output stays null. * fix(web): keep error pill for failed runs ChatPane's card doesn't cover Per review on #3083: the per-message gray error pill was suppressed for every persisted error status event, but ChatPane only renders the replacement top-level error card for `retryableAssistantMessage` (the last failed assistant). So a failed turn that is no longer last (after a follow-up) or an older failed run in history showed neither the pill nor the card — its error detail vanished, undercutting reload/history survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose error the card represents); AssistantMessage suppresses only that one pill and keeps rendering StatusPill for all other error events. * fix(daemon): don't treat a process exit code as an HTTP status Follow-up to review on #3083: the status-context helper accepted a bare `code` prefix, so `exit code 401` / `process exited with code 429` still matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the very `exit code 401` case the comment calls out as noise). `code` now only counts when qualified (`status code` / `error code` / `response code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer matches. Adds fixtures for exit-code lines returning null. * chore(web): translate AMR card / error keys for 16 remaining locales PR #3083 added 10 new `chat.amrCard.` / `chat.amrError.` keys but only provided en/zh-CN/zh-TW translations; the other 16 locales fell back to English. Translate the card title/body, three chips, primary CTA, and the AMR self-error (auth / balance) messages and buttons for ar, de, es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk. * fix(amr): address review feedback on #2355 Targeted fixes for the unresolved review threads on #2355. Each fix includes / updates a focused test. - runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now verifies the inner `opencode` executable exists + is runnable, not just the directory. This closes the false-positive availability path that let `detectAgents()` surface AMR as available even when the packaged companion was empty / partially copied (mrcfps, 4 threads). - runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a stale `opencode` on the user's PATH, so packaged AMR builds can't be hijacked by a global installation. - web/EntryShell.tsx: when the Local CLI scan returns an available agent and the previously-selected agent is AMR, switch the selection to the first available local agent so the runtime and persisted agent agree before Continue. - server.ts (model-probe branch): for AMR, check `readVelaLoginStatus` BEFORE rejecting on an empty live-model catalog — a signed-out user was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of the correct `AMR_AUTH_REQUIRED` (sign-in affordance). - server.ts (default model fallback): if the user asked for the AMR agent default and the cached id is no longer in the FRESH catalog, fall back to `liveModels[0]` from the probe instead of rejecting the run as `AMR_MODEL_UNAVAILABLE`. - integrations/vela.ts: route `vela login` through `createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat` shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with verbatim args (matches `execAgentFile` / chat-run spawning). - tools/pack/src/linux.ts: in containerized Linux builds, bind-mount the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env to the container-side path. The host path was being passed in as-is even though the default container only mounts /project, /tools-pack and cache/home — `copyOptionalVelaCliBinary` saw a missing path. Deferred (out of scope for this PR): - `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked for a separate focused PR. - Strict `--require-vela-cli` for Windows + mac-x64 beta builds: prematurely blocked — `@powerformer/vela-cli` only publishes the `darwin-arm64` platform binary today; adding the flag elsewhere would fail the builds. Revisit once win/x64/linux binaries ship. * fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ) The new signed-out AMR branch in the catalog preflight at server.ts:10875 calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the const declaration sat ~100 lines below at the outer function scope. Because `const` is TDZ-aware, that branch would have thrown `ReferenceError: Cannot access 'sendAmrAccountFailure' before initialization` for the exact users it tries to help — defeating the original intent. Hoist the helper to just above the AMR preflight block so it's available to every AMR code path in this function. Behavior elsewhere is unchanged. Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch uses packaged built-in Vela for AMR` was creating the `<resourceRoot>/bin/libexec/opencode/` companion directory only, but this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree` also requires the inner `opencode` executable. Add it to that fixture to match the new contract; the test was a sibling of the executables / env-and-detection fixtures already updated in `13fc4f4`. Addresses #2355 review (mrcfps, 2026-05-28). * feat(web): add hover cancel for AMR login (#3158) * feat(web): add hover cancel for AMR login * fix(web): don't bounce AmrLoginPill back to 'Signing in…' after local cancel Both codex-connector (P2) and looper (CHANGES_REQUESTED) on this PR flagged the same race in the new local-cancel path: `handleCancelLogin` dispatches `notifyAmrLoginStatusChanged('login-canceled')` immediately after `/login/cancel` returns, but the `AMR_LOGIN_STATUS_EVENT` listener unconditionally re-enters `refresh()` and then restarts polling whenever `/api/integrations/vela/status` still reports `loginInFlight: true`. That is a real race because the daemon's `cancelVelaLogin()` only sends SIGTERM (escalating to SIGKILL after `LOGIN_CANCEL_KILL_GRACE_MS` = 2000 ms) and keeps the child in `activeLoginProcs` until it actually exits — so the first `/status` read after a successful cancel can legally still come back as in-flight. Under that window the pill flips back to 'Signing in…' and can later surface the timeout/error path even though the user already canceled, defeating the behavior promised in the PR description. Fix the listener instead of every dispatch site: in the `login-canceled` branch, after the local reset (stopPolling + setPending(null) + clear refs), optimistically mark every subscribed pill instance as not-in-flight (`setStatus((c) => c ? { ...c, loginInFlight: false } : c)`) and `return` — skip the refresh-and-reconcile branch below entirely. The next explicit refresh (component mount, user interaction, or a `status-changed` event) will pick up the daemon's confirmed state once the child has actually exited. Add a focused regression test that holds `/api/integrations/vela/status` at `loginInFlight: true` even after a successful `/login/cancel`, asserting that the pill stays at the Canceled → Authorize sequence and never bounces back to 'Signing in…'. This test fails on the pre-fix listener and passes on the new behavior; existing 'cancels an in-flight AMR sign-in…' and 'reconciles late AMR browser completion to Signed in after local cancel' tests continue to pass. Addresses review feedback on #3158 (chatgpt-codex-connector, nettee). --------- Co-authored-by: lefarcen <935902669@qq.com> --------- Co-authored-by: a1chzt <chizblank@gmail.com> Co-authored-by: Amy <1184569493@qq.com> Co-authored-by: Mason <jinmeihong0201@gmail.com> Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com> Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-28 05:09:55 +00:00
open-design-bot[bot]	4ddb8f9560	Update docs/assets/github-metrics.svg (#3075 ) Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>	2026-05-27 07:10:46 +00:00
mehmet turac	d70070fcbc	skills: add research decision room (#2949 ) * skills: add research decision room * skills: align research room example contract	2026-05-26 15:01:37 +00:00
Amy	5563e7eca6	test: expand home entry and html preview coverage (#2992 ) * test: cover entry topbar and hero flows * test: expand entry and html preview coverage * test: isolate mocked github stars in home entry e2e Generated-By: looper 0.8.1 (runner=fixer, agent=codex) * chore: retrigger CI for PR 2992	2026-05-26 14:48:35 +00:00
lefarcen	7312c64580	ci(landing): split landing deploy into staging gate + manual production (#2994 ) * ci(landing): split landing deploy into staging gate + manual production A merge to `main` previously published the landing page straight to production (open-design.ai) via `landing-page-deploy`. There was no buffer to review the rendered site, so a bad merge was live instantly. Split deploys across two Cloudflare Pages projects so production is only ever reached by an explicit human action: - `landing-page-staging` (push to main) -> staging project `open-design-landing-staging` -> staging.open-design.ai. - `landing-page-production` (manual workflow_dispatch only) -> production project `open-design-landing` -> open-design.ai. Only this workflow names the production project; gate it with required reviewers on the `production` GitHub environment. - `landing-page-ci` now also deploys a per-PR preview into the staging project (`--branch=pr-<n>`) for same-repo branches and comments the URL. Fork PRs (no secrets / read-only token) skip the deploy and keep just the build validation. Path filters already scope this to landing edits. Decouple search-engine indexing from staging: - `blog-indexing-on-deploy` now triggers on `landing-page-production` (not every main push), so the test environment is never submitted to Google/IndexNow. - It diffs from a new `blog-indexed-prod` tag (the last indexed prod commit) instead of `HEAD^`, and force-advances the tag after a successful run, so a manual promotion bundling several merged posts indexes all of them rather than only the last commit. Staging and PR-preview builds drop `PUBLIC_GA_MEASUREMENT_ID` so test traffic does not pollute the production GA property. * ci(landing): keep staging + PR previews out of the search index staging.open-design.ai mirrors production and is exposed via cert transparency logs, so search engines can discover it. Indexing the mirror competes with open-design.ai for the same content. Emit `<meta name="robots" content="noindex, nofollow">` whenever OD_LANDING_NOINDEX=1, and set that flag on the staging and PR-preview builds (production leaves it unset and stays indexable). noindex is used rather than a robots.txt Disallow so crawlers can still fetch the page and read both the tag and the canonical, which already points at the production origin. * fix(landing): make staging noindex actually take effect The previous commit read `process.env.OD_LANDING_NOINDEX` directly in `seo-head.astro`, but `.astro` frontmatter is transformed by Vite and does not see process.env, so the meta never rendered. Two fixes: - Inject the flag as the compile-time constant `__OD_LANDING_NOINDEX__` via `vite.define` in astro.config.ts (config runs in Node and can read process.env); SeoHead consumes that constant. - The homepage (`index.astro`) and `og.astro` build their own <head> and never use SeoHead, so a per-component meta can miss pages. Add an `astro:build:done` integration that appends a catch-all `/* X-Robots-Tag: noindex, nofollow` to the Cloudflare Pages `_headers` on staging/preview builds, covering every response (homepage, assets, any custom-head page) at the HTTP layer. Production builds leave `_headers` untouched. Verified: build with OD_LANDING_NOINDEX=1 emits the _headers block and the SeoHead <meta>; build without the flag emits neither; astro check clean. * fix(landing): address review — pin prod checkout to main, defer index pointer Two blockers from review: - landing-page-production: workflow_dispatch can be launched from any ref via the Actions "Use workflow from" dropdown, so an operator could ship an arbitrary branch to open-design.ai. Pin the checkout to `ref: main` so the deployed artifact always equals reviewed main. - blog-indexing-on-deploy: the `blog-indexed-prod` pointer was advanced right after sitemap submission, before Inspect / Search Analytics / Render status / Open status PR. A failure in any of those still moved the pointer, so the next production run skipped those posts. Move the advance to the very end, gated on `success()`, so a failure leaves the tag in place and the range is re-processed next run (submissions are idempotent). * fix(landing): gate production promotion to the main ref only Follow-up to the production-path review note: pinning checkout to main fixed the deployed content, but the workflow was still dispatchable from any ref, which records a non-main production run and would dodge blog-indexing's `workflow_run` `branches: [main]` filter. Gate the whole job on `github.ref == 'refs/heads/main'` so a dispatch from any other branch/tag is skipped outright.	2026-05-26 14:05:04 +00:00

1 2 3 4 5

208 Commits