Commit Graph

208 Commits

Author SHA1 Message Date
Aadi Jai Gupta
49a1c7bd79 fix daemon opencode headless permissions (#4957) 2026-07-02 08:49:44 +00:00
Amy
56c410e9ef [codex] strengthen daemon diagnostics coverage (#4948)
* test: expand automation coverage gaps

* test: cover additional automation gaps

* test: strengthen daemon diagnostics coverage

* fix(e2e): align media provider project modal helper
2026-06-30 16:25:17 +00:00
kokisanai
54de349c47 fix: replace expired Discord invite (#4849)
Co-authored-by: koki yanlai xu <koki@kokideMacBook-Air.local>
2026-06-29 15:00:45 +00:00
open-design-bot[bot]
dbd280610e docs(readme): refresh contributors wall (#4832)
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-06-29 07:16:33 +00:00
xne998808-ai
23704a9d54 feat(landing-page): add 5 design-agent alternative comparison pages (Stitch, Genspark, Figma Make, Qoder, Trae) (#4853)
* feat(landing-page): add 5 design-agent alternative comparison pages

Add /alternatives/{stitch,genspark,figma-make,qoder,trae}/ pages comparing
Open Design with five AI design/coding tools. Each page carries deeply
researched, fact-specific copy (pricing, usage limits, lock-in, honest
trade-offs), real competitor screenshots plus current Open Design product
screenshots, a feature table, and a decision matrix.

Rebuild the shared alternative-detail renderer onto an editorial band layout
(full-width alternating bands, one unified content column, dark closing CTA
band) so every /alternatives/ page is visually consistent. Render list and
table block text through set:html so inline <b> emphasis works.

Localise all five pages into the 11 landing locales (machine-translated then
native-audited) and re-shard into alternatives-i18n.part-*. Wire the new slugs
into the /compare/ hub and the footer. Save the reusable current Open Design
UI screenshots under docs/screenshots/current-2026-06/.

* fix(landing-page): address review on alternatives layout

- clip page-level horizontal overflow on .v4 pages so the 100vw full-bleed
  shell no longer jitters when a vertical scrollbar is present
- port the FAQ accordion refinements (border, padding, type scale) under .v4
  using the landing token set, since the rich wrapper moved off .solution-page

---------

Co-authored-by: Joey <276262049+xne998808-ai@users.noreply.github.com>
2026-06-28 13:17:43 +00:00
open-design-bot[bot]
59214704f6 docs(readme): refresh contributors wall (#4719)
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-06-26 13:55:47 +00:00
PerishFire
273bacfc78 [codex] Add design-system tracking analytics (#4783)
* feat(analytics): design-system tracking foundation (project_kind, entry source, brand picker)

Implements the first slices of the design-system tracking spec:

- §1 project_kind: brand-extraction backing projects now report
  project_kind=design_system so DS-project runs (creation + edits)
  drill down cleanly instead of collapsing into 'brand'.
- U3/U4: report design_system_source from the real selection source
  (request->user_selected, plugin->template_inherited, project->
  project_saved, app-default->default) instead of the hard-wired
  unknown/not_applicable; add design_system_kind (official/custom,
  derived from the user:<id> shape) and design_system_slug (official).
- §3.1 create entry source: thread the real entry (onboarding /
  design_systems_page / composer_picker / home_card / library /
  project_canvas) from each navigate() call site into the create
  page_view and create_result, replacing the binary heuristic.
- C7-C9 preset-brand picker: instrument the "Start from a brand"
  trigger click, picker surface_view, and brand_pick with
  preset_brand_category (never the domain).

Contracts: add TrackingDesignSystemKind, TrackingDesignSystemEditSurface,
DesignSystemsPresetBrandPicker{Click,SurfaceView}Props; extend
RunCreatedProps with design_system_kind/slug/edit_surface and both
entry_from enums with composer_picker/project_canvas/library.

Adds docs/design-system-tracking-spec.md.

Verified: contracts build + 170 tests, web typecheck + full suite (3463),
daemon typecheck + run-analytics/brand-extraction tests (70).

* feat(analytics): track AI-optimize (design_system_enrich) click

C13 (tracking spec §3.3): instrument the "AI Optimize" banner CTA on a
programmatically-extracted DS-as-project so the AI-conversion rate
(clicked ÷ programmatic creates) is visible.

Contracts: add design_system_enrich area, DesignSystemEnrichClickProps,
DesignSystemEnrichResultProps, and the design_system_enrich_result event
name (result event reserved for the C14 follow-up).

Adds an implementation-status table to the spec. C14 (enrich_result) and
C15 (ProjectMetadata.enrichmentStatus) remain follow-ups since they need
correlation with the async enrichment run's completion.

Verified: contracts build, web typecheck, brand-enrichment + analytics
tests (89).

* feat(analytics): report all design-system create sources (ds_source_origins)

Comment ②: multi-source creates were flattened to a single `mixed`
design_system_source, hiding which sources combined. Add a comma-joined
`ds_source_origins` (e.g. `source_url,local_code`) on create_result that
lists every source actually used, keeping the singular field for
back-compat. Mirrors the target_platforms/connectors multi-value convention.

Verified: contracts build, web typecheck.

* feat(analytics): distinguish onboarding build-DS vs skip-to-home (C2)

C2 (tracking spec §3.1): the final onboarding step forks into "Build a
design system" and going straight Home, but both reported
completion_type=completed_without_design_system, so the skip rate was
invisible. Parameterize runOnboardingCompletion so the build fork reports
completed_with_design_system and the home forks report
completed_without_design_system. Update the EntryShell onboarding test,
which clicked "Build a design system" yet asserted the without value.

Verified: web typecheck, EntryShell onboarding tests (24).

* feat(analytics): instrument design-system edit lifecycle (E1-E3)

Tracking spec §3.4/§3.6:

- E3 (direct module edits, ui_click area=design_system_edit, all carrying
  edit_surface=direct_module + artifact_kind=design_system + the DS id):
  DesignSystemsTab general ops (edit-with-agent / refresh / download / reset),
  DesignKitView module buttons via the parent handlers (logo/font/image
  upload incl. paste, color edit, logo/image delete, design_md edit-save),
  and the BrandPreviewCard three actions (use-in-chat / open-project / delete).
- E1 (agent-routed edits): a DS-project run editing an EXISTING design system
  now carries edit_surface — comment/mark from their entry_from, otherwise
  chat. First-generation runs get none.

Contracts: DesignSystemEditClickProps (+ union); reuses the
TrackingDesignSystemEditSurface added earlier and RunCreatedProps.edit_surface.

Not wired (enum reserved / no handler today): kit_import, kit_open (external
link in the stateless kit view), design_md_copy/upload.

Verified: contracts build; web + daemon typecheck; DS/Brand/Kit web tests
(89) + daemon run-analytics tests (44).

* feat(analytics): AI-optimize result + enrichment status (C14/C15)

Closes the programmatic-vs-ai_refined comparison gap (tracking spec §6):

- C14: the AI-optimize run is tagged (analyticsHints.dsEnrichment) from the
  ProjectView banner CTA; when it settles, the daemon emits
  design_system_enrich_result (result / design_system_id / project_id /
  run_id / duration_ms / error_code).
- C15: on a successful enrichment run the daemon flags the backing project
  metadata enrichmentStatus='ai_refined' + enrichmentCompletedAt, so the two
  DS cohorts can be split for retention/usage analysis.

Contracts: ProjectMetadata.enrichmentStatus/enrichmentCompletedAt; reuses
the design_system_enrich_result event + DesignSystemEnrichResultProps added
earlier. ProjectChatSendMeta gains dsEnrichment.

Verified: contracts build; web + daemon typecheck; daemon run-analytics +
brand-extraction tests (70); web brand-enrichment tests.

* fix(contracts): declare dsEnrichment hint in ChatAnalyticsHints

The daemon run route reads analyticsHints.dsEnrichment and ProjectView's
AI-optimize path sends it, but the shared ChatAnalyticsHints DTO did not
declare the flag — leaving the web/daemon HTTP shape ahead of the contract
layer so typed ChatRequest callers could not discover or type-check the hint
that drives design_system_enrich_result and the ai_refined metadata stamp.

Declare dsEnrichment?: boolean with analytics-only semantics and add a
request-shape test that compile-fails if the field is dropped again.

Addresses review feedback on PR #4740.

* chore: ignore .playwright-mcp capture output

Local Playwright MCP runs write YAML page snapshots and PNG screenshots to
.playwright-mcp/. These are ad-hoc local visual-check output (including local
UI/session state), not a maintained test fixture, and per the root AGENTS.md
local runtime data must stay out of git. Add the ignore rule so the directory
is never accidentally staged again.

* fix: repair AI optimize enrichment tracking

Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex)

* fix: track design kit module edit clicks

Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex)

* fix: stabilize project P0 catalog loading

Stub unrelated catalog endpoints in the scoped project workspace and runtime Playwright flows so the P0 checks assert their target behavior instead of waiting on large registry responses.\n\nGenerated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex)

* fix: guard design-system tracking repairs

Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex)

---------

Co-authored-by: free666799 <293857035+free666799@users.noreply.github.com>
Co-authored-by: Looper <looper@noreply.github.com>
2026-06-26 02:57:18 +00:00
Sid
48695090f8 Make WSL agent setup unambiguous for MCP installs (#4655)
Windows users running agent CLIs inside WSL were landing on the native Windows guide and copying the README MCP command directly, which left Linux /usr/bin/od shadowing, daemon origin, Node ABI, and Codex config parse failures unexplained. Add a WSL2 guide and teach the Codex config normalizer to drop nested feature tables that current Codex parses as maps instead of boolean flags.

Constraint: WSL agent CLIs need the same Linux environment for the wrapper, daemon, Node modules, and credentials

Rejected: Docs-only workaround | would leave daemon-launched Codex runs failing on nested features tables

Confidence: high

Scope-risk: narrow

Directive: Keep WSL-specific guidance separate from native Windows PowerShell troubleshooting

Tested: pnpm --filter @open-design/daemon exec vitest run -c vitest.config.ts tests/codex-config-normalize.test.ts

Tested: pnpm --filter @open-design/daemon typecheck

Tested: pnpm guard

Tested: git diff --check

Not-tested: Manual WSL2 end-to-end MCP install on a Windows host

Related: Fixes #4648
2026-06-23 09:56:22 +00:00
open-design-bot[bot]
f94e887d65 docs(readme): refresh contributors wall (#4653)
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-06-23 08:55:14 +00:00
PerishFire
0bf1b6d6b8 [codex] converge release workflows and stable dry-runs (#4390)
* fix(tools-pack): use junctions for Windows standalone peer deps

* fix(desktop): expose IPC during startup

* fix(tools-pack): preserve Windows inspect diagnostics

* fix(tools-pack): report Windows inspect status errors

* fix(packaged): use Electron net fetch for app protocol

* fix(packaged): load Windows renderer from web sidecar

* fix(desktop): show Windows packaged window during startup

* fix(packaged): disable Windows GPU startup

* fix(tools-pack): keep Windows core smoke observable

* fix(packaged): remove Windows startup probes

* fix(tools-pack): trace Windows desktop IPC status

* fix(tools-pack): add Windows IPC diagnose loop

* fix(release): default beta-s Windows updater feed

* chore: clean merged test eof

* refactor(release): unify prerelease channel model

* chore(release): close prerelease doc escape hatches

* refactor(release): converge release channel workflows

* fix(release): install toolchain in metadata jobs

* fix(release): build release package before contracts

* chore(release): bump development version to 0.10.1

* fix(e2e): seed windows packaged smoke runtime config

* fix(release): install toolchain for metadata publish

* fix(release): materialize betas metadata checkout

* chore(release): bump development version to 0.10.2

* fix(release): allow betas metadata cold start from s3

* fix(e2e): support betas packaged update scenarios

* fix(release): pass betas channel into packaged smoke

* fix(release): set betas channel during self-hosted builds

* fix(release): verify counted channel reservations

* fix(release): use pnpm cmd for betas windows publish

* fix(release): add betas manifest artifact fallback

* fix(release): skip beta-s public metadata fetch

* fix(release): read beta-s manifests from storage

* fix(release): cache beta windows tools-pack builds

* fix(release): inline beta mac tools-pack builds

* fix(pack): deep sign unsigned mac bundles

* docs(pack): document payload-first beta updater validation

* fix(release): align preview tools-pack cache flow

* fix(release): align prerelease tools-pack cache flow

* fix(release): pass github token to prerelease metadata

* fix(release): setup pnpm before feishu notify

* fix(release): add stable dry-run prepublish flow

* fix(release): accept completed prerelease metadata gate

* fix(release): require stable release branches

* fix(release): converge r2 access checks

* fix(updater): use release channel parser for defaults

* fix(updater): harden windows payload relaunch

* fix(release): converge updater smoke fixture contract

* test(e2e): require silent updater fixture output

* fix(release): align stable windows smoke build path

* fix(ci): include release workspace in validation

* fix(ci): repair release validation lanes

Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex)

* fix(ci): restore zero-install Feishu notification

Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex)

---------

Co-authored-by: Looper <looper@noreply.github.com>
2026-06-23 06:13:21 +00:00
PerishFire
fa544f2836 [codex] Prune unused automation and repair metrics publishing (#4612)
* chore(workflows): prune unused automation

* chore(workflows): update github app token action

* chore(workflows): use github app client id
2026-06-23 04:01:49 +00:00
xne998808-ai
91e5207d72 docs(readme): fix 404 agent-install command, use od mcp install <agent> (#4649)
The "install into your coding agent" sections pointed users at
`curl -fsSL https://open-design.ai/install.sh | sh -s <agent>`, but that
URL returns 404 — the bootstrap script was never published, and a
standalone `od` CLI is not yet distributed. Every README (English + 12
i18n) shipped a copy-paste command that fails.

Replace it with the real, shipped command `od mcp install <agent>`,
matching the Platform Compatibility table already present in each file
("Once OD is installed, a single `od mcp install <agent>` ..."). The
surrounding "one-line install" comment stays accurate and the
install-OD-first prerequisite is covered by that table.

Co-authored-by: Joey <276262049+xne998808-ai@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 00:19:59 +00:00
PerishFire
d3ebed564c [codex] Update Open Design X account links (#4611)
* fix: update Open Design X account links

* test(web): assert settings X link target

Generated-By: looper 0.9.10+codex.autoclean (runner=fixer, agent=codex)

---------

Co-authored-by: Siri-Ray <2667192167@qq.com>
Co-authored-by: Looper <looper@noreply.github.com>
2026-06-22 09:04:49 +00:00
Nagendhra Madishetti
37d52288b0 docs(windows): explain the SmartScreen 'unknown publisher' installer warning (#4554)
New Windows users hit a blue Defender SmartScreen dialog the first time they
run the packaged installer, with Run anyway hidden behind More info. The
Windows troubleshooting guide only covered the dev/source setup, so there was
no answer for the very first thing an end user sees. Add a section that
explains why the warning appears (unsigned build, not a threat), how to
proceed (More info then Run anyway), and how to verify the download source and
SHA-256 checksum first.

Co-authored-by: Nagendhra <nagendhra405@gmail.com>
2026-06-22 05:18:39 +00:00
open-design-bot[bot]
c958cdb7ae docs(readme): refresh contributors wall (#4560)
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-06-22 05:18:39 +00:00
Marc Chan
a0412c12e6 fix(docker): add opt-in API auth disable flag (#4541)
* fix(docker): add opt-in API auth disable flag

Generated-By: looper 0.9.10 (runner=worker, agent=opencode)

* docs(docker): add .env setup to beginner guide

Generated-By: looper 0.9.10 (runner=fixer, agent=opencode)
2026-06-19 03:24:14 +00:00
Carson Yang
11fb1ad6f0 docs: add Sealos deployment option (#4472)
* docs: add Sealos deployment option

* docs: update Sealos deploy slug

* docs: point Sealos badge to app store page

* docs: add Sealos deploy section to localized READMEs

* docs: clarify Sealos authentication guidance
2026-06-18 06:38:10 +00:00
PerishFire
c782aeb3bb ci: stop duplicate post-merge validation (#4469) 2026-06-17 10:04:03 +00:00
kokisanai
b02d20e6b9 fix: unify Discord invite links (#4452)
Co-authored-by: koki yanlai xu <koki@kokideMacBook-Air.local>
2026-06-17 08:25:16 +00:00
Amy
8ff19c6b41 [codex] Add main-risk smoke and E2E coverage (#4322)
* test(e2e): cover daemon reload artifact recovery

* test(e2e): expand critical flow coverage

* test(daemon): add startup and recovery smoke coverage

* test(e2e): add visual and desktop chrome smoke coverage

* ci: gate packaged onboarding smoke to nightly

* fix(web): repair artifact replay type narrowing

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(ci): repair preflight smoke coverage checks

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(web): tighten artifact recovery polling

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(web): keep error recovery retries alive

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(e2e): relax brittle UI P0 assertions

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(web): recover pointer artifact outputs

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(e2e): stabilize manual edit reload smoke

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(e2e): avoid noisy win onboarding cleanup

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(web): stop failed artifact recovery polling

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(daemon): isolate startup smoke data root

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(web): recover standalone html artifacts

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(web): guard pointer artifact recovery by run time

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(ci): narrow strict visual gate to critical smoke

Generated-By: looper 0.9.7 (runner=fixer, agent=codex)

* fix(ci): limit PR visual capture to critical cases

* fix(ci): keep PR visual capture complete

* fix(web): keep reload artifact recovery retryable

* fix(ci): clarify PR visual gate layering

* fix(e2e): stabilize visual home P2 capture

* fix(e2e): align visual gallery detail clicks

* fix(e2e): preserve packaged install during onboarding cleanup

* fix(ci): shard PR visual capture

* fix(ci): remove duplicate nix validation from ci

* fix(ci): keep visual capture screenshots flat

* fix(ci): point nix hash autofix at ci-nix

* fix(ci): stabilize PR P0 recovery gate

* fix(web): ignore stale artifact pointers during recovery

* fix(web): reset BYOK model after clearing key

* ci: run critical UI P0 subset on PRs

* ci: combine PR UI P0 critical shards

* ci: combine PR visual capture shards

* ci: preserve visual capture shard manifests

* fix(web): clear ProjectView delayed timers on unmount

* test(e2e): harden onboarding connect gate coverage

* test(e2e): align AMR auth recovery and visual catalog timeout
2026-06-17 16:01:53 +08:00
Marc Chan
7abb7888df fix(deploy): align Docker defaults with GHCR releases (#4327)
* fix(deploy): align Docker defaults with GHCR releases

Generated-By: looper 0.9.9 (runner=worker, agent=opencode)

* fix(ci): publish stable Docker tags from release workflow

Generated-By: looper 0.9.9 (runner=fixer, agent=opencode)

* fix(ci): fold reusable workflow guard expression

Generated-By: looper 0.9.9 (runner=fixer, agent=opencode)

* fix(ci): gate Docker release publish

Generated-By: looper 0.9.9 (runner=fixer, agent=opencode)

* fix(ci): publish stable Docker tags after release

Generated-By: looper 0.9.9 (runner=fixer, agent=opencode)

* fix(ci): guard Docker latest tag enable expression

Generated-By: looper 0.9.9 (runner=fixer, agent=opencode)

* fix(deploy): update Helm chart GHCR defaults

Generated-By: looper 0.9.9 (runner=fixer, agent=opencode)

* fix(ci): publish latest from release workflow inputs

Generated-By: looper 0.9.9 (runner=fixer, agent=opencode)
2026-06-17 12:25:18 +08:00
PerishFire
a0afc584bb [codex] centralize daemon data directory docs (#4222)
* docs: centralize daemon data directory contract

* fix(e2e): allow slower artifact consistency navigation

Generated-By: looper 0.9.5 (runner=fixer, agent=codex)

* docs: localize daemon data directory pointers

Generated-By: looper 0.9.5 (runner=fixer, agent=codex)

---------

Co-authored-by: Looper <looper@noreply.github.com>
2026-06-15 02:52:05 +00:00
Denis Redozubov
4b3bf91f27 Model orchestrator scratch workspaces (#4263)
* Model orchestrator scratch workspaces

* Address scratch workspace contract review
2026-06-14 09:14:30 +00:00
Lucas Bento
d0cc28d339 feat(daemon): add Amp CLI as a coding-agent adapter (#3861)
* feat(daemon): add Amp CLI as a coding-agent adapter

Amp (ampcode.com) runs headless via `amp -x --stream-json`, which emits
the Claude Code-compatible stream JSON format the daemon already parses,
with the prompt delivered over stdin. Add it as a coding-agent adapter
reusing the claude-stream-json parser; the model picker maps to Amp's
agent --mode (smart/deep/rush).

- new ampAgentDef + registry entry + install metadata
- web agent label/alias
- replay mock (renders via the claude renderer)
- docs adapter-catalog row + README CLI count (21 -> 22)
- adapter buildArgs/stream-format test

* fix(daemon): order Amp adapter at P2, not first-run default

Move ampAgentDef next to copilot (the other P2 local headless CLI adapter)
instead of the head of BASE_AGENT_DEFS. The daemon preserves AGENT_DEFS
order in /api/agents and the web app fills an empty config with
agents.find((a) => a.available), so listing Amp first made it the default
engine ahead of AMR/Claude/Codex for any user with amp on PATH and no
persisted agentId — inconsistent with its P2 classification. Reordering
restores the prior first-run default. Addresses PR review feedback.

---------

Co-authored-by: Siri-Ray <2667192167@qq.com>
2026-06-11 15:17:07 +00:00
elihahah666
1a10fa2600 docs(readme): announce 0.10.0 and add a permanent AMR entry across all READMEs (#4154)
* docs(readme): announce 0.10.0 and add AMR entry across all READMEs

* docs(readme): link Gemini CLI and Kimi CLI rows in agent compatibility table

* docs(readme): expand AMR acronym to Open Design Model Router in the AMR blockquote

* docs(readme): rename nav AMR link to Model Router and correct acronym to Agentic Model Router

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-06-11 14:37:53 +00:00
lefarcen
6869b1208b fix(plugins-home): correct deck/scroll preview capture + smoother gallery playback (#4044)
* fix(bake): classify deck-vs-scroll by viewport, real-wheel pan, motion config

Systematic audit of all 126 baked previews surfaced three capture bugs:

- 2 vertical pages misread as decks (the input probe wheel-scrolled them and the
  scroll-driven animation looked like a slide change), so they got walked
  sideways. Classify by viewport height instead: a fixed-viewport page is a deck,
  a vertically-scrollable page is a landing page (pan it) even with a horizontal
  marquee/carousel sub-component.
- 9 scroll-hijack landing pages (custom/transform scroll) that window.scrollTo
  can't move, so the pan was static. Pan those with REAL wheel events
  (page.mouse.wheel), which drive the page's own scroll handler.
- single-screen pages now hold (static) instead of being forced down a deck path.

Plus an opt-in override: authors can declare od.preview.motion ('scroll' | 'deck'
| 'static') and the bake honors it, auto-detecting only when it's absent. Schema
+ plugins-spec document the field. (Also strips a stray NUL byte from the hash
line that made the file read as binary.) BAKE_VERSION -> 4 re-bakes everything.

* perf(plugins-home): only decode visible gallery clips + stream first frame

Two cheap wins for the baked gallery videos:

- Decouple mount from play. The tile mounts the <video> across the wider inView
  margin (so scroll-in/hover never remounts + reloads), but only PLAYS while
  truly visible — off-screen tiles in the mount margin hold their poster frame
  paused instead of all running a simultaneous decode. Adds a 0-margin visible
  observer in PreviewSurface alongside the existing near one.
- preload=metadata instead of auto: paints the first frame off the +faststart
  header instead of eagerly buffering the whole clip up front, so tiles show fast
  and don't saturate the network. The idle hold buffers the pan before hover.

* perf(plugins-home): keep baked clips mounted across a scroll window

Scrolling a tile out of view and back re-showed a load even though the clip
bytes are HTTP-cached (R2 immutable): the <video> unmounted at the tight 120px
margin, so scroll-back remounted a fresh element that re-fetches metadata and
re-decodes the first frame. Add a wide keep-mounted observer (~1500/1800px) so a
clip stays mounted for a few screens — instant scroll-back — while iframes keep
the tight margin and play stays gated to the truly-visible zone (paused, not
unmounted, off screen).

* fix(bake,contracts): probe scroll mechanism before recording; validate motion

Address review:
- Move the window.scrollTo probe before Page.startScreencast so its scrollTo
  160 -> 0 jump isn't baked into the head of the pan as a visible lurch.
- Type od.preview.motion in the Zod PluginManifestSchema (enum scroll|deck|
  static) so an invalid value fails doctor/install instead of silently parsing
  via passthrough and being ignored by the bake; add contract test coverage.

* fix(bake): auto-detect single-screen fixed pages as static, not deck

A fixed-viewport page is only a deck if an input actually advances it; probe the
driver during auto-detect and fall back to 'static' (default viewport + a hold)
when nothing moves it, instead of routing every non-scrollable page through the
deck path where walkSlides(null) just held at the deck-sized capture. Extracted
the arrow/wheel probe into probeDeckDriver(). Verified: a waitlist page now bakes
a 2.5s static hold, guizang still walks, acreage still pans.

---------

Co-authored-by: audit <a@b.c>
2026-06-10 05:43:50 +00:00
yinjialu
fcef49b342 feat: add AI Native observability trace diagnostics (#3714)
* docs: spec ai native observability loop

* docs: clarify observability loop gates

Generated-By: looper 0.9.3 (runner=fixer, agent=codex)

* docs: include evaluation runs in score model

Generated-By: looper 0.9.3 (runner=fixer, agent=codex)

* docs: broaden experiment score eligibility

Generated-By: looper 0.9.3 (runner=fixer, agent=codex)

* docs: define observability artifact storage boundary

* docs: expand observability registry rollout

* docs: complete artifact manifest example

Generated-By: looper 0.9.3 (runner=fixer, agent=codex)

* docs: align observability manifest and scoring examples

Generated-By: looper 0.9.3 (runner=fixer, agent=codex)

* docs: define fixed dataset trust gate

* feat: add ai observability trace diagnostics

* docs: clarify object storage issue compatibility

* fix: tighten Langfuse event provenance and attachment caps

Generated-By: looper 0.9.3 (runner=fixer, agent=codex)
2026-06-08 03:43:43 +00:00
chaoxiaoche
eabed76a4c docs: plan design system 2.0 backfill (#3776)
Co-authored-by: chaoxiaoche <chaoxiaoche@chaoxiaochedeMacBook-Pro.local>
2026-06-06 08:02:31 +00:00
elihahah666
67077fd36f chore(docs): move translated docs into docs/i18n/ (#3621)
* chore(docs): move translated docs into docs/i18n/

Collect the translated README/QUICKSTART/CONTRIBUTING/MAINTAINERS files
(including the Korean set) into docs/i18n/, leaving only the English sources
in the repo root so the GitHub project home page file list stays clean.
Rewrite internal links for the new layout (../../ for repo-root resources,
sibling filenames between translations), update both switcher conventions,
the i18n-check mixed-layout support, the contributors-wall workflow globs,
TRANSLATIONS.md guidance, and drop now-dead root translation paths from the
fork-PR docs allowlist.

* fix(docs): correct root-relative links in Korean contribution guide

Prefix repo-root targets (scripts/sync-design-systems.ts, TRANSLATIONS.md,
package.json, .github/pull_request_template.md) with ../../ so they resolve
from the new docs/i18n/ depth; sibling translated docs stay bare.

Generated-By: looper 0.8.1 (runner=fixer, agent=claude-code)

---------

Co-authored-by: qiongyu1999 <2694684348@qq.com>
2026-06-04 06:55:16 +00:00
Vladyslav Ovdeychuk
77fee5fe42 docs: document macOS Docker host networking workaround (#3417)
Co-authored-by: Vladislav Ovdeychuk <ovdeychuk@trueconf.ru>
Co-authored-by: Siri-Ray <2667192167@qq.com>
2026-06-04 04:04:32 +00:00
Amy
8fefebbbaf test(e2e): add priority tiers and main UI alerts (#3574)
* test(e2e): add priority tiers and stabilize p0 coverage

* test(e2e): align restoration artifact reopen path

* test(e2e): stabilize p1 workspace flows

* ci(e2e): run extended UI on main and notify failures

* fix(e2e): repair priority preflight checks

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)

* fix(e2e): restore AMR login pill coverage

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)

* fix(e2e): add full UI test script

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)

* fix(e2e): remove disallowed ui script

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)

* fix(e2e): allow full UI script in guard

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)
2026-06-03 12:08:06 +00:00
yinjialu
c848e53b6c docs: plan run reliability optimizations (#3526) 2026-06-03 09:33:21 +00:00
PerishFire
98767fb302 chore: move safe large assets to R2 (#3503)
* chore: tighten blob guard to 2MiB

* chore: move safe 1MiB assets to R2
2026-06-02 11:10:07 +00:00
PerishFire
c3c356961b chore: move large repository assets to r2 (#3492) 2026-06-02 10:19:12 +00:00
Shivam
3f165b5498 docs(deploy): add Azure Container Instances guide (#3163)
* docs(deploy): add Azure Container Instances guide

* docs(deploy): clarify Azure proxy topology

* docs(deploy): keep Azure proxy streams unbuffered

---------

Co-authored-by: Shivam <shivam2931120@users.noreply.github.com>
2026-06-02 08:14:21 +00:00
Denis Redozubov
3da33f92a1 Harden sandbox orchestration daemon chokepoints (#3420)
* Harden sandbox orchestration chokepoints

* Cover web app public copy in neutrality guard
2026-06-02 07:33:12 +00:00
Dhakshin V
484ec7c664 docs: add Alibaba Cloud (阿里云) deployment guide (#3275)
* docs: add Alibaba Cloud (阿里云) deployment guide

Adds docs/deployment/cloud/aliyun.md with:
- ECS single-machine deployment using the existing Docker Compose stack
- ACK (Kubernetes) reference manifest and multi-replica caveats
- Image acceleration setup via Alibaba Cloud Container Registry
- ICP filing (备案) overview for mainland China public hosting
- Common pitfalls and references to existing Docker / install-guide docs

Docs-only slice for #1025. Live ROS templates, one-click scripts, and
verification screenshots are out of scope here and tracked as follow-up
work in the issue.

* docs: fix ACK manifest reachability and OD_ALLOWED_ORIGINS alias

Address PerishCode review on PR #3275:

- The daemon defaults to OD_BIND_HOST=127.0.0.1 (apps/daemon/src/server.ts),
  so the readinessProbe and ClusterIP Service in the previous manifest could
  never reach the Pod. Add OD_BIND_HOST=0.0.0.0 + OD_API_TOKEN (required by
  the bound-API-token guard for non-loopback binds) and a kubectl secret step.

- The daemon reads OD_ALLOWED_ORIGINS, not OPEN_DESIGN_ALLOWED_ORIGINS.
  OPEN_DESIGN_* names are Compose-only aliases mapped in deploy/docker-compose.yml.
  Use OD_ALLOWED_ORIGINS for the direct-container ACK path and call out both
  names in the network-exposure section and the pitfalls table.

Also adds an Ingress / Bearer-token note for operators fronting the
Service externally.

* docs: document OD_API_TOKEN bearer-token forwarding in Nginx block

PerishCode follow-up review on PR #3275:

The Path A Nginx block as written would 401 every UI call except the
three open probes (/api/health, /api/version, /api/daemon/status). Same
root cause as the ACK fix in 9d5f6ec — the auth model affects both paths,
not just direct-container deployments.

Verified against source:
- deploy/scripts/install.sh:386 always writes a generated OD_API_TOKEN
  into deploy/.env (no opt-out flag).
- deploy/docker-compose.yml:18 requires OD_API_TOKEN (Compose ? syntax)
  and binds OD_BIND_HOST=0.0.0.0, so the daemon-side bearer middleware
  is always active for the Compose path.
- apps/daemon/src/server.ts:3777 keys the loopback short-circuit on
  isLoopbackPeerAddress(req.socket?.remoteAddress) — the TCP peer, not
  X-Forwarded-For — so a reverse-proxied request from a Docker bridge
  IP never gets the localhost bypass.

Adds proxy_set_header Authorization to the Nginx block, a paragraph
explaining where OD_API_TOKEN comes from, and updates the pitfalls row
that previously only mentioned CORS to also list the missing-bearer
cause.
2026-06-02 05:53:06 +00:00
Amy
3083388a1a Add launch review regression coverage (#3300)
* Add main launch review E2E coverage

* Add daemon launch review regression coverage

* Tighten plugin authoring completion regressions

* fix(web): preserve deck slide on preview switches

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)

* Add project detail regression coverage
2026-06-01 14:19:33 +00:00
open-design-bot[bot]
e1f93a2f40 Update docs/assets/github-metrics.svg (#3376)
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-31 14:48:58 +00:00
open-design-bot[bot]
e76eb6da63 Update docs/assets/github-metrics.svg (#3338)
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-30 04:31:16 +00:00
open-design-bot[bot]
482e318afe Update docs/assets/github-metrics.svg (#3267)
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-29 14:12:36 +00:00
lefarcen
da19ff3ca0 feat(mocks): replay-based mock CLIs for 14 of OD's supported agents (opencode/codex/claude/gemini/cursor-agent/deepseek/qwen/grok + ACP family devin/hermes/kilo/kimi/kiro/vibe) (#3241)
* feat(mocks): replay-based mock CLIs for opencode/claude/codex/deepseek/qwen/grok

Drops in a `mocks/` top-level dir that pretends to be the real agent
CLIs by streaming pre-recorded sessions in each CLI's native stdout
protocol. Zero LLM tokens.

## Use cases

- **E2E tests** in `apps/daemon/tests/` — exercise the full chat-server
  pipeline against a known trace, assert UI events / artifacts.
- **Self-validation during dev** — iterate on `claude-stream.ts` /
  `json-event-stream.ts` parser changes without burning provider budget.
- **Regression harness** — replay the same trace before and after a
  charter / parser change; diff the daemon events the UI surfaces.
- **Demo / onboarding** — show what a 17-tool claude editing session
  looks like end-to-end, offline.

## How

- 6 bash wrappers (`mocks/bin/`) shadow the real CLIs when PATH-overlaid.
- `mocks/mock-agent.mjs` reads `mocks/recordings/<trace>.jsonl`, picks
  one via env var (`SYNCLO_EXPLORE_MOCK_TRACE` / `_POOL` /
  `_BY_PROMPT_HASH`), streams the trace in the requested format.
- Each format renderer matches the EXACT JSON shape the OD daemon
  parser expects, verified line-by-line against
  `apps/daemon/src/{json-event-stream,claude-stream}.ts`:

  | CLI                       | streamFormat              | parser source                              |
  | ------------------------- | ------------------------- | ------------------------------------------ |
  | `opencode`                | `json-event-stream`       | `handleOpenCodeEvent`                      |
  | `codex`                   | `json-event-stream`       | `handleCodexEvent`                         |
  | `claude`                  | `claude-stream-json`      | `createClaudeStreamHandler`                |
  | `deepseek` `qwen` `grok`  | `plain`                   | `server.ts` (raw stdout)                   |

## Quick start

```bash
export PATH="$PWD/mocks/bin:$PATH"
export SYNCLO_EXPLORE_MOCK_TRACE=04097377   # 8-char prefix OK
export SYNCLO_EXPLORE_MOCK_NO_DELAY=1

echo "any prompt" | opencode run
echo "any prompt" | claude -p --output-format=stream-json
echo "any prompt" | codex exec
```

The mock binary announces the picked trace id on stderr:
`[mock-opencode] picked 04097377… via fixed`.

Recording selection (env, in priority order):
- `SYNCLO_EXPLORE_MOCK_TRACE=<id>` — fixed (prefix OK)
- `SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH=1` + stdin prompt — `sha256(prompt) % N`
- `SYNCLO_EXPLORE_MOCK_POOL=<tag>` — random within `agent:claude` /
  `skill:agent-browser` / `outcome:failed` / etc.
- (default) uniform random
- `SYNCLO_EXPLORE_MOCK_SEED=<str>` — reproducible "random"
- `SYNCLO_EXPLORE_MOCK_NO_DELAY=1` — skip inter-event waits

## Dataset

179 anonymized Langfuse traces from this project's own production
telemetry:

- 9 agents: claude 57 · opencode 41 · codex 38 · gemini 25 ·
  cursor-agent 11 · qwen 2 · copilot 2 · deepseek 2 · antigravity 1
- outcomes: succeeded 144 · failed 35
- skills: default 71 · ad-creative 50 · algorithmic-art 30 ·
  agent-browser 22 · video-hyperframes 2 · plus magazine-web-ppt /
  brainstorming / data-report / penpot-flutter-design-source 1 each
- 124 multi-turn (sessions with ≥2 turns)
- 18 produce `<artifact>` output
- ~4.5 MB on disk total

Anonymization: `/Users/<name>/` → `${HOME}/`,
`C:\Users\<name>\` → `%USERPROFILE%\`, project UUIDs →
stable `proj-001`, `proj-002`, …. Tool input/output payloads
preserved verbatim (templated UI, no cell-level PII).

## Smoke test

`bash mocks/scripts/smoke-test.sh` — 6 checks across all 6 agents.
All pass on this branch (verified locally):

```
  ✓ opencode first event = step_start
  ✓ codex first event = thread.started
  ✓ claude first event = system
  ✓ deepseek emitted plain text (144 chars on first line)
  ✓ qwen emitted plain text (144 chars on first line)
  ✓ grok emitted plain text (144 chars on first line)
All mock CLIs working. 
```

## Adding more recordings

The exporter that produced this set lives in
[nexu-io/agent-pr-explore](https://github.com/nexu-io/agent-pr-explore)
(see `cli/src/local/orchestrator/langfuse-import.ts` + the `local
langfuse-import` CLI command). Operators with the Langfuse keys can pull
more by tag / outcome / artifact / multi-turn filter, then run
`local recordings anonymize --out-dir ~/Documents/open-design/mocks/recordings`.
`mocks/README.md` has the full instructions.

## Out of scope (follow-ups)

- **ACP agents** (`devin`, `hermes`, `kilo`, `kimi`, `kiro`, `vibe`) need
  a JSON-RPC server on stdio rather than a one-shot stream — separate
  `format-acp.mjs` module not yet written.
- **Per-agent json-event-stream variants** (`cursor-agent`, `gemini`,
  `qoder`, `copilot`, `pi`) currently fall back to the `plain` renderer;
  their parsers are in `apps/daemon/src/json-event-stream.ts` and follow
  the same template as `format-codex.mjs`.

## AGENTS.md updates

- Added `mocks/` to the top-level content directories listing
- Added a Validation strategy bullet pointing here for agent-stream /
  parser changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mocks): add opencode-cli/kiro-cli/vibe-acp bin aliases and unref ACP timeout

- Add mocks/bin/opencode-cli, kiro-cli, vibe-acp wrappers for the primary
  RuntimeAgentDef bin names OD resolves before any fallback. Without these,
  a PATH-overlaid OD daemon run bypasses the mock entirely (opencode-cli,
  kiro-cli) or cannot find the mock at all (vibe-acp, which has no fallback).
- Include opencode-cli, kiro-cli, vibe-acp in the smoke-test ACP/JSON loop
  so coverage is verified end-to-end.
- Call .unref() on the 30s safety timeout in format-acp.mjs so a completed
  ACP session exits promptly instead of waiting the full 30 seconds.

Generated-By: looper 0.9.2 (runner=fixer, agent=claude-code)

* feat(mocks): add vela (AMR) — login / models / ACP with strict set_model gate

Extends mocks/ to cover OD's own AMR runtime. `vela` is the bin name
`apps/daemon/src/runtimes/defs/amr.ts` specifies (`bin: 'vela'`,
`streamFormat: 'acp-json-rpc'`). It's richer than the generic ACP
agents — covers full login + models + chat-session lifecycle.

### What vela does (mirrored from apps/daemon/tests/fixtures/fake-vela.mjs)

1. `vela login` — writes ~/.amr/config.json with a fake profile (controlKey,
   runtimeKey, user{email,name,plan}, profile-specific apiUrl/linkUrl).
   The on-disk projection is what OD's daemon login route + AmrLoginPill
   poller read; production goes through device-auth, the mock skips
   straight to the file write.

2. `vela models` — prints the production-shaped public model catalog as
   newline-separated `public_model_*    vela` lines. Override via
   FAKE_VELA_MODELS env.

3. `vela agent run --runtime opencode` — ACP JSON-RPC server with three
   vela-specific protocol extensions:

   a. `initialize` response carries `agentCapabilities`
      (`promptCapabilities.embeddedContext`) + `models`
      (`currentModelId` + `availableModels`).
   b. `session/new` response carries the same `models` block.
   c. **Strict set_model gate**: `session/prompt` is rejected with
      JSON-RPC -32602 ("session/set_model must be called before
      session/prompt") UNLESS `session/set_model` (or
      `session/set_config_option`) has been called for the current
      sessionId. Mirrors real vela 0.0.1 contract; catches regressions
      in `attachAcpSession` that silently skip set_model.

### Error injection envs (in sync with fake-vela.mjs)

  FAKE_VELA_SESSION_ID            - sessionId returned by session/new
  FAKE_VELA_TEXT                  - override assistant text
  FAKE_VELA_THOUGHT               - optional thought_chunk before text
  FAKE_VELA_SESSION_NEW_ERROR     - fail session/new
  FAKE_VELA_SET_MODEL_ERROR       - fail session/set_model
  FAKE_VELA_PROMPT_ERROR          - fail session/prompt
  FAKE_VELA_REQUIRE_SET_MODEL='0' - disable the strict gate (legacy)
  FAKE_VELA_LOGIN_USER_EMAIL      - email written into config profile
  FAKE_VELA_LOGIN_USER_PLAN       - plan written into config profile
  FAKE_VELA_LOGIN_DELAY_MS        - sleep before write (test in-flight)
  FAKE_VELA_LOGIN_FAIL            - print + exit 1
  FAKE_VELA_MODELS                - override models stdout
  VELA_PROFILE                    - profile slot (prod | test | local)

### Components

`mocks/lib/format-vela.mjs` (~205 LOC)
  - Full ACP server with vela protocol extensions
  - Strict set_model gate
  - Error injection plumbing

`mocks/lib/vela-subcommands.mjs` (~90 LOC)
  - runVelaLogin() — writes ~/.amr/config.json
  - runVelaModels() — prints catalog

`mocks/bin/vela` — dispatcher wrapper. Forwards `vela <subcmd>` to
mock-agent.mjs which routes to login/models or falls through to ACP.

`mocks/mock-agent.mjs` — parseArgs now collects positionals so the vela
dispatcher can read subcommand from there; switch case added for vela.

`mocks/scripts/smoke-test.sh` — +4 assertions:
  vela models prints ≥10 catalog lines
  vela login writes ~/.amr/config.json with the requested email
  vela agent run ACP roundtrip (initialize+models+set_model+stream+result)
  vela strict set_model gate rejects prompt without prior set_model

### Verified locally

  ✓ vela models printed 15 catalog lines
  ✓ vela login wrote ~/.amr/config.json with profile.prod.user.email
  ✓ vela agent run ACP roundtrip (initialize+models, set_model accepted, prompt streamed)
  ✓ vela strict set_model gate rejects session/prompt without prior set_model

All 21 smoke checks pass (up from 17 with previous P3 ACP commit).

### AGENTS.md + README updates

  AGENTS.md — mention `vela (AMR — vela CLI)` alongside ACP agents in
  the directory listing entry.
  mocks/README.md — protocol table row + dedicated vela section with
  subcommand contract, strict gate explanation, env-injection cheat
  sheet. Mock-tree listing updated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mocks): honor REPORT_FILE env when --report-file flag not given

Harnesses that spawn the mock without translating their report-path
contract to the mock's CLI flag (notably nexu-io/agent-pr-explore's
orchestrator, which passes REPORT_FILE as env per the existing
opencode/claude/codex agent launchers) wouldn't get a report file
written, so the harness's "agent exit 0 but produced no report"
check would always fire and mark mock runs as failure even though the
stdout stream was complete.

Fix: in mock-agent.mjs parseArgs, fall through to process.env.REPORT_FILE
when --report-file wasn't provided on argv. Each format renderer already
accepts opts.reportFile and writes the recording's final assistant text
to it (`format-*.mjs` already had this — only the wiring was missing).

Verified: synclo-explore run with `mock=true, mock_trace=04097377`
against the opencode wrapper now produces a plan.md with the recording's
17-tool claude editing session report. ~1.5s per run vs ~70s real opencode.

* mocks: move recordings to Cloudflare R2; PR→main→Action upload path

The 179-recording corpus (~4.5 MB raw, ~280 KB after compression) has
been moved off git into Cloudflare R2 at the bucket open-design-mocks
under recordings/v1/. The repo now ships:

- mocks/manifest.json — the canonical catalog (renamed from
  recordings/index.json) with sha256 + storage hints; consumers
  fetch this to discover what exists, then pull individual jsonl
  files on demand
- mocks/scripts/fetch-recordings.sh — parallel, sha256-verified,
  idempotent puller for the public r2.dev URL
- mocks/scripts/add-recording.sh — local maintainer helper that
  validates a new .jsonl and copies it into recordings-staging/
  (no R2 calls; no credentials needed)
- mocks/scripts/upload-to-r2.mjs — called only by the CI workflow
- mocks/scripts/lib/manifest-utils.mjs — shared sha256/meta/
  rebuild-histograms logic, used by both add-recording (preview)
  and upload-to-r2 (actual write) so the entry shape never drifts
- .github/workflows/sync-mocks-to-r2.yml — fires on push to main
  when mocks/recordings-staging/ changes; uploads to R2, updates
  manifest, commits cleanup back; serialized via concurrency group

Trust model: R2 write credentials (CLOUDFLARE_API_TOKEN,
CLOUDFLARE_ACCOUNT_ID) are repo secrets; nobody can push from a
laptop. Read stays public via the r2.dev URL.

Why not pnpm install integration: contributors who do not touch
agent code do not pay the fetch cost. Fetch happens on first
smoke-test run (auto-fallback) or when a mock spawn needs data.

Repo size: -4.55 MB net (delete 179 jsonl, +280 KB manifest +
scripts). Smoke test (21 checks) still green against the fetched
corpus.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: scope R2 write token to a dedicated secret name

Use CLOUDFLARE_R2_MOCKS_TOKEN (instead of reusing the shared
CLOUDFLARE_API_TOKEN that landing-page-*.yml uses for Pages deploys)
so the R2 write capability can be scoped to just the
open-design-mocks bucket without bleeding extra capability into the
Pages workflows.

Also hardcode the powerformer CF account_id directly in the workflow
(account IDs are not secret and the shared CLOUDFLARE_ACCOUNT_ID
secret may point at a different account).

Workflow now fails fast with an actionable error message + dashboard
link if the secret is unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: switch R2 sync to S3-compat API (wrangler getMemberships gate)

wrangler 4.x calls /memberships before any r2 action, requiring
user:read scope. R2 "Object Read & Write" tokens deliberately lack
that scope (defense in depth — a leaked token should not enumerate
account-level resources). The workflow now uses the aws CLI talking
straight to the R2 S3-compatible endpoint with SigV4, no membership
lookup.

Secret rotation: CLOUDFLARE_R2_MOCKS_TOKEN (Bearer) is replaced by
CLOUDFLARE_R2_MOCKS_AK / CLOUDFLARE_R2_MOCKS_SK (matching the
existing CLOUDFLARE_R2_RELEASES_AK/SK naming convention). End-to-end
tested locally: PUT recording → manifest rebuild → manifest PUT →
staging cleanup all green.

aws CLI is pre-installed on ubuntu-latest, so no install step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: scrub synclo namespace; use OD_MOCKS_* env prefix throughout

These mocks were copy-pasted from synclo-explore, where they
originated, and inherited the SYNCLO_EXPLORE_MOCK_* env-var
convention. That brand-bleed is not appropriate in OD: rename the
public env surface to OD_MOCKS_* (matching OD-native prefixes like
OD_MOCKS_CACHE_DIR, OD_TRACE_R2_UPLOAD, OD_EXPECT_TIMEOUT_SECONDS).

Renames:
  SYNCLO_EXPLORE_MOCK_TRACE             → OD_MOCKS_TRACE
  SYNCLO_EXPLORE_MOCK_BY_PROMPT_HASH    → OD_MOCKS_BY_PROMPT_HASH
  SYNCLO_EXPLORE_MOCK_POOL              → OD_MOCKS_POOL
  SYNCLO_EXPLORE_MOCK_SEED              → OD_MOCKS_SEED
  SYNCLO_EXPLORE_MOCK_NO_DELAY          → OD_MOCKS_NO_DELAY
  SYNCLO_EXPLORE_MOCK_RECORDINGS_DIR    → OD_MOCKS_RECORDINGS_DIR
  SYNCLO_EXPLORE_MOCK_SMOKE_TRACE       → OD_MOCKS_SMOKE_TRACE
  SYNCLO_OD_MOCKS_I_KNOW_WHAT_IM_DOING  → OD_MOCKS_ALLOW_LOCAL_UPLOAD

Also drop the inline harvester usage from README. The harvester is an
external CLI in nexu-io/agent-pr-explore — its README is the right
place for langfuse-import flags, anonymization options, etc. OD only
documents its own staging→PR→Action workflow.

Smoke test (21 checks) still green; OD_MOCKS_TRACE end-to-end
verified to route correctly.

Consumers of the OLD env names (notably the orchestrator in
nexu-io/agent-pr-explore) need a matching rename. No back-compat
shim here — the explore side has zero external users today and a
one-line follow-up is cleaner than a permanent deprecation layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* AGENTS.md: align mock env names with mocks/ rename (SYNCLO_* → OD_MOCKS_*)

Missed in the prior commit (a30b868a) — only grepped mocks/ subdir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: drop staging dir + GH Action; back to local-script upload

The staging-dir + Action design (added earlier in this PR) had a flaw
the user caught: new recordings briefly entered the repo on their way
through staging, leaving them in git history forever even after the
Action cleanup commit removed them from HEAD. That defeats the whole
point of moving recordings to R2.

Replace with the simpler local-maintainer flow:

  bash mocks/scripts/upload-recording.sh /path/to/<trace>.jsonl
  # → validates, wrangler r2 put, updates manifest.json, wrangler r2 put manifest
  git add mocks/manifest.json && git commit && git push
  # → only the ~200B manifest delta enters git

The wrangler-OAuth gate replaces the CI secret + Action duo. For a
solo / small maintainer team this collapses the trust chain down to
"do you have wrangler login to the powerformer account?" — no GH
secrets to rotate, no concurrency window to worry about, no
inevitable repo-history bloat.

Deletes:
- .github/workflows/sync-mocks-to-r2.yml
- mocks/scripts/upload-to-r2.mjs   (CI-only)
- mocks/scripts/add-recording.sh   (staging helper, now obsolete)
- mocks/recordings-staging/        (empty dir, never to be repopulated)

Adds:
- mocks/scripts/upload-recording.sh

Kept:
- mocks/scripts/fetch-recordings.sh
- mocks/scripts/lib/manifest-utils.mjs (still used by upload-recording.sh)
- mocks/manifest.json (committed; the only mocks artifact in git)

End-to-end tested locally: re-upload an existing recording is
idempotent, manifest math is stable, fetch + smoke test still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: address review — guard allowlist + safe ~/.amr + loud OD_MOCKS_TRACE typo

Three concrete issues raised across recent Siri-Ray (Looper) review
threads on #3241:

1. scripts/guard.ts only allowlisted mocks/lib/ + mocks/mock-agent.mjs,
   leaving mocks/scripts/lib/manifest-utils.mjs outside the residual-
   JS guard. Result: Preflight fail on every push. Extend the allowlist
   to mocks/scripts/ — same precedent as the lib/ entry directly above.

2. mocks/scripts/smoke-test.sh moved the caller real ~/.amr to
   ~/.amr-smoke-backup, ran vela login (which writes a fake config),
   then rm -rf the .amr and restored the backup. Two failure modes:
   crash mid-run loses the user real config, and re-running before
   restore overwrites the backup with the fake login. Fix: sandbox
   vela login into a mktemp -d HOME via env (HOME=$amr_sandbox vela
   login). Never touches the real ~/.amr at all. trap cleans up.

3. mocks/lib/recording-picker.mjs silently fell through to
   prompt-hash → pool → random when OD_MOCKS_TRACE was set but did
   not match any recording (typo, prefix too short, corpus not
   fetched). Tests using a pinned trace would silently get a
   different trace, hiding regressions. Fix: throw an explicit error
   with the failing value + a pointer at fetch-recordings.sh.

Verified locally: pnpm guard prints "Residual JavaScript check
passed", smoke-test still 21/21, ~/.amr mtime unchanged after run,
typo on OD_MOCKS_TRACE now produces "mock-agent: OD_MOCKS_TRACE=...
set but no matching recording in <dir>" on stderr.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fetch-recordings: detect empty filter result before line-counting

printf '%s\n' on an empty string emits a single empty line, so the
previous TOTAL=$(printf ... | grep -c "") math returned 1 on an
empty $ENTRIES_TSV — a typo like `--agent no-such-agent` printed
"Fetching up to 1 recordings", downloaded zero, and exited 0
("ready"). Check `-z $ENTRIES_TSV` first.

Reproduced + fix verified per the reviewer thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mocks: address mrcfps review — goldens + provenance + contract check

Three durability improvements suggested in the PR #3241 top-level
review:

## 1. Golden daemon-event snapshots (mocks/golden/*.events.json + apps/daemon/tests/mocks-golden.test.ts)

Smoke-test verified that mocks RUN; that catches crashes but not a
parser change that semantically reshapes the events the daemon emits.

Commit the daemon-event sequence for 3 representative traces:
- claude  314d6833 — median-complexity agent-browser session
- codex   dcdff3b3 — 14-tool refactor
- opencode 9a9522ec — 7-tool data-report

apps/daemon/tests/mocks-golden.test.ts spawns the mock, feeds stdout
through the real createClaudeStreamHandler / createJsonEventStreamHandler,
normalizes per-spawn volatile fields (only sessionId today, only on
claude), and deep-equals against the committed snapshot. A parser
regression fails the test loudly.

After an intentional parser change, regenerate:

  MOCKS_GOLDEN_UPDATE=1 pnpm --filter @open-design/daemon test mocks-golden
  git diff mocks/golden/
  # eyeball; commit if shapes match intent

## 2. Provenance fields on every manifest entry (mocks/scripts/lib/manifest-utils.mjs + mocks/manifest.json)

Augment inspectRecording() to write:

  captured_at         — ISO 8601 from existing meta.timestamp
  cli_version         — null until harvester writes it
  protocol_version    — null until harvester writes it
  anonymization_version — null until harvester writes it

captured_at is now populated for all 179 existing entries from the
meta event the harvester already emits. The harvester in
nexu-io/agent-pr-explore is the next step for cli_version /
protocol_version / anonymization_version — once those are
populated, consumers can detect when a recording is older than ~1
minor version behind the live CLI and flag for re-harvest.

No matrix of (cli_version × agent) recordings — that explodes
maintenance. Just metadata per recording so trust decay is visible.

## 3. Real-CLI contract check (mocks/scripts/contract-check.sh + docs/MOCKS-CONTRACT-CHECK.md)

Mocks catch parser regressions against recordings; they do NOT
catch recordings drifting away from the live agent CLI as that CLI
evolves. The contract check spawns the real CLI alongside the mock
with a fixed deterministic prompt + diffs top-level event-type
distributions.

Deliberately human-driven, not cron-scheduled:
- costs real LLM tokens per invocation
- requires real CLI auth
- maintainer reads the output, not a regex

Suggested triggers per doc: real-CLI release notes mentioning
"output format" / "stream" / "JSON" / "events"; before a parser
refactor; ad-hoc when something looks off.

## Coverage note

README updated to position mocks as "deterministic protocol/parser
coverage" (not "e2e replacement") per mrcfps framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mocks-golden test): drop import of non-exported ParserKind

Use plain string (the type alias is `string` anyway) — Preflight
typecheck on a31fa71a failed:

  tests/mocks-golden.test.ts(29,8): error TS2459: Module
  "../src/json-event-stream.js" declares "ParserKind" locally, but
  it is not exported.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* recording-picker: structured OD_MOCKS_POOL + hard-fail no-match

Siri-Ray review: \`OD_MOCKS_POOL=outcome:failed\` was documented as a
supported selection knob, but the matcher only checked tags and
\`meta.agent\` — so the negative-path pool found 0 candidates and
silently fell through to global random, validating against any
recording instead of a failed trace.

Fix:
- Parse \`<dim>:<value>\` shape and route each dim to the right meta
  field: \`outcome\` → \`meta.outcome\`, \`agent\` → \`meta.agent\`,
  \`skill\` → \`tags[]\`. Bare values still fall back to tag substring.
- If the env was set and matched nothing, throw with the failing
  value and a jq one-liner for inspection. Same loud-fail policy as
  OD_MOCKS_TRACE — silent fallback was the original bug.

Verified locally: outcome:failed, agent:codex, skill:agent-browser
all route correctly; outcome:nonsense throws the explicit error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* contract-check.sh: fix lost $PROMPT in mock invocation

Siri-Ray review on e576074a: the mock side wrapped its pipeline in
`bash -c "printf %s \"\$PROMPT\" | ..."` — but $PROMPT was a parent
shell variable, not exported, so the child bash expanded it to an
empty string. Result: the contract check sent the real prompt to the
real CLI and an empty string to the mock, defeating the
same-input invariant the whole script rests on. Also let the mock
randomly select a different trace whenever a maintainer happens to
have OD_MOCKS_BY_PROMPT_HASH=1 in their env.

Fix: drop the inner bash -c entirely; use a subshell that scopes the
PATH overlay and pipes printf into the PATH-resolved mock binary
directly. The subshell limits the PATH change without var-passing.

Verified locally: with prompt-A the mock picks trace 54ec02ee via
hash; prompt-B → 2667e851 via hash; empty prompt (old broken
behavior) → random — confirms the prompt is now actually reaching
the mock under PATH overlay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 07:17:20 +00:00
open-design-bot[bot]
49573f031a Update docs/assets/github-metrics.svg (#3159)
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-29 03:02:19 +00:00
Amy
1c2a1c4459 Add launch review regression coverage and stabilize daemon tests (#3207)
* Add launch review E2E regression coverage

* Harden daemon launch review regressions

* Stabilize daemon runtime tests

* fix(tests): restore e2e preflight typing

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* fix(tests): make fake plugin runtime ESM-safe

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* Stabilize e2e fake agent and regression tests

* fix(tests): repair fake agent cjs runtime

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* fix(review): harden plugin authoring checks

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)

* fix(tests): bind plugin authoring run to seeded conversation

Generated-By: looper 0.9.2 (runner=fixer, agent=codex)
2026-05-29 02:39:33 +00:00
Denis Redozubov
f70fa0eb35 docs(media): describe external media composition (#3201) 2026-05-28 10:41:02 +00:00
lefarcen
df8a0faff6 feat(runtimes): register AMR (vela) as an ACP stdio agent (#2355)
* feat(runtimes): register AMR (vela) as an ACP stdio agent

AMR is the vela CLI's ACP runtime mode. `vela agent run --runtime opencode`
speaks ACP JSON-RPC over stdio (see vela's
`specs/current/runtime/manual-agent-run-openrouter.md`); per
`docs/new-agent-runtime-acp.md` we expose it through the same `streamFormat:
'acp-json-rpc'` transport that already powers Hermes, Devin, Kimi, etc.

The new `defs/amr.ts` is the entire wiring — `buildArgs` returns
`['agent', 'run', '--runtime', 'opencode']`, `fetchModels` reuses
`detectAcpModels`, and the fallback list seeds the OpenRouter ids vela's
e2e baseline uses. `executables.ts`/`app-config.ts`/`metadata.ts` get the
matching `VELA_BIN`/`VELA_LINK_URL`/`VELA_RUNTIME_KEY`/`VELA_OPENCODE_BIN`
allowlist + install/docs URLs, so users can configure the per-agent env in
Settings without leaking into other adapters.

Coverage: `tests/fixtures/fake-vela.mjs` is a minimal ACP stub that returns
the documented `initialize` / `session/new` / `session/set_model` /
`session/prompt` shapes; `tests/amr-acp-integration.test.ts` spawns it via
`child_process.spawn` and drives a full turn through `attachAcpSession` and
`detectAcpModels`, so the ACP transport contract for AMR is end-to-end
verified locally even before a real `vela` binary is installed.

Validated:
- pnpm guard
- pnpm typecheck (all workspace projects)
- pnpm --filter @open-design/daemon test (2881/2881)

Deferred: real OpenRouter-backed turn through a built `vela` binary —
the runtime def needs no changes for that path, only `VELA_RUNTIME_KEY`
and `VELA_LINK_URL` in env (or Settings).

* fix(runtimes/amr): pin a concrete default model and bare openai ids

End-to-end validation against a freshly-built `vela` (nexu-io/vela@main)
+ OpenRouter surfaced two contract details the first AMR runtime def
got wrong:

1. vela rejects `session/prompt` with `session/set_model must be called
   before session/prompt`. attachAcpSession in apps/daemon/src/acp.ts
   skips set_model whenever the picked model is the synthetic 'default'
   id, so AMR's fallback list must NOT include DEFAULT_MODEL_OPTION. The
   def now ships a concrete `gpt-5.4-mini` as both `fetchModels`'
   default option and `fallbackModels[0]`, which makes attachAcpSession
   always send a real `session/set_model` for AMR turns.

2. `vela --runtime opencode` auto-prepends `openai/` to whatever modelId
   it forwards to opencode's openai provider. With OpenRouter-style ids
   like `openai/gpt-5.4-mini`, opencode receives the double-prefixed
   `openai/openai/gpt-5.4-mini` and replies `ProviderModelNotFoundError`.
   The new fallback list ships the bare ids opencode's openai registry
   actually knows about (gpt-5.4, gpt-5.4-mini, gpt-5.4-fast, etc.).

Stub + tests:
- tests/fixtures/fake-vela.mjs now enforces the set_model gate the same
  way real vela does, so a regression that silently goes back to
  model: 'default' would surface as a fatal error in tests instead of a
  hidden production failure.
- tests/amr-acp-integration.test.ts pins both contracts: no 'default' /
  no 'openai/' prefix in fallbackModels, and a negative case that
  asserts session/prompt fails when no model is set.

Adds `apps/daemon/scripts/verify-amr-real-vela.mjs` — a small dev-time
runner that drives `attachAcpSession` against a real `vela` binary and
prints the daemon's chat events, so future protocol drift can be checked
against an actual OpenRouter call.

Verified locally: `vela agent run --runtime opencode` + OpenRouter
returns the prompted string ("AMR-E2E-PASS") through the full daemon
pipeline; daemon test suite stays 2883/2883.

* fix(runtimes/amr): substitute concrete model when chat run sends 'default'

A plugin-driven AMR run from the UI surfaced a real-world hole in the
prior commit:

  json-rpc id 3: session/set_model must be called before session/prompt

The Default-design-router plugin (and any caller that doesn't pin a
real model) sends `model: 'default'` straight through, which the AMR
runtime def cannot accept — vela rejects `session/prompt` without
`session/set_model` and attachAcpSession skips set_model whenever
model === 'default'. Just leaving DEFAULT_MODEL_OPTION out of the
adapter's `fallbackModels` is not enough: the chat-run handler in
server.ts still forwarded 'default' verbatim.

This adds `resolveModelForAgent(def, resolved, env?)` as the
single source of truth for the substitution:

  1. If the caller picked a real id, pass it through.
  2. Else, if `def.defaultModelEnvVar` is set and the daemon process
     env has a non-empty value for it, return that (operator escape
     hatch — see below).
  3. Else, if the def's `fallbackModels` does NOT contain a 'default'
     id, return `fallbackModels[0].id`.
  4. Else, return the original value (the historic shape — defs that
     list 'default' themselves are untouched).

AMR sets `defaultModelEnvVar: 'VELA_DEFAULT_MODEL'`, so when
opencode's openai-provider registry deprecates `gpt-5.4-mini`
upstream, an operator can swap the fallback id without a code change
by exporting `VELA_DEFAULT_MODEL=gpt-5.5` before launching tools-dev
/ od. Worth noting the env var must live in the daemon's `process.env`
(Settings-UI per-agent env values only reach the spawned child, not
the daemon's resolver) — the new field's docblock spells this out.

Coverage:
- `tests/runtimes/resolve-model.test.ts` — 8 unit tests covering all
  four resolver branches plus the env-override happy path / fallback /
  ignore-when-user-picked-a-real-id case.
- `pnpm --filter @open-design/daemon typecheck` clean.

* chore(runtimes/amr): move AMR to the top of the base agent list

So `AMR (vela)` shows up first in the agent picker / status views,
ahead of claude / codex. Pure ordering change; no behavior delta.

* feat(amr): Sign-in / Sign-out button on the AMR Settings card

The first half of the AMR work assumed the operator would set
VELA_RUNTIME_KEY / VELA_LINK_URL on the daemon process and never
surfaced login state to users. This adds the missing UX so a fresh
install can drive the full path from Settings:

  - GET  /api/integrations/vela/status   reads ~/.vela/config.json
    for the active profile and returns { loggedIn, profile, user }
    (without leaking the runtime/control keys themselves).
  - POST /api/integrations/vela/login    spawns `vela login` once
    (409 if one is already in flight). The vela CLI opens the user's
    browser to the device-authorization page itself — Open Design
    only needs to kick the subprocess off.
  - POST /api/integrations/vela/logout   removes ~/.vela/config.json
    so the next status read returns logged-out.

`AmrAgentCard` is a dedicated agent-card component for AMR because
the existing `<button>` row can't host an interactive sub-control
(nested interactive elements). It polls /status after a login click
until the daemon reports loggedIn=true (or 5 minutes elapse), and
exposes a Sign-out action on hover. Other adapters (claude, codex,
hermes, …) keep their existing `<button>` card.

i18n: 8 new keys (settings.amrLogin / Logout / LoggingIn / etc.)
added to en + zh-CN. Other locales spread `en` and inherit the
English copy until translations land.

Coverage:
- `tests/integrations/vela.test.ts` pins the config.json reader
  against a tmp HOME — including the negative case where a profile
  has user info but no runtimeKey (still logged-out), and the
  secret-leak guard ("rt-secret-*" must not appear in the projection
  payload).
- `tests/components/AmrAgentCard.test.tsx` covers all four UI
  states (logged-out, logging-in, logged-in, logging-out) plus the
  click-propagation invariant the divergent card was built to keep.

`pnpm --filter @open-design/daemon test` 2901 / 2901 passing.
`pnpm --filter @open-design/web test` 1719 / 1719 passing.
`pnpm typecheck` + `pnpm guard` clean.

Dev script side-effects: `apps/daemon/scripts/verify-amr-real-vela.mjs`
no longer requires both VELA_RUNTIME_KEY and VELA_LINK_URL — if
VELA_PROFILE is set, the vela CLI is allowed to resolve credentials
from `~/.vela/config.json`. Added the two AMR `.mjs` fixtures to
`scripts/guard.ts` allowlist with the executable-fixture / dev-runner
rationale.

* fix(connection-test): substitute model for AMR before attachAcpSession

The chat-run path in server.ts already routes the requested model through
`resolveModelForAgent` so AMR / vela (whose CLI demands an explicit
`session/set_model` before `session/prompt`) gets the def's first
concrete fallback id when the chat run ships `model: 'default'`.
`connectionTest.ts` was wiring `attachAcpSession({ ..., model: model ?? null })`
directly, which made the Test Connection button on the AMR Settings
card deadlock with the same `session/set_model must be called before
session/prompt` error the chat-run path already handles — surfaced as a
permanent "Testing connection…" spinner in the UI.

Reuse the same helper here so Test Connection mirrors chat-run behavior.

* test(amr): three-layer end-to-end coverage for the AMR login + turn flow

The PR up to this point shipped runtime + UI code with unit-level Vitest
coverage. This commit adds the cross-layer regression net the live demo
relied on:

1. apps/daemon/tests/integrations/vela.routes.test.ts (HTTP, Vitest)
   Spins up the real daemon Express app via `startServer({port:0,...})`,
   persists `agentCliEnv.amr.VELA_BIN = <fake>` into app-config.json,
   and exercises every /api/integrations/vela/* endpoint against the
   extended fake-vela stub:
     - status reads ~/.vela/config.json under various states
     - login spawns the fake, waits for config.json to appear, returns
       pid + startedAt + profile
     - 409 already-running guard with the stub's delay knob
     - logout removes the file (idempotent)
     - secrets (runtimeKey / controlKey) never leak in the projection
     - login → status round-trip flips loggedIn=false → true

2. e2e/tests/amr/turn.test.ts (tools-dev orchestrated, Vitest)
   Boots a namespaced daemon + web pair through `createSmokeSuite`,
   inlines a self-contained fake `vela` binary that handles BOTH
   `vela login` (writes ~/.vela/config.json) and
   `vela agent run --runtime opencode` (ACP stdio with the
   `session/set_model must precede session/prompt` gate the real binary
   enforces), then drives a complete /api/runs lifecycle for
   `agentId: 'amr', model: 'default'` and asserts the assistant message
   captures the fake's streamed text. This is the test that would have
   surfaced today's plugin-default-model regression (the `set_model
   before prompt` error) at PR time instead of demo time.

3. e2e/ui/amr-login-pill.test.ts (Playwright)
   Mocks /api/agents + /api/integrations/vela/{status,login,logout}
   to drive the Settings AMR card through the full Sign in → Signed in
   → Sign out cycle. Pins the AmrLoginPill polling contract and the
   aria-label semantics (the pill's accessible name is "Sign out" once
   logged in, regardless of which label the hover-state text shows).

fake-vela.mjs extensions:
   - Handles `vela login` argv by writing
     ~/.vela/config.json for the active VELA_PROFILE and exiting 0 —
     mirrors real vela's on-disk side-effect without the device-auth
     loop.
   - FAKE_VELA_LOGIN_DELAY_MS knob so route tests can observe the
     in-flight state of the spawn lifecycle.
   - FAKE_VELA_LOGIN_USER_EMAIL / _USER_PLAN to assert the surfaced
     user fields end-to-end.

Validated:
   - `pnpm guard` + `pnpm typecheck` (all workspace projects)
   - `pnpm --filter @open-design/daemon test`: 2998 / 2998 passing,
     including the new 8-test integration suite.
   - `cd e2e && pnpm test tests/amr`: 1 / 1 passing.
   - `cd e2e && pnpm exec playwright test ui/amr-login-pill.test.ts`:
     1 / 1 passing (6.7s).

* feat(amr): package native cli and refine login ui

* feat(amr): wire vela cli beta packaging

* docs(amr): document vela ci packaging review

* docs(amr): refine vela ci integration review

* fix(ci): refresh nix pnpm dependency hashes

* fix(pack): clean up Vela CLI packaging

* fix(pack): bundle Vela CLI support files

* fix(amr): recover login attempts from stale auth state

* test: expand AMR and automations coverage

* fix(amr): address review follow-ups

* test(web): align tasks fixtures with contracts

* fix(daemon): type wildcard route params

* fix(ci): refresh PR merge validation

* fix(amr): clear env credentials on logout

* feat(settings): inline local CLI model configuration

* fix(amr): recognize daemon env credentials

* [codex] Fix Vela companion packaging (#2979)

* Fix Vela companion packaging

* Update Nix pnpm dependency hashes

* [codex] Surface AMR account failures (#2980)

* fix: surface AMR account failures

* fix: cover AMR recovery error guidance

* chore: bump beta base version to 0.8.1 (#2990)

* Fix AMR profile and packaged runtime review issues

* Detect packaged AMR OpenCode companion tree

* feat(web): polish AMR frontend flows

* Polish AMR onboarding card

* fix: read AMR login state from dot-amr config (#3048)

* test: tighten AMR credential and packaging coverage

* test: restore AMR executable test env helper

* [codex] Fix packaged mac Dock identity and AMR label (#3076)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR live models and dot-amr login state (#3073)

* fix: read AMR login state from dot-amr config

* fix: load live AMR models before runs

* fix: point AMR onboarding link to production wallet

* fix: address AMR model review feedback

* fix: persist live AMR model fallback

* [codex] Fix AMR link catalog model ids (#3088)

* Fix packaged mac sidecar Dock identity

* Rename AMR assistant label

* Fix AMR link catalog model ids

* Fix AMR model normalization typecheck

* Use live AMR model for default runs

* fix: polish AMR runtime settings UI

* Accelerate AMR startup defaults (#3092)

* Surface AMR insufficient balance wallet URL (#3099)

* fix(web): polish onboarding controls (#3112)

* fix(web): show CLI scan loading state

* Avoid duplicate AMR wallet recharge links (#3117)

* Avoid duplicate AMR wallet recharge links

* Use Vela CLI 0.0.3 test package

* chore(nix): refresh pnpm deps hash

* Fix AMR wallet guidance display

---------

Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>

* chore(pack): pin Vela CLI 0.0.3-test.1 (#3127)

* chore(nix): refresh pnpm deps hash

* chore(pack): pin Vela CLI 0.0.3

* chore(nix): refresh pnpm deps hash

* fix(web): suppress AMR exit 130 fallback (#3136)

* feat(web): nudge users to hosted AMR on model/auth/quota failures (#3083)

* feat(web): nudge users to hosted AMR on model/auth/quota failures

When a non-AMR agent run fails with an auth / quota / upstream model
error, surface an inline nudge under the error pill linking to Open
Design's hosted AMR gateway (https://open-design.ai/amr). The nudge
fires `surface_view` (element=run_failed_toast) on impression and
`ui_click` (element=go_amr) on the link.

Also teach the daemon to classify CLI-agent auth/quota/upstream failures
(Claude Code, codex, ...) into specific API error codes
(AGENT_AUTH_REQUIRED / RATE_LIMITED / UPSTREAM_UNAVAILABLE) instead of
the generic AGENT_EXECUTION_FAILED, so both the error message and the
nudge key off accurate codes. AMR's own runs are excluded from the
nudge — they keep the dedicated sign-in / recharge affordances.

* feat(web): rework failed-run AMR guidance into per-case error UI

Replace the single inline nudge with a per-case failed-run experience
driven by the run's error code + agent:

- The error card is now neutral gray (was red) and always carries a
  retry button; it is driven by the persisted per-message error event so
  it survives a reload.
- Non-AMR agent hitting a model/auth/quota wall: a theme-color promotion
  card under the error card offers "switch to AMR & retry" — switches the
  run to AMR, opens Settings on the AMR card, and auto-retries once the
  account signs in (ProjectView polls vela login status, independent of
  the Settings pill lifecycle, with success / 5-min-timeout / unmount
  exits).
- AMR agent unauthorized: clearer copy + an "authorize & retry" button.
- AMR agent out of balance: clearer copy + a "top up" button to the AMR
  wallet, with manual retry.
- Settings AMR card: when opened from the nudge, it scrolls into view and
  pulses, and an authorize-button coachmark (a fake hand cursor that
  rises in and dismisses on hover) points at the sign-in control when not
  yet authorized.

analytics: surface_view (run_failed_toast) on the promotion card and
ui_click (go_amr) on its action are retained. i18n adds chat.amrCard.*
and chat.amrError.* (en / zh-CN / zh-TW translated; other locales fall
back to en) and drops the old chat.amrErrorGuidance keys.

* fix(daemon): require status context for numeric service-failure codes

Per review on #3083: the model-service classifier matched bare HTTP
status numbers (`500`, `502`, `429`, `401`), so ordinary CLI output like
`line 500`, `read 502 bytes`, or `exit code 401` could be misclassified
as a provider outage / auth wall and wrongly surface the AMR nudge. Now
a status number only counts when it carries explicit context (`HTTP 500`,
`status 503`, `code: 401`, `502 Bad Gateway`); textual provider phrases
(overloaded, bad gateway, service unavailable, rate limit, …) are
unchanged. Adds fixtures proving unrelated numeric output stays null.

* fix(web): keep error pill for failed runs ChatPane's card doesn't cover

Per review on #3083: the per-message gray error pill was suppressed for
every persisted error status event, but ChatPane only renders the
replacement top-level error card for `retryableAssistantMessage` (the
last failed assistant). So a failed turn that is no longer last (after a
follow-up) or an older failed run in history showed neither the pill nor
the card — its error detail vanished, undercutting reload/history
survival. ChatPane now passes `errorCardOwnerId` (the assistant id whose
error the card represents); AssistantMessage suppresses only that one
pill and keeps rendering StatusPill for all other error events.

* fix(daemon): don't treat a process exit code as an HTTP status

Follow-up to review on #3083: the status-context helper accepted a bare
`code` prefix, so `exit code 401` / `process exited with code 429` still
matched and got classified as AGENT_AUTH_REQUIRED / RATE_LIMITED (the
very `exit code 401` case the comment calls out as noise). `code` now
only counts when qualified (`status code` / `error code` / `response
code`) or punctuation-bound (`code: 401`); bare `exit code N` no longer
matches. Adds fixtures for exit-code lines returning null.

* chore(web): translate AMR card / error keys for 16 remaining locales

PR #3083 added 10 new `chat.amrCard.*` / `chat.amrError.*` keys but only
provided en/zh-CN/zh-TW translations; the other 16 locales fell back to
English. Translate the card title/body, three chips, primary CTA, and
the AMR self-error (auth / balance) messages and buttons for ar, de,
es-ES, fa, fr, hu, id, it, ja, ko, pl, pt-BR, ru, th, tr, uk.

* fix(amr): address review feedback on #2355

Targeted fixes for the unresolved review threads on #2355. Each fix
includes / updates a focused test.

- runtimes/executables.ts: `packagedVelaOpenCodeCompanionTree` now
  verifies the inner `opencode` executable exists + is runnable, not
  just the directory. This closes the false-positive availability path
  that let `detectAgents()` surface AMR as available even when the
  packaged companion was empty / partially copied (mrcfps, 4 threads).

- runtimes/executables.ts: `resolveAmrOpenCodeExecutable` now prefers
  the bundled `<OD_RESOURCE_ROOT>/bin/libexec/opencode/opencode` over a
  stale `opencode` on the user's PATH, so packaged AMR builds can't be
  hijacked by a global installation.

- web/EntryShell.tsx: when the Local CLI scan returns an available
  agent and the previously-selected agent is AMR, switch the selection
  to the first available local agent so the runtime and persisted
  agent agree before Continue.

- server.ts (model-probe branch): for AMR, check `readVelaLoginStatus`
  BEFORE rejecting on an empty live-model catalog — a signed-out user
  was getting `AMR_MODEL_UNAVAILABLE` ("choose a model") instead of
  the correct `AMR_AUTH_REQUIRED` (sign-in affordance).

- server.ts (default model fallback): if the user asked for the AMR
  agent default and the cached id is no longer in the FRESH catalog,
  fall back to `liveModels[0]` from the probe instead of rejecting the
  run as `AMR_MODEL_UNAVAILABLE`.

- integrations/vela.ts: route `vela login` through
  `createCommandInvocation` so an npm/Node-style `vela.cmd` / `.bat`
  shim on Windows gets the correct `cmd.exe /d /s /c …` wrapping with
  verbatim args (matches `execAgentFile` / chat-run spawning).

- tools/pack/src/linux.ts: in containerized Linux builds, bind-mount
  the host directory of `OPEN_DESIGN_VELA_CLI_BIN` and rewrite the env
  to the container-side path. The host path was being passed in as-is
  even though the default container only mounts /project, /tools-pack
  and cache/home — `copyOptionalVelaCliBinary` saw a missing path.

Deferred (out of scope for this PR):
- `od amr status/login/logout/cancel` CLI subcommands (AGENTS.md
  UI/CLI dual-track rule, server.ts:5763) — sizable surface; tracked
  for a separate focused PR.
- Strict `--require-vela-cli` for Windows + mac-x64 beta builds:
  prematurely blocked — `@powerformer/vela-cli` only publishes the
  `darwin-arm64` platform binary today; adding the flag elsewhere
  would fail the builds. Revisit once win/x64/linux binaries ship.

* fix(amr): hoist sendAmrAccountFailure above the AMR catalog preflight (TDZ)

The new signed-out AMR branch in the catalog preflight at server.ts:10875
calls `sendAmrAccountFailure(...)` to emit AMR_AUTH_REQUIRED, but the
const declaration sat ~100 lines below at the outer function scope. Because
`const` is TDZ-aware, that branch would have thrown `ReferenceError:
Cannot access 'sendAmrAccountFailure' before initialization` for the
exact users it tries to help — defeating the original intent.

Hoist the helper to just above the AMR preflight block so it's available
to every AMR code path in this function. Behavior elsewhere is unchanged.

Also rerun the daemon test suite: `launch.test.ts > resolveAgentLaunch
uses packaged built-in Vela for AMR` was creating the
`<resourceRoot>/bin/libexec/opencode/` companion *directory* only, but
this PR's earlier tightening of `packagedVelaOpenCodeCompanionTree`
also requires the inner `opencode` executable. Add it to that fixture
to match the new contract; the test was a sibling of the executables /
env-and-detection fixtures already updated in 13fc4f4.

Addresses #2355 review (mrcfps, 2026-05-28).

* feat(web): add hover cancel for AMR login (#3158)

* feat(web): add hover cancel for AMR login

* fix(web): don't bounce AmrLoginPill back to 'Signing in…' after local cancel

Both codex-connector (P2) and looper (CHANGES_REQUESTED) on this PR
flagged the same race in the new local-cancel path: `handleCancelLogin`
dispatches `notifyAmrLoginStatusChanged('login-canceled')` immediately
after `/login/cancel` returns, but the `AMR_LOGIN_STATUS_EVENT` listener
unconditionally re-enters `refresh()` and then restarts polling
whenever `/api/integrations/vela/status` still reports
`loginInFlight: true`.

That is a real race because the daemon's `cancelVelaLogin()` only sends
SIGTERM (escalating to SIGKILL after `LOGIN_CANCEL_KILL_GRACE_MS` =
2000 ms) and keeps the child in `activeLoginProcs` until it actually
exits — so the first `/status` read after a successful cancel can
legally still come back as in-flight. Under that window the pill flips
back to 'Signing in…' and can later surface the timeout/error path even
though the user already canceled, defeating the behavior promised in
the PR description.

Fix the listener instead of every dispatch site: in the
`login-canceled` branch, after the local reset (stopPolling +
setPending(null) + clear refs), optimistically mark every subscribed
pill instance as not-in-flight (`setStatus((c) => c ? { ...c,
loginInFlight: false } : c)`) and `return` — skip the
refresh-and-reconcile branch below entirely. The next explicit refresh
(component mount, user interaction, or a `status-changed` event) will
pick up the daemon's confirmed state once the child has actually
exited.

Add a focused regression test that holds `/api/integrations/vela/status`
at `loginInFlight: true` even after a successful `/login/cancel`,
asserting that the pill stays at the Canceled → Authorize sequence and
never bounces back to 'Signing in…'. This test fails on the pre-fix
listener and passes on the new behavior; existing
'cancels an in-flight AMR sign-in…' and 'reconciles late AMR browser
completion to Signed in after local cancel' tests continue to pass.

Addresses review feedback on #3158 (chatgpt-codex-connector, nettee).

---------

Co-authored-by: lefarcen <935902669@qq.com>

---------

Co-authored-by: a1chzt <chizblank@gmail.com>
Co-authored-by: Amy <1184569493@qq.com>
Co-authored-by: Mason <jinmeihong0201@gmail.com>
Co-authored-by: Caprika <56862773+alchemistklk@users.noreply.github.com>
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-28 05:09:55 +00:00
open-design-bot[bot]
4ddb8f9560 Update docs/assets/github-metrics.svg (#3075)
Co-authored-by: open-design-bot[bot] <282769551+open-design-bot[bot]@users.noreply.github.com>
2026-05-27 07:10:46 +00:00
mehmet turac
d70070fcbc skills: add research decision room (#2949)
* skills: add research decision room

* skills: align research room example contract
2026-05-26 15:01:37 +00:00
Amy
5563e7eca6 test: expand home entry and html preview coverage (#2992)
* test: cover entry topbar and hero flows

* test: expand entry and html preview coverage

* test: isolate mocked github stars in home entry e2e

Generated-By: looper 0.8.1 (runner=fixer, agent=codex)

* chore: retrigger CI for PR 2992
2026-05-26 14:48:35 +00:00
lefarcen
7312c64580 ci(landing): split landing deploy into staging gate + manual production (#2994)
* ci(landing): split landing deploy into staging gate + manual production

A merge to `main` previously published the landing page straight to
production (open-design.ai) via `landing-page-deploy`. There was no
buffer to review the rendered site, so a bad merge was live instantly.

Split deploys across two Cloudflare Pages projects so production is only
ever reached by an explicit human action:

- `landing-page-staging` (push to main) -> staging project
  `open-design-landing-staging` -> staging.open-design.ai.
- `landing-page-production` (manual workflow_dispatch only) -> production
  project `open-design-landing` -> open-design.ai. Only this workflow
  names the production project; gate it with required reviewers on the
  `production` GitHub environment.
- `landing-page-ci` now also deploys a per-PR preview into the staging
  project (`--branch=pr-<n>`) for same-repo branches and comments the URL.
  Fork PRs (no secrets / read-only token) skip the deploy and keep just
  the build validation. Path filters already scope this to landing edits.

Decouple search-engine indexing from staging:

- `blog-indexing-on-deploy` now triggers on `landing-page-production`
  (not every main push), so the test environment is never submitted to
  Google/IndexNow.
- It diffs from a new `blog-indexed-prod` tag (the last indexed prod
  commit) instead of `HEAD^`, and force-advances the tag after a
  successful run, so a manual promotion bundling several merged posts
  indexes all of them rather than only the last commit.

Staging and PR-preview builds drop `PUBLIC_GA_MEASUREMENT_ID` so test
traffic does not pollute the production GA property.

* ci(landing): keep staging + PR previews out of the search index

staging.open-design.ai mirrors production and is exposed via cert
transparency logs, so search engines can discover it. Indexing the
mirror competes with open-design.ai for the same content.

Emit `<meta name="robots" content="noindex, nofollow">` whenever
OD_LANDING_NOINDEX=1, and set that flag on the staging and PR-preview
builds (production leaves it unset and stays indexable). noindex is
used rather than a robots.txt Disallow so crawlers can still fetch the
page and read both the tag and the canonical, which already points at
the production origin.

* fix(landing): make staging noindex actually take effect

The previous commit read `process.env.OD_LANDING_NOINDEX` directly in
`seo-head.astro`, but `.astro` frontmatter is transformed by Vite and
does not see process.env, so the meta never rendered. Two fixes:

- Inject the flag as the compile-time constant `__OD_LANDING_NOINDEX__`
  via `vite.define` in astro.config.ts (config runs in Node and can read
  process.env); SeoHead consumes that constant.
- The homepage (`index.astro`) and `og.astro` build their own <head> and
  never use SeoHead, so a per-component meta can miss pages. Add an
  `astro:build:done` integration that appends a catch-all
  `/*  X-Robots-Tag: noindex, nofollow` to the Cloudflare Pages `_headers`
  on staging/preview builds, covering every response (homepage, assets,
  any custom-head page) at the HTTP layer. Production builds leave
  `_headers` untouched.

Verified: build with OD_LANDING_NOINDEX=1 emits the _headers block and
the SeoHead <meta>; build without the flag emits neither; astro check
clean.

* fix(landing): address review — pin prod checkout to main, defer index pointer

Two blockers from review:

- landing-page-production: workflow_dispatch can be launched from any ref
  via the Actions "Use workflow from" dropdown, so an operator could ship
  an arbitrary branch to open-design.ai. Pin the checkout to `ref: main`
  so the deployed artifact always equals reviewed main.

- blog-indexing-on-deploy: the `blog-indexed-prod` pointer was advanced
  right after sitemap submission, before Inspect / Search Analytics /
  Render status / Open status PR. A failure in any of those still moved
  the pointer, so the next production run skipped those posts. Move the
  advance to the very end, gated on `success()`, so a failure leaves the
  tag in place and the range is re-processed next run (submissions are
  idempotent).

* fix(landing): gate production promotion to the main ref only

Follow-up to the production-path review note: pinning checkout to main
fixed the deployed content, but the workflow was still dispatchable from
any ref, which records a non-main production run and would dodge
blog-indexing's `workflow_run` `branches: [main]` filter. Gate the whole
job on `github.ref == 'refs/heads/main'` so a dispatch from any other
branch/tag is skipped outright.
2026-05-26 14:05:04 +00:00