Commit Graph

  • 4a92407ae7 Describe the review design as current state, not as a delta Jesse Vincent 2026-06-10 08:28:28 -07:00
  • cc81ffe7f3 Spec: record iterations 2-3 results and final frozen-config matrix Jesse Vincent 2026-06-10 05:06:59 -07:00
  • a0dcb77596 Hand reviewers the diff as a file, not a paste Jesse Vincent 2026-06-10 03:44:19 -07:00
  • bc7d93de1a Reviewer skepticism covers the implementer's design rationales Jesse Vincent 2026-06-10 02:20:28 -07:00
  • 63a155692b Make diff-pasting non-optional for task reviewer dispatch Jesse Vincent 2026-06-10 02:10:34 -07:00
  • 4866fe8b2d Close the Minor-severity escape hatch Jesse Vincent 2026-06-10 02:09:10 -07:00
  • e45a8f2548 Spec: document cost iterations and the per-task review consolidation Jesse Vincent 2026-06-09 23:59:22 -07:00
  • fc75b0b3b4 Merge per-task reviews into one task reviewer (iteration 2) Jesse Vincent 2026-06-09 23:58:28 -07:00
  • da0a11f6d4 Cut review-cost drivers: turn-aware models, inline diffs, scoped evidence Jesse Vincent 2026-06-09 22:42:54 -07:00
  • b42846401f Add phrase-level pre-judging triggers to reviewer prompt rule Jesse Vincent 2026-06-09 21:49:51 -07:00
  • c087105ff3 Red Flags: never tell a reviewer what not to flag or pre-rate severity Jesse Vincent 2026-06-09 21:47:41 -07:00
  • 29e5842917 Close three review blind spots found by defect tracing Jesse Vincent 2026-06-09 21:19:08 -07:00
  • 1d94bc939d Require explicit model on subagent dispatch Jesse Vincent 2026-06-09 21:11:45 -07:00
  • 833ec4177e Forbid controllers pre-judging reviewer findings Jesse Vincent 2026-06-09 18:28:24 -07:00
  • c4abda336c Sync plan: escaped pre() pattern in Task 5 checks block Jesse Vincent 2026-06-09 18:19:00 -07:00
  • c874cf0cb3 Fix plan doc: correct Task 1 grep expectation; sync Task 5 story block Jesse Vincent 2026-06-09 17:21:06 -07:00
  • 08a2e7eed3 Sync plan's Task 5 blocks with review fixes Jesse Vincent 2026-06-09 17:13:03 -07:00
  • 077dd192a7 SDD controller: reviewer prompt budgets, ⚠️ handling, final-review pointer, model judgment Jesse Vincent 2026-06-09 16:59:05 -07:00
  • 441d22a2c0 Implementer prompt: re-run covering tests after fixing review findings Jesse Vincent 2026-06-09 16:56:28 -07:00
  • efcaa40f1f Scope spec reviewer's Your Job wording to the diff Jesse Vincent 2026-06-09 16:55:28 -07:00
  • 622a3887f3 Spec reviewer: judge from the diff, grounded skepticism, ⚠️ verdict channel Jesse Vincent 2026-06-09 16:53:30 -07:00
  • d3d6800b07 Use bare placeholder names in quality reviewer prompt body Jesse Vincent 2026-06-09 16:51:54 -07:00
  • 246b493db4 Make per-task quality reviewer prompt self-contained and task-scoped Jesse Vincent 2026-06-09 16:47:27 -07:00
  • 7dc323c28b Add implementation plan for task-scoped review dispatch Jesse Vincent 2026-06-09 16:42:50 -07:00
  • 55938589d3 Harden review-dispatch spec per adversarial review findings Jesse Vincent 2026-06-09 16:33:44 -07:00
  • 450b02a11b Add design spec: task-scoped review dispatch for SDD Jesse Vincent 2026-06-09 16:26:00 -07:00
  • 9eb452afe7 chore: bump evals submodule to claude transcript-capture fix Drew Ritter 2026-06-13 15:15:46 -07:00
  • 16856963f2 chore: bump evals submodule to claude transcript-capture fix bump/evals-claude-transcript-capture Drew Ritter 2026-06-13 15:15:46 -07:00
  • 9d2b0e971d writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks writing-plans-crisp Jesse Vincent 2026-06-10 20:44:48 -07:00
  • 93f2ce91b8 Fix companion stop metadata and token permissions Drew Ritter 2026-06-11 10:25:19 -07:00
  • e9ee6c5b4d Harden Windows browser launcher Drew Ritter 2026-06-10 20:33:56 -07:00
  • 5415cb8ccf Fix Windows lifecycle validation Drew Ritter 2026-06-10 20:09:55 -07:00
  • 1c21a91e01 Align visual companion docs with shipped scope Drew Ritter 2026-06-10 19:41:28 -07:00
  • 441335ee3e Fix companion test cleanup and argv assertions Drew Ritter 2026-06-10 19:37:30 -07:00
  • 377192f7a1 Harden companion platform tests Drew Ritter 2026-06-10 19:26:53 -07:00
  • 5eea0d09d7 Fix companion lifecycle test ownership metadata Drew Ritter 2026-06-10 19:12:17 -07:00
  • a6a4cd85b9 Harden companion stop ownership proof Drew Ritter 2026-06-10 18:49:38 -07:00
  • 8034176801 Isolate companion fallback tokens Drew Ritter 2026-06-10 18:39:37 -07:00
  • 2bab677ba7 Fix server test fallback cleanup Drew Ritter 2026-06-10 18:33:38 -07:00
  • c4cde1eed9 Harden root screen containment Drew Ritter 2026-06-10 18:25:03 -07:00
  • 5f3b317741 Plan visual companion final hardening fixup Drew Ritter 2026-06-10 18:19:31 -07:00
  • 7bb6af2f67 Tighten visual companion hardening spec Drew Ritter 2026-06-10 18:13:18 -07:00
  • 4f88b89c75 Document visual companion final hardening fixup Drew Ritter 2026-06-10 18:05:55 -07:00
  • c7d7e3550f Harden companion Windows lifecycle coverage Drew Ritter 2026-06-10 16:23:13 -07:00
  • a2e67bbd9b Harden brainstorm companion auth regressions Drew Ritter 2026-06-10 14:58:16 -07:00
  • fe812c418f Document visual companion auth hardening plan Drew Ritter 2026-06-10 14:14:15 -07:00
  • f4d1788ffb fix(brainstorm-server): fix auth-integration bugs from full-branch review Jesse Vincent 2026-06-09 19:13:52 -07:00
  • 4341c3f4d5 test(brainstorm-server): thread session key through tests after auth merge Jesse Vincent 2026-06-09 18:33:00 -07:00
  • c64c4ea6f4 feat(brainstorm-server): gate every endpoint behind a per-session key Jesse Vincent 2026-06-09 12:22:53 -07:00
  • de05e020d8 docs(brainstorm): catalog visual companion issues; choose session-key for security Jesse Vincent 2026-06-09 12:13:54 -07:00
  • eee4f87471 fix(brainstorm-server): tie stop-server PID check to the session's port Jesse Vincent 2026-06-09 17:27:30 -07:00
  • bac46a5dcb fix(brainstorm-server): address adversarial review findings Jesse Vincent 2026-06-09 15:59:59 -07:00
  • daa41c0670 feat(brainstorming): offer the visual companion just-in-time; harden lifecycle guidance Jesse Vincent 2026-06-09 15:32:58 -07:00
  • 0d37ff6505 feat(brainstorm-server): opt-in auto-open of the browser on the first screen Jesse Vincent 2026-06-09 15:26:19 -07:00
  • 13da997ac7 feat(brainstorm-server): reuse the same port on session restart Jesse Vincent 2026-06-09 15:22:23 -07:00
  • 31a0de857b feat(brainstorm-companion): resilient reconnect, live status, paused overlay Jesse Vincent 2026-06-09 15:18:19 -07:00
  • c292421627 feat(brainstorm-server): 4h configurable idle timeout; close WS on shutdown Jesse Vincent 2026-06-09 15:08:09 -07:00
  • 9b00cc298d fix(brainstorm-server): verify PID ownership before stopping Jesse Vincent 2026-06-09 14:57:44 -07:00
  • 88fe1e7e15 fix(brainstorm-server): ignore macOS resource-fork dotfiles Jesse Vincent 2026-06-09 14:53:48 -07:00
  • e6c983888f chore(evals): bump submodule to SUP-333 boundary + plumbing scenarios (7f8e80c) Drew Ritter 2026-06-11 13:42:58 -07:00
  • 1280585826 chore(evals): bump submodule to SUP-333 boundary + plumbing scenarios (7f8e80c) drew/bump-evals-boundary-scenarios Drew Ritter 2026-06-11 13:42:58 -07:00
  • 35464d67c0 E27 stack: conditional impl tier + final-review tier pin + narration recipe + terse reviewer contract sdd-e27-stack Jesse Vincent 2026-06-10 23:34:18 -07:00
  • 90b5433f59 E03: cheapest-tier implementers when plan carries complete code (transcription hypothesis) Jesse Vincent 2026-06-10 22:13:18 -07:00
  • 420c234a2c Bump evals submodule: E29-E34 quality investigation + L2 gate results (af05326) Jesse Vincent 2026-06-11 13:17:09 -07:00
  • d1fcc9889a Strict-cost spec: L2 final — died at gates; explicit escalation holds at sonnet, implicit adjudication does not Jesse Vincent 2026-06-11 13:11:32 -07:00
  • 74f85a7709 fix(writing-skills): hang backfire mechanism on the separated prohibition-vs-recipe comparison (NEW-4); control comparison stated as trend Jesse Vincent 2026-06-11 11:30:31 -07:00
  • b148b648eb fix(writing-skills): scope empirical claims, honest noise reporting, conditionalize micro-test checklist line Jesse Vincent 2026-06-11 11:10:33 -07:00
  • 3e565ca2ad feat(writing-skills): form-selection table + micro-test wording method Jesse Vincent 2026-06-11 10:20:24 -07:00
  • ac11700642 Bump evals submodule: L1 elicitation + autoresearch scenarios and logs (649b1f8) Jesse Vincent 2026-06-11 11:37:41 -07:00
  • 710f031ad0 writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks Jesse Vincent 2026-06-10 20:44:48 -07:00
  • 72cb21b82c Constraints block is the reviewer's attention lens: copy spec verbatim, never improvise process rules Jesse Vincent 2026-06-11 10:31:48 -07:00
  • 5c3af5f195 fix(skills): brainstorming gate exempts nothing-to-design requests; description exceptions are authoritative (SUP-333 C) drew/sup-333-3-brainstorming-triviality-gate Drew Ritter 2026-06-10 23:48:44 -07:00
  • f9d11b3c2f fix(skills): SDD review fanout scales with the change (SUP-333 B) drew/sup-333-2-sdd-proportionality Drew Ritter 2026-06-10 23:47:45 -07:00
  • e5f337b89e fix(skills): plans reference the spec instead of restating it — end to end (SUP-333 A) drew/sup-333-1-plans-reference-spec Drew Ritter 2026-06-10 23:45:30 -07:00
  • de1d35e5e7 Strict-cost spec: L1 final — cost win re-attributed to complete-code plans; guidance owns fidelity/variance Jesse Vincent 2026-06-10 21:44:23 -07:00
  • ec014e7a7f Bump evals submodule to merged superpowers-evals main (ac264b1) Jesse Vincent 2026-06-10 19:39:02 -07:00
  • eba16f6b91 Strict-cost spec: L2 recon n=2 (sonnet controller $6.68/$8.05, judgment clean, escalation points unstressed) Jesse Vincent 2026-06-10 17:11:26 -07:00
  • 27788fdef9 Strict-cost spec: record batch A-E rung verdicts (L1 validated, L2 recon positive, L3 dead) Jesse Vincent 2026-06-10 16:59:43 -07:00
  • 0cb1960068 chore(evals): bump submodule for Claude Haiku target Drew Ritter 2026-06-10 16:13:55 -07:00
  • 5cd1a9d5f2 chore(evals): bump submodule for Claude Haiku target codex/pri-2158-bump-evals-submodule Drew Ritter 2026-06-10 16:13:55 -07:00
  • 9a25a75bac Spec: strict-cost SDD experiment ladder — judgment as co-invariant, plan-side crispness first Jesse Vincent 2026-06-10 14:35:00 -07:00
  • 60fa4f6fc4 Record writing-plans micro-test result: resolved, no change needed Jesse Vincent 2026-06-10 14:31:50 -07:00
  • 43a6ee23f7 Spec: record iterations 4-5 (variance honesty, structural fixes, final validated ranges) Jesse Vincent 2026-06-10 13:08:40 -07:00
  • fe90d6c469 Adopt audited positive phrasings: evidence rule leads positive; fix-report completeness as checklist Jesse Vincent 2026-06-10 13:08:19 -07:00
  • b81f35bb1e Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget Jesse Vincent 2026-06-10 13:08:06 -07:00
  • 926096a1d7 Spec: positive-instruction redesign — audit results, micro-test method, writing-plans variants Jesse Vincent 2026-06-10 12:32:06 -07:00
  • a995af2e24 Shared: unique review-package collateral names Jesse Vincent 2026-06-10 09:39:21 -07:00
  • d4dbf44162 Add review-package script; close fix-dispatch test gap Jesse Vincent 2026-06-10 08:51:16 -07:00
  • 2434ef7f35 Describe the review design as current state, not as a delta Jesse Vincent 2026-06-10 08:28:28 -07:00
  • 7cf78437e2 Spec: record iterations 2-3 results and final frozen-config matrix Jesse Vincent 2026-06-10 05:06:59 -07:00
  • e355795625 Hand reviewers the diff as a file, not a paste Jesse Vincent 2026-06-10 03:44:19 -07:00
  • 29ee4e8e44 Reviewer skepticism covers the implementer's design rationales Jesse Vincent 2026-06-10 02:20:28 -07:00
  • 28498a5cde Make diff-pasting non-optional for task reviewer dispatch Jesse Vincent 2026-06-10 02:10:34 -07:00
  • 5e2907fc4f Close the Minor-severity escape hatch Jesse Vincent 2026-06-10 02:09:10 -07:00
  • e532f24df7 Spec: document cost iterations and the per-task review consolidation Jesse Vincent 2026-06-09 23:59:22 -07:00
  • e3c74fc1c9 Merge per-task reviews into one task reviewer (iteration 2) Jesse Vincent 2026-06-09 23:58:28 -07:00
  • 3e3e1e701e Cut review-cost drivers: turn-aware models, inline diffs, scoped evidence Jesse Vincent 2026-06-09 22:42:54 -07:00
  • 853396e3ae Add phrase-level pre-judging triggers to reviewer prompt rule Jesse Vincent 2026-06-09 21:49:51 -07:00
  • 83d54f7ddd Red Flags: never tell a reviewer what not to flag or pre-rate severity Jesse Vincent 2026-06-09 21:47:41 -07:00
  • c7900f1698 Close three review blind spots found by defect tracing Jesse Vincent 2026-06-09 21:19:08 -07:00