Commit Graph

  • d7f47d350a Phase E: action-language tool vocabulary Jesse Vincent 2026-05-05 18:26:21 -07:00
  • 6ec8686477 Phase D: cross-runtime tweaks (visual-companion, executing-plans, test) Jesse Vincent 2026-05-05 18:26:01 -07:00
  • 1681f58a3f Phase C: alphabetize README platform listings + spec Jesse Vincent 2026-05-05 18:25:44 -07:00
  • 6b9f1b214a Phase B: config-file refs + per-platform tool refs + spec Jesse Vincent 2026-05-05 18:25:31 -07:00
  • f0e5117fa6 Phase A: agent-neutral prose + CSO → SDO + spec Jesse Vincent 2026-05-05 18:25:12 -07:00
  • 741c232768 Move eval harness to submodule (#1541) Drew Ritter 2026-05-13 12:25:41 -07:00
  • 9ea7e2b6cb fix(tdd): link testing anti-patterns reference (#1532) Drew Ritter 2026-05-12 17:22:42 -07:00
  • 0fad59e91f [codex] replace Circle K signal with generic review guidance (#1531) Drew Ritter 2026-05-12 17:22:19 -07:00
  • d00f4ad442 fix: remove global worktree path fallback (#1476) Drew Ritter 2026-05-12 10:24:45 -07:00
  • ce95985094 fix(using-git-worktrees): repair skipped Step 2 numbering (#1522) Drew Ritter 2026-05-11 17:50:01 -07:00
  • 98e39bd9e4 fix: remove stale Cursor plugin refs fuleinist 2026-05-12 00:10:05 +08:00
  • fb1dfe9a16 fix(writing-skills): use markdown link for testing methodology reference Stable Genius 2026-03-05 19:05:33 -08:00
  • bc2558c3f9 evals: use pre-commit hooks Drew Ritter 2026-05-06 15:41:52 -07:00
  • 9efbb7dd0d evals: add Gemini 2.5 Flash backend Drew Ritter 2026-05-06 15:09:59 -07:00
  • f7705f208e evals: drop drill source marker Drew Ritter 2026-05-06 14:55:14 -07:00
  • 74cddb5575 evals: remove unreleased wave scenarios Drew Ritter 2026-05-06 14:43:08 -07:00
  • a325106502 Address adversarial review findings Jesse Vincent 2026-05-06 12:41:28 -07:00
  • 0e7b967e69 docs: introduce evals/ as the canonical skill-behavior eval harness Jesse Vincent 2026-05-06 12:33:10 -07:00
  • 342ccf61d1 docs: annotate dated artifacts referencing lifted bash tests Jesse Vincent 2026-05-06 12:32:00 -07:00
  • 315ef09ebc tests: annotate three kept bash tests with drill coverage notes Jesse Vincent 2026-05-06 12:29:59 -07:00
  • 12ef68d55e tests: remove test-requesting-code-review.sh (covered by drill code-review-catches-planted-bugs) Jesse Vincent 2026-05-06 12:28:40 -07:00
  • ea8aad8764 tests: remove test-document-review-system.sh (covered by drill spec-reviewer-catches-planted-flaws) Jesse Vincent 2026-05-06 12:28:40 -07:00
  • 1f0ad3817d tests: remove subagent-driven-dev fixtures (covered by drill sdd-go-fractals + sdd-svelte-todo) Jesse Vincent 2026-05-06 12:27:31 -07:00
  • 7fd1ac7bfc tests: remove run-claude-describes-sdd.sh (covered by drill mid-conversation-skill-invocation) Jesse Vincent 2026-05-06 12:25:46 -07:00
  • 8611a4ea97 tests: remove skill-triggering bash prompts (covered by drill triggering-* scenarios) Jesse Vincent 2026-05-06 12:24:53 -07:00
  • 09046c046b evals: drop SUPERPOWERS_ROOT setup step from README/CLAUDE Jesse Vincent 2026-05-06 12:21:35 -07:00
  • 671ec3769d evals: drop SUPERPOWERS_ROOT from codex/gemini required_env Jesse Vincent 2026-05-06 12:20:47 -07:00
  • 03cc20d3b5 evals: default SUPERPOWERS_ROOT to parent of evals/ if unset Jesse Vincent 2026-05-06 12:19:39 -07:00
  • 6bc6f2279d Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b Jesse Vincent 2026-05-06 12:15:46 -07:00
  • 1a42ead98f Plan: lift drill into superpowers as evals/ Jesse Vincent 2026-05-06 12:08:58 -07:00
  • 09d2c1d39c Spec: address adversarial review findings Jesse Vincent 2026-05-06 12:03:24 -07:00
  • bce1267adb Spec: lift drill into superpowers as evals/ Jesse Vincent 2026-05-06 11:54:12 -07:00
  • 718cb1d78c docs: turned the dash in "- Jesse" into an escape sequence (#1474) robotsnh 2026-05-06 19:22:19 +01:00
  • 8c26f9456c Draft Superpowers 6 release notes f/sp6-relnotes Jesse Vincent 2026-06-15 15:04:58 -07:00
  • 3e20a04ae5 Job posting Jesse Vincent 2026-06-15 11:46:19 -07:00
  • 8cf3900614 Job posting Jesse Vincent 2026-06-15 11:46:19 -07:00
  • 71489c8160 E37: pre-flight plan review — surface plan conflicts as one batched question before Task 1 Jesse Vincent 2026-06-11 14:54:55 -07:00
  • 97c9ea3f7d Spec: L2b tested — opus structural win, sonnet transmission+attention gap (E35/E36); bump evals to 9919b27 Jesse Vincent 2026-06-11 14:06:28 -07:00
  • afecfcd239 L2b: plan-mandated defects are findings the human adjudicates Jesse Vincent 2026-06-11 13:41:21 -07:00
  • 2989810931 E27 stack: conditional impl tier + final-review tier pin + narration recipe + terse reviewer contract Jesse Vincent 2026-06-10 23:34:18 -07:00
  • 1588b949f2 E03: cheapest-tier implementers when plan carries complete code (transcription hypothesis) Jesse Vincent 2026-06-10 22:13:18 -07:00
  • 9b6be89aea E37: pre-flight plan review — surface plan conflicts as one batched question before Task 1 sdd-l2b-plan-mandated Jesse Vincent 2026-06-11 14:54:55 -07:00
  • eafa95b437 Spec: L2b tested — opus structural win, sonnet transmission+attention gap (E35/E36); bump evals to 9919b27 Jesse Vincent 2026-06-11 14:06:28 -07:00
  • 228f2cb8a9 L2b: plan-mandated defects are findings the human adjudicates Jesse Vincent 2026-06-11 13:41:21 -07:00
  • e161df5a9b E27 stack: conditional impl tier + final-review tier pin + narration recipe + terse reviewer contract Jesse Vincent 2026-06-10 23:34:18 -07:00
  • 5e204f1128 E03: cheapest-tier implementers when plan carries complete code (transcription hypothesis) Jesse Vincent 2026-06-10 22:13:18 -07:00
  • b1eb92ea72 Strict-cost spec: L2 final — died at gates; explicit escalation holds at sonnet, implicit adjudication does not Jesse Vincent 2026-06-11 13:11:32 -07:00
  • 6e9bbb7e3e writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks Jesse Vincent 2026-06-10 20:44:48 -07:00
  • fe938ac86c Constraints block is the reviewer's attention lens: copy spec verbatim, never improvise process rules Jesse Vincent 2026-06-11 10:31:48 -07:00
  • 07dec9331f Strict-cost spec: L1 final — cost win re-attributed to complete-code plans; guidance owns fidelity/variance Jesse Vincent 2026-06-10 21:44:23 -07:00
  • afcbf8bacb Strict-cost spec: L2 recon n=2 (sonnet controller $6.68/$8.05, judgment clean, escalation points unstressed) Jesse Vincent 2026-06-10 17:11:26 -07:00
  • fa14c8d671 Strict-cost spec: record batch A-E rung verdicts (L1 validated, L2 recon positive, L3 dead) Jesse Vincent 2026-06-10 16:59:43 -07:00
  • 8b76932337 Spec: strict-cost SDD experiment ladder — judgment as co-invariant, plan-side crispness first Jesse Vincent 2026-06-10 14:35:00 -07:00
  • 0702ec2c6f Record writing-plans micro-test result: resolved, no change needed Jesse Vincent 2026-06-10 14:31:50 -07:00
  • 85a9324a53 Spec: record iterations 4-5 (variance honesty, structural fixes, final validated ranges) Jesse Vincent 2026-06-10 13:08:40 -07:00
  • 610b09874e Adopt audited positive phrasings: evidence rule leads positive; fix-report completeness as checklist Jesse Vincent 2026-06-10 13:08:19 -07:00
  • 6df501ea5d Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget Jesse Vincent 2026-06-10 13:08:06 -07:00
  • 1585f40c8e Spec: positive-instruction redesign — audit results, micro-test method, writing-plans variants Jesse Vincent 2026-06-10 12:32:06 -07:00
  • 60c0b744b4 Shared: unique review-package collateral names Jesse Vincent 2026-06-10 09:39:21 -07:00
  • 6b3e4ad407 Add review-package script; close fix-dispatch test gap Jesse Vincent 2026-06-10 08:51:16 -07:00
  • a84bb0f52b Describe the review design as current state, not as a delta Jesse Vincent 2026-06-10 08:28:28 -07:00
  • 69d396a676 Spec: record iterations 2-3 results and final frozen-config matrix Jesse Vincent 2026-06-10 05:06:59 -07:00
  • e4457c970e Hand reviewers the diff as a file, not a paste Jesse Vincent 2026-06-10 03:44:19 -07:00
  • fac5888846 Reviewer skepticism covers the implementer's design rationales Jesse Vincent 2026-06-10 02:20:28 -07:00
  • 8ac14c0450 Make diff-pasting non-optional for task reviewer dispatch Jesse Vincent 2026-06-10 02:10:34 -07:00
  • 3ed554d557 Close the Minor-severity escape hatch Jesse Vincent 2026-06-10 02:09:10 -07:00
  • 4e8edca36e Spec: document cost iterations and the per-task review consolidation Jesse Vincent 2026-06-09 23:59:22 -07:00
  • d7726d99dc Merge per-task reviews into one task reviewer (iteration 2) Jesse Vincent 2026-06-09 23:58:28 -07:00
  • 4c1f1e5cc5 Cut review-cost drivers: turn-aware models, inline diffs, scoped evidence Jesse Vincent 2026-06-09 22:42:54 -07:00
  • 7288393773 Add phrase-level pre-judging triggers to reviewer prompt rule Jesse Vincent 2026-06-09 21:49:51 -07:00
  • 254a8e2e32 Red Flags: never tell a reviewer what not to flag or pre-rate severity Jesse Vincent 2026-06-09 21:47:41 -07:00
  • 7c11cee649 Close three review blind spots found by defect tracing Jesse Vincent 2026-06-09 21:19:08 -07:00
  • b36cf86afd Require explicit model on subagent dispatch Jesse Vincent 2026-06-09 21:11:45 -07:00
  • 06bec17a34 Forbid controllers pre-judging reviewer findings Jesse Vincent 2026-06-09 18:28:24 -07:00
  • 236524413b Sync plan: escaped pre() pattern in Task 5 checks block Jesse Vincent 2026-06-09 18:19:00 -07:00
  • 6e019e0316 Fix plan doc: correct Task 1 grep expectation; sync Task 5 story block Jesse Vincent 2026-06-09 17:21:06 -07:00
  • d4bb8d268f Sync plan's Task 5 blocks with review fixes Jesse Vincent 2026-06-09 17:13:03 -07:00
  • d519ba65fd SDD controller: reviewer prompt budgets, ⚠️ handling, final-review pointer, model judgment Jesse Vincent 2026-06-09 16:59:05 -07:00
  • d32a56dc32 Implementer prompt: re-run covering tests after fixing review findings Jesse Vincent 2026-06-09 16:56:28 -07:00
  • 994bc26d2a Scope spec reviewer's Your Job wording to the diff Jesse Vincent 2026-06-09 16:55:28 -07:00
  • d5850df1bc Spec reviewer: judge from the diff, grounded skepticism, ⚠️ verdict channel Jesse Vincent 2026-06-09 16:53:30 -07:00
  • b5edd40d2c Use bare placeholder names in quality reviewer prompt body Jesse Vincent 2026-06-09 16:51:54 -07:00
  • 6a02446953 Make per-task quality reviewer prompt self-contained and task-scoped Jesse Vincent 2026-06-09 16:47:27 -07:00
  • 042d238b26 Add implementation plan for task-scoped review dispatch Jesse Vincent 2026-06-09 16:42:50 -07:00
  • cf81ad2ac3 Harden review-dispatch spec per adversarial review findings Jesse Vincent 2026-06-09 16:33:44 -07:00
  • cb0dbeb095 Add design spec: task-scoped review dispatch for SDD Jesse Vincent 2026-06-09 16:26:00 -07:00
  • db6077bb21 Strict-cost spec: L2 final — died at gates; explicit escalation holds at sonnet, implicit adjudication does not sdd-review-dispatch Jesse Vincent 2026-06-11 13:11:32 -07:00
  • 65e702f92a writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks Jesse Vincent 2026-06-10 20:44:48 -07:00
  • 7f126acda6 Constraints block is the reviewer's attention lens: copy spec verbatim, never improvise process rules Jesse Vincent 2026-06-11 10:31:48 -07:00
  • 06f7789487 Strict-cost spec: L1 final — cost win re-attributed to complete-code plans; guidance owns fidelity/variance Jesse Vincent 2026-06-10 21:44:23 -07:00
  • 330aba6dd6 Strict-cost spec: L2 recon n=2 (sonnet controller $6.68/$8.05, judgment clean, escalation points unstressed) Jesse Vincent 2026-06-10 17:11:26 -07:00
  • 5b9eb20f76 Strict-cost spec: record batch A-E rung verdicts (L1 validated, L2 recon positive, L3 dead) Jesse Vincent 2026-06-10 16:59:43 -07:00
  • 7e421713ac Spec: strict-cost SDD experiment ladder — judgment as co-invariant, plan-side crispness first Jesse Vincent 2026-06-10 14:35:00 -07:00
  • 8476908a1b Record writing-plans micro-test result: resolved, no change needed Jesse Vincent 2026-06-10 14:31:50 -07:00
  • e6118e02b9 Spec: record iterations 4-5 (variance honesty, structural fixes, final validated ranges) Jesse Vincent 2026-06-10 13:08:40 -07:00
  • 9a221229a5 Adopt audited positive phrasings: evidence rule leads positive; fix-report completeness as checklist Jesse Vincent 2026-06-10 13:08:19 -07:00
  • 7d8f0ce9e9 Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget Jesse Vincent 2026-06-10 13:08:06 -07:00
  • f37c5e5115 Spec: positive-instruction redesign — audit results, micro-test method, writing-plans variants Jesse Vincent 2026-06-10 12:32:06 -07:00
  • 618698d9b3 Shared: unique review-package collateral names Jesse Vincent 2026-06-10 09:39:21 -07:00
  • 2d6e56ee90 Add review-package script; close fix-dispatch test gap Jesse Vincent 2026-06-10 08:51:16 -07:00