From b8982a748a27d808f5034887cc4b408053f111dd Mon Sep 17 00:00:00 2001
From: doquanghuy <doquanghuy91@gmail.com>
Date: Thu, 28 May 2026 22:50:38 +0700
Subject: [PATCH] =?UTF-8?q?docs(workflows):=20clarify=20continue=5Fon=5Fer?=
 =?UTF-8?q?ror=20scope=20=E2=80=94=20returned=20FAILED=20only?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Copilot finding on d0b9e00:

The README's "Error Handling" intro implied `continue_on_error` covers
"any other runtime error raised during step execution", but the engine
only consults the flag when a step returns `StepResult(status=FAILED, ...)`.
Exceptions raised out of `step_impl.execute()` propagate to
`WorkflowEngine.execute()`, where the catch-all logs `workflow_failed`
and re-raises — the step result is never recorded, and the flag is
never consulted.

Audited the whole PR diff for the same overclaim:

1. workflows/README.md — main fix. Reworded the Error Handling intro to
   "any step that returns StepResult(status=FAILED, ...)" and promoted
   the parenthetical structural-validation note into the Notes block.
   Added a new "Scope: returned failures only" note that names the
   exception path explicitly and tells step authors how to bring the
   flag into scope for exceptional code (catch internally and return
   FAILED with the failure encoded in `output`).

2. tests/test_workflows.py — section comment used "when an executable
   step fails", same ambiguity. Tightened to "when a step returns
   StepResult(status=FAILED, ...)" and added a sentence calling out
   that unhandled exceptions are out of scope.

3. src/specify_cli/workflows/engine.py — already correct ("any step
   that returns FAILED" in the validator comment; "lets the pipeline
   route around the failure" in the execute path). No change.

Engine semantics and test bodies are unchanged. Docs-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 tests/test_workflows.py | 15 +++++++++------
 workflows/README.md     | 29 ++++++++++++++++++-----------
 2 files changed, 27 insertions(+), 17 deletions(-)
diff --git a/tests/test_workflows.py b/tests/test_workflows.py
index 0a5e61c2a..3ab8871de 100644
--- a/tests/test_workflows.py
+++ b/tests/test_workflows.py
@@ -2336,12 +2336,15 @@ steps:
 # ===== continue_on_error Tests =====
 #
 # Locks the contract documented in workflows/README.md "Error Handling"
-# section: when an executable step fails and `continue_on_error: true`
-# is declared, the engine records the step's `output` (with `exit_code`
-# and `stderr` from the failure) and its `status` (sibling key on
-# `steps.<id>`, not nested under `output`) and continues to the next
-# sibling step instead of halting the run. Gate aborts
-# (`output.aborted`) still halt regardless of the flag.
+# section: when a step returns `StepResult(status=FAILED, ...)` and
+# `continue_on_error: true` is declared, the engine records the step's
+# `output` (with `exit_code` and `stderr` from the failure) and its
+# `status` (sibling key on `steps.<id>`, not nested under `output`)
+# and continues to the next sibling step instead of halting the run.
+# Gate aborts (`output.aborted`) still halt regardless of the flag.
+# Unhandled exceptions raised out of `step_impl.execute()` are out of
+# scope for this flag — they propagate to `WorkflowEngine.execute()`
+# and abort the run.
 
 
 class TestContinueOnError:
diff --git a/workflows/README.md b/workflows/README.md
index 40c593fb6..b0b7050fe 100644
--- a/workflows/README.md
+++ b/workflows/README.md
@@ -221,17 +221,13 @@ Aggregate results from fan-out steps:
 
 ## Error Handling
 
-By default, any step that ends in `StepStatus.FAILED` at runtime halts
-the entire run — most commonly a `shell` or `command` step exiting
-non-zero, but also any other runtime error raised during step
-execution. (Invalid workflow definitions are rejected up-front by
-`specify workflow run` before the run even starts, so structural
-validation failures never reach this code path.) Set
-`continue_on_error: true` on a step to record its result and continue
-to the next sibling step instead. When the failure was a non-zero
-exit, the exit code remains available on
-`steps.<id>.output.exit_code` so downstream `if`, `switch`, or `gate`
-steps can branch on it:
+By default, any step that returns `StepResult(status=FAILED, ...)`
+at runtime halts the entire run — most commonly a `shell` or
+`command` step exiting non-zero. Set `continue_on_error: true` on
+a step to record its result and continue to the next sibling step
+instead. When the failure was a non-zero exit, the exit code
+remains available on `steps.<id>.output.exit_code` so downstream
+`if`, `switch`, or `gate` steps can branch on it:
 
 ```yaml
 - id: heavy-thing
@@ -263,10 +259,21 @@ step in your own loop.
 
 - The field must be a literal boolean (`true` / `false`); coerced
   strings like `"true"` are rejected at validation time.
+- **Scope: returned failures only.** The flag applies to step results
+  with `status=FAILED`. Unhandled exceptions raised out of a step's
+  `execute()` method are caught one level up by `WorkflowEngine.execute()`,
+  logged as `workflow_failed`, and abort the run regardless of
+  `continue_on_error`. If a step author wants the flag to cover an
+  exceptional path, the step must catch the exception internally and
+  return `StepResult(status=FAILED, ...)` with the failure encoded in
+  `output` (e.g. `exit_code`, `stderr`, or a custom field).
 - Gate aborts (`on_reject: abort` chosen by the operator) always halt
   the run — `continue_on_error` does not override them. The flag is
   for transient/expected step failures, not for overriding deliberate
   operator decisions.
+- Structural validation runs up-front: `specify workflow run` rejects
+  invalid workflow definitions before the run is created, so
+  validation failures never reach this code path.
 - When the flag is omitted, behaviour is byte-equivalent to before
   this feature.