Skip to content

fix: execute review: blocks on type: string step outputs (#350)#351

Merged
nhorton merged 4 commits intomainfrom
fix/string-output-reviews-issue-350
Apr 9, 2026
Merged

fix: execute review: blocks on type: string step outputs (#350)#351
nhorton merged 4 commits intomainfrom
fix/string-output-reviews-issue-350

Conversation

@nhorton
Copy link
Copy Markdown
Contributor

@nhorton nhorton commented Apr 8, 2026

Summary

  • Fixes Bug: string-type output review blocks are silently not executed #350: review: blocks on type: string step outputs were silently ignored because the review pipeline only matched ReviewRule patterns against file paths.
  • String outputs with review blocks now produce synthetic ReviewTask objects carrying the value on a new ReviewTask.inline_content field, rendered into the instruction file as a "Content to Review" section.
  • Pass-caching still works: the review_id content hash mixes in the inline value, so distinct values get distinct cache keys (cache invalidates on change, persists when unchanged).

Why this matters

The job_yml DeepSchema actively directs authors toward type: string for transient inter-step data (no-temporary-files-for-transient-data). Authors following that guidance and also declaring a review: block on the string output were getting a misconfigured-but-silent quality gate. This fix closes that loophole so narrative summaries, counts, and status reports declared on string outputs are actually reviewed.

What changed

  • ReviewTask.inline_content: str | None field
  • build_string_output_review_tasks() in quality_gate.py, wired into run_quality_gate
  • Shared _build_preamble() helper (deduplicates common_job_info + inputs logic between file and string paths)
  • compute_review_id / _paths_component / _content_hash honor inline_content
  • build_instruction_file omits "Files to Review" for inline tasks and emits "Content to Review"
  • format_for_claude task names read review of inline content instead of review of 0 files (follow-up commit)
  • New requirements: JOBS-REQ-004.8, REVIEW-REQ-005.1.8, REVIEW-REQ-009.1.7
  • 19 new tests (unit + end-to-end via run_quality_gate, plus formatter rendering)
  • Bespoke smoke-test job at .deepwork/jobs/test_job/ exercising all four review-level combinations (file_path + output-ref review, string + output-ref review, string + arg-level review, string + no review)

Test plan

  • uv run pytest — 1250 passed, 1 skipped, 1 xfailed
  • uv run ruff check on changed files — clean
  • uv run mypy on changed source files — clean
  • New end-to-end tests verify: string review produces instructions, value is inlined in the instruction file, pass-caching honors unchanged values, cache invalidates on value change
  • Manual smoke test via .deepwork/jobs/test_job/test_review_system workflow — produced 3 review tasks (file_path + 2 string outputs) plus the control string output correctly skipped. Verified "Content to Review" sections contain the actual string values and distinct review_id hashes for the two inline tasks.
  • /deepwork review on the branch — 23 review tasks all passed cleanly (python_code_review, test_file_quality, requirements_file DeepSchema, CHANGELOG DeepSchema, python_lint, requirements_traceability, etc.)

🤖 Generated with Claude Code

nhorton and others added 4 commits April 8, 2026 17:54
Previously, review: blocks declared on type: string outputs were silently
dropped because the review pipeline only matched ReviewRule patterns against
file paths. Authors got no warning and the review appeared correctly
configured. This was especially problematic given that the job_yml
DeepSchema directs authors toward type: string for transient inter-step data.

String outputs with review blocks now produce synthetic ReviewTask objects
with the value carried on a new ReviewTask.inline_content field. The value
is rendered into the instruction file as a "Content to Review" section and
mixed into the review_id content hash so each distinct value gets its own
pass-cache key (cache invalidates on value change, persists when unchanged).

Adds:
- ReviewTask.inline_content field
- build_string_output_review_tasks() in quality_gate, wired into run_quality_gate
- Shared _build_preamble() helper to deduplicate common_job_info+inputs logic
- Requirements: JOBS-REQ-004.8, REVIEW-REQ-005.1.8, REVIEW-REQ-009.1.7
- 18 new tests (unit + end-to-end via run_quality_gate)

Fixes #350

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iles")

Follow-up to the issue #350 string-output review fix. The formatter's
_task_name() used len(files_to_review) to describe task scope, which fell
through to the multi-file branch for inline-content tasks and produced
"review of 0 files" — cosmetic wart visible in the parallel-task listing.

Also adds a bespoke test_job (.deepwork/jobs/test_job/) that exercises the
full quality gate end-to-end with four outputs covering every review-level
combination: file_path + output-ref review, string + output-ref review
(NEW), string + arg-level review (NEW, inherited), and string + no review
(control). Used as a manual smoke test when verifying the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI runs `ruff format --check`, which my local `uv run ruff check` calls
did not exercise (check runs the linter, not the formatter). Two lines
I had manually broken up were re-joined by ruff format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…o-fix)

Previously `make lint` ran `ruff format` and `ruff check --fix` — both in
auto-fix mode — so it silently mutated files instead of reporting issues.
CI, on the other hand, runs `ruff format --check` and `ruff check` in
check-only mode, which is what actually fails a PR. That asymmetry meant a
clean local `make lint` gave no guarantee that CI would pass: running it
would just reformat files in-place, and the developer might miss the
uncommitted changes.

Splits into two targets:

  - `make lint` now mirrors CI (ruff format --check, ruff check, mypy).
    A clean run guarantees the CI Lint job will pass.
  - `make lint-fix` is the previous behavior (auto-fix formatter + linter)
    for convenient local cleanup.

Caught while working on #350 — my local `ruff check` calls did not run the
formatter at all, and even `make lint` would have only silently reformatted
the files, leaving a surprise CI failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nhorton nhorton added this pull request to the merge queue Apr 9, 2026
Merged via the queue into main with commit ce50d05 Apr 9, 2026
5 checks passed
@nhorton nhorton deleted the fix/string-output-reviews-issue-350 branch April 9, 2026 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: string-type output review blocks are silently not executed

1 participant