fix: execute review: blocks on type: string step outputs (#350)#351
Merged
fix: execute review: blocks on type: string step outputs (#350)#351
Conversation
Previously, review: blocks declared on type: string outputs were silently dropped because the review pipeline only matched ReviewRule patterns against file paths. Authors got no warning and the review appeared correctly configured. This was especially problematic given that the job_yml DeepSchema directs authors toward type: string for transient inter-step data. String outputs with review blocks now produce synthetic ReviewTask objects with the value carried on a new ReviewTask.inline_content field. The value is rendered into the instruction file as a "Content to Review" section and mixed into the review_id content hash so each distinct value gets its own pass-cache key (cache invalidates on value change, persists when unchanged). Adds: - ReviewTask.inline_content field - build_string_output_review_tasks() in quality_gate, wired into run_quality_gate - Shared _build_preamble() helper to deduplicate common_job_info+inputs logic - Requirements: JOBS-REQ-004.8, REVIEW-REQ-005.1.8, REVIEW-REQ-009.1.7 - 18 new tests (unit + end-to-end via run_quality_gate) Fixes #350 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iles") Follow-up to the issue #350 string-output review fix. The formatter's _task_name() used len(files_to_review) to describe task scope, which fell through to the multi-file branch for inline-content tasks and produced "review of 0 files" — cosmetic wart visible in the parallel-task listing. Also adds a bespoke test_job (.deepwork/jobs/test_job/) that exercises the full quality gate end-to-end with four outputs covering every review-level combination: file_path + output-ref review, string + output-ref review (NEW), string + arg-level review (NEW, inherited), and string + no review (control). Used as a manual smoke test when verifying the fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI runs `ruff format --check`, which my local `uv run ruff check` calls did not exercise (check runs the linter, not the formatter). Two lines I had manually broken up were re-joined by ruff format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…o-fix)
Previously `make lint` ran `ruff format` and `ruff check --fix` — both in
auto-fix mode — so it silently mutated files instead of reporting issues.
CI, on the other hand, runs `ruff format --check` and `ruff check` in
check-only mode, which is what actually fails a PR. That asymmetry meant a
clean local `make lint` gave no guarantee that CI would pass: running it
would just reformat files in-place, and the developer might miss the
uncommitted changes.
Splits into two targets:
- `make lint` now mirrors CI (ruff format --check, ruff check, mypy).
A clean run guarantees the CI Lint job will pass.
- `make lint-fix` is the previous behavior (auto-fix formatter + linter)
for convenient local cleanup.
Caught while working on #350 — my local `ruff check` calls did not run the
formatter at all, and even `make lint` would have only silently reformatted
the files, leaving a surprise CI failure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
review:blocks ontype: stringstep outputs were silently ignored because the review pipeline only matchedReviewRulepatterns against file paths.ReviewTaskobjects carrying the value on a newReviewTask.inline_contentfield, rendered into the instruction file as a "Content to Review" section.Why this matters
The
job_ymlDeepSchema actively directs authors towardtype: stringfor transient inter-step data (no-temporary-files-for-transient-data). Authors following that guidance and also declaring areview:block on the string output were getting a misconfigured-but-silent quality gate. This fix closes that loophole so narrative summaries, counts, and status reports declared on string outputs are actually reviewed.What changed
ReviewTask.inline_content: str | Nonefieldbuild_string_output_review_tasks()inquality_gate.py, wired intorun_quality_gate_build_preamble()helper (deduplicatescommon_job_info+ inputs logic between file and string paths)compute_review_id/_paths_component/_content_hashhonorinline_contentbuild_instruction_fileomits "Files to Review" for inline tasks and emits "Content to Review"format_for_claudetask names readreview of inline contentinstead ofreview of 0 files(follow-up commit)JOBS-REQ-004.8,REVIEW-REQ-005.1.8,REVIEW-REQ-009.1.7run_quality_gate, plus formatter rendering).deepwork/jobs/test_job/exercising all four review-level combinations (file_path + output-ref review, string + output-ref review, string + arg-level review, string + no review)Test plan
uv run pytest— 1250 passed, 1 skipped, 1 xfaileduv run ruff checkon changed files — cleanuv run mypyon changed source files — clean.deepwork/jobs/test_job/test_review_systemworkflow — produced 3 review tasks (file_path + 2 string outputs) plus the control string output correctly skipped. Verified "Content to Review" sections contain the actual string values and distinct review_id hashes for the two inline tasks./deepwork reviewon the branch — 23 review tasks all passed cleanly (python_code_review, test_file_quality, requirements_file DeepSchema, CHANGELOG DeepSchema, python_lint, requirements_traceability, etc.)🤖 Generated with Claude Code