feat(models): add vLLM provider support by NmanQAQ · Pull Request #1860 · bytedance/deer-flow

NmanQAQ · 2026-04-04T16:26:04Z

Summary

This PR adds DeerFlow support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking.

What Changed

Add deerflow.models.vllm_provider:VllmChatModel to preserve vLLM's non-standard reasoning field across non-streaming responses, streaming deltas, and follow-up tool-call turns
- backend/packages/harness/deerflow/models/vllm_provider.py
- backend/tests/test_vllm_provider.py
Extend the model factory so vLLM thinking-disable logic handles both legacy chat_template_kwargs.thinking configs and the Qwen/vLLM 0.19.0 chat_template_kwargs.enable_thinking format
- backend/packages/harness/deerflow/models/factory.py
- backend/tests/test_model_factory.py
Document the Qwen-specific toggle, add a VLLM_API_KEY example, and update the sample vLLM config
- .env.example
- config.example.yaml
- README.md
- backend/CLAUDE.md
Refresh backend/uv.lock for the new harness dependency state included with the provider addition

User-Visible Effect

DeerFlow can be configured against vLLM 0.19.0 using a first-class provider instead of a generic OpenAI adapter
vLLM reasoning content is preserved across tool-call turns instead of being dropped by the default LangChain OpenAI conversion path
For Qwen-style vLLM models, switching DeerFlow to flash mode now turns thinking off reliably via extra_body.chat_template_kwargs.enable_thinking
Existing configs that still use chat_template_kwargs.thinking continue to work because requests are normalized before they are sent

Validation

Backend:

cd backend
uvx ruff check .
uvx ruff format --check .
uv run --with pytest python -m pytest tests/test_vllm_provider.py tests/test_model_factory.py -q
uv run pytest tests/test_client_live.py -v -s

Notes

tests/test_client_live.py completed with 17 passed, 1 skipped, 1 failed in the local Windows environment. The remaining failure is a UnicodeEncodeError when the test prints a Unicode symbol during TestLiveMultiToolChain::test_write_then_read; the underlying write/read tool flow completed successfully before the console-print failure.
A broader local Windows run of uv run pytest tests --ignore=tests/test_client_live.py --ignore=tests/test_create_deerflow_agent_live.py -q still reports pre-existing local-sandbox path separator and symlink-permission failures unrelated to this PR.

CLAassistant · 2026-04-04T16:37:50Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ WillemJiang
❌ NmanQAQ

NmanQAQ seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

WillemJiang · 2026-04-05T02:14:46Z

@NmanQAQ thanks for your contribution. Please click the CLA button to sign the CLA before we merge your PR.

Copilot

Pull request overview

Adds first-class vLLM (OpenAI-compatible) chat provider support, ensuring vLLM’s non-standard reasoning field is preserved across full responses, streaming deltas, and follow-up tool-call turns, and improves the “thinking disabled” normalization for Qwen/vLLM chat template kwargs.

Changes:

Introduces VllmChatModel provider to preserve vLLM reasoning across requests and responses.
Extends the model factory to disable thinking for vLLM/Qwen via chat_template_kwargs (supporting both legacy thinking and new enable_thinking).
Updates documentation and examples for vLLM setup; refreshes the backend lockfile and adds tests.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`README.md`	Adds vLLM/Qwen configuration example and explains the vLLM reasoning/thinking toggle behavior.
`config.example.yaml`	Documents a vLLM 0.19.0 example configuration and bumps `config_version`.
`backend/uv.lock`	Updates locked dependencies for the new provider-related dependency state.
`backend/tests/test_vllm_provider.py`	Adds unit tests covering vLLM reasoning preservation and thinking kwarg normalization.
`backend/tests/test_model_factory.py`	Adds regression tests for vLLM thinking-disable behavior in both `thinking` and `enable_thinking` formats.
`backend/packages/harness/deerflow/models/vllm_provider.py`	Implements `VllmChatModel` (ChatOpenAI subclass) to preserve vLLM `reasoning` and normalize chat template kwargs.
`backend/packages/harness/deerflow/models/factory.py`	Adds deep-merge helper and vLLM-specific disable logic for `chat_template_kwargs`.
`backend/CLAUDE.md`	Documents the new vLLM provider and Qwen reasoning toggle behavior.
`.env.example`	Adds `VLLM_API_KEY` example entry.

Copilot · 2026-04-05T02:19:37Z

backend/packages/harness/deerflow/models/vllm_provider.py

+    if reasoning := _dict.get("reasoning"):
+        additional_kwargs["reasoning"] = reasoning
+        reasoning_text = _reasoning_to_text(reasoning)
+        if reasoning_text:
+            additional_kwargs["reasoning_content"] = reasoning_text
+


In _convert_delta_to_message_chunk_with_reasoning, the reasoning field is only preserved when it is truthy (if reasoning := _dict.get("reasoning")). If vLLM emits an empty string / empty list as a valid intermediate delta value, this will silently drop the reasoning key and can break the “echo prior reasoning” requirement on subsequent turns. Consider checking is not None instead of truthiness and then separately deciding whether to compute/set reasoning_content text.

Copilot · 2026-04-05T02:19:38Z

config.example.yaml

 # Bump this number when the config schema changes.
 # Run `make config-upgrade` to merge new fields into your local config.yaml.
-config_version: 5
+config_version: 6


config_version was bumped from 5 → 6, but the PR appears to only add/adjust examples and provider behavior (no config schema changes). Since config_version is used to warn users that their config is outdated, consider keeping it at 5 unless there’s a concrete schema migration that make config-upgrade needs to apply.

Suggested change

config_version: 6

config_version: 5

Copilot · 2026-04-05T02:19:38Z

README.md

+     - name: qwen3-32b-vllm
+       display_name: Qwen3 32B (vLLM)
+       use: deerflow.models.vllm_provider:VllmChatModel
+       model: Qwen/Qwen3-32B
+       api_key: dummy
+       base_url: http://localhost:8000/v1
+       supports_thinking: true


The vLLM example uses api_key: dummy, but the PR also introduces VLLM_API_KEY in .env.example and uses $VLLM_API_KEY in config.example.yaml. To avoid confusing users, consider switching this README snippet to api_key: $VLLM_API_KEY (and optionally note that some local vLLM deployments accept any non-empty key).

NmanQAQ mentioned this pull request Apr 4, 2026

fix(models): normalize vLLM Qwen thinking toggle #1859

Closed

fix(models): normalize vLLM Qwen thinking toggle

88e2f3b

NmanQAQ force-pushed the fix/vllm-qwen-thinking-toggle branch from b54d5a7 to 88e2f3b Compare April 5, 2026 00:09

WillemJiang requested a review from Copilot April 5, 2026 02:14

Copilot started reviewing on behalf of WillemJiang April 5, 2026 02:15 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Merge branch 'main' into fix/vllm-qwen-thinking-toggle

fcb63cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(models): add vLLM provider support#1860

feat(models): add vLLM provider support#1860
NmanQAQ wants to merge 2 commits intobytedance:mainfrom
NmanQAQ:fix/vllm-qwen-thinking-toggle

NmanQAQ commented Apr 4, 2026

Uh oh!

CLAassistant commented Apr 4, 2026 •

edited

Loading

Uh oh!

WillemJiang commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

NmanQAQ commented Apr 4, 2026

Summary

What Changed

User-Visible Effect

Validation

Notes

Uh oh!

CLAassistant commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillemJiang commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Apr 4, 2026 •

edited

Loading