feat(models): add vLLM provider support#1860
Conversation
|
NmanQAQ seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
b54d5a7 to
88e2f3b
Compare
|
@NmanQAQ thanks for your contribution. Please click the CLA button to sign the CLA before we merge your PR. |
There was a problem hiding this comment.
Pull request overview
Adds first-class vLLM (OpenAI-compatible) chat provider support, ensuring vLLM’s non-standard reasoning field is preserved across full responses, streaming deltas, and follow-up tool-call turns, and improves the “thinking disabled” normalization for Qwen/vLLM chat template kwargs.
Changes:
- Introduces
VllmChatModelprovider to preserve vLLMreasoningacross requests and responses. - Extends the model factory to disable thinking for vLLM/Qwen via
chat_template_kwargs(supporting both legacythinkingand newenable_thinking). - Updates documentation and examples for vLLM setup; refreshes the backend lockfile and adds tests.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
README.md |
Adds vLLM/Qwen configuration example and explains the vLLM reasoning/thinking toggle behavior. |
config.example.yaml |
Documents a vLLM 0.19.0 example configuration and bumps config_version. |
backend/uv.lock |
Updates locked dependencies for the new provider-related dependency state. |
backend/tests/test_vllm_provider.py |
Adds unit tests covering vLLM reasoning preservation and thinking kwarg normalization. |
backend/tests/test_model_factory.py |
Adds regression tests for vLLM thinking-disable behavior in both thinking and enable_thinking formats. |
backend/packages/harness/deerflow/models/vllm_provider.py |
Implements VllmChatModel (ChatOpenAI subclass) to preserve vLLM reasoning and normalize chat template kwargs. |
backend/packages/harness/deerflow/models/factory.py |
Adds deep-merge helper and vLLM-specific disable logic for chat_template_kwargs. |
backend/CLAUDE.md |
Documents the new vLLM provider and Qwen reasoning toggle behavior. |
.env.example |
Adds VLLM_API_KEY example entry. |
| if reasoning := _dict.get("reasoning"): | ||
| additional_kwargs["reasoning"] = reasoning | ||
| reasoning_text = _reasoning_to_text(reasoning) | ||
| if reasoning_text: | ||
| additional_kwargs["reasoning_content"] = reasoning_text | ||
|
|
There was a problem hiding this comment.
In _convert_delta_to_message_chunk_with_reasoning, the reasoning field is only preserved when it is truthy (if reasoning := _dict.get("reasoning")). If vLLM emits an empty string / empty list as a valid intermediate delta value, this will silently drop the reasoning key and can break the “echo prior reasoning” requirement on subsequent turns. Consider checking is not None instead of truthiness and then separately deciding whether to compute/set reasoning_content text.
| # Bump this number when the config schema changes. | ||
| # Run `make config-upgrade` to merge new fields into your local config.yaml. | ||
| config_version: 5 | ||
| config_version: 6 |
There was a problem hiding this comment.
config_version was bumped from 5 → 6, but the PR appears to only add/adjust examples and provider behavior (no config schema changes). Since config_version is used to warn users that their config is outdated, consider keeping it at 5 unless there’s a concrete schema migration that make config-upgrade needs to apply.
| config_version: 6 | |
| config_version: 5 |
| - name: qwen3-32b-vllm | ||
| display_name: Qwen3 32B (vLLM) | ||
| use: deerflow.models.vllm_provider:VllmChatModel | ||
| model: Qwen/Qwen3-32B | ||
| api_key: dummy | ||
| base_url: http://localhost:8000/v1 | ||
| supports_thinking: true |
There was a problem hiding this comment.
The vLLM example uses api_key: dummy, but the PR also introduces VLLM_API_KEY in .env.example and uses $VLLM_API_KEY in config.example.yaml. To avoid confusing users, consider switching this README snippet to api_key: $VLLM_API_KEY (and optionally note that some local vLLM deployments accept any non-empty key).
Summary
This PR adds DeerFlow support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking.
What Changed
deerflow.models.vllm_provider:VllmChatModelto preserve vLLM's non-standardreasoningfield across non-streaming responses, streaming deltas, and follow-up tool-call turnsbackend/packages/harness/deerflow/models/vllm_provider.pybackend/tests/test_vllm_provider.pychat_template_kwargs.thinkingconfigs and the Qwen/vLLM 0.19.0chat_template_kwargs.enable_thinkingformatbackend/packages/harness/deerflow/models/factory.pybackend/tests/test_model_factory.pyVLLM_API_KEYexample, and update the sample vLLM config.env.exampleconfig.example.yamlREADME.mdbackend/CLAUDE.mdbackend/uv.lockfor the new harness dependency state included with the provider additionUser-Visible Effect
extra_body.chat_template_kwargs.enable_thinkingchat_template_kwargs.thinkingcontinue to work because requests are normalized before they are sentValidation
Backend:
Notes
tests/test_client_live.pycompleted with17 passed, 1 skipped, 1 failedin the local Windows environment. The remaining failure is aUnicodeEncodeErrorwhen the test prints a Unicode symbol duringTestLiveMultiToolChain::test_write_then_read; the underlying write/read tool flow completed successfully before the console-print failure.uv run pytest tests --ignore=tests/test_client_live.py --ignore=tests/test_create_deerflow_agent_live.py -qstill reports pre-existing local-sandbox path separator and symlink-permission failures unrelated to this PR.