fix(memory): skip duplicate fact entries by Gujiassh · Pull Request #1857 · bytedance/deer-flow

Gujiassh · 2026-04-04T15:20:57Z

Summary

skip appending duplicate fact content when applying memory updates
preserve existing fact removal and max-facts trimming behavior
add focused regression coverage for existing duplicates, same-batch duplicates, and max-facts trimming

End-user benefit

Without this fix, repeated memory updates can accumulate the same fact content over and over, which bloats the stored memory and degrades later memory injection quality. This change keeps the fact list concise while preserving the existing update flow.

Testing

backend/.venv/bin/python -m pytest backend/tests/test_memory_updater.py -q

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

WillemJiang · 2026-04-05T03:11:47Z

@Gujiassh, thanks for your contribution. Please resolve the conflicts from the main branch.

Copilot

Pull request overview

Fixes long-term memory bloat by preventing duplicate fact entries (based on fact content) from being appended during memory updates, while preserving existing removal and max-facts trimming behavior.

Changes:

Add content-based deduplication for facts during _apply_updates().
Introduce a small helper to canonicalize fact content for dedupe comparisons.
Add regression tests covering existing duplicates, same-batch duplicates, and interaction with max-facts trimming.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`backend/packages/harness/deerflow/agents/memory/updater.py`	Adds content-key helper and uses it to skip adding duplicate facts during memory update application.
`backend/tests/test_memory_updater.py`	Adds focused tests validating dedupe behavior and ensuring removals/trimming still work.

Copilot · 2026-04-05T03:13:39Z

backend/packages/harness/deerflow/agents/memory/updater.py

            if confidence >= config.fact_confidence_threshold:
+                content = fact.get("content", "")
+                fact_key = _fact_content_key(content)
+                if fact_key is not None and fact_key in existing_fact_keys:
+                    continue
+
                fact_entry = {
                    "id": f"fact_{uuid.uuid4().hex[:8]}",
-                    "content": fact.get("content", ""),
+                    "content": content,
                    "category": fact.get("category", "context"),


content is taken directly from fact.get("content") and stored back into memory without type/emptiness normalization. If the LLM returns a non-string (e.g., null/list/dict), later _strip_upload_mentions_from_memory() calls _UPLOAD_SENTENCE_RE.search() on that value and will raise a TypeError, aborting the memory update. Consider coercing to str or skipping facts whose content is not a non-empty string (e.g., strip() to empty) and using the normalized value both for dedupe and storage.

Copilot · 2026-04-05T03:13:40Z

backend/tests/test_memory_updater.py

+    with patch(
+        "deerflow.agents.memory.updater.get_memory_config",
+        return_value=_memory_config(max_facts=2, fact_confidence_threshold=0.7),
+    ):
+        result = updater._apply_updates(current_memory, update_data, thread_id="thread-9")


This test constructs a MemoryConfig with max_facts=2, but MemoryConfig.max_facts is constrained with ge=10 (see memory_config.py). The helper bypasses validation via setattr, so the test is exercising an impossible production configuration and may become brittle if assignment validation is enabled later. Suggest keeping max_facts within valid bounds (>=10) and adjusting the fixture data accordingly (e.g., generate >10 facts and assert trimming).

Gujiassh · 2026-04-05T06:36:16Z

I rechecked this while resolving the main-branch conflicts and found the duplicate-fact fix is already present on current , including the content-key guard in \ and the focused dedupe regressions in . The same fix was already merged earlier as #1193, so rebasing this branch now would only replay behavior that is already upstream. Closing this one as superseded to avoid duplicate review noise.

Gujiassh · 2026-04-05T06:36:18Z

Closing as superseded by the already-merged main-branch fix and tests from #1193.

Gujiassh · 2026-04-05T06:36:56Z

Follow-up clarification: current main already contains the duplicate-fact guard in backend/packages/harness/deerflow/agents/memory/updater.py and the focused dedupe regressions in backend/tests/test_memory_updater.py. The equivalent fix was merged earlier via #1193, so this PR is being closed as superseded rather than rebased.

fix(harness): skip duplicate memory facts

2b4e59c

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

WillemJiang requested a review from Copilot April 5, 2026 03:10

Copilot started reviewing on behalf of WillemJiang April 5, 2026 03:11 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Gujiassh closed this Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory): skip duplicate fact entries#1857

fix(memory): skip duplicate fact entries#1857
Gujiassh wants to merge 1 commit intobytedance:mainfrom
Gujiassh:fix/memory-fact-dedup-clean

Gujiassh commented Apr 4, 2026

Uh oh!

WillemJiang commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

Gujiassh commented Apr 5, 2026

Uh oh!

Gujiassh commented Apr 5, 2026

Uh oh!

Gujiassh commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Gujiassh commented Apr 4, 2026

Summary

End-user benefit

Testing

Uh oh!

WillemJiang commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Gujiassh commented Apr 5, 2026

Uh oh!

Gujiassh commented Apr 5, 2026

Uh oh!

Gujiassh commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants