fix(memory): skip duplicate fact entries#1857
fix(memory): skip duplicate fact entries#1857Gujiassh wants to merge 1 commit intobytedance:mainfrom
Conversation
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
@Gujiassh, thanks for your contribution. Please resolve the conflicts from the main branch. |
There was a problem hiding this comment.
Pull request overview
Fixes long-term memory bloat by preventing duplicate fact entries (based on fact content) from being appended during memory updates, while preserving existing removal and max-facts trimming behavior.
Changes:
- Add content-based deduplication for facts during
_apply_updates(). - Introduce a small helper to canonicalize fact content for dedupe comparisons.
- Add regression tests covering existing duplicates, same-batch duplicates, and interaction with max-facts trimming.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
backend/packages/harness/deerflow/agents/memory/updater.py |
Adds content-key helper and uses it to skip adding duplicate facts during memory update application. |
backend/tests/test_memory_updater.py |
Adds focused tests validating dedupe behavior and ensuring removals/trimming still work. |
| if confidence >= config.fact_confidence_threshold: | ||
| content = fact.get("content", "") | ||
| fact_key = _fact_content_key(content) | ||
| if fact_key is not None and fact_key in existing_fact_keys: | ||
| continue | ||
|
|
||
| fact_entry = { | ||
| "id": f"fact_{uuid.uuid4().hex[:8]}", | ||
| "content": fact.get("content", ""), | ||
| "content": content, | ||
| "category": fact.get("category", "context"), |
There was a problem hiding this comment.
content is taken directly from fact.get("content") and stored back into memory without type/emptiness normalization. If the LLM returns a non-string (e.g., null/list/dict), later _strip_upload_mentions_from_memory() calls _UPLOAD_SENTENCE_RE.search() on that value and will raise a TypeError, aborting the memory update. Consider coercing to str or skipping facts whose content is not a non-empty string (e.g., strip() to empty) and using the normalized value both for dedupe and storage.
| with patch( | ||
| "deerflow.agents.memory.updater.get_memory_config", | ||
| return_value=_memory_config(max_facts=2, fact_confidence_threshold=0.7), | ||
| ): | ||
| result = updater._apply_updates(current_memory, update_data, thread_id="thread-9") |
There was a problem hiding this comment.
This test constructs a MemoryConfig with max_facts=2, but MemoryConfig.max_facts is constrained with ge=10 (see memory_config.py). The helper bypasses validation via setattr, so the test is exercising an impossible production configuration and may become brittle if assignment validation is enabled later. Suggest keeping max_facts within valid bounds (>=10) and adjusting the fixture data accordingly (e.g., generate >10 facts and assert trimming).
|
I rechecked this while resolving the main-branch conflicts and found the duplicate-fact fix is already present on current , including the content-key guard in \ and the focused dedupe regressions in . The same fix was already merged earlier as #1193, so rebasing this branch now would only replay behavior that is already upstream. Closing this one as superseded to avoid duplicate review noise. |
|
Closing as superseded by the already-merged main-branch fix and tests from #1193. |
|
Follow-up clarification: current main already contains the duplicate-fact guard in backend/packages/harness/deerflow/agents/memory/updater.py and the focused dedupe regressions in backend/tests/test_memory_updater.py. The equivalent fix was merged earlier via #1193, so this PR is being closed as superseded rather than rebased. |
Summary
End-user benefit
Without this fix, repeated memory updates can accumulate the same fact content over and over, which bloats the stored memory and degrades later memory injection quality. This change keeps the fact list concise while preserving the existing update flow.
Testing
backend/.venv/bin/python -m pytest backend/tests/test_memory_updater.py -q