Skip to content

feat(client): add create_label/create_edge_type with SchemaMode#36

Open
polaz wants to merge 67 commits intomainfrom
feat/#29-schema-api-create-label
Open

feat(client): add create_label/create_edge_type with SchemaMode#36
polaz wants to merge 67 commits intomainfrom
feat/#29-schema-api-create-label

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 13, 2026

Summary

  • Add `create_label(name, properties, *, schema_mode)` and `create_edge_type(name, properties)` to `AsyncCoordinodeClient` + sync wrappers
  • Add `create_text_index()` / `drop_text_index()` methods with `TextIndexInfo` result type; validate Cypher identifiers before DDL interpolation
  • Add `text_search()` and `hybrid_text_vector_search()` via `TextService` gRPC; clarify that a `CREATE TEXT INDEX` DDL is required before searching
  • Update `proto` submodule: `SchemaMode` enum, `ComputedPropertyDefinition`, `TextService`, `DecommissionNode` RPC
  • Bump `coordinode-rs` submodule to v0.3.14 (5fc543b); regenerate proto stubs; pin Colab notebook install to matching coordinode-python commit
  • Expose `schema_mode: int` on `LabelInfo` and `EdgeTypeInfo`
  • Rewrite all 4 demo notebooks for Google Colab with `IN_COLAB` guard + pinned `coordinode-embedded` install
  • Add `client=` parameter to `CoordinodeGraph` and `CoordinodePropertyGraphStore`; `callable(getattr())` guards for optional methods
  • Update `refresh_schema()` in `CoordinodeGraph` to use structured `get_labels()` / `get_edge_types()` API; fall back to `_parse_schema()`
  • Remove MERGE split workarounds from `graph.py` and `base.py` — `SET r += {}` supported since v0.3.12
  • Extract `_validate_property_dict` helper to reduce `_build_property_definitions` cognitive complexity below SonarCloud threshold

Changes

File Change
`coordinode/coordinode/client.py` `create_label`, `create_edge_type`, `create_text_index`, `drop_text_index`, `text_search`, `hybrid_text_vector_search`, `TextIndexInfo` type; `_validate_property_dict` helper; `_validate_cypher_identifier` for DDL safety; docstring: FTS requires explicit `CREATE TEXT INDEX` DDL
`langchain-coordinode/langchain_coordinode/graph.py` `client=` param, `callable(getattr())` guards, structured schema API, `_PROPERTY_TYPE_NAME` mapping, fallback to `_parse_schema()`; remove MERGE split workaround
`llama-index-coordinode/.../base.py` `client=` param, `callable(getattr())` guards; remove MERGE split in `upsert_relations()`
`proto` Submodule → 381dbcd (SchemaMode, ComputedPropertyDefinition, TextService, DecommissionNode); stubs regenerated
`coordinode-rs` Submodule → 5fc543b (v0.3.12-19-g5fc543b; adds percentileCont/percentileDisc query functions)
`demo/notebooks/` 4 Colab-ready notebooks pinned to `@bd8d0dc`; embedded engine bundling coordinode-rs v0.3.14
`demo/README.md` Open in Colab badge table; port 37084 metrics/health entry; embedded version note v0.3.14
`demo/Dockerfile.jupyter` New file — builds Jupyter Lab image with SDK installed
`demo/docker-compose.yml` Image → `coordinode:0.3.14`; HTTP `/health` readiness probe; Jupyter on `127.0.0.1:38888`
`docker-compose.yml` Image → `coordinode:0.3.14`
`ruff.toml` Exclude submodule + .ipynb_checkpoints; per-file-ignores for `**/*.ipynb`
`.gitignore` `**/.ipynb_checkpoints/`
`tests/unit/test_schema_crud.py` +8 tests: property validation, bool enforcement, schema_mode normalization
`tests/integration/test_sdk.py` +12 tests: round-trips, FTS (with explicit index creation/cleanup), schema_mode, Cypher identifier validation; DETACH DELETE in cleanup (replaces two-step to avoid duplicate-edge error from undirected MATCH)

Test plan

  • Unit tests: `uv run pytest tests/unit/` — 54 passed, 0 failed
  • Integration tests: `COORDINODE_ADDR=localhost:7082 uv run pytest tests/integration/` against `ghcr.io/structured-world/coordinode:0.3.14` — 41 passed, 0 failed

Closes #29

…te proto

- Update proto submodule to 381dbcd: adds SchemaMode enum (STRICT/VALIDATED/
  FLEXIBLE), ComputedPropertyDefinition, schema_mode field on CreateLabelRequest
  and Label messages
- Add AsyncCoordinodeClient.create_label(name, properties, *, schema_mode) → LabelInfo
  with full _type_map / _mode_map conversion; schema_mode defaults to 'strict'
- Add AsyncCoordinodeClient.create_edge_type(name, properties) → EdgeTypeInfo
  (proto does not expose schema_mode for edge types)
- Add sync wrappers CoordinodeClient.create_label / create_edge_type
- Expose schema_mode: int on LabelInfo and EdgeTypeInfo (getattr-safe: defaults
  to 0 for older server images that omit the field)
- Revert refresh_schema() labels(a)[0] shortcut to labels(a) AS src_labels +
  Python _first_label() helper; published Docker image does not yet support
  subscript-on-function syntax (TODO comment added)
- Add 11 unit tests (test_schema_crud.py): schema_mode propagation on LabelInfo
  and EdgeTypeInfo, missing-field default, repr coverage
- Add 6 integration tests (test_sdk.py): create_label / create_edge_type round-
  trips, schema_mode=flexible acceptance, invalid schema_mode ValueError,
  appears-in-list verification (with documented workaround for G009)
- Mark test_traverse_inbound as xfail: INBOUND direction not supported by server

Known gaps recorded in GAPS.md:
  G008 - SchemaMode enforcement not active in published image
  G009 - ListLabels/ListEdgeTypes omits schema-only objects (no nodes yet)
  G010 - labels(n)[0] subscript not in published image (workaround: _first_label)
  G011 - traverse INBOUND direction unimplemented on server
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Full-text search capabilities with configurable text indexes and fuzzy matching
    • Hybrid text-vector search combining both modalities
    • Schema management APIs for creating and managing labels and edge types with flexible/validated schema modes
    • Client dependency injection support for custom client integration
    • Comprehensive demo notebooks for LlamaIndex, LangChain, and LangGraph frameworks
  • Improvements

    • Enhanced vector search graceful fallback when unsupported
    • Improved client lifecycle management
  • Documentation

    • Added demo notebooks and setup guide for all supported frameworks

Walkthrough

Adds full-text/hybrid search and schema-registration APIs and types, Cypher identifier validation and property-definition validation, injectable client support for LangChain/LlamaIndex adapters, tests and demo notebooks/docker, dependency pins, and updated proto/submodule refs.

Changes

Cohort / File(s) Summary
Core client & public types
coordinode/coordinode/client.py, coordinode/coordinode/__init__.py
Adds text gRPC stub, Cypher identifier validation, property-definition builders/validators, new types TextResult/HybridResult/TextIndexInfo, extends LabelInfo/EdgeTypeInfo with schema_mode, async schema APIs (create_label, create_edge_type) and full-text APIs (create_text_index, drop_text_index, text_search, hybrid_text_vector_search) plus sync wrappers.
LangChain adapter
langchain-coordinode/langchain_coordinode/graph.py
Adds injectable client with _owns_client semantics; refresh_schema() prefers structured get_labels/get_edge_types() with get_schema_text() fallback, tighter parsing and property-type mapping; similarity_search() now guards missing vector_search.
LlamaIndex adapter
llama-index-coordinode/llama_index/graph_stores/coordinode/base.py
Accepts injected client, derives supports_vector_queries from client capabilities, tolerates missing get_schema_text/vector_search, adjusts vector_query()/get_schema() behavior, and unifies relation upsert to always SET r += $props.
Tests
tests/integration/test_sdk.py, tests/unit/test_schema_crud.py
Adds integration tests for schema and full-text/hybrid search (including gRPC xfail handling) and unit tests for schema_mode, _build_property_definitions() validation, and schema-mode normalization; imports updated for new types.
Demos & notebooks
demo/README.md, demo/notebooks/*, demo/docker-compose.yml
Adds demo README, four notebooks (seed, LlamaIndex, LangChain, LangGraph agent) with embedded-client adapters and cleanup; adds demo Docker Compose for Coordinode + Jupyter.
Docker & compose
docker-compose.yml, demo/docker-compose.yml
Pins Coordinode image to ghcr.io/structured-world/coordinode:0.3.14; adds demo compose with coordinode and jupyter services and healthchecks.
Config, stubs & misc
ruff.toml, .gitignore, coordinode-embedded/.../_coordinode_embedded.pyi
Ignores Jupyter checkpoints in git and ruff; notebook-specific lint ignores; fixes LocalClient.__enter__ return annotation in stub.
Dependency & submodule updates
langchain-coordinode/pyproject.toml, llama-index-coordinode/pyproject.toml, proto, coordinode-rs
Bumps coordinode min version to >=0.6.0; updates proto and Rust submodule commit refs.
Public exports
coordinode/coordinode/__init__.py
Re-exports new public types: TextResult, HybridResult, TextIndexInfo.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Client as AsyncCoordinodeClient
    participant SchemaStub as SchemaServiceStub
    participant TextStub as TextServiceStub
    participant Server as CoordiNode Server

    rect rgba(200,150,100,0.5)
    Note over App,Server: Schema Registration Flow
    App->>Client: create_label(name, properties, schema_mode)
    Client->>Client: _validate_property_dict / _build_property_definitions
    Client->>SchemaStub: CreateLabel(proto)
    SchemaStub->>Server: RPC CreateLabel
    Server-->>SchemaStub: LabelInfo proto
    SchemaStub-->>Client: proto.LabelInfo
    Client-->>App: LabelInfo(name, schema_mode)
    end

    rect rgba(100,200,150,0.5)
    Note over App,Server: Text Search Flow
    App->>Client: text_search(query, label)
    Client->>TextStub: TextSearch(proto)
    TextStub->>Server: RPC TextSearch
    Server-->>TextStub: TextSearch response
    TextStub-->>Client: proto results
    Client-->>App: list[TextResult]
    end

    rect rgba(150,100,200,0.5)
    Note over App,Server: Hybrid Search Flow
    App->>Client: hybrid_text_vector_search(text, vector, label)
    Client->>Client: normalize vector -> floats
    Client->>TextStub: HybridTextVectorSearch(proto)
    TextStub->>Server: RPC HybridTextVectorSearch
    Server-->>TextStub: Hybrid response
    TextStub-->>Client: proto results
    Client-->>App: list[HybridResult]
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR substantially addresses #29 objectives: create_label/create_edge_type with property validation and schema_mode are implemented; text-search (create_text_index, drop_text_index, text_search, hybrid_text_vector_search) is added; LabelInfo/EdgeTypeInfo expose schema_mode; dependency injection (client parameter) supports optional schema APIs. However, traverse() and TraverseResult are not implemented in this PR. Implement traverse(start_node_id, edge_type, direction, max_depth) → TraverseResult to fully close #29. Alternatively, clarify in the issue that traverse is deferred to a follow-up PR.
Out of Scope Changes check ⚠️ Warning The PR includes substantial scope expansion beyond #29: full-text search (create_text_index, drop_text_index, text_search, hybrid_text_vector_search, TextIndexInfo, TextIndexResult, HybridResult), Cypher identifier validation, dependency injection for CoordinodeGraph/CoordinodePropertyGraphStore, structured schema API integration, four demo notebooks, multiple refactoring/workaround removals, and Docker/CI updates. Clarify PR scope in the title or description, or defer full-text search, dependency injection, and demo notebooks to separate PRs focused on those features alone.
Docstring Coverage ⚠️ Warning Docstring coverage is 54.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat(client): add create_label/create_edge_type with SchemaMode' directly describes the main feature addition—schema creation APIs with SchemaMode support—which is a central part of the changeset.
Description check ✅ Passed The PR description comprehensively explains the changes: schema-creation APIs, text-search support, dependency injection, structured schema retrieval, and demo/test updates, all related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#29-schema-api-create-label

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@coordinode/coordinode/client.py`:
- Around line 418-425: The current loop building PropertyDefinition uses
_type_map.get(..., PropertyType.PROPERTY_TYPE_STRING) which silently coerces
unknown types to STRING; change this to validate the resolved type by checking
if type_str exists in _type_map and if not raise a ValueError (include the
unknown type_str and p["name"] in the message) before constructing
PropertyDefinition, using _type_map[type_str] when present; apply the same
validation and error behavior to the other occurrence that builds
PropertyDefinition (the block around the second instance noted in the comment).
- Around line 398-408: Extract the duplicated _type_map into a single shared
constant or helper (e.g., PROPERTY_TYPE_MAP or get_property_type) so both create
methods reference that single source; remove the local _type_map dicts and
import/use the shared symbol instead; ensure the mapping keys and values remain
the same (mapping to PropertyType.PROPERTY_TYPE_*) and update both places where
_type_map was used so behavior is unchanged.

In `@langchain-coordinode/langchain_coordinode/graph.py`:
- Around line 270-273: The docstring/comment claims CoordiNode nodes have
exactly one label but the API (create_node(labels: list[str]) and
NodeResult.labels: list[str]) allows multiple labels; clarify intent and enforce
or document accordingly: either enforce the single-label constraint by
validating in create_node (raise on len(labels) != 1 and store/return a
single-label invariant) and update uses that call labels()[0] (or keep min()
deterministic) OR change the comment to state "nodes typically have one label in
this application; code uses min() to deterministically pick one when multiple
labels exist" and leave create_node/NodeResult.labels as list[str]; update the
docstring in graph.py near labels() and any comment referencing min() to match
the chosen approach.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8f002175-5aaf-4cab-8d74-37209a075a63

📥 Commits

Reviewing files that changed from the base of the PR and between d477417 and b550a03.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • coordinode/coordinode/client.py
  • langchain-coordinode/langchain_coordinode/graph.py
  • proto
  • tests/integration/test_sdk.py
  • tests/unit/test_schema_crud.py

polaz added 2 commits April 13, 2026 18:18
…upport

- Add client= parameter to CoordinodeGraph.__init__ and
  CoordinodePropertyGraphStore.__init__: accepts any object with .cypher()
  instead of creating a gRPC CoordinodeClient from addr. Enables passing
  LocalClient from coordinode-embedded.
- Rewrite all 4 demo notebooks for Google Colab:
  * 00_seed_data.ipynb — seeds tech-industry knowledge graph; Colab uses
    LocalClient(":memory:") directly (same .cypher() API)
  * 01_llama_index_property_graph.ipynb — PropertyGraphStore via
    _EmbeddedAdapter wrapping LocalClient; get_schema_text() via Cypher fallback
  * 02_langchain_graph_chain.ipynb — CoordinodeGraph with client= adapter;
    OpenAI GraphCypherQAChain section is optional (auto-skips without key)
  * 03_langgraph_agent.ipynb — LocalClient direct; LangGraph agent with
    mock demo first (no LLM key needed), LLM section optional
- Each notebook: IN_COLAB detection, installs coordinode-embedded from GitHub
  source (Rust build, ~5 min first run), Open in Colab badge in first cell
- Add demo/README.md with Colab badge table for all 4 notebooks

Colab links:
  https://colab.research.google.com/github/structured-world/coordinode-python/blob/main/demo/notebooks/00_seed_data.ipynb
  https://colab.research.google.com/github/structured-world/coordinode-python/blob/main/demo/notebooks/01_llama_index_property_graph.ipynb
  https://colab.research.google.com/github/structured-world/coordinode-python/blob/main/demo/notebooks/02_langchain_graph_chain.ipynb
  https://colab.research.google.com/github/structured-world/coordinode-python/blob/main/demo/notebooks/03_langgraph_agent.ipynb
…pe validation

- Extract shared _build_property_definitions() static method: both
  create_label and create_edge_type now call it, eliminating duplicated
  _type_map and property-building loop (addresses CodeRabbit #1)
- Raise ValueError for unknown property type in both methods instead of
  silently coercing to STRING; error includes property name and valid
  options (addresses CodeRabbit #2, already in previous commit)
- Clarify _first_label() docstring: nodes "typically have one label in
  this application" rather than asserting system-level single-label
  constraint (addresses CodeRabbit #3)
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 13, 2026

CodeRabbit review addressed (3 threads):

#1 (shared _type_map): Extracted into _build_property_definitions() static method — both create_label and create_edge_type share it. Single source of truth for type validation.

#2 (unknown type coercion): Now raises ValueError with property name + valid options list instead of silently coercing to STRING. Applied via the shared helper.

#3 (_first_label docstring): Updated to say "nodes typically have one label in this application" rather than asserting a system-level constraint. The min() selection is documented as deterministic fallback when multiple labels exist.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Line 128: The adapter's no-op close() prevents LocalClient resources from
being released; update the adapter's def close(self): pass to delegate to the
embedded client by calling the LocalClient.close() (or the stored client
instance's close method) so that store.close() triggers proper cleanup of
temporary dirs and other resources; locate the adapter class with the close
method and replace the no-op with a call to the underlying client's close()
(e.g., self._client.close() or whichever attribute holds the LocalClient
instance).

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Line 129: The adapter's close() method currently no-ops and must forward to
the embedded LocalClient to avoid leaking resources; update the adapter class
(the method named close) to call the wrapped LocalClient instance's close() (the
same object created in Colab mode and stored on the adapter) and propagate any
exceptions or suppress them consistently with existing shutdown semantics so
that graph.close() triggers LocalClient.close() and cleans up DB resources and
temp dirs.

In `@demo/notebooks/03_langgraph_agent.ipynb`:
- Around line 192-198: In find_related(entity_name: str, depth: int = 1) -> str,
depth is only clamped with min(depth, 3) which allows 0 or negative values and
leads to an invalid '*1..N' range; sanitize depth by computing a bounded_depth =
max(1, min(depth, 3)) (or equivalent) before building the Cypher fragment and
use bounded_depth in the string passed to client.cypher so the relationship
range is always between 1 and 3.
- Around line 173-181: The save_fact function currently interpolates the
tool-provided relation into the Cypher identifier (in client.cypher), allowing
injection; before building the query in save_fact, validate and sanitize
relation: normalize to upper(), replace spaces with underscores, then enforce a
strict whitelist regex (e.g., only [A-Z0-9_]+) and reject or map any value that
fails (raise an exception or use a safe fallback like DEFAULT_RELATION). Use the
sanitized token when concatenating into the Cypher relationship name and keep
all other values as params (subject, obj, SESSION) passed to client.cypher to
avoid further injection risks.

In `@langchain-coordinode/langchain_coordinode/graph.py`:
- Around line 48-54: The graph wrapper currently always closes self._client in
close(), which will close caller-owned clients passed via the client parameter;
change the constructor (where client: Any = None and self._client = client if
client is not None else CoordinodeClient(...)) to record ownership (e.g.
self._owns_client = client is None) and update the close() method to only call
close on self._client when self._owns_client is True, leaving externally
provided clients open.

In `@llama-index-coordinode/llama_index/graph_stores/coordinode/base.py`:
- Around line 71-75: The constructor currently takes a `client` parameter and
always assigns it to `self._client`, but `close()` unconditionally calls
`self._client.close()`, which can close externally managed clients; modify the
initializer (the function that sets `self._client`) to record ownership (e.g.,
set `self._owns_client = True` when you create `CoordinodeClient(addr,
timeout=timeout)` and `False` when a client is injected), and update the
`close()` method to only call `self._client.close()` if `self._owns_client` is
True; reference the initializer where `self._client` is set and the `close()`
method to implement this ownership check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ca1cd5d3-f7f2-4ed9-994a-740d5bd812a8

📥 Commits

Reviewing files that changed from the base of the PR and between b550a03 and e9805bc.

📒 Files selected for processing (8)
  • coordinode/coordinode/client.py
  • demo/README.md
  • demo/notebooks/00_seed_data.ipynb
  • demo/notebooks/01_llama_index_property_graph.ipynb
  • demo/notebooks/02_langchain_graph_chain.ipynb
  • demo/notebooks/03_langgraph_agent.ipynb
  • langchain-coordinode/langchain_coordinode/graph.py
  • llama-index-coordinode/llama_index/graph_stores/coordinode/base.py

- ruff.toml: exclude coordinode-rs/ (submodule) and **/.ipynb_checkpoints/
  from lint+format; add per-file-ignores for *.ipynb (E401/E402/I001 — normal
  notebook patterns: multi-import on one line, import after pip install cell)
- Auto-format demo notebooks and coordinode-embedded .pyi stub
- Fix UP037 in _coordinode_embedded.pyi: remove quotes from type annotation
- Add .ipynb_checkpoints/ to .gitignore
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
demo/notebooks/02_langchain_graph_chain.ipynb (1)

132-133: ⚠️ Potential issue | 🟡 Minor

Delegate close() to the underlying LocalClient to release resources.

Same issue as in notebook 01 — the adapter's close() is a no-op while LocalClient.close() should be called to release database resources.

Suggested fix
     def close(self):
-        pass
+        self._lc.close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/02_langchain_graph_chain.ipynb` around lines 132 - 133, The
adapter's close() method is currently a no-op; change it to delegate to the
underlying LocalClient to release resources by calling the client's close method
(e.g., invoke LocalClient.close() on the adapter's stored client instance from
within close()). Locate the adapter class with def close(self): pass and replace
the pass with a call to the underlying client (the attribute holding the
LocalClient instance) so the adapter's close() forwards cleanup to
LocalClient.close().
demo/notebooks/01_llama_index_property_graph.ipynb (1)

131-132: ⚠️ Potential issue | 🟡 Minor

Delegate close() to the underlying LocalClient to release resources.

The adapter's close() is a no-op, but LocalClient.close() releases database resources. When store.close() is called at the end of the notebook, resources remain uncleaned.

Suggested fix
     def close(self):
-        pass
+        self._lc.close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/01_llama_index_property_graph.ipynb` around lines 131 - 132,
The adapter's close() currently does nothing; update its close() implementation
to call and await (if async) the underlying LocalClient.close() so resources are
released when store.close() is invoked; locate the adapter class that defines
def close(self): pass and replace it to delegate to the LocalClient instance
(e.g., self._local_client.close() or await self._local_client.close()) ensuring
correct sync/async semantics to mirror LocalClient.close().
demo/notebooks/03_langgraph_agent.ipynb (2)

203-212: ⚠️ Potential issue | 🟡 Minor

Clamp depth to a valid lower bound.

min(depth, 3) allows 0 or negative values, producing an invalid *1..0 or *1..-1 range that will fail.

Suggested fix
 `@tool`
 def find_related(entity_name: str, depth: int = 1) -> str:
     """Find all entities reachable from entity_name within the given number of hops (max 3)."""
+    safe_depth = max(1, min(depth, 3))
     rows = client.cypher(
-        "MATCH (n:Entity {name: $name, session: $sess})-[r*1.." + str(min(depth, 3)) + "]->(m) "
+        "MATCH (n:Entity {name: $name, session: $sess})-[r*1.." + str(safe_depth) + "]->(m) "
         "RETURN m.name AS related, type(last(r)) AS via LIMIT 20",
         params={"name": entity_name, "sess": SESSION},
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/03_langgraph_agent.ipynb` around lines 203 - 212, The function
find_related constructs a Cypher range using min(depth, 3) which can be zero or
negative; clamp depth to a valid lower bound by computing hops = max(1,
min(depth, 3)) (or equivalent) and use hops when building the pattern (e.g.,
"[r*1..{hops}]") so the range is always at least 1 and no larger than 3; update
the client.cypher call in find_related to use this hops variable when
concatenating the query string.

182-191: ⚠️ Potential issue | 🟠 Major

Sanitize relationship type before interpolating into Cypher.

The relation parameter from tool input is concatenated directly into the Cypher query. A crafted value could inject malicious Cypher (e.g., RELATED]->(x) DELETE (a) //).

Suggested fix
+import re
+
 `@tool`
 def save_fact(subject: str, relation: str, obj: str) -> str:
     """Save a fact (subject → relation → object) into the knowledge graph.
     Example: save_fact('Alice', 'WORKS_AT', 'Acme Corp')"""
+    rel_type = relation.upper().replace(" ", "_")
+    if not re.fullmatch(r'[A-Z_][A-Z0-9_]*', rel_type):
+        return f"Invalid relation type: {relation}"
     client.cypher(
         "MERGE (a:Entity {name: $s, session: $sess}) "
         "MERGE (b:Entity {name: $o, session: $sess}) "
-        "MERGE (a)-[r:" + relation.upper().replace(" ", "_") + "]->(b)",
+        f"MERGE (a)-[r:{rel_type}]->(b)",
         params={"s": subject, "o": obj, "sess": SESSION},
     )
     return f"Saved: {subject} -[{relation}]-> {obj}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/03_langgraph_agent.ipynb` around lines 182 - 191, The
save_fact function concatenates the tool-provided relation directly into the
Cypher string (see save_fact, relation, client.cypher, SESSION), which allows
Cypher injection; sanitize and validate relation before interpolation by
normalizing (e.g., relation.upper().replace(" ", "_")) and enforcing a tight
whitelist regex such as only A-Z, 0-9 and underscores (e.g., r'^[A-Z0-9_]+$');
if validation fails, raise an error and do not call client.cypher—only then
interpolate the validated relation into the MERGE relationship token.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Around line 108-139: Update the _EmbeddedAdapter class to document the stubbed
methods: add a brief note in the class docstring and/or inline comments inside
vector_search, get_labels, and get_edge_types stating these return empty results
because embedded mode does not implement vector search or label/edge
introspection; mention expected behavior (e.g., returns empty list) and point
users to the full CoordinodeClient implementation for production features so
it's clear these are intentional stubs.

---

Duplicate comments:
In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Around line 131-132: The adapter's close() currently does nothing; update its
close() implementation to call and await (if async) the underlying
LocalClient.close() so resources are released when store.close() is invoked;
locate the adapter class that defines def close(self): pass and replace it to
delegate to the LocalClient instance (e.g., self._local_client.close() or await
self._local_client.close()) ensuring correct sync/async semantics to mirror
LocalClient.close().

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Around line 132-133: The adapter's close() method is currently a no-op; change
it to delegate to the underlying LocalClient to release resources by calling the
client's close method (e.g., invoke LocalClient.close() on the adapter's stored
client instance from within close()). Locate the adapter class with def
close(self): pass and replace the pass with a call to the underlying client (the
attribute holding the LocalClient instance) so the adapter's close() forwards
cleanup to LocalClient.close().

In `@demo/notebooks/03_langgraph_agent.ipynb`:
- Around line 203-212: The function find_related constructs a Cypher range using
min(depth, 3) which can be zero or negative; clamp depth to a valid lower bound
by computing hops = max(1, min(depth, 3)) (or equivalent) and use hops when
building the pattern (e.g., "[r*1..{hops}]") so the range is always at least 1
and no larger than 3; update the client.cypher call in find_related to use this
hops variable when concatenating the query string.
- Around line 182-191: The save_fact function concatenates the tool-provided
relation directly into the Cypher string (see save_fact, relation,
client.cypher, SESSION), which allows Cypher injection; sanitize and validate
relation before interpolation by normalizing (e.g., relation.upper().replace("
", "_")) and enforcing a tight whitelist regex such as only A-Z, 0-9 and
underscores (e.g., r'^[A-Z0-9_]+$'); if validation fails, raise an error and do
not call client.cypher—only then interpolate the validated relation into the
MERGE relationship token.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 62214ae0-ea5f-4ac7-bc46-36478c863db6

📥 Commits

Reviewing files that changed from the base of the PR and between e9805bc and fc8f53b.

📒 Files selected for processing (7)
  • .gitignore
  • coordinode-embedded/python/coordinode_embedded/_coordinode_embedded.pyi
  • demo/notebooks/00_seed_data.ipynb
  • demo/notebooks/01_llama_index_property_graph.ipynb
  • demo/notebooks/02_langchain_graph_chain.ipynb
  • demo/notebooks/03_langgraph_agent.ipynb
  • ruff.toml

polaz added 2 commits April 13, 2026 20:15
…ebooks

- graph.py, base.py: add _owns_client flag — close() skips shutdown for
  externally-injected clients (client= kwarg); only closes clients it created
- notebooks 01/02: _EmbeddedAdapter.close() delegates to self._lc.close()
  instead of being a no-op
- notebook 03: validate rel_type against _REL_TYPE_RE before Cypher
  interpolation; clamp find_related depth to max(1, min(int(depth), 3))
vector_search(), get_labels(), get_edge_types() return empty results in
embedded mode — add inline comments explaining this limitation so users
understand the behavior without reading server-side code
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 13, 2026

Fixed in commit 792afcc: _EmbeddedAdapter.close() now delegates to self._lc.close() in notebooks 01 and 02. Thread #5 (notebook 02) fixed in the same commit.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Line 228: The notebook creates a CoordinodePropertyGraphStore with an injected
client (CoordinodePropertyGraphStore(client=client)), but store.close() won’t
close that injected client under the current ownership model; update the cleanup
to explicitly call client.close() after store.close() (or instead of relying on
store.close()) so the underlying connection is properly closed—ensure you modify
the cleanup block that calls store.close() (around the store creation/cleanup
code) to add client.close() as well.

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Line 231: Because the notebook injects an external client into CoordinodeGraph
(graph = CoordinodeGraph(client=client)), graph.close() will not close that
injected client; update the cleanup to explicitly call client.close() in
addition to graph.close() (e.g., after using graph, call graph.close() then
client.close()) so the injected client's resources are released; reference
CoordinodeGraph, graph, client, graph.close(), and client.close() when making
the change.

In `@demo/notebooks/03_langgraph_agent.ipynb`:
- Around line 134-150: The notebook starts a detached Docker container (proc ->
container_id, COORDINODE_ADDR) but never guarantees it is stopped on errors or
when re-running; wrap the docker run + port-wait logic in a try/finally (or
register an atexit handler) and in the finally call
subprocess.run(["docker","stop", container_id]) (checking container_id is set
and not empty) so the container is always stopped on exit or error; also update
the existing cleanup at the later block that attempted to stop the container
(lines referencing container_id/COORDINODE_ADDR) to reuse the same container_id
variable or guard it and call docker stop there if the try/finally path didn’t
run.
- Around line 201-205: query_facts currently executes arbitrary Cypher
(client.cypher) which allows destructive or cross-session operations; change it
to reject or sanitize input by enforcing read-only queries and session scoping:
validate the cypher string in query_facts to disallow dangerous
keywords/constructs (e.g., CREATE, DELETE, MERGE, CALL, LOAD, DROP, SET, ';')
and require that the query includes or is transformed to include a session
filter (params={"sess": SESSION} and WHERE n.session = $sess) or only allow a
small whitelist of safe patterns (e.g., STARTS WITH "MATCH" and no mutation
tokens); if validation fails, return an error instead of executing the query.
Ensure you reference and update the query_facts function and its use of
client.cypher to implement this validation/sanitization and fail-safe behavior.

In `@langchain-coordinode/langchain_coordinode/graph.py`:
- Around line 80-83: Replace the inline TODO at the comment above rows =
self._client.cypher with a non-TODO note or a tracked-issue reference: update
the comment that currently reads "TODO: simplify to labels(a)[0] once
subscript-on-function is in published image." to either reference the issue/PR
number (e.g., "see ISSUE-1234") or rewrite as a descriptive note (e.g., "Note:
can simplify to labels(a)[0] after subscript-on-function support is available")
so static analysis no longer flags a TODO; locate this comment adjacent to the
rows variable and self._client.cypher call in the Graph code to make the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6ffce079-c347-4845-a1ce-6a32f536ec9a

📥 Commits

Reviewing files that changed from the base of the PR and between fc8f53b and 6710ac6.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • demo/notebooks/01_llama_index_property_graph.ipynb
  • demo/notebooks/02_langchain_graph_chain.ipynb
  • demo/notebooks/03_langgraph_agent.ipynb
  • langchain-coordinode/langchain_coordinode/graph.py
  • llama-index-coordinode/llama_index/graph_stores/coordinode/base.py

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds schema-creation APIs to the CoordiNode Python client (including SchemaMode support) and updates downstream integrations/demos to support injecting a pre-built client (e.g., embedded LocalClient) for easier Colab usage.

Changes:

  • Add create_label(..., schema_mode=...) and create_edge_type(...) to async + sync clients, plus schema_mode surfaced on LabelInfo/EdgeTypeInfo.
  • Allow LangChain/LlamaIndex graph stores to accept an externally-provided client= and only close internally-owned clients.
  • Refresh demo notebooks/README for Colab + embedded mode; update lint config and expand unit/integration tests.

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
coordinode/coordinode/client.py New schema CRUD APIs + schema_mode plumbing on info objects.
langchain-coordinode/langchain_coordinode/graph.py Accept injected client and adjust schema refresh notes/docs.
llama-index-coordinode/llama_index/graph_stores/coordinode/base.py Accept injected client and avoid closing caller-owned clients.
tests/unit/test_schema_crud.py Unit coverage for schema_mode defaults/propagation and repr.
tests/integration/test_sdk.py Integration coverage for create_label/create_edge_type and schema_mode validation.
demo/notebooks/00_seed_data.ipynb Colab-ready seeding notebook using embedded mode when available.
demo/notebooks/01_llama_index_property_graph.ipynb Colab-ready LlamaIndex demo using injected client adapter.
demo/notebooks/02_langchain_graph_chain.ipynb Colab-ready LangChain demo using injected client adapter.
demo/notebooks/03_langgraph_agent.ipynb Colab-ready LangGraph agent demo with embedded/local modes.
demo/README.md Colab badge index and local Docker Compose instructions.
ruff.toml Exclusions for submodules/checkpoints and notebook per-file ignores.
coordinode-embedded/python/coordinode_embedded/_coordinode_embedded.pyi Minor stub typing tweak for __enter__.
.gitignore Add ignore entry for .ipynb_checkpoints.
uv.lock Workspace version bump in lockfile.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…fix CONTAINER_ID scope

- notebooks 01/02: add explicit client.close() after store/graph.close() in
  cleanup cells — _owns_client prevents store from closing injected clients
- notebook 03: promote container_id to module-level CONTAINER_ID = None so
  cleanup cell can stop auto-started Docker containers on re-runs
- notebook 03: add write-keyword guard and $sess requirement to query_facts
  to prevent destructive or cross-session Cypher from agent-facing tool
- graph.py:82: replace TODO comment with descriptive note referencing G010/GAPS.md
  to silence SonarCloud warning
…ync docstring types

- _build_property_definitions: raise ValueError with index when property entry
  is not a dict with a non-empty 'name' key (was KeyError)
- LabelInfo.schema_mode inline annotation: 0=unspecified (not 0=unspecified/strict)
- create_label docstring: add "list" and "map" to the type strings list
- _first_label docstring: clarify min() is intentional for determinism;
  labels(n)[0] is unstable due to non-guaranteed label ordering
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@coordinode/coordinode/client.py`:
- Around line 371-411: Rename the parameters in _build_property_definitions from
PropertyType and PropertyDefinition to snake_case names (e.g., property_type_cls
and property_definition_cls) to satisfy linters and clarify they are class
references; update all uses inside the function (type_map values and the call to
PropertyDefinition(...)) to refer to the new parameter names and adjust any call
sites that pass the proto classes into _build_property_definitions to use the
same positional or keyword names if necessary.

In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Around line 397-402: The cleanup cell contains a duplicated/concatenated token
"callerclient.close()" — fix the typo by replacing the malformed trailing text
with a single call to client.close() and its comment; specifically update the
lines around store.delete(...), store.close(), and client.close() so the final
line reads a single client.close() and retains the comment "injected client —
owned by caller" (remove the extra "caller" concatenation and duplicate
client.close()).

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Around line 395-400: The cleanup cell contains duplicated text where the
client.close() call and its comment were concatenated; update the final lines in
the cell so the Neo4j calls are correct by leaving graph.query(...),
print("Cleaned up"), graph.close(), then a single client.close() with the
correct comment ("# injected client — owned by caller")—remove the duplicated
"client.close()  # injected client — owned by callerclient.close()  # injected
client — owned by caller" fragment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3c8ff839-a7b7-41aa-88c0-c9831cf19dc9

📥 Commits

Reviewing files that changed from the base of the PR and between 6710ac6 and 6b69045.

📒 Files selected for processing (5)
  • coordinode/coordinode/client.py
  • demo/notebooks/01_llama_index_property_graph.ipynb
  • demo/notebooks/02_langchain_graph_chain.ipynb
  • demo/notebooks/03_langgraph_agent.ipynb
  • langchain-coordinode/langchain_coordinode/graph.py

polaz added 2 commits April 13, 2026 21:15
…bel visibility test

Server now enforces strict schema for labels registered via create_label().
The workaround node must include only declared properties, so add tag:string
to the schema alongside x:int64.
…case; fix duplicate client.close() in notebooks

- _build_property_definitions: PropertyType → property_type_cls,
  PropertyDefinition → property_definition_cls (SonarCloud naming rule)
- notebooks 01/02: remove duplicated client.close() line in cleanup cells
@polaz polaz requested a review from Copilot April 13, 2026 18:30
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 13, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… dep to >=0.6.0

- CoordinodeGraph.refresh_schema() now uses get_labels() + get_edge_types()
  when available (coordinode>=0.6.0), bypassing the regex text parser;
  injected clients without these methods fall back to _parse_schema()
- Add _PROPERTY_TYPE_NAME dict mapping PropertyType proto ints to strings
  (INT64, FLOAT64, STRING, BOOL, BYTES, TIMESTAMP, VECTOR, LIST, MAP)
- Bump coordinode requirement to >=0.6.0 in both langchain-coordinode
  and llama-index-graph-stores-coordinode
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Line 25: Replace the single-root .ipynb_checkpoints/ ignore entry with a
recursive pattern so checkpoint directories are ignored at any depth; update the
.gitignore entry `.ipynb_checkpoints/` to use a recursive glob (e.g.,
`**/.ipynb_checkpoints/`) so Jupyter checkpoints alongside notebooks in
subdirectories are also excluded.

In `@coordinode/coordinode/client.py`:
- Around line 394-409: The loop that builds property_definition_cls currently
coerces 'required' and 'unique' with bool(...), allowing non-boolean values like
"false" or 1 to be treated as True; change the validation so that for each
property dict (in the properties loop inside the code that constructs
property_definition_cls) you explicitly check if 'required' or 'unique' keys are
present and, if so, assert they are instances of bool (raise ValueError with a
clear message referencing the property name/index), otherwise default them to
False; then pass the validated boolean values into
property_definition_cls(name=..., type=..., required=..., unique=...).

In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Around line 332-337: The query passed to store.structured_query uses a
truncated tag in param_map ("prefix": f"Alice-{tag[:4]}) which can cause
cross-run collisions; update the f-string in the param_map for
store.structured_query to use the full tag (e.g., f"Alice-{tag}") so the WHERE
p.name STARTS WITH parameter is unique per run and preserves isolation.
- Around line 163-207: The notebook may leave a detached Docker container
running when you start CoordiNode (container_id created after subprocess.run)
because there is no teardown; modify the startup logic around the branch that
runs docker (where IMAGE, proc, container_id, and COORDINODE_ADDR are set) to
register a cleanup that stops the container (e.g., atry/finally or
atexit.register that calls subprocess.run(["docker","stop", container_id]))
and/or ensure any exception path (including the timeout else) always stops the
created container; ensure the cleanup uses the captured container_id and only
attempts to stop it if proc.returncode == 0 and container_id is non-empty so you
don’t try to stop a non-created container.

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Around line 288-293: The demo query uses a truncated prefix when calling
graph.query (params={"prefix": f"Turing-{tag[:4]}"}), which can collide with
other runs; change the params value to use the full tag (e.g., params={"prefix":
f"Turing-{tag}"} or otherwise include the entire unique tag variable) so the
MATCH filter on s.name is fully namespaced by the run's tag and prevents
cross-run matches.
- Around line 164-208: The cell can leave a started Docker container running
because when falling back to docker run you set container_id =
proc.stdout.strip() but never record or stop it on teardown; ensure you record
the started container id (e.g., in a module/global name like
STARTED_CONTAINER_ID) and update your cleanup logic to stop it
(subprocess.run(["docker","stop", STARTED_CONTAINER_ID])) only if that variable
is set and the container was created by this notebook; alternatively register an
atexit handler or try/finally around the startup sequence to call
subprocess.run(["docker","stop", container_id]) so
COORDINODE_ADDR/GRPC_PORT/_port_open logic still works and you avoid leaked
containers.

In `@demo/notebooks/03_langgraph_agent.ipynb`:
- Around line 207-217: The current check in query_facts only looks for the
literal "$sess" token but doesn't ensure it's used to restrict node access;
update query_facts to validate that $sess is used in a WHERE clause that
constrains a node property (e.g. pattern matching "WHERE <alias>.session =
$sess" or equivalent) rather than merely present in the query or returned;
implement a robust check (regex or simple parsing) on q to confirm presence of a
WHERE condition binding a node/alias session property to $sess and reject
queries that only return $sess or embed it outside a WHERE predicate; keep the
existing read-only keyword check (_WRITE_KEYWORDS) and still call
client.cypher(q, params={"sess": SESSION}) only after this stricter
session-scoping validation.

In `@langchain-coordinode/langchain_coordinode/graph.py`:
- Around line 50-55: The constructor currently accepts a prebuilt client but
assumes it has get_schema_text() and vector_search(), causing refresh_schema()
and similarity_search() to fail when passed the simple LocalClient (which only
exposes cypher() and close()); update __init__ of the Graph class to either
validate the passed client implements the required protocol (cypher,
get_schema_text, vector_search) and raise a clear error, or automatically wrap
the passed LocalClient in a small adapter that implements get_schema_text()
(calling cypher under the hood) and vector_search() (delegating to an embedded
vector search implementation), and update usages of
get_triplets()/relation_names handling to accept relation_names=None and extract
relationship type via type(r) instead of r.__type__ so callers that omit
relation_names still work with CoordiNode v0.3.2; reference Graph.__init__,
refresh_schema, similarity_search, LocalClient, CoordinodeClient, and
get_triplets when making the changes.

In `@llama-index-coordinode/llama_index/graph_stores/coordinode/base.py`:
- Around line 73-76: The constructor currently accepts an injected client but
doesn't validate its API, causing later calls in get_schema() and vector_query()
to raise AttributeError when a bare coordinode-embedded LocalClient (which only
implements cypher() and close()) is passed; update the constructor that sets
self._client / self._owns_client to validate that the injected client implements
the richer protocol expected by get_schema() and vector_query() (e.g., has
get_schema_text and vector_search methods) and raise a clear TypeError if not,
or alternatively accept an adapter parameter and wrap a LocalClient into an
adapter implementing get_schema_text and vector_search before assigning
self._client (referencing CoordinodeClient, LocalClient, get_schema(),
vector_query(), get_schema_text, and vector_search to locate the code).

In `@proto`:
- Line 1: Verify the proto submodule update points to commit 381dbcd7 and
contains the expected schema-mode additions by updating/fetching the submodule
(git submodule update --init --recursive / git fetch && git checkout 381dbcd7),
then inspect the .proto files for the new SchemaMode enum, the
ComputedPropertyDefinition message, and schema_mode fields in relevant messages,
and confirm the create_label and create_edge_type RPCs reference schema_mode; if
any piece is missing, update the submodule to the correct commit or add the
missing proto definitions and re-run the proto compilation to ensure generated
clients reflect the changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5821a64-2c69-4542-80fc-5c813932a1a1

📥 Commits

Reviewing files that changed from the base of the PR and between d477417 and fcf1166.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • .gitignore
  • coordinode-embedded/python/coordinode_embedded/_coordinode_embedded.pyi
  • coordinode/coordinode/client.py
  • demo/README.md
  • demo/notebooks/00_seed_data.ipynb
  • demo/notebooks/01_llama_index_property_graph.ipynb
  • demo/notebooks/02_langchain_graph_chain.ipynb
  • demo/notebooks/03_langgraph_agent.ipynb
  • langchain-coordinode/langchain_coordinode/graph.py
  • llama-index-coordinode/llama_index/graph_stores/coordinode/base.py
  • proto
  • ruff.toml
  • tests/integration/test_sdk.py
  • tests/unit/test_schema_crud.py

polaz added 3 commits April 15, 2026 02:16
- nb03: add client.health() verification before 'Connected' print in
  both gRPC branches (COORDINODE_ADDR env and _port_open fallback)
- nb03: scope find_related intermediate hops to session — use path
  variable p and WHERE ALL(x IN nodes(p) WHERE x.session = $sess);
  switch type(last(r)) to type(last(relationships(p)))
- tests: add test_create_label_schema_mode_validated asserting
  schema_mode == 2 (VALIDATED) to complement flexible/invalid tests
client.health() returns bool, not raises — bare call ignored the result.
Now raise RuntimeError with a clear message if health check fails.
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 14, 2026

@coderabbitai re: coordinode-rs submodule SHA 8dc198f:

The SHA is intentional — it is pinned to the commit immediately following the v0.3.13 release commit in coordinode-rs (the changelog/release-notes commit). No v0.3.13 git tag has been created in coordinode-rs yet; the SHA is the closest stable point that includes the schema-mode additions shipped with v0.3.13.

Once the coordinode-rs team tags the release, the submodule will be updated to that tag. This is tracked.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

Tip

For best results, initiate chat on the files or code changes.

@polaz Understood — pinning to the changelog commit immediately after the release commit is a reasonable interim approach when the upstream hasn't tagged the release yet. The schema-mode additions are included and the SHA is deterministic, so this is a safe holdover until v0.3.13 is tagged and the submodule can be updated to the official tag. Thanks for tracking it.


✏️ Learnings added
Learnt from: polaz
URL: https://github.com/structured-world/coordinode-python/pull/36

Timestamp: 2026-04-14T23:21:52.969Z
Learning: In `structured-world/coordinode-python`, the `coordinode-rs` submodule may be intentionally pinned to a post-release changelog commit (e.g., SHA `8dc198f`, the commit immediately following the v0.3.13 release commit) when the upstream coordinode-rs repository has not yet created the corresponding git tag. This is a deliberate interim pinning strategy by maintainer polaz; the submodule will be updated to the official release tag once it is created. Do not flag this as an issue requiring a change.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/coordinode-python PR: 25
File: tests/integration/adapters/test_llama_index.py:159-180
Timestamp: 2026-04-12T09:55:48.630Z
Learning: In `tests/integration/adapters/test_llama_index.py`, the `vector_search` RPC in CoordiNode does not return node properties. Therefore, integration tests for `vector_query()` must assert against the internal CoordiNode node ID (captured from a `CREATE ... RETURN n` Cypher response) using `ChunkNode.id_`, rather than application-level property fields like `id`. This is the correct pattern for seeded-node identity assertions in vector query tests.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 18
File: llama-index-coordinode/pyproject.toml:25-25
Timestamp: 2026-04-09T19:03:31.417Z
Learning: In the `coordinode-python` repository, the Python package dependency `coordinode` (PyPI) is distinct from the CoordiNode Rust database server (`structured-world/coordinode` on GitHub). The Python SDK (`coordinode` on PyPI) reached v0.4.x stability (v0.4.0–v0.4.4 released April 9, 2026), while the Rust server's latest release is v0.3.0. Web searches for "coordinode" may return the Rust server GitHub repo instead of the PyPI package — always verify against PyPI directly for Python dependency constraints in this repo.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 27
File: tests/integration/adapters/test_langchain.py:151-163
Timestamp: 2026-04-12T12:01:16.998Z
Learning: In `tests/integration/adapters/test_langchain.py`, `CoordinodeGraph.graph.query()` wraps `CoordinodeClient.cypher()` and returns raw dict values. A `CREATE (n:...) RETURN n` Cypher call yields the internal integer node ID directly (not a node object). Calling `.id_` on it would raise `AttributeError`. This is distinct from the LlamaIndex adapter pattern where `.id_` is used on node objects. For seeded-node identity assertions in `test_similarity_search_returns_results`, direct integer comparison with `r["id"]` from `similarity_search()` is correct.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 15
File: tests/integration/adapters/test_langchain.py:22-25
Timestamp: 2026-04-09T17:03:28.232Z
Learning: In `tests/integration/adapters/test_langchain.py`, the `graph` fixture uses `scope="module"` intentionally to avoid per-test gRPC connection overhead with `CoordinodeGraph`. Test isolation is achieved via the `unique_tag` fixture, which generates a UUID-based suffix per test invocation to namespace all writes. This is a deliberate design decision by the maintainer (polaz).

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 36
File: langchain-coordinode/langchain_coordinode/graph.py:50-55
Timestamp: 2026-04-13T19:18:44.541Z
Learning: In `langchain-coordinode/langchain_coordinode/graph.py`, `CoordinodeGraph.__init__` accepts a `client: Any = None` parameter for dependency injection. Optional methods (`get_schema_text`, `vector_search`, `get_labels`, `get_edge_types`) are guarded with `hasattr` checks at each call site. Duck-typing is intentional — any object exposing `.cypher()` is valid (e.g. notebook `_EmbeddedAdapter`). No `isinstance` or Protocol checks are used because there is no shared base class. This is a deliberate design decision by maintainer polaz.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 36
File: demo/notebooks/00_seed_data.ipynb:54-65
Timestamp: 2026-04-13T20:13:32.387Z
Learning: In the `coordinode-python` repository, `coordinode-embedded` is an alternative in-process transport (Rust engine via maturin), not a wrapper around the `coordinode` PyPI package. It declares no dependency on `coordinode`. Demo notebooks must explicitly install both packages when supporting both embedded and gRPC server-backed (`CoordinodeClient`) modes.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 29
File: tests/integration/test_sdk.py:229-237
Timestamp: 2026-04-12T20:49:08.132Z
Learning: In the `structured-world/coordinode-python` repository, CoordiNode is schema-free. `CoordinodeClient.get_labels()` may return a `LabelInfo` with an empty `properties` list even when the corresponding nodes carry actual properties at runtime. Integration tests must NOT assert that specific property names appear in `LabelInfo.properties`; asserting `isinstance(found.properties, list)` is the correct and complete check.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 36
File: tests/integration/test_sdk.py:568-639
Timestamp: 2026-04-13T22:25:11.413Z
Learning: In `tests/integration/test_sdk.py`, FTS (full-text search) indexing in CoordiNode is automatic for schema-free graphs. Calling `create_label()` before `text_search()` or `hybrid_text_vector_search()` is NOT required. The FTS integration tests (`test_text_search_returns_results`, `test_text_search_fuzzy`, `test_hybrid_text_vector_search_returns_results`) intentionally exercise the schema-free path where nodes are created via plain Cypher without schema registration, and the server automatically indexes text properties. The schema-strict path (with `create_label()`) is covered separately by `test_create_label_*` tests.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

tests/integration/test_sdk.py:712

  • Cleanup here uses DETACH DELETE even though the PR description says DETACH DELETE is a server no-op. If that’s still true on the target image, this test will leak nodes across runs. Prefer DELETE (no detach needed for single nodes) or the two-step relationship+node delete pattern used elsewhere in the demos/tests.
        assert r.score > 0
    finally:
        client.cypher(f"MATCH (n:{label} {{tag: $tag}}) DETACH DELETE n", params={"tag": tag})


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

polaz added 3 commits April 15, 2026 02:56
…assumptions

FTS indexing is NOT automatic — a CREATE TEXT INDEX DDL is required
before text_search() can return results.  The previous code and tests
incorrectly assumed automatic indexing.

SDK changes:
- Add TextIndexInfo result class (name, label, properties, documents_indexed)
- Add AsyncCoordinodeClient.create_text_index() and drop_text_index()
- Add CoordinodeClient sync wrappers for both methods
- Export TextIndexInfo from coordinode package
- Fix text_search() docstring: document DDL requirement, remove false
  claim about automatic indexing

Test changes:
- Fix test_text_search_returns_results: create/drop TEXT INDEX around
  the search call; drop empty-results xfail (now a hard assertion)
- Fix test_text_search_fuzzy: create/drop TEXT INDEX; assert results
  are non-empty (was: just verify no exception)
- Fix test_hybrid_text_vector_search_returns_results: add TEXT INDEX
- Update header comment: remove 'automatic indexing' statement
DETACH DELETE may not reliably remove connected edges when nodes have
relationships. Use explicit MATCH-DELETE for edges then nodes to ensure
clean test isolation.
Colab notebooks use coordinode-embedded bundling coordinode-rs v0.3.13.
Docker Compose stack uses the CoordiNode server image v0.3.14.
These are separate version lines — make the distinction explicit.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@coordinode/coordinode/client.py`:
- Around line 583-593: The drop_text_index method currently interpolates the
index name directly into the Cypher string (await self.cypher(f"DROP TEXT INDEX
{name}")), which repeats the unsafe pattern from create_text_index; update
drop_text_index to apply the same identifier validation/sanitization used by
create_text_index (e.g., call the existing validate/sanitize function or inline
the same safe-check logic) before interpolating, and then pass the validated
identifier into self.cypher to avoid injection and ensure consistent behavior
with create_text_index.
- Around line 414-471: The _build_property_definitions function exceeds
cognitive complexity — extract small validation helpers to simplify the loop:
implement a helper like _validate_property_dict(p, idx) that verifies p is a
dict, extracts and validates the non-empty name and normalizes/validates the
type string, and another helper like _validate_property_flags(name, required,
unique) that validates boolean flags (or combine into one helper that returns
name, type_str, required, unique); then update _build_property_definitions to
call these helpers inside the for loop and use their returned values when
constructing property_definition_cls(name=..., type=..., required=...,
unique=...), keeping the same error messages and the existing type_map and
result building logic.
- Around line 571-581: The create_text_index block directly interpolates user
inputs (name, label, properties/prop_list) into the cypher string; add
validation or safe-escaping for Cypher identifiers before building props_expr
and cypher to avoid syntax errors or injection. Implement or call a helper like
_validate_cypher_identifier(value, param_name) to validate name and label, and
validate each property in prop_list (or alternatively apply backtick-escaping
for each identifier) and raise ValueError on invalid identifiers; then construct
props_expr and lang_clause and run self.cypher(cypher) as before (references:
create_text_index function, variables name, label, properties, prop_list,
props_expr, cypher).

In `@demo/README.md`:
- Around line 30-34: Add the missing metrics/health port entry to the ports
table in demo/README.md: insert a row for port 37084 and label the Service as
"Metrics / Health" (to match demo/docker-compose.yml which exposes 37084 for
/metrics and /health) so the README stays consistent with the compose file and
aids local ops/troubleshooting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ae099c38-bd20-429b-815c-b5721a5ecd3c

📥 Commits

Reviewing files that changed from the base of the PR and between e06be1f and 0c677c9.

📒 Files selected for processing (4)
  • coordinode/coordinode/__init__.py
  • coordinode/coordinode/client.py
  • demo/README.md
  • tests/integration/test_sdk.py

polaz added 5 commits April 15, 2026 03:13
Updates embedded engine to include all fixes after v0.3.13:
- percentileCont/percentileDisc functions
- Documentation and API reference improvements
Aligns embedded engine with CoordiNode server image v0.3.14.
…identifier validation

- Extract per-property validation into _validate_property_dict static method,
  reducing _build_property_definitions cognitive complexity from 16 → ~10
  (fixes SonarCloud threshold of 15)
- Add _CYPHER_IDENT_RE regex and _validate_cypher_identifier() module helper;
  validate name, label, and each property in create_text_index(), and name in
  drop_text_index() before interpolating into DDL Cypher strings
- Add missing port 37084 (metrics/health) to demo/README.md port table to
  match docker-compose.yml
Update @b23c833 install reference in all 4 Colab notebooks to @e3ff279,
which includes the coordinode-rs v0.3.14 submodule; update README note
from v0.3.13 to v0.3.14.
Previous pointer (5fc543b) was a local-only commit not pushed to
structured-world/coordinode, causing CI submodule fetch to fail.
Switch to 4c80fbe — the official release commit tagged v0.3.14
that exists on GitHub.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (1)
coordinode/coordinode/client.py (1)

685-710: ⚠️ Potential issue | 🟡 Minor

The text_search() docs still contradict the schema-free behavior exercised by tests.

The Args section says schema-free text is auto-indexed on current servers, but the Note still says indexing is "not automatic" and requires a CREATE TEXT INDEX first. That will mislead callers into an unnecessary setup step.

✏️ Suggested wording
-        Note:
-            Text indexing is **not** automatic.  Before calling this method,
-            create a full-text index with the Cypher DDL statement::
+        Note:
+            Some deployments require an explicit full-text index. On current
+            schema-free CoordiNode servers, text properties are indexed
+            automatically; create a text index only when your server requires
+            explicit index registration::

Based on learnings: In tests/integration/test_sdk.py, FTS indexing in CoordiNode is automatic for schema-free graphs, so create_label() is not required before text_search() or hybrid_text_vector_search().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@coordinode/coordinode/client.py` around lines 685 - 710, Update the docstring
for text_search() to remove the misleading "Text indexing is not automatic" Note
and instead state that on current CoordiNode servers schema-free graphs are
auto-indexed (so create_label() or explicit CREATE TEXT INDEX is not required
for schema-free nodes), while retaining guidance that users can create an
explicit text index via create_text_index or the Cypher DDL (CREATE TEXT INDEX
...) when they need a named/indexed property for schema-based graphs or explicit
control; also mirror this clarification for hybrid_text_vector_search() if
present and remove any contradictory phrasing about requiring CREATE TEXT INDEX
before using text_search().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@coordinode-rs`:
- Line 1: The submodule is pinned to a non-existent SHA
(5fc543b04f0a732ec394eae84b130f4542754378); update the submodule pointer to the
v0.3.14 release (commit 4c80fbef57eeee16efa471dc544704d592972b4a). Replace the
bad gitlink/commit reference for the coordinode submodule with the tag v0.3.14
(or the commit 4c80fbef57ee...) in the repository’s submodule entry and commit
the updated submodule reference so CI and fresh clones fetch the correct commit.

In `@coordinode/coordinode/client.py`:
- Around line 594-603: The code builds a Cypher DDL that interpolates language
verbatim and allows empty properties; update the index-creation logic to first
reject empty property lists (raise ValueError if properties is empty after
normalizing) and validate each property as already done with
_validate_cypher_identifier(prop, "property"); additionally validate the
language parameter before interpolation—either require language to be None or a
non-empty string that passes safe identifier validation (e.g., call
_validate_cypher_identifier(language, "language") or check against an explicit
whitelist) and only then append the language clause, so malformed or injectable
DDL cannot be produced when composing cypher (variables to touch:
properties/prop_list, language, _validate_cypher_identifier, name, label,
cypher).

In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Line 42: The Colab-only block that downloads and builds coordinode-embedded
(guarded by IN_COLAB) should skip the Rust download/build when a remote
CoordiNode address is provided; add a guard that checks
os.environ.get("COORDINODE_ADDR") (or similar env var used elsewhere) and only
run the rustup/maturin/git+coordinode-embedded install when IN_COLAB is true AND
COORDINODE_ADDR is not set, leaving the subsequent pip installs and
nest_asyncio.apply() unchanged.

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Around line 280-283: The cleanup step currently uses a single DETACH DELETE
via graph.query(...) which fails on servers that require explicit relationship
removal; replace that single query with the two-step cleanup used elsewhere:
first run a graph.query that MATCHes and DELETEs relationships for nodes with
n.name ENDS WITH $tag, then run a second graph.query that MATCHes and DELETEs
the nodes themselves (using the same params={"tag": tag}); keep the surrounding
print("Cleaned up"), graph.close(), and client.close() calls intact so ownership
semantics remain unchanged.
- Line 39: When IN_COLAB is true, the cell currently always bootstraps Rust and
builds/install coordinode-embedded; change the guard to skip that entire
embedded-build block if an external server is specified by COORDINODE_ADDR
(check os.environ.get("COORDINODE_ADDR")). Concretely, update the condition
using IN_COLAB and the env var (e.g., if IN_COLAB and not
os.environ.get("COORDINODE_ADDR")) so the rustup/maturin/pip install of
coordinode-embedded and the cargo PATH modification are only executed when Colab
is used standalone; leave the subsequent pip installs (coordinode, langchain,
etc.) untouched.

In `@demo/notebooks/03_langgraph_agent.ipynb`:
- Line 127: SESSION is currently set to uuid.uuid4().hex[:8], which truncates
the UUID to 8 hex chars and reduces entropy; change this to use the full UUID
(e.g., uuid.uuid4().hex without slicing) so SESSION retains full uniqueness and
avoids cross-run collisions and accidental data deletion—update the assignment
where SESSION is defined (referencing SESSION and uuid.uuid4().hex[:8]) to use
the full uuid.uuid4().hex value.
- Around line 168-180: The function query_facts currently slices rows[:20] after
executing client.cypher, which can materialize huge results; before calling
client.cypher(q, ...), ensure the cypher string q includes a LIMIT 20 (or a
configurable safe max) if it doesn't already—e.g., detect an existing LIMIT via
a simple regex and only append " LIMIT 20" when absent—then execute
client.cypher with the modified q (preserving the existing read-only and
session-scoping checks using _WRITE_CLAUSE_RE, _SESSION_WHERE_SCOPE_RE, and
_SESSION_NODE_SCOPE_RE) so the database enforces the result bound instead of
materializing everything in query_facts.

In `@demo/README.md`:
- Around line 9-17: The Colab badge links currently point to blob/main so the
notebooks float with the branch while only the embedded install is actually
pinned; update the README note and/or the links in the table: either change each
Colab URL to reference a specific commit SHA (replace blob/main with
blob/<commit-sha>) so notebook content is truly pinned, or modify the Note text
to explicitly state that only the embedded install inside the notebooks
(coordinode-embedded) is commit-pinned and that the notebook links reference the
main branch; update the phrasing where it currently says “Notebooks are pinned
to a specific commit” to reflect which part is pinned.

---

Duplicate comments:
In `@coordinode/coordinode/client.py`:
- Around line 685-710: Update the docstring for text_search() to remove the
misleading "Text indexing is not automatic" Note and instead state that on
current CoordiNode servers schema-free graphs are auto-indexed (so
create_label() or explicit CREATE TEXT INDEX is not required for schema-free
nodes), while retaining guidance that users can create an explicit text index
via create_text_index or the Cypher DDL (CREATE TEXT INDEX ...) when they need a
named/indexed property for schema-based graphs or explicit control; also mirror
this clarification for hybrid_text_vector_search() if present and remove any
contradictory phrasing about requiring CREATE TEXT INDEX before using
text_search().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 884059ed-6081-42b6-89f3-322bcf8f4bae

📥 Commits

Reviewing files that changed from the base of the PR and between 0c677c9 and 0e2fde1.

📒 Files selected for processing (7)
  • coordinode-rs
  • coordinode/coordinode/client.py
  • demo/README.md
  • demo/notebooks/00_seed_data.ipynb
  • demo/notebooks/01_llama_index_property_graph.ipynb
  • demo/notebooks/02_langchain_graph_chain.ipynb
  • demo/notebooks/03_langgraph_agent.ipynb

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
demo/notebooks/02_langchain_graph_chain.ipynb (2)

280-283: ⚠️ Potential issue | 🟠 Major

Use two-step cleanup instead of DETACH DELETE for server compatibility.

Line 280 reintroduces DETACH DELETE, while this PR already uses explicit relationship+node cleanup in demo/notebooks/00_seed_data.ipynb for broader compatibility.

Suggested fix
-graph.query("MATCH (n) WHERE n.name ENDS WITH $tag DETACH DELETE n", params={"tag": tag})
+graph.query(
+    "MATCH (n) WHERE n.name ENDS WITH $tag OPTIONAL MATCH (n)-[r]-() DELETE r",
+    params={"tag": tag},
+)
+graph.query("MATCH (n) WHERE n.name ENDS WITH $tag DELETE n", params={"tag": tag})
 print("Cleaned up")
 graph.close()
 client.close()  # injected client — owned by caller

Run this read-only check to confirm cleanup-query inconsistency across notebooks:

#!/bin/bash
python3 <<'PY'
import json

for path in [
    "demo/notebooks/00_seed_data.ipynb",
    "demo/notebooks/02_langchain_graph_chain.ipynb",
]:
    nb = json.load(open(path))
    print(f"\n== {path} ==")
    for i, cell in enumerate(nb["cells"]):
        if cell.get("cell_type") != "code":
            continue
        src = "".join(cell.get("source", []))
        if "DELETE" in src and ("DETACH DELETE" in src or "demo_tag" in src or "ENDS WITH $tag" in src):
            print(f"[cell {i}]")
            print(src.strip())
PY

Expected: 00_seed_data.ipynb shows explicit two-step deletes, while 02_langchain_graph_chain.ipynb still shows DETACH DELETE.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/02_langchain_graph_chain.ipynb` around lines 280 - 283,
Replace the one-step DETACH DELETE used in the graph.query call with the
explicit two-step cleanup used elsewhere: first delete relationships that
involve the target nodes, then delete the nodes themselves; update the call that
currently uses graph.query("MATCH (n) WHERE n.name ENDS WITH $tag DETACH DELETE
n", params={"tag": tag}) to run two queries via graph.query—one removing
relationships for nodes matching the ENDS WITH $tag predicate (DELETE
relationships), and a second deleting those nodes (DELETE nodes); keep the same
params={"tag": tag} and ensure graph.close() and client.close() behavior remains
unchanged.

39-39: ⚠️ Potential issue | 🟠 Major

Skip embedded Rust build when COORDINODE_ADDR is already set.

Line 39 still compiles coordinode-embedded for all Colab runs, even when the notebook will use an external server. This adds avoidable latency/failure risk.

Suggested fix
-# Install coordinode-embedded in Colab only (requires Rust build).
-if IN_COLAB:
+# Install coordinode-embedded in Colab only when embedded fallback is needed.
+if IN_COLAB and not os.environ.get("COORDINODE_ADDR"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/02_langchain_graph_chain.ipynb` at line 39, The Colab-only
Rust install and coordinode-embedded pip build should be skipped when an
external server is configured via COORDINODE_ADDR; wrap the existing IN_COLAB
installation block (the code that creates _ctx, downloads sh.rustup.rs, runs
subprocess.run to install rustup, sets _cargo_bin into PATH, and pip-installs
maturin and the git+...coordinode-embedded package) with an extra guard like: if
IN_COLAB and not os.environ.get("COORDINODE_ADDR"): ...; ensure any variables
and subprocess.run calls inside that block (e.g., _rustup_path handling,
_cargo_bin PATH update, pip install of maturin and coordinode-embedded) are only
executed when COORDINODE_ADDR is unset so Colab runs that point to an external
coordinode server skip the embedded Rust build.
demo/notebooks/01_llama_index_property_graph.ipynb (1)

42-42: ⚠️ Potential issue | 🟠 Major

Avoid forcing embedded build in Colab when using a remote server.

Line 42 still runs the Rust/embedded install path whenever IN_COLAB is true. If COORDINODE_ADDR is provided, this is unnecessary and can fail before the gRPC path is used.

Suggested fix
-# Install coordinode-embedded in Colab only (requires Rust build).
-if IN_COLAB:
+# Install coordinode-embedded in Colab only when embedded fallback is needed.
+if IN_COLAB and not os.environ.get("COORDINODE_ADDR"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/01_llama_index_property_graph.ipynb` at line 42, The
Colab-specific Rust/embedded install block guarded by IN_COLAB should be skipped
when a remote coordinode server is intended; check os.getenv("COORDINODE_ADDR")
(or os.environ) before running the rustup/maturin/coordinode-embedded install
and only run that block if IN_COLAB is true AND COORDINODE_ADDR is not
set/empty. Update the conditional around the rustup/maturin/coordinode-embedded
code (the block using IN_COLAB, _rustup_path, subprocess.run for rustup and pip
install git+https://...coordinode-embedded) so it falls back to the normal pip
installs when COORDINODE_ADDR is provided.
demo/notebooks/03_langgraph_agent.ipynb (2)

168-180: ⚠️ Potential issue | 🟠 Major

Enforce LIMIT before executing query_facts() to avoid large materialization.

Line 180 truncates after client.cypher(...) already fetched rows. Add/validate LIMIT in the query string first so the database enforces bounds.

🔧 Proposed fix
+_LIMIT_RE = re.compile(r"\bLIMIT\s+(\d+)\b", re.IGNORECASE)
+
 `@tool`
 def query_facts(cypher: str) -> str:
@@
-    q = cypher.strip()
+    q = cypher.strip().rstrip(";")
@@
     if not (_SESSION_WHERE_SCOPE_RE.search(q) or _SESSION_NODE_SCOPE_RE.search(q)):
         return "Query must scope reads to the current session with either WHERE <alias>.session = $sess or {session: $sess}"
-    rows = client.cypher(q, params={"sess": SESSION})
-    return str(rows[:20]) if rows else "No results"
+    m = _LIMIT_RE.search(q)
+    if not m:
+        q = f"{q} LIMIT 20"
+    elif int(m.group(1)) > 20:
+        return "Query must include LIMIT <= 20."
+    rows = client.cypher(q, params={"sess": SESSION})
+    return str(rows) if rows else "No results"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/03_langgraph_agent.ipynb` around lines 168 - 180, The
query_facts function currently executes client.cypher(q, ...) after validation
but before bounding results; ensure the DB enforces a LIMIT by inspecting q for
an existing LIMIT and, if absent, appending a safe LIMIT (e.g., LIMIT 100) to
the query string before calling client.cypher; keep the existing read-only check
(_WRITE_CLAUSE_RE) and session-scoping checks (_SESSION_WHERE_SCOPE_RE or
_SESSION_NODE_SCOPE_RE) intact, and call client.cypher with the modified q and
params={"sess": SESSION} so large materializations are prevented.

127-127: ⚠️ Potential issue | 🟠 Major

Use full UUID for SESSION to avoid cross-run collisions.

Line 127 truncates to 8 hex chars (32 bits), which makes collisions realistic across repeated runs and can mix/cleanup another run’s data.

🔧 Proposed fix
-SESSION = uuid.uuid4().hex[:8]  # isolates this demo's data from other sessions
+SESSION = uuid.uuid4().hex  # isolates this demo's data from other sessions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/03_langgraph_agent.ipynb` at line 127, Replace the truncated
session identifier with a full UUID: change the SESSION assignment (currently
"SESSION = uuid.uuid4().hex[:8]") to use the full UUID (e.g., uuid.uuid4().hex
or str(uuid.uuid4())) so runs are isolated; also scan for any code that assumes
an 8-char SESSION and update those assumptions (cleanup, naming, or lookup
logic) to accept the full UUID value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Line 42: The Colab-specific Rust/embedded install block guarded by IN_COLAB
should be skipped when a remote coordinode server is intended; check
os.getenv("COORDINODE_ADDR") (or os.environ) before running the
rustup/maturin/coordinode-embedded install and only run that block if IN_COLAB
is true AND COORDINODE_ADDR is not set/empty. Update the conditional around the
rustup/maturin/coordinode-embedded code (the block using IN_COLAB, _rustup_path,
subprocess.run for rustup and pip install git+https://...coordinode-embedded) so
it falls back to the normal pip installs when COORDINODE_ADDR is provided.

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Around line 280-283: Replace the one-step DETACH DELETE used in the
graph.query call with the explicit two-step cleanup used elsewhere: first delete
relationships that involve the target nodes, then delete the nodes themselves;
update the call that currently uses graph.query("MATCH (n) WHERE n.name ENDS
WITH $tag DETACH DELETE n", params={"tag": tag}) to run two queries via
graph.query—one removing relationships for nodes matching the ENDS WITH $tag
predicate (DELETE relationships), and a second deleting those nodes (DELETE
nodes); keep the same params={"tag": tag} and ensure graph.close() and
client.close() behavior remains unchanged.
- Line 39: The Colab-only Rust install and coordinode-embedded pip build should
be skipped when an external server is configured via COORDINODE_ADDR; wrap the
existing IN_COLAB installation block (the code that creates _ctx, downloads
sh.rustup.rs, runs subprocess.run to install rustup, sets _cargo_bin into PATH,
and pip-installs maturin and the git+...coordinode-embedded package) with an
extra guard like: if IN_COLAB and not os.environ.get("COORDINODE_ADDR"): ...;
ensure any variables and subprocess.run calls inside that block (e.g.,
_rustup_path handling, _cargo_bin PATH update, pip install of maturin and
coordinode-embedded) are only executed when COORDINODE_ADDR is unset so Colab
runs that point to an external coordinode server skip the embedded Rust build.

In `@demo/notebooks/03_langgraph_agent.ipynb`:
- Around line 168-180: The query_facts function currently executes
client.cypher(q, ...) after validation but before bounding results; ensure the
DB enforces a LIMIT by inspecting q for an existing LIMIT and, if absent,
appending a safe LIMIT (e.g., LIMIT 100) to the query string before calling
client.cypher; keep the existing read-only check (_WRITE_CLAUSE_RE) and
session-scoping checks (_SESSION_WHERE_SCOPE_RE or _SESSION_NODE_SCOPE_RE)
intact, and call client.cypher with the modified q and params={"sess": SESSION}
so large materializations are prevented.
- Line 127: Replace the truncated session identifier with a full UUID: change
the SESSION assignment (currently "SESSION = uuid.uuid4().hex[:8]") to use the
full UUID (e.g., uuid.uuid4().hex or str(uuid.uuid4())) so runs are isolated;
also scan for any code that assumes an 8-char SESSION and update those
assumptions (cleanup, naming, or lookup logic) to accept the full UUID value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8dc482ed-0817-4ff8-9dfe-58d924e99e51

📥 Commits

Reviewing files that changed from the base of the PR and between 0e2fde1 and 7534b14.

📒 Files selected for processing (5)
  • coordinode-rs
  • demo/notebooks/00_seed_data.ipynb
  • demo/notebooks/01_llama_index_property_graph.ipynb
  • demo/notebooks/02_langchain_graph_chain.ipynb
  • demo/notebooks/03_langgraph_agent.ipynb

polaz added 2 commits April 15, 2026 03:55
…n create_text_index

- raise ValueError when properties list is empty
- validate language as Cypher identifier before DDL interpolation
- replace two-step edge+node DELETE with DETACH DELETE in integration test
  (undirected MATCH returns each edge twice, causing duplicate DELETE failure)

fix(notebooks): skip embedded install when COORDINODE_ADDR is set

- 01_llama, 02_langchain: guard Rust build with COORDINODE_ADDR check
  so Docker/server stacks skip the redundant ~5 min Colab build step
- 03_langgraph: use full uuid4().hex (32 chars) for session ID to avoid
  collision in concurrent demo sessions
- 03_langgraph: auto-append LIMIT 20 in query_facts when query has no
  LIMIT clause, bounding LLM-generated query result size
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
coordinode/coordinode/client.py (1)

689-715: ⚠️ Potential issue | 🟡 Minor

Align the text_search() docstring with actual FTS behavior.

The Args section now says schema-free graphs are indexed automatically, but Returns and Note still say callers must create a text index first. That contradiction will send users down an unnecessary setup path.

Suggested wording
-        Returns:
-            List of :class:`TextResult` ordered by BM25 score descending.
-            Returns ``[]`` if no text index exists for *label*.
-
-        Note:
-            Text indexing is **not** automatic.  Before calling this method,
-            create a full-text index with the Cypher DDL statement::
+        Returns:
+            List of :class:`TextResult` ordered by BM25 score descending.
+            Labels with no searchable text content simply return ``[]``.
+
+        Note:
+            On current CoordiNode servers, schema-free graphs index text
+            automatically. Use :meth:`create_text_index` when you want explicit
+            DDL-managed index lifecycle/control::
Based on learnings: In `tests/integration/test_sdk.py`, FTS indexing in CoordiNode is automatic for schema-free graphs. Calling `create_label()` before `text_search()` or `hybrid_text_vector_search()` is NOT required.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@coordinode/coordinode/client.py` around lines 689 - 715, The docstring for
text_search() is inconsistent about whether full-text indexing is automatic;
update the Args/Returns/Note to reflect actual behavior: for schema-free graphs
CoordiNode automatically indexes text properties so callers do not need to call
create_label() or create_text_index() before text_search() or
hybrid_text_vector_search(); keep a short note explaining that explicit CREATE
TEXT INDEX or create_text_index() is only required for non-schema-free
deployments or when custom index configuration is desired, and ensure references
to create_label(), create_text_index(), text_search(), and
hybrid_text_vector_search() are used so readers can find related APIs.
demo/notebooks/02_langchain_graph_chain.ipynb (1)

40-40: 🧹 Nitpick | 🔵 Trivial

Pin the LangChain versions behind this demo.

This notebook installs whatever the latest langchain* packages are, then imports GraphCypherQAChain from a specific module path. That surface moves often enough that a clean rerun can break without any repo change.

In the official LangChain Python docs, what is the current import path and constructor usage for `GraphCypherQAChain` (including whether `allow_dangerous_requests=True` is required), and which `langchain` package/version documents that API?

Also applies to: 265-265

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/02_langchain_graph_chain.ipynb` at line 40, The notebook
installs floating langchain packages which can break imports like
GraphCypherQAChain; update the pip install subprocess.run call that lists
"langchain", "langchain-coordinode", "langchain-community", and
"langchain-openai" to use explicit version pins (e.g. "langchain==X.Y.Z" etc.)
and lock any transitive langchain-related packages used by the demo, and then
verify and update the demo import/constructor for GraphCypherQAChain to match
the official LangChain docs (adjust the import path and whether
allow_dangerous_requests=True is required) before choosing the versions;
specifically modify the subprocess.run install arguments and the demo cells that
import/construct GraphCypherQAChain so they are consistent with the pinned
langchain package versions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Line 160: The demo truncates the UUID when creating the run tag ("tag =
uuid.uuid4().hex[:6]"), which increases collision risk; change the assignment to
keep the full UUID (e.g., use uuid.uuid4().hex or str(uuid.uuid4())) so the
"tag" variable retains full uniqueness across runs and shared environments.

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Line 158: The tag currently uses a truncated value uuid.uuid4().hex[:6] which
reduces entropy and risks collisions; change the assignment of tag (where
uuid.uuid4().hex[:6] appears) to use the full UUID hex (e.g., uuid.uuid4().hex)
so run-scoped names remain unique across demo environments while leaving the
rest of the per-run namespacing logic unchanged.
- Around line 87-94: CoordinodeGraph's embedded adapter should not implement
get_labels() or get_edge_types() (or any schema RPC-like methods) because
returning [] falsely signals schema support; remove these two methods from the
embedded adapter (or change them so they are not discoverable via hasattr, e.g.,
raise AttributeError or NotImplementedError instead of returning an empty list)
so callers using hasattr/callable checks will fall back to
get_schema_text()/refresh_schema() as intended; ensure other optional methods
(get_schema_text, vector_search) remain untouched and continue to be checked via
hasattr where used.

In `@demo/notebooks/03_langgraph_agent.ipynb`:
- Line 124: The query_facts function currently only appends LIMIT 20 when no
LIMIT is present, leaving queries like "LIMIT 100000" to execute and rely on
rows[:20] after materialization; fix by capping any explicit LIMIT before
execution: update query_facts to replace any match of _LIMIT_RE
(r"\bLIMIT\s+\d+\b") with "LIMIT 20" (e.g., via re.sub) on q prior to calling
client.cypher, also ensure you strip trailing semicolons as already done; this
guarantees client.cypher never receives a larger LIMIT and avoids relying on
rows[:20] after large results are materialized.

---

Duplicate comments:
In `@coordinode/coordinode/client.py`:
- Around line 689-715: The docstring for text_search() is inconsistent about
whether full-text indexing is automatic; update the Args/Returns/Note to reflect
actual behavior: for schema-free graphs CoordiNode automatically indexes text
properties so callers do not need to call create_label() or create_text_index()
before text_search() or hybrid_text_vector_search(); keep a short note
explaining that explicit CREATE TEXT INDEX or create_text_index() is only
required for non-schema-free deployments or when custom index configuration is
desired, and ensure references to create_label(), create_text_index(),
text_search(), and hybrid_text_vector_search() are used so readers can find
related APIs.

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Line 40: The notebook installs floating langchain packages which can break
imports like GraphCypherQAChain; update the pip install subprocess.run call that
lists "langchain", "langchain-coordinode", "langchain-community", and
"langchain-openai" to use explicit version pins (e.g. "langchain==X.Y.Z" etc.)
and lock any transitive langchain-related packages used by the demo, and then
verify and update the demo import/constructor for GraphCypherQAChain to match
the official LangChain docs (adjust the import path and whether
allow_dangerous_requests=True is required) before choosing the versions;
specifically modify the subprocess.run install arguments and the demo cells that
import/construct GraphCypherQAChain so they are consistent with the pinned
langchain package versions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e49edf9c-9103-42f5-95dd-98e2f10335d1

📥 Commits

Reviewing files that changed from the base of the PR and between 7534b14 and 0181894.

📒 Files selected for processing (5)
  • coordinode/coordinode/client.py
  • demo/notebooks/01_llama_index_property_graph.ipynb
  • demo/notebooks/02_langchain_graph_chain.ipynb
  • demo/notebooks/03_langgraph_agent.ipynb
  • tests/integration/test_sdk.py

…from embedded adapter

- 01_llama, 02_langchain: use full uuid4().hex for `tag` variable
  (was [:6] = 24 bits entropy, risked cross-run collisions on shared envs)
- 01_llama, 02_langchain: remove get_labels()/get_edge_types() from
  _EmbeddedAdapter — returning [] made adapter look schema-RPC-capable,
  silently short-circuiting the get_schema_text() path in CoordinodeGraph
- 03_langgraph: replace explicit LIMIT N > 20 in query_facts via re.sub
  instead of only appending when absent; server no longer materializes
  oversized result sets from LLM-generated queries

fix(client): remove contradictory "indexed automatically" claim from
text_search() Args.label docstring; Note already says indexing requires
explicit CREATE TEXT INDEX DDL
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 15, 2026

@coderabbitai Re: text_search() docstring (client.py:689-715) — the docstring was indeed internally inconsistent: Args.label said "indexed automatically", while Note said "not automatic". Fixed by removing the incorrect auto-indexing claim from Args.label; the Note is correct and unchanged (explicit CREATE TEXT INDEX DDL is required, as confirmed by integration tests in test_text_search_returns_results).

Re: LangChain version pinning (02_langchain.ipynb:40) — won't pin. These are demo notebooks intended to work with current LangChain releases. Pinning specific versions would require ongoing maintenance on every LangChain release and would break Colab installs as packages age out of PyPI availability windows.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

Tip

For best results, initiate chat on the files or code changes.

@polaz Thanks for the follow-up on both threads.

Docstring fix — that's the right resolution. The Note correctly reflects the actual behavior (explicit CREATE TEXT INDEX DDL required, as the integration tests confirm), and removing the conflicting claim from Args.label makes the docstring internally consistent. ✅

LangChain version pinning — understood; tracking current releases is the right call for Colab-oriented demo notebooks where pinning quickly becomes a maintenance liability.


✏️ Learnings added
Learnt from: polaz
URL: https://github.com/structured-world/coordinode-python/pull/36

Timestamp: 2026-04-15T05:54:14.419Z
Learning: In `demo/notebooks/02_langchain_graph_chain.ipynb` (and other demo notebooks in `structured-world/coordinode-python`), LangChain dependency versions are intentionally NOT pinned. These notebooks are designed to work with current LangChain releases, and pinning would require ongoing maintenance on every release and could break Colab installs as packages age out of PyPI availability windows. This is a deliberate decision by maintainer polaz.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/coordinode-python PR: 0
File: :0-0
Timestamp: 2026-04-14T23:21:52.969Z
Learning: In `structured-world/coordinode-python`, the `coordinode-rs` submodule may be intentionally pinned to a post-release changelog commit (e.g., SHA `8dc198f`, the commit immediately following the v0.3.13 release commit) when the upstream coordinode-rs repository has not yet created the corresponding git tag. This is a deliberate interim pinning strategy by maintainer polaz; the submodule will be updated to the official release tag once it is created. Do not flag this as an issue requiring a change.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 25
File: tests/integration/adapters/test_llama_index.py:159-180
Timestamp: 2026-04-12T09:55:48.630Z
Learning: In `tests/integration/adapters/test_llama_index.py`, the `vector_search` RPC in CoordiNode does not return node properties. Therefore, integration tests for `vector_query()` must assert against the internal CoordiNode node ID (captured from a `CREATE ... RETURN n` Cypher response) using `ChunkNode.id_`, rather than application-level property fields like `id`. This is the correct pattern for seeded-node identity assertions in vector query tests.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 27
File: tests/integration/adapters/test_langchain.py:151-163
Timestamp: 2026-04-12T12:01:16.998Z
Learning: In `tests/integration/adapters/test_langchain.py`, `CoordinodeGraph.graph.query()` wraps `CoordinodeClient.cypher()` and returns raw dict values. A `CREATE (n:...) RETURN n` Cypher call yields the internal integer node ID directly (not a node object). Calling `.id_` on it would raise `AttributeError`. This is distinct from the LlamaIndex adapter pattern where `.id_` is used on node objects. For seeded-node identity assertions in `test_similarity_search_returns_results`, direct integer comparison with `r["id"]` from `similarity_search()` is correct.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 36
File: langchain-coordinode/langchain_coordinode/graph.py:50-55
Timestamp: 2026-04-13T19:18:44.541Z
Learning: In `langchain-coordinode/langchain_coordinode/graph.py`, `CoordinodeGraph.__init__` accepts a `client: Any = None` parameter for dependency injection. Optional methods (`get_schema_text`, `vector_search`, `get_labels`, `get_edge_types`) are guarded with `hasattr` checks at each call site. Duck-typing is intentional — any object exposing `.cypher()` is valid (e.g. notebook `_EmbeddedAdapter`). No `isinstance` or Protocol checks are used because there is no shared base class. This is a deliberate design decision by maintainer polaz.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 29
File: tests/integration/test_sdk.py:229-237
Timestamp: 2026-04-12T20:49:08.132Z
Learning: In the `structured-world/coordinode-python` repository, CoordiNode is schema-free. `CoordinodeClient.get_labels()` may return a `LabelInfo` with an empty `properties` list even when the corresponding nodes carry actual properties at runtime. Integration tests must NOT assert that specific property names appear in `LabelInfo.properties`; asserting `isinstance(found.properties, list)` is the correct and complete check.

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 15
File: tests/integration/adapters/test_langchain.py:22-25
Timestamp: 2026-04-09T17:03:28.232Z
Learning: In `tests/integration/adapters/test_langchain.py`, the `graph` fixture uses `scope="module"` intentionally to avoid per-test gRPC connection overhead with `CoordinodeGraph`. Test isolation is achieved via the `unique_tag` fixture, which generates a UUID-based suffix per test invocation to namespace all writes. This is a deliberate design decision by the maintainer (polaz).

Learnt from: polaz
Repo: structured-world/coordinode-python PR: 36
File: tests/integration/test_sdk.py:568-639
Timestamp: 2026-04-13T22:25:11.413Z
Learning: In `tests/integration/test_sdk.py`, FTS (full-text search) indexing in CoordiNode is automatic for schema-free graphs. Calling `create_label()` before `text_search()` or `hybrid_text_vector_search()` is NOT required. The FTS integration tests (`test_text_search_returns_results`, `test_text_search_fuzzy`, `test_hybrid_text_vector_search_returns_results`) intentionally exercise the schema-free path where nodes are created via plain Cypher without schema registration, and the server automatically indexes text properties. The schema-strict path (with `create_label()`) is covered separately by `test_create_label_*` tests.

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
D Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +80 to +99
"i",
"m",
"p",
"o",
"r",
"t",
" ",
"o",
"s",
",",
" ",
"s",
"o",
"c",
"k",
"e",
"t",
"\n",
"\n",
"\n",
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code cell’s source is serialized as one character per list element (e.g., "i", "m", "p", ...), which will run as invalid Python in Jupyter. Re-save/re-export the notebook so source is a single string or a list of full lines (each typically ending with \n).

Suggested change
"i",
"m",
"p",
"o",
"r",
"t",
" ",
"o",
"s",
",",
" ",
"s",
"o",
"c",
"k",
"e",
"t",
"\n",
"\n",
"\n",
"import os, socket\n",
"\n",
"\n",

Copilot uses AI. Check for mistakes.
Comment on lines +1497 to +1518
"source": [
"i",
"f",
" ",
"n",
"o",
"t",
" ",
"o",
"s",
".",
"e",
"n",
"v",
"i",
"r",
"o",
"n",
".",
"g",
"e",
"t",
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code cell’s source is also serialized character-by-character, which will fail to execute as Python. Re-export the notebook so the cell source is stored as a single string or a list of full code lines.

Suggested change
"source": [
"i",
"f",
" ",
"n",
"o",
"t",
" ",
"o",
"s",
".",
"e",
"n",
"v",
"i",
"r",
"o",
"n",
".",
"g",
"e",
"t",
"source": "if not os.environ.get(\"OPENAI_API_KEY\"):\n print(\"Skipping GraphCypherQAChain example because OPENAI_API_KEY is not set.\")\nelse:\n from langchain.chains import GraphCypherQAChain\n from langchain_openai import ChatOpenAI\n\n chain = GraphCypherQAChain.from_llm(\n ChatOpenAI(temperature=0),\n graph=graph,\n verbose=False,\n allow_dangerous_requests=True,\n )\n\n result = chain.invoke({\"query\": f\"Which nodes are connected to {tag}?\"})\n print(result)\n"
"t",

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +102
"i",
"m",
"p",
"o",
"r",
"t",
" ",
"o",
"s",
",",
" ",
"s",
"o",
"c",
"k",
"e",
"t",
"\n",
"\n",
"\n",
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code cell’s source is serialized one character per list element, which will execute as invalid Python in Jupyter. Please re-save/re-export the notebook so the source is a single string or a list of full lines (with \n).

Suggested change
"i",
"m",
"p",
"o",
"r",
"t",
" ",
"o",
"s",
",",
" ",
"s",
"o",
"c",
"k",
"e",
"t",
"\n",
"\n",
"\n",
"import os, socket\n",
"\n",
"\n",

Copilot uses AI. Check for mistakes.
"id": "a1b2c3d4-0000-0000-0000-000000000007",
"metadata": {},
"outputs": [],
"source": "import uuid\n\nDEMO_TAG = os.environ.get(\"COORDINODE_DEMO_TAG\") or f\"seed_data_{uuid.uuid4().hex[:8]}\"\nprint(\"Using DEMO_TAG:\", DEMO_TAG)\n# Delete relationships first, then nodes — two-step ensures correct cleanup across\n# all CoordiNode versions (embedded and server).\nclient.cypher(\n \"MATCH (n {demo: true, demo_tag: $tag})-[r]-() DELETE r\",\n params={\"tag\": DEMO_TAG},\n)\nclient.cypher(\n \"MATCH (n {demo: true, demo_tag: $tag}) DELETE n\",\n params={\"tag\": DEMO_TAG},\n)\nprint(\"Previous demo data removed\")"
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup query uses an undirected pattern (-[r]-()) to delete relationships before deleting nodes. In CoordiNode this can match the same relationship from both endpoints and cause duplicate-delete/cleanup failures (this PR already switched other cleanups to DETACH DELETE for that reason). Consider replacing both statements with a single MATCH (n {demo: true, demo_tag: $tag}) DETACH DELETE n, or at least WITH DISTINCT r DELETE r to de-duplicate relationships.

Suggested change
"source": "import uuid\n\nDEMO_TAG = os.environ.get(\"COORDINODE_DEMO_TAG\") or f\"seed_data_{uuid.uuid4().hex[:8]}\"\nprint(\"Using DEMO_TAG:\", DEMO_TAG)\n# Delete relationships first, then nodes — two-step ensures correct cleanup across\n# all CoordiNode versions (embedded and server).\nclient.cypher(\n \"MATCH (n {demo: true, demo_tag: $tag})-[r]-() DELETE r\",\n params={\"tag\": DEMO_TAG},\n)\nclient.cypher(\n \"MATCH (n {demo: true, demo_tag: $tag}) DELETE n\",\n params={\"tag\": DEMO_TAG},\n)\nprint(\"Previous demo data removed\")"
"source": "import uuid\n\nDEMO_TAG = os.environ.get(\"COORDINODE_DEMO_TAG\") or f\"seed_data_{uuid.uuid4().hex[:8]}\"\nprint(\"Using DEMO_TAG:\", DEMO_TAG)\n# Remove prior demo nodes and any attached relationships in one step to avoid\n# duplicate relationship matches during cleanup.\nclient.cypher(\n \"MATCH (n {demo: true, demo_tag: $tag}) DETACH DELETE n\",\n params={\"tag\": DEMO_TAG},\n)\nprint(\"Previous demo data removed\")"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@demo/notebooks/01_llama_index_property_graph.ipynb`:
- Around line 272-585: The startup eagerly converts COORDINODE_PORT into
GRPC_PORT using int(...) before choosing the connection path, so a malformed
port will crash even when COORDINODE_ADDR is present; change the logic in the
module that defines GRPC_PORT (the int(os.environ.get("COORDINODE_PORT",
"7080")) expression) to delay or guard parsing: read the raw env string (e.g.,
coord_port_str = os.environ.get("COORDINODE_PORT", "7080")) and only call
int(...) when needed for the branch that probes the port (_port_open or
similar), or wrap the int(...) in a try/except and fall back to the default 7080
so a bad COORDINODE_PORT does not abort startup when COORDINODE_ADDR is used.
Ensure references to GRPC_PORT/_port_open and COORDINODE_ADDR remain consistent
after the change.

In `@demo/notebooks/02_langchain_graph_chain.ipynb`:
- Around line 269-582: The code eagerly parses COORDINODE_PORT into GRPC_PORT
with int(...) before checking COORDINODE_ADDR, causing startup to fail on
malformed COORDINODE_PORT; change GRPC_PORT to keep the raw string (e.g.,
GRPC_PORT = os.environ.get("COORDINODE_PORT", "7080")) or set it to None and
only convert to int when actually needed (wrap conversion in try/except at the
point _port_open or the branch that uses it), and ensure the branch that uses
COORDINODE_ADDR (and the CoordinodeClient(...) construction) does not trigger
int(...) on a bad value.

In `@demo/notebooks/03_langgraph_agent.ipynb`:
- Line 124: query_facts currently unconditionally replaces any explicit "LIMIT
<n>" with "LIMIT 20", breaking caller intent (e.g. LIMIT 1 -> 20); update the
logic in query_facts that uses _LIMIT_RE so it only increases a numeric limit
when the captured n > 20 and otherwise preserves the original LIMIT; implement
by matching the LIMIT number via _LIMIT_RE (capture the digits), parse to int,
and if >20 substitute "LIMIT 20" (or leave the original match untouched), and if
no LIMIT exists append " LIMIT 20" as before; reference symbols: function
query_facts and regex _LIMIT_RE.

In `@tests/integration/test_sdk.py`:
- Around line 613-637: Wrap the index creation and test actions in a try/finally
so cleanup always runs even if create_text_index or later steps raise;
specifically, move the call to client.create_text_index(idx_name, label, "body")
inside the try, and in the finally first attempt to drop_text_index(idx_name)
inside its own try/except (so a failing drop doesn't prevent node deletion),
then always run client.cypher(... DELETE ...) to remove the seeded node; apply
the same pattern to the fuzzy and hybrid tests that call create_text_index /
drop_text_index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 489df211-a20a-46f0-b061-6e6becd47698

📥 Commits

Reviewing files that changed from the base of the PR and between 7534b14 and d75026e.

📒 Files selected for processing (5)
  • coordinode/coordinode/client.py
  • demo/notebooks/01_llama_index_property_graph.ipynb
  • demo/notebooks/02_langchain_graph_chain.ipynb
  • demo/notebooks/03_langgraph_agent.ipynb
  • tests/integration/test_sdk.py

Comment on lines +272 to +585
"G",
"R",
"P",
"C",
"_",
"P",
"O",
"R",
"T",
" ",
"=",
" ",
"i",
"n",
"t",
"(",
"o",
"s",
".",
"e",
"n",
"v",
"i",
"r",
"o",
"n",
".",
"g",
"e",
"t",
"(",
"\"",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"P",
"O",
"R",
"T",
"\"",
",",
" ",
"\"",
"7",
"0",
"8",
"0",
"\"",
")",
")",
"\n",
"\n",
"i",
"f",
" ",
"o",
"s",
".",
"e",
"n",
"v",
"i",
"r",
"o",
"n",
".",
"g",
"e",
"t",
"(",
"\"",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
"\"",
")",
":",
"\n",
" ",
" ",
" ",
" ",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
" ",
"=",
" ",
"o",
"s",
".",
"e",
"n",
"v",
"i",
"r",
"o",
"n",
"[",
"\"",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
"\"",
"]",
"\n",
" ",
" ",
" ",
" ",
"f",
"r",
"o",
"m",
" ",
"c",
"o",
"o",
"r",
"d",
"i",
"n",
"o",
"d",
"e",
" ",
"i",
"m",
"p",
"o",
"r",
"t",
" ",
"C",
"o",
"o",
"r",
"d",
"i",
"n",
"o",
"d",
"e",
"C",
"l",
"i",
"e",
"n",
"t",
"\n",
"\n",
" ",
" ",
" ",
" ",
"c",
"l",
"i",
"e",
"n",
"t",
" ",
"=",
" ",
"C",
"o",
"o",
"r",
"d",
"i",
"n",
"o",
"d",
"e",
"C",
"l",
"i",
"e",
"n",
"t",
"(",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
")",
"\n",
" ",
" ",
" ",
" ",
"p",
"r",
"i",
"n",
"t",
"(",
"f",
"\"",
"C",
"o",
"n",
"n",
"e",
"c",
"t",
"e",
"d",
" ",
"t",
"o",
" ",
"{",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
"}",
"\"",
")",
"\n",
"e",
"l",
"i",
"f",
" ",
"_",
"p",
"o",
"r",
"t",
"_",
"o",
"p",
"e",
"n",
"(",
"G",
"R",
"P",
"C",
"_",
"P",
"O",
"R",
"T",
")",
":",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid eager COORDINODE_PORT parsing before connection-path selection.

int(os.environ.get("COORDINODE_PORT", "7080")) is evaluated before the COORDINODE_ADDR branch. A malformed port value will crash startup even when Line 332 has a valid COORDINODE_ADDR.

Suggested fix
-GRPC_PORT = int(os.environ.get("COORDINODE_PORT", "7080"))
-
-if os.environ.get("COORDINODE_ADDR"):
+if os.environ.get("COORDINODE_ADDR"):
     COORDINODE_ADDR = os.environ["COORDINODE_ADDR"]
     from coordinode import CoordinodeClient
     client = CoordinodeClient(COORDINODE_ADDR)
     print(f"Connected to {COORDINODE_ADDR}")
-elif _port_open(GRPC_PORT):
-    COORDINODE_ADDR = f"localhost:{GRPC_PORT}"
+else:
+    try:
+        grpc_port = int(os.environ.get("COORDINODE_PORT", "7080"))
+    except ValueError as exc:
+        raise RuntimeError("COORDINODE_PORT must be an integer") from exc
+
+    if _port_open(grpc_port):
+        COORDINODE_ADDR = f"localhost:{grpc_port}"
+        from coordinode import CoordinodeClient
+        client = CoordinodeClient(COORDINODE_ADDR)
+        print(f"Connected to {COORDINODE_ADDR}")
+    else:
+        # embedded fallback...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/01_llama_index_property_graph.ipynb` around lines 272 - 585,
The startup eagerly converts COORDINODE_PORT into GRPC_PORT using int(...)
before choosing the connection path, so a malformed port will crash even when
COORDINODE_ADDR is present; change the logic in the module that defines
GRPC_PORT (the int(os.environ.get("COORDINODE_PORT", "7080")) expression) to
delay or guard parsing: read the raw env string (e.g., coord_port_str =
os.environ.get("COORDINODE_PORT", "7080")) and only call int(...) when needed
for the branch that probes the port (_port_open or similar), or wrap the
int(...) in a try/except and fall back to the default 7080 so a bad
COORDINODE_PORT does not abort startup when COORDINODE_ADDR is used. Ensure
references to GRPC_PORT/_port_open and COORDINODE_ADDR remain consistent after
the change.

Comment on lines +269 to +582
"G",
"R",
"P",
"C",
"_",
"P",
"O",
"R",
"T",
" ",
"=",
" ",
"i",
"n",
"t",
"(",
"o",
"s",
".",
"e",
"n",
"v",
"i",
"r",
"o",
"n",
".",
"g",
"e",
"t",
"(",
"\"",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"P",
"O",
"R",
"T",
"\"",
",",
" ",
"\"",
"7",
"0",
"8",
"0",
"\"",
")",
")",
"\n",
"\n",
"i",
"f",
" ",
"o",
"s",
".",
"e",
"n",
"v",
"i",
"r",
"o",
"n",
".",
"g",
"e",
"t",
"(",
"\"",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
"\"",
")",
":",
"\n",
" ",
" ",
" ",
" ",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
" ",
"=",
" ",
"o",
"s",
".",
"e",
"n",
"v",
"i",
"r",
"o",
"n",
"[",
"\"",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
"\"",
"]",
"\n",
" ",
" ",
" ",
" ",
"f",
"r",
"o",
"m",
" ",
"c",
"o",
"o",
"r",
"d",
"i",
"n",
"o",
"d",
"e",
" ",
"i",
"m",
"p",
"o",
"r",
"t",
" ",
"C",
"o",
"o",
"r",
"d",
"i",
"n",
"o",
"d",
"e",
"C",
"l",
"i",
"e",
"n",
"t",
"\n",
"\n",
" ",
" ",
" ",
" ",
"c",
"l",
"i",
"e",
"n",
"t",
" ",
"=",
" ",
"C",
"o",
"o",
"r",
"d",
"i",
"n",
"o",
"d",
"e",
"C",
"l",
"i",
"e",
"n",
"t",
"(",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
")",
"\n",
" ",
" ",
" ",
" ",
"p",
"r",
"i",
"n",
"t",
"(",
"f",
"\"",
"C",
"o",
"n",
"n",
"e",
"c",
"t",
"e",
"d",
" ",
"t",
"o",
" ",
"{",
"C",
"O",
"O",
"R",
"D",
"I",
"N",
"O",
"D",
"E",
"_",
"A",
"D",
"D",
"R",
"}",
"\"",
")",
"\n",
"e",
"l",
"i",
"f",
" ",
"_",
"p",
"o",
"r",
"t",
"_",
"o",
"p",
"e",
"n",
"(",
"G",
"R",
"P",
"C",
"_",
"P",
"O",
"R",
"T",
")",
":",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid eager COORDINODE_PORT parsing before branch on COORDINODE_ADDR.

Line 269 is evaluated before Line 329. If COORDINODE_PORT is malformed, notebook startup fails even when COORDINODE_ADDR is valid and should be used directly.

Suggested fix
-GRPC_PORT = int(os.environ.get("COORDINODE_PORT", "7080"))
-
-if os.environ.get("COORDINODE_ADDR"):
+if os.environ.get("COORDINODE_ADDR"):
     COORDINODE_ADDR = os.environ["COORDINODE_ADDR"]
     from coordinode import CoordinodeClient
     client = CoordinodeClient(COORDINODE_ADDR)
     print(f"Connected to {COORDINODE_ADDR}")
-elif _port_open(GRPC_PORT):
-    COORDINODE_ADDR = f"localhost:{GRPC_PORT}"
+else:
+    try:
+        grpc_port = int(os.environ.get("COORDINODE_PORT", "7080"))
+    except ValueError as exc:
+        raise RuntimeError("COORDINODE_PORT must be an integer") from exc
+
+    if _port_open(grpc_port):
+        COORDINODE_ADDR = f"localhost:{grpc_port}"
+        from coordinode import CoordinodeClient
+        client = CoordinodeClient(COORDINODE_ADDR)
+        print(f"Connected to {COORDINODE_ADDR}")
+    else:
+        # embedded fallback...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/02_langchain_graph_chain.ipynb` around lines 269 - 582, The
code eagerly parses COORDINODE_PORT into GRPC_PORT with int(...) before checking
COORDINODE_ADDR, causing startup to fail on malformed COORDINODE_PORT; change
GRPC_PORT to keep the raw string (e.g., GRPC_PORT =
os.environ.get("COORDINODE_PORT", "7080")) or set it to None and only convert to
int when actually needed (wrap conversion in try/except at the point _port_open
or the branch that uses it), and ensure the branch that uses COORDINODE_ADDR
(and the CoordinodeClient(...) construction) does not trigger int(...) on a bad
value.

"metadata": {},
"outputs": [],
"source": [
"import os, re, uuid\nfrom langchain_core.tools import tool\n\nSESSION = uuid.uuid4().hex # isolates this demo's data from other sessions\n\n_REL_TYPE_RE = re.compile(r\"[A-Z_][A-Z0-9_]*\")\n# Regex guards for query_facts (demo safety guard).\n_WRITE_CLAUSE_RE = re.compile(\n r\"\\b(CREATE|MERGE|DELETE|DETACH|SET|REMOVE|DROP|CALL|LOAD)\\b\",\n re.IGNORECASE | re.DOTALL,\n)\n# NOTE: this guard checks that AT LEAST ONE node pattern carries session scope.\n# A Cartesian-product query such as `MATCH (n), (m {session: $sess}) RETURN n`\n# would pass yet return unscoped rows for `n`. A complete per-alias check would\n# require parsing the Cypher AST, which is out of scope for a demo safety guard.\n# In production code, use server-side row-level security instead of client regex.\n_SESSION_WHERE_SCOPE_RE = re.compile(\n r\"WHERE\\b[^;{}]*\\.session\\s*=\\s*\\$sess\",\n re.IGNORECASE | re.DOTALL,\n)\n_SESSION_NODE_SCOPE_RE = re.compile(\n r\"\\([^)]*\\{[^}]*session\\s*:\\s*\\$sess[^}]*\\}[^)]*\\)\",\n re.IGNORECASE | re.DOTALL,\n)\n\n\n@tool\ndef save_fact(subject: str, relation: str, obj: str) -> str:\n \"\"\"Save a fact (subject → relation → object) into the knowledge graph.\n Example: save_fact('Alice', 'WORKS_AT', 'Acme Corp')\"\"\"\n rel_type = relation.upper().replace(\" \", \"_\")\n # Validate rel_type before interpolating into Cypher to prevent injection.\n if not _REL_TYPE_RE.fullmatch(rel_type):\n return f\"Invalid relation type {relation!r}: only letters, digits, and underscores allowed\"\n client.cypher(\n f\"MERGE (a:Entity {{name: $s, session: $sess}}) \"\n f\"MERGE (b:Entity {{name: $o, session: $sess}}) \"\n f\"MERGE (a)-[r:{rel_type}]->(b)\",\n params={\"s\": subject, \"o\": obj, \"sess\": SESSION},\n )\n return f\"Saved: {subject} -[{rel_type}]-> {obj}\"\n\n\n@tool\ndef query_facts(cypher: str) -> str:\n \"\"\"Run a read-only Cypher MATCH query against the knowledge graph.\n Must scope reads via either WHERE <alias>.session = $sess\n or a node pattern {session: $sess}.\"\"\"\n q = cypher.strip()\n if _WRITE_CLAUSE_RE.search(q):\n return \"Only read-only Cypher is allowed in query_facts.\"\n # Require $sess in a WHERE clause or node pattern, not just anywhere.\n # Accepts both: WHERE n.session = $sess and MATCH (n {session: $sess})\n if not (_SESSION_WHERE_SCOPE_RE.search(q) or _SESSION_NODE_SCOPE_RE.search(q)):\n return \"Query must scope reads to the current session with either WHERE <alias>.session = $sess or {session: $sess}\"\n # Enforce a result cap so agents cannot dump the entire graph.\n # Cap or append LIMIT 20 before execution so the server never materializes\n # more than 20 rows, even when the LLM writes an explicit large LIMIT.\n _LIMIT_RE = re.compile(r\"\\bLIMIT\\s+\\d+\\b\", re.IGNORECASE)\n if _LIMIT_RE.search(q):\n q = _LIMIT_RE.sub(\"LIMIT 20\", q)\n else:\n q = q.rstrip().rstrip(\";\") + \" LIMIT 20\"\n rows = client.cypher(q, params={\"sess\": SESSION})\n return str(rows) if rows else \"No results\"\n\n\n@tool\ndef find_related(entity_name: str, depth: int = 1) -> str:\n \"\"\"Find all entities reachable from entity_name within the given number of hops (max 3).\"\"\"\n safe_depth = max(1, min(int(depth), 3))\n rows = client.cypher(\n f\"MATCH p=(n:Entity {{name: $name, session: $sess}})-[*1..{safe_depth}]->(m:Entity {{session: $sess}}) \"\n \"WHERE ALL(x IN nodes(p) WHERE x.session = $sess) \"\n \"RETURN m.name AS related, type(last(relationships(p))) AS via LIMIT 20\",\n params={\"name\": entity_name, \"sess\": SESSION},\n )\n if not rows:\n return f\"No related entities found for {entity_name}\"\n return \"\\n\".join(f\"{r['via']} -> {r['related']}\" for r in rows)\n\n\n@tool\ndef list_all_facts() -> str:\n \"\"\"List every fact stored in the current session's knowledge graph.\"\"\"\n rows = client.cypher(\n \"MATCH (a:Entity {session: $sess})-[r]->(b:Entity {session: $sess}) \"\n \"RETURN a.name AS subject, type(r) AS relation, b.name AS object\",\n params={\"sess\": SESSION},\n )\n if not rows:\n return \"No facts stored yet\"\n return \"\\n\".join(f\"{r['subject']} -[{r['relation']}]-> {r['object']}\" for r in rows)\n\n\ntools = [save_fact, query_facts, find_related, list_all_facts]\nprint(f\"Session: {SESSION}\")\nprint(\"Tools:\", [t.name for t in tools])"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

query_facts() currently expands smaller explicit LIMITs to 20.

At Line 124, replacing all LIMIT <n> with LIMIT 20 changes caller intent (e.g., LIMIT 1 becomes 20). Cap only when n > 20, and preserve smaller limits.

🔧 Proposed fix
-    _LIMIT_RE = re.compile(r"\bLIMIT\s+\d+\b", re.IGNORECASE)
-    if _LIMIT_RE.search(q):
-        q = _LIMIT_RE.sub("LIMIT 20", q)
+    _LIMIT_RE = re.compile(r"\bLIMIT\s+(\d+)\b", re.IGNORECASE)
+    def _cap_limit(m):
+        return f"LIMIT {min(int(m.group(1)), 20)}"
+    if _LIMIT_RE.search(q):
+        q = _LIMIT_RE.sub(_cap_limit, q)
     else:
         q = q.rstrip().rstrip(";") + " LIMIT 20"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/notebooks/03_langgraph_agent.ipynb` at line 124, query_facts currently
unconditionally replaces any explicit "LIMIT <n>" with "LIMIT 20", breaking
caller intent (e.g. LIMIT 1 -> 20); update the logic in query_facts that uses
_LIMIT_RE so it only increases a numeric limit when the captured n > 20 and
otherwise preserves the original LIMIT; implement by matching the LIMIT number
via _LIMIT_RE (capture the digits), parse to int, and if >20 substitute "LIMIT
20" (or leave the original match untouched), and if no LIMIT exists append "
LIMIT 20" as before; reference symbols: function query_facts and regex
_LIMIT_RE.

Comment on lines +613 to +637
rows = client.cypher(
f"CREATE (n:{label} {{tag: $tag, body: 'machine learning and neural networks'}}) RETURN n AS node_id",
params={"tag": tag},
)
seed_id = rows[0]["node_id"]
# Text index must be created explicitly; nodes written before index creation
# are indexed immediately at DDL time.
idx_info = client.create_text_index(idx_name, label, "body")
assert isinstance(idx_info, TextIndexInfo)
try:
results = client.text_search(label, "machine learning", limit=5)
assert isinstance(results, list)
assert results, "text_search returned no results after index creation"
assert any(r.node_id == seed_id for r in results), (
f"seeded node {seed_id} not found in text_search results: {results}"
)
r = results[0]
assert isinstance(r, TextResult)
assert isinstance(r.node_id, int)
assert isinstance(r.score, float)
assert r.score > 0
assert isinstance(r.snippet, str)
finally:
client.drop_text_index(idx_name)
client.cypher(f"MATCH (n:{label} {{tag: $tag}}) DELETE n", params={"tag": tag})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the FTS cleanup unconditional.

create_text_index() runs before the try, so any exception there skips cleanup entirely. drop_text_index() also runs before node deletion, so a failed drop leaves the seeded node behind. On shared/reused test servers, this leaks data across runs.

🧹 Safer cleanup pattern
-    client.create_text_index(idx_name, label, "body")
-    try:
+    idx_created = False
+    try:
+        client.create_text_index(idx_name, label, "body")
+        idx_created = True
         results = client.text_search(label, "machine learning", limit=5)
         ...
     finally:
-        client.drop_text_index(idx_name)
-        client.cypher(f"MATCH (n:{label} {{tag: $tag}}) DELETE n", params={"tag": tag})
+        try:
+            if idx_created:
+                client.drop_text_index(idx_name)
+        finally:
+            client.cypher(f"MATCH (n:{label} {{tag: $tag}}) DELETE n", params={"tag": tag})

Apply the same pattern to the fuzzy and hybrid tests.

Also applies to: 673-685, 696-722

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_sdk.py` around lines 613 - 637, Wrap the index
creation and test actions in a try/finally so cleanup always runs even if
create_text_index or later steps raise; specifically, move the call to
client.create_text_index(idx_name, label, "body") inside the try, and in the
finally first attempt to drop_text_index(idx_name) inside its own try/except (so
a failing drop doesn't prevent node deletion), then always run client.cypher(...
DELETE ...) to remove the seeded node; apply the same pattern to the fuzzy and
hybrid tests that call create_text_index / drop_text_index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants