A command-line interface for querying the SPOKE (Scalable Precision medicine Open Knowledge Engine) biomedical knowledge graph via Neo4j/Cypher.
Author: Wanjun Gu · wanjun.gu@ucsf.edu
SPOKE is a large-scale biomedical knowledge graph developed at UCSF that integrates data from dozens of public databases — connecting diseases, genes, proteins, compounds, pathways, symptoms, variants, anatomy, and more into a unified graph. spoke-cli provides a simple terminal interface to run read-only Cypher queries against SPOKE and export results as JSON or CSV.
- Rust (edition 2024, via
rustup) - Network access to the SPOKE Neo4j instance
git clone https://github.com/BaranziniLab/spoke-cli
cd spoke-cli
cargo build --releaseThe compiled binary will be at target/release/spoke-cli. Optionally move it onto your PATH:
cp target/release/spoke-cli /usr/local/bin/spoke-cliCredentials are loaded from a .env file in the current working directory:
KNOWLEDGE_GRAPH_URI=bolt://<host>:<port>
KNOWLEDGE_GRAPH_USERNAME=<username>
KNOWLEDGE_GRAPH_PASSWORD=<password>
KNOWLEDGE_GRAPH_DATABASE=<database>| Variable | Description | Example |
|---|---|---|
KNOWLEDGE_GRAPH_URI |
Bolt URI to the Neo4j instance | bolt://<host>:<port> |
KNOWLEDGE_GRAPH_USERNAME |
Neo4j username | <username> |
KNOWLEDGE_GRAPH_PASSWORD |
Neo4j password | <password> |
KNOWLEDGE_GRAPH_DATABASE |
Target database name | <database> |
Verifies connectivity and credentials to the Neo4j instance.
spoke-cli test-connectionConnecting to bolt://<host>:<port> ... OK
uri : bolt://<host>:<port>
database : <database>
user : <username>
Introspects the database schema — node labels, relationship types, and property keys — and returns the result as JSON.
# Print schema to stdout
spoke-cli glimpse-knowledge-graph
# Save schema to a file
spoke-cli glimpse-knowledge-graph --output schema.json| Output Field | Description |
|---|---|
node_labels |
All node types in the graph (e.g. Gene, Disease) |
relationship_types |
All edge types (e.g. ASSOCIATES_DaG) |
property_keys |
All property names used across the graph |
node_type_properties |
Per-label property schemas with types |
relationship_type_properties |
Per-relationship property schemas |
Executes a read-only Cypher query. Results are saved to a file by default; use --stdout to print instead.
spoke-cli query '<CYPHER>' [OPTIONS]| Flag | Description | Default |
|---|---|---|
--output <FILE> |
Output file name (extension auto-appended if missing) | <random-hash>.<format> |
--format <FMT> |
Output format: json or csv |
json |
--stdout |
Print results to stdout instead of saving to a file | off |
Write operations (
CREATE,MERGE,SET,DELETE,DROP, etc.) are blocked regardless of credentials.
Check connectivity
spoke-cli test-connectionExplore available node types
spoke-cli query "CALL db.labels() YIELD label RETURN label ORDER BY label" --stdoutQuery disease nodes
spoke-cli query "MATCH (d:Disease) RETURN d.name, d.identifier LIMIT 10" --stdoutMultiple sclerosis subnetwork — find the disease node
spoke-cli query \
"MATCH (d:Disease) WHERE d.name =~ '(?i).*multiple sclerosis.*' RETURN d.name, d.identifier" \
--stdoutMultiple sclerosis subnetwork — all direct neighbors (1-hop)
spoke-cli query \
"MATCH (d:Disease)-[r]-(n)
WHERE d.name =~ '(?i).*multiple sclerosis.*'
RETURN d.name AS disease, type(r) AS rel_type, labels(n)[0] AS neighbor_type, n.name AS neighbor
LIMIT 200" \
--output ms_subnetwork.jsonMS-associated genes
spoke-cli query \
"MATCH (d:Disease)-[r]-(g:Gene)
WHERE d.name =~ '(?i).*multiple sclerosis.*'
RETURN d.name AS disease, type(r) AS relationship, g.name AS gene
LIMIT 100" \
--format csv --output ms_genes.csvMS-associated compounds
spoke-cli query \
"MATCH (d:Disease)-[r]-(c:Compound)
WHERE d.name =~ '(?i).*multiple sclerosis.*'
RETURN d.name AS disease, type(r) AS relationship, c.name AS compound
LIMIT 100" \
--output ms_compounds.jsonGene–protein associations
spoke-cli query \
"MATCH (g:Gene)-[r]-(p:Protein) WHERE g.name = 'BRCA1'
RETURN g.name AS gene, type(r) AS rel, p.name AS protein" \
--stdoutImmune pathways
spoke-cli query \
"MATCH (p:Pathway)-[r]-(g:Gene)
WHERE p.name CONTAINS 'immune'
RETURN p.name AS pathway, g.name AS gene LIMIT 50" \
--format csv --output immune_pathways.csvSave full schema
spoke-cli glimpse-knowledge-graph --output spoke_schema.jsonSPOKE integrates data across 42+ node types:
| Category | Node Types |
|---|---|
| Molecular | Gene, Protein, Compound, MiRNA, Complex, ProteinDomain, ProteinFamily |
| Disease/Health | Disease, Symptom, SideEffect, PharmacologicClass |
| Biological | BiologicalProcess, MolecularFunction, CellularComponent, Pathway, Reaction |
| Cellular | Anatomy, CellType, AnatomyCellType, CellLine |
| Genomic | Variant, Chromosome, Haplotype, PanGene |
| Dietary | Food, Nutrient, DietarySupplement |
| Other | Organism, EC, Location, SDoH, Environment |
JSON — array of objects, one per row, keyed by RETURN column names or aliases:
[
{ "disease": "multiple sclerosis", "rel_type": "ASSOCIATES_DaG", "neighbor_type": "Gene", "neighbor": "HLA-DRB1" },
{ "disease": "multiple sclerosis", "rel_type": "TREATS_CtD", "neighbor_type": "Compound", "neighbor": "interferon beta-1a" }
]CSV — standard comma-separated with a header row:
disease,rel_type,neighbor_type,neighbor
"multiple sclerosis","ASSOCIATES_DaG","Gene","HLA-DRB1"
"multiple sclerosis","TREATS_CtD","Compound","interferon beta-1a"
| Crate | Purpose |
|---|---|
neo4rs |
Async Neo4j Bolt driver |
tokio |
Async runtime |
clap |
CLI argument parsing |
dotenvy |
.env file loading |
serde_json |
JSON serialization |
regex |
Cypher write-guard & column parsing |
rand |
Default output filename generation |
For research and educational use at UCSF. See the SPOKE project for data licensing terms.