Add CLI Filtering for CVs and Introduce --dry-run Option for Deployment by anshuman9468 · Pull Request #57 · dbpedia/databus-python-client

anshuman9468 · 2026-04-10T20:28:51Z

Description

This PR introduces two key enhancements to improve usability and safety:

CLI Filtering for CVs: Adds support for filtering CVs directly via the command-line interface, enabling more efficient and targeted processing.
--dry-run Deployment Option: Introduces a --dry-run flag for deployment commands, allowing users to simulate the deployment process without making any actual changes. This helps in validating configurations and preventing unintended updates.

These changes aim to improve developer experience by providing better control over CV handling and safer deployment workflows.

Related Issues
Closes #12

Summary by CodeRabbit

New Features
- Download filtering via pipe-separated filters on URIs to select content variants, formats, and compression.
- --dry-run option for deploy to generate and print metadata without performing deployment.
- completion subcommand to generate shell completions (bash, zsh, fish).
Documentation
- README updated with filter syntax, matching rules, and dry-run usage examples.
Tests
- Added unit tests covering download filter matching and URL extraction.

coderabbitai · 2026-04-10T20:29:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7951de90-5aef-4aab-aade-e06cd21cdd32

📥 Commits

Reviewing files that changed from the base of the PR and between 03c820b and d7a3a3a.

📒 Files selected for processing (1)

tests/test_filter.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_filter.py

📝 Walkthrough

Walkthrough

Adds pipe-separated download filtering to Databus URIs, implements filter-matching logic for JSON-LD Part nodes (extensions, compression, content variants), extends download flow to apply filters, adds a CLI --dry-run for deploy and a completion subcommand, and includes unit tests for filtering.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Documented new download filter URI syntax (`$URI\|filter1\|filter2`), matching rules for content variants (`key=value` or `value`), file extensions (`.ext`), compression (`..gz`), example usage, and `--dry-run` deploy flag.
Download & Filtering `databusclient/api/download.py`	Added parsing of pipe-separated filters in `download()`, propagated optional `filters: List[str]` into `_download_version`, `_download_artifact`, `_download_group`, and `_get_file_download_urls_from_artifact_jsonld`. Implemented `_matches_filters(node, filters)` to match compression (`..name`), extensions (`.ext`), and content-variant key/value or bare-value checks against `dcv:`/`dataid-cv:` fields; updated URL extraction to skip non-matching Part nodes.
CLI Enhancements `databusclient/cli.py`	Added `--dry-run` flag to `deploy` command and plumbing to generate/print JSON-LD without performing upload/deploy across Classic, Metadata-file, and WebDAV modes. Added `completion` subcommand accepting `bash
Tests `tests/test_filter.py`	New unit tests for `_matches_filters` and `_get_file_download_urls_from_artifact_jsonld`, covering empty/None filters, extension matching, compression matching, CV key/value and bare-value matching, combined filters, and negative cases.

Sequence Diagram

sequenceDiagram
    autonumber
    actor User
    participant CLI
    participant Download
    participant JSONLD
    participant Filesystem

    User->>CLI: download("base_uri|filter1|filter2")
    CLI->>Download: invoke download(base_uri, filters)
    Download->>Download: parse URI -> base_uri + filters
    Download->>JSONLD: fetch artifact/version/group JSON-LD
    JSONLD-->>Download: return `@graph` with Part nodes
    Download->>Download: for each Part -> _matches_filters(node, filters)
    alt match
        Download->>Download: collect file URLs
    else no match
        Download-->>Download: skip Part
    end
    Download->>Filesystem: download collected URLs
    Filesystem-->>Download: files saved
    Download-->>CLI: results
    CLI-->>User: report complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Download capabilities and docker image #12 — Modifies databusclient/cli.py and CLI behavior; likely related to CLI deploy/command changes.

🚥 Pre-merge checks | ✅ 1 | ❌ 4

❌ Failed checks (2 warnings, 2 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	The PR includes a new CLI `completion` subcommand that is not mentioned in the PR description or issue objectives, representing out-of-scope functionality.	Either remove the `completion` subcommand or clarify its relevance to issue `#12` and update the PR description accordingly.
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.82% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description covers both features and their benefits, but the required checklist items are not completed or checked off.	Complete the checklist items to confirm code style compliance, self-review, testing, and documentation updates have been performed.
Linked Issues check	❓ Inconclusive	The PR partially addresses issue `#12` by adding filtering support and dry-run functionality, but does not implement all the core download features (vault token authentication, Docker image) mentioned in the linked issue.	Clarify whether this PR is intended as a partial implementation of issue `#12` or if additional features are expected in separate PRs.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the two main features added in the PR: CLI filtering for CVs and a --dry-run deployment option.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@databusclient/api/download.py`:
- Around line 903-915: The filter logic in the for-loop that checks
f.startswith(".") and f.startswith("..") currently reads formatExtension and
compression from node.get(...) only, causing valid Part nodes with only a "file"
field to fail; update the checks in that loop to fall back to extracting the
extension and compression from node["file"] when node.get("formatExtension") or
node.get("compression") are empty—use the file name (node.get("file") or
node["file"]) to derive the format extension (e.g., the suffix after the last
'.') and the compression (e.g., a trailing .gz/.bz2/etc.), then compare those
derived values (lowercased) against expected in the existing f.startswith(".")
and f.startswith("..") branches so the filters match when metadata fields are
absent.

In `@databusclient/cli.py`:
- Around line 162-171: The current dry-run branch returns before building the
dataset/DataID preview so `--dry-run --webdav-url` prints an empty list; fix by
ensuring metadata is constructed even in dry-run mode: extract or reuse the same
metadata-generation logic used for real deploys (the code that produces
`metadata` from `distributions`, `remote`, `path`, and `webdav_url`) and call it
regardless of `dry_run`, but only skip the actual upload side-effect when
`dry_run` is true (i.e., call `webdav.upload_to_webdav` or a new
`build_metadata` helper in both cases or pass a dry_run flag to
`webdav.upload_to_webdav`), and remove the premature `return` so the
DataID/metadata preview is printed for dry runs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d7b368af-d320-4e10-8d56-42702b89db04

📥 Commits

Reviewing files that changed from the base of the PR and between ca7ac2d and 03c820b.

📒 Files selected for processing (4)

README.md
databusclient/api/download.py
databusclient/cli.py
tests/test_filter.py

coderabbitai · 2026-04-10T20:34:43Z

databusclient/api/download.py

+    for f in filters:
+        if f.startswith(".."):
+            # Compression filter
+            expected = f[2:].lower()
+            actual = str(node.get("compression", "")).lower()
+            if actual != expected:
+                return False
+        elif f.startswith("."):
+            # Format extension filter
+            expected = f[1:].lower()
+            actual = str(node.get("formatExtension", "")).lower()
+            if actual != expected:
+                return False


⚠️ Potential issue | 🟠 Major

Fall back to the file name when formatExtension or compression is missing.

file_format and compression are optional in published metadata, so .ttl / ..gz currently fail on otherwise valid Part nodes that only expose file. That makes the new filter syntax silently skip matching files on those datasets.

💡 Localized fix

for f in filters: if f.startswith(".."): # Compression filter expected = f[2:].lower() - actual = str(node.get("compression", "")).lower() + actual = str(node.get("compression", "")).lower() + if not actual: + actual = _detect_compression_format(str(node.get("file", ""))) or "" if actual != expected: return False elif f.startswith("."): # Format extension filter expected = f[1:].lower() - actual = str(node.get("formatExtension", "")).lower() + actual = str(node.get("formatExtension", "")).lower() + if not actual: + path = urlparse(str(node.get("file", ""))).path + basename = os.path.basename(path).lower() + basename = re.sub(r"\.(bz2|gz|xz)$", "", basename) + actual = basename.rsplit(".", 1)[-1] if "." in basename else "" if actual != expected: return False

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@databusclient/api/download.py` around lines 903 - 915, The filter logic in the for-loop that checks f.startswith(".") and f.startswith("..") currently reads formatExtension and compression from node.get(...) only, causing valid Part nodes with only a "file" field to fail; update the checks in that loop to fall back to extracting the extension and compression from node["file"] when node.get("formatExtension") or node.get("compression") are empty—use the file name (node.get("file") or node["file"]) to derive the format extension (e.g., the suffix after the last '.') and the compression (e.g., a trailing .gz/.bz2/etc.), then compare those derived values (lowercased) against expected in the existing f.startswith(".") and f.startswith("..") branches so the filters match when metadata fields are absent.

coderabbitai · 2026-04-10T20:34:44Z

databusclient/cli.py

+        if dry_run:
+            click.echo("[DRY-RUN] Skipping WebDAV upload")
+            metadata = []
+        else:
+            metadata = webdav.upload_to_webdav(distributions, remote, path, webdav_url)
+
+        if dry_run:
+            click.echo("[DRY-RUN] Generated metadata (partial):")
+            click.echo(json.dumps(metadata, indent=2))
+            return


⚠️ Potential issue | 🟠 Major

--dry-run in WebDAV mode never builds the DataID preview.

This branch returns after printing [], so it skips the same dataset-construction path that real metadata deploys use. The result is that deploy --dry-run --webdav-url ... does not actually simulate the deploy or surface metadata/DataID errors until a real run.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@databusclient/cli.py` around lines 162 - 171, The current dry-run branch returns before building the dataset/DataID preview so `--dry-run --webdav-url` prints an empty list; fix by ensuring metadata is constructed even in dry-run mode: extract or reuse the same metadata-generation logic used for real deploys (the code that produces `metadata` from `distributions`, `remote`, `path`, and `webdav_url`) and call it regardless of `dry_run`, but only skip the actual upload side-effect when `dry_run` is true (i.e., call `webdav.upload_to_webdav` or a new `build_metadata` helper in both cases or pass a dry_run flag to `webdav.upload_to_webdav`), and remove the premature `return` so the DataID/metadata preview is printed for dry runs.

feat: add CLI filtering for CVs and --dry-run for deploy

03c820b

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

fix: remove unused pytest import in tests

d7a3a3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CLI Filtering for CVs and Introduce --dry-run Option for Deployment#57

Add CLI Filtering for CVs and Introduce --dry-run Option for Deployment#57
anshuman9468 wants to merge 2 commits intodbpedia:mainfrom
anshuman9468:fix/cv-filtering-and-dryrun

anshuman9468 commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (2 warnings, 2 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 10, 2026

Uh oh!

coderabbitai bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anshuman9468 commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (2 warnings, 2 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anshuman9468 commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading