Add runHMMER function for HMMER analysis by AbhirupaGhosh · Pull Request #18 · JRaviLab/amRdata

AbhirupaGhosh · 2026-03-30T22:12:08Z

These functions will be added to data_processing.R once approved.

The scripts are modifications of @epbrenner 's hmmering and rhmmer.

Description

What kind of change(s) are included?

Feature (adds or updates new capabilities)
Bug fix (fixes an issue).
Enhancement (adds functionality).
Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

I have read and followed the CONTRIBUTING.md guidelines.
I have searched for existing content to ensure this is not a duplicate.
I have performed a self-review of these additions (including spelling, grammar, and related).
I have added comments to my code to help provide understanding.
I have added a test which covers the code changes found within this PR.
I have deleted all non-relevant text in this pull request template.
Reviewer assignment: Tag a relevant team member to review and approve the changes.

@epbrenner

These functions will be added to data_processing.R once approved. The scripts are modifications of @epbrenner 's hmmering and rhmmer.

Removed redundant line reading from file as it was already handled earlier in the code.

jananiravi

It looks like you moved a line -- unless I'm missing something, it's good to merge!

jananiravi

I see that I commented on one commit earlier -- sorry about that!

In principle, it looks good. I would like to request @eboyer221 or @epbrenner to run this locally and suggest non-alpine placeholders to ensure this works for all!

jananiravi · 2026-04-03T00:02:45Z

R/data_curation.R

  combined_drug_data <- unlist(batch_drug_data, use.names = FALSE)
-  if (length(combined_drug_data) == 0) { message("No drug data returned."); return(NULL) }
+  if (length(combined_drug_data) == 0) {
+    message("No drug data returned.")


found or returned?

jananiravi · 2026-04-03T00:03:03Z

R/data_curation.R

  combined_genome_data <- unlist(batch_genome_data, use.names = FALSE)
-  if (length(combined_genome_data) == 0) { message("No genome data returned."); return(NULL) }
+  if (length(combined_genome_data) == 0) {
+    message("No genome data returned.")


found/returned/retrieved? same Q as before.

jananiravi · 2026-04-03T00:05:58Z

R/runHMMER.R

+    chunk_size <- ceiling(length(records) / chunk_count)
+    chunks <- split(records, ceiling(seq_along(records) / chunk_size))
+
+    purrr::walk2(chunks, seq_along(chunks), function(chunk, i) {


https://tidyverse.org/blog/2023/05/purrr-walk-this-way/ nice!

jananiravi · 2026-04-03T00:06:45Z

R/runHMMER.R

+      "exec",
+      "-B", paste0(mount_host, ":", mount_cont),
+      "-B", paste0(db_host_dir, ":", db_cont_dir),
+      "/scratch/alpine/aghosh5@xsede.org/software/hmmer_latest.sif",


⚠️ hardcoded path alert

jananiravi · 2026-04-03T00:07:13Z

R/runHMMER.R

+
+  message("Combined parquet written")
+
+  # arrow::read_parquet("/scratch/alpine/aghosh5@xsede.org/AMR/data/Campylobacter_jejuni/protein_COG_count.parquet") |> DBI::dbWriteTable(conn=con, name="protein_COG_count")


hardcoded path alert. cannot be part of the public amRdata repo.

jananiravi · 2026-04-03T00:08:20Z

vignettes/intro.Rmd

+  cdhit_extra_args = c("-g", "1"),
+  cdhit_output_prefix = "cdhit_out",
+  # InterPro
+  ipr_appl = c("Pfam"),


user can switch: Pfam vs. something else? @AbhirupaGhosh @epbrenner

charmvang · 2026-04-10T21:24:05Z

R/runHMMER.R

+
+.runHMMER <- function(duckdb_path,
+                      output_path,
+                      threads = 0,


Suggested change

threads = 0,

threads = 1,

n_workers = 1,

charmvang · 2026-04-10T21:24:40Z

R/runHMMER.R

+  # number of parallel jobs (NOT threads per hmmscan)
+  n_workers <- 4
+
+  # threads per hmmscan
+  threads <- 8


Suggested change

# number of parallel jobs (NOT threads per hmmscan)

n_workers <- 4

# threads per hmmscan

threads <- 8

charmvang · 2026-04-10T21:32:24Z

Just had a thought, we have to run each HMMER database in the function separately and then combine the outputs later?

Add runHMMER function for HMMER analysis

423b920

These functions will be added to data_processing.R once approved. The scripts are modifications of @epbrenner 's hmmering and rhmmer.

AbhirupaGhosh self-assigned this Mar 30, 2026

AbhirupaGhosh requested review from charmvang, epbrenner and jananiravi March 30, 2026 22:12

AbhirupaGhosh and others added 3 commits March 30, 2026 22:16

Style code (GHA)

f030ea1

Remove duplicate line reading in runHMMER.R

aee23f2

Removed redundant line reading from file as it was already handled earlier in the code.

Style code (GHA)

c272d3a

jananiravi approved these changes Apr 3, 2026

View reviewed changes

jananiravi reviewed Apr 3, 2026

View reviewed changes

charmvang approved these changes Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add runHMMER function for HMMER analysis#18

Add runHMMER function for HMMER analysis#18
AbhirupaGhosh wants to merge 4 commits intomainfrom
add-hmmer

AbhirupaGhosh commented Mar 30, 2026

Uh oh!

jananiravi left a comment

Uh oh!

jananiravi left a comment

Uh oh!

jananiravi Apr 3, 2026

Uh oh!

jananiravi Apr 3, 2026

Uh oh!

jananiravi Apr 3, 2026

Uh oh!

jananiravi Apr 3, 2026

Uh oh!

jananiravi Apr 3, 2026

Uh oh!

jananiravi Apr 3, 2026

Uh oh!

charmvang Apr 10, 2026

Uh oh!

charmvang Apr 10, 2026

Uh oh!

charmvang commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		message("Combined parquet written")

		# arrow::read_parquet("/scratch/alpine/aghosh5@xsede.org/AMR/data/Campylobacter_jejuni/protein_COG_count.parquet") \|> DBI::dbWriteTable(conn=con, name="protein_COG_count")

Conversation

AbhirupaGhosh commented Mar 30, 2026

Description

What kind of change(s) are included?

Checklist

Uh oh!

jananiravi left a comment

Choose a reason for hiding this comment

Uh oh!

jananiravi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charmvang commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants