Skip to content

Add autotrack — autonomous MOT tracker optimization loop [rebase&merge]#346

Draft
Borda wants to merge 28 commits intodevelopfrom
bemch/auto-research
Draft

Add autotrack — autonomous MOT tracker optimization loop [rebase&merge]#346
Borda wants to merge 28 commits intodevelopfrom
bemch/auto-research

Conversation

@Borda
Copy link
Copy Markdown
Member

@Borda Borda commented Apr 3, 2026

Summary

MOT tracker quality depends on two largely independent axes: algorithm design and hyperparameter tuning. Most published improvements conflate them — a well-tuned weaker algorithm routinely beats a poorly-tuned stronger one, making it hard to isolate what actually matters. This PR separates the axes by adding autotrack/, an autonomous optimization loop for SORT, ByteTrack, and OC-SORT on MOT17.

The goal is both practical (better trackers, reproducible tuning) and scientific (the experiment log — including every reverted change — is itself a research artifact).

Approach

Three progressive layers build on each other:

Layer 1 — SOTA trackers with solid defaults. The existing trackers/core/ implementations of SORT, ByteTrack, and OC-SORT are already competitive out of the box. This layer is the foundation; autotrack/ does not replace it.

Layer 2 — Optuna extracts the best from the existing parameter surface. optimize_tracking.py runs an Optuna study over the tracker's exposed hyperparameters (Kalman noise scales, confidence thresholds, buffer sizes). No code changes — pure tuning. FRCNN results gain 1–2.5 HOTA points; SDP gains 2–4 points. This layer alone is useful as a standalone tuning tool and can be adopted without running the agent loop.

Layer 3 — autotrack goes beyond tuning by making algorithmic improvements. This is the novel contribution. An autonomous agent iterates over structural code changes (state representation, association strategy, camera motion compensation, Kalman mechanics), measures HOTA at fixed default parameters after each change, keeps improvements, and reverts regressions. Optuna acts as a second-pass validator after each kept change to confirm the improvement is real and not a tuning artifact. The iteration log is JSONL and captures every attempt, kept or reverted.

Human defines:  research question · metric · hard boundaries
Agent decides:  what to change · what to try next

Two tools govern the loop:

Tool Role
optimize_tracking.py --n-trials 1 Campaign metric — default params, clean code-change signal
optimize_tracking.py --n-trials N Optuna study — warm-starts from best_config.json, validates tuned ceiling

The agent is explicitly permitted to update optimize_tracking.py as the tracker architecture evolves — adding parameters that newly exist, removing ones absorbed into the implementation, tightening search ranges as knowledge accumulates.

Benchmarks

MOT17-val, full 7-sequence eval. Defaults = fixed params from default_config.json, no tuning. +Optuna = n=500 trials. +autotrack + Optuna = in progress.

FRCNN public detections (bundled, no GPU)

Config ByteTrack OC-SORT SORT
Defaults (HOTA) 50.36 49.69 49.95
+ Optuna (HOTA) 51.76 52.22 51.49
+ autotrack + Optuna (HOTA) (pending) (pending) (pending)

SDP public detections (bundled, no GPU)

Config ByteTrack OC-SORT SORT
Defaults (HOTA) 53.94 53.35 53.22
+ Optuna (HOTA) 56.12 57.75 56.08
+ autotrack + Optuna (HOTA) (pending) (pending) (pending)

Estimated ceiling with code improvements + Optuna on FRCNN: ~61.9 HOTA (vs ~56.0 for tuning alone), derived from the DetA/AssA decomposition — DetA is bounded by the detector (~0.57–0.62 for FRCNN), but AssA has substantial headroom from ~0.55 to ~0.65 via better association logic.

Hard guarantees

Three invariants are enforced by program.md and cannot be relaxed by the agent:

  • No GT leakage. The tracker sees only det/det.txt. gt/gt.txt is never accessed at inference time.
  • Reproducible detections. FRCNN and SDP detections are bundled with the MOT17 benchmark. Generated detections (RF-DETR, YOLO World X) are written to content-addressed sibling directories before any agent run — they are frozen inputs, not live inference.
  • Metrics via trackers.eval only. trackers/eval/ is out of scope for agent edits. The metric computation is identical across all iterations; the agent cannot move the goalposts.

Quick start

# 1. Install the optimize dependency group
uv sync --group optimize

# 2. Download MOT17-val (bundled detections, no GPU needed)
trackers download mot17 --split val --asset annotations,detections

# 3. Baseline: measure defaults
cd autotrack
uv run python optimize_tracking.py bytetrack frcnn --n-trials 1   # ~50 HOTA

# 4. Tune: Optuna over the parameter surface
uv run python optimize_tracking.py bytetrack frcnn --n-trials 500  # ~52 HOTA

To run the autonomous agent loop, point any coding agent at program.md:

claude
> Read program.md and start the experiment loop.

References

  • Bewley et al., SORT, ICIP 2016
  • Zhang et al., ByteTrack, ECCV 2022
  • Cao et al., OC-SORT, CVPR 2023
  • Luiten et al., HOTA, IJCV 2021
  • Akiba et al., Optuna, KDD 2019

Borda and others added 13 commits April 2, 2026 13:51
- experiments/program.md: autoresearch contract — research question, HOTA≥60 target, hard boundaries, 7 research starting points (Kalman P/R init, two-threshold association, velocity attenuation, etc.)
- experiments/optimize_tracking.py: Optuna-based metric runner; n_trials=1 evaluates defaults; multi-core via multiprocessing+SQLite; agent updates search space as architecture evolves
- experiments/README.md: motivation, approach, target analysis (HOTA ceiling derivation), pre-flight checks, references
- pyproject.toml: add `optimize` dependency group (optuna[rdb], fire)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
 autotrack/optimize_tracking.py
  - --det-tag TAG CLI arg: overrides the directory suffix for any custom detector without touching _DET_SOURCE_TO_TAG; _validate_args and
  _resolve_sequences both accept it
  - Multiprocessing progress bar: replaced pool.starmap with starmap_async + a polling loop that loads the SQLite study every 2 s and feeds a
  Rich Progress bar showing completed trials and live best HOTA (mirrors the existing single-worker callback approach)
  - Module docstring updated with --det-tag usage example

  autotrack/README.md
  - Fixed cd experiments → cd autotrack; old --tracker sort --fast → positional syntax
  - YOLO section replaced with YOLOX section (correct weights filename)
  - RF-DETR section added as a standalone step
  - New Custom detections section: dir layout, MOT format, --det-tag usage
  - Pre-flight checks table updated (removed API key row, fixed commands)
  - Fixed /optimize campaign experiments/ → autotrack/
  - Fixed broken Files table row for optimize_tracking.py

  autotrack/program.md
  - generate_detections.py added to scope_files
  - Weights filename corrected (yolox_x.pth → bytetrack_x_mot17.pth.tar)
  - RF-DETR and custom detector quickstart notes added below pre-flight table

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- generate_detections.py: remove YOLOX backend (loader, predictor, frame processing); add YOLO-World via inference-models with center→top-left coord conversion; rename rfdetr-l → rfdetr/l to match yolo_world/l slash notation
- optimize_tracking.py: swap yolox→yoloworld in _DET_SOURCE_TO_TAG; extract _run_parallel_study; fix multiline ternaries to if/else; use setattr() for dynamic Kalman attrs (mypy); pass >3 args as kwargs
- best_config.json: drop broken yolox entry (HOTA=7.7); add real Optuna results for yoloworld, rfdetr, dpm across all three trackers
- pyproject.toml: remove YOLOX git source + no-build-isolation; add inference-models>=0.19.0

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- search_space.json: expand 16 boundary-hugging parameters across all three trackers (lost_track_buffer, track_activation_threshold, minimum_iou_threshold, high_conf_det_threshold, q_scale/r_scale/p_scale, velocity_decay, q_miss_alpha, max_interpolation_gap, p_reset_threshold, direction_consistency_weight); add log=true to lost_track_buffer (all trackers) and minimum_iou_threshold (all trackers)
- optimize_tracking.py: pass log= to suggest_int so log-scale int parameters are respected
- best_config.json: bytetrack/rfdetr updated to HOTA 45.08 from new run
- uv.lock: regenerated after yolox removal

---
Co-authored-by: Claude Code <noreply@anthropic.com>
…mation (ORU)

- Add oru_enabled parameter to ByteTrackKalmanBoxTracker: on re-detection after occlusion, replay virtual predict+update cycles along linearly interpolated trajectory to re-estimate velocity
- Expose oru_enabled in optimize_tracking.py _build_tracker and _define_search_space
- Add oru_enabled to default_config.json and search_space.json

---
Co-authored-by: Claude Code <noreply@anthropic.com>
…0.05)

- Add stage2_iou_threshold=0.05 param to ByteTrackTracker; stage-1 keeps minimum_iou_threshold=0.1
- Lower stage-2 threshold recovers more low-confidence detections without breaking high-conf stage
- Expose to Optuna via search_space.json; add to default_config.json and optimize_tracking.py

---
Co-authored-by: OpenAI Codex <codex@openai.com>
…larity

- Add iou_age_weight=0.03: scale stage-1 IoU similarity by 1/(1+w*lost_frames) for each track
- Biases Hungarian assignment toward recently-seen tracks; reduces stale-prediction false matches
- iou_age_weight=0.03 is active at default params; Optuna range [0.0, 0.2] log-scale

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- Apply age discount only to cost matrix (not threshold check): raw IoU used for min-threshold gate, discount only biases solver assignment toward active tracks
- Tighten Optuna search range [0.0, 0.2] -> [0.0, 0.1]
- Fix pre-existing bug: optimize_tracking.py final re-eval now applies _apply_kalman_patch

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Apply Optuna-found parameter values as new defaults: lost_track_buffer 30→62,
track_activation_threshold 0.7→0.314, q_scale 0.01→0.00246, r_scale 0.1→0.292,
p_scale 1.0→7.34, velocity_decay 0.95→0.817, q_miss_alpha 0.1→0.461,
max_interpolation_gap 20→30, p_reset_threshold 5→13; HOTA 56.781→57.424 (+1.13%)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 3, 2026 07:14
@Borda Borda marked this pull request as draft April 3, 2026 07:14
@Borda Borda added the enhancement New feature or request label Apr 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the new autotrack/ workflow for autonomous + Optuna-based optimization of MOT17 trackers, and updates core tracker internals to support additional post-processing and association/Kalman behaviors that the optimization loop can tune and validate.

Changes:

  • Added autotrack/ tooling: Optuna runner (optimize_tracking.py), detection generation (generate_detections.py), visualization utilities, and configuration/artifact files (default_config.json, search_space.json, best_config.json, program.md).
  • Extended ByteTrack and SORT utilities with new association / Kalman mechanics and MOT-gap interpolation.
  • Added an optimize dependency group and adjusted repo formatting/ignore configs to support the new workflow.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
trackers/core/sort/utils.py Adds MOT-format short-gap interpolation helper used by autotrack evaluation output.
trackers/core/bytetrack/tracker.py Adds stage-2 IoU threshold and IoU age discount for stage-1 ranking; updates association gating logic.
trackers/core/bytetrack/kalman.py Adds velocity decay, miss-noise inflation, P-reset, and ORU mechanics to ByteTrack Kalman tracker.
README.md Badge formatting change (single-line).
pyproject.toml Adds optimize dependency group and uv git source for onnx-simplifier.
docs/trackers/ocsort.md Reflowed paragraph formatting.
docs/trackers/comparison.md Reflowed admonition formatting.
CODE_OF_CONDUCT.md Reflowed paragraph formatting.
autotrack/visualize_detections.py New utility to render MOT detections on frames.
autotrack/search_space.json New Optuna parameter search space definitions per tracker.
autotrack/README.md New documentation for the autotrack workflow and benchmarks.
autotrack/program.md New campaign contract/spec for the autonomous optimization loop.
autotrack/optimize_tracking.py New Optuna study runner + evaluation harness using trackers.eval.
autotrack/generate_detections.py New script to generate MOT17 detections via RF-DETR / YOLO-World backends.
autotrack/default_config.json New baseline/default parameter set for --n-trials 1 runs.
autotrack/best_config.json New committed “best known” tuned configs used for warm-starting/guarding.
.pre-commit-config.yaml mdformat configured with --wrap=no (drives markdown reflow behavior).
.gitignore Adjusts ignores (including .python-version) and adds autotrack output/cache patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Borda Borda force-pushed the bemch/auto-research branch from 07f7488 to a62024a Compare April 3, 2026 08:36
Borda and others added 5 commits April 3, 2026 12:54
…ecovery

Short occlusions (1-4 frames) are handled well by velocity decay alone; ORU
trajectory replay is beneficial only for longer gaps where velocity has drifted.
HOTA 57.424→57.813 (+0.686%), IDF1 69.573→70.009

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- bytetrack/sdp Optuna result: 58.753 (was 56.115 before i10-i11)
- New optimal params include oru_threshold=14, q_scale/r_scale/p_scale all ~10x lower

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- q_scale 0.00246→0.000202, r_scale 0.292→0.0441, p_scale 7.34→0.731 (tighter Kalman — trust measurements more)
- oru_threshold 5→14, velocity_decay 0.817→0.774, q_miss_alpha 0.461→0.282
- stage2_iou_threshold 0.05→0.233, lost_track_buffer 62→52, p_reset_threshold 13→26
- HOTA 57.813→58.753 (+1.30%)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- Confidence boost in Hungarian cost: solver_iou *= (1 + w * conf[det])
- Neutral at all tested defaults (0.0–0.5); added to Optuna search space [0.0, 1.0]
- IDSW improved 297→293 at w=0.3 but HOTA regressed; w=0.1 exactly neutral

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- Mature-track-only stage-2: only tracks with >= N updates participate in low-conf recovery
- Neutral at N=0,1; regresses at N>=2 — ghost exclusion hurts legitimate young tracks
- Added to Optuna search space [0, 5] for future joint optimisation

---
Co-authored-by: Claude Code <noreply@anthropic.com>
@Borda Borda force-pushed the bemch/auto-research branch from 699e62f to 1bc1138 Compare April 3, 2026 21:50
Borda and others added 5 commits April 4, 2026 00:22
…disabled)

- Add _giou_matrix() helper and giou_blend param to ByteTrackTracker stage-1 cost
- giou_blend=0.0 default keeps metric at 58.753 (best found 0.32 gave +0.092%, below 0.1% threshold)
- Add giou_blend to search_space.json [0.0, 1.0] and optimize_tracking.py wiring
- Fix best_config.json trailing newline

---
Co-authored-by: Claude Code <noreply@anthropic.com>
…earch)

- 1000-trial Optuna search over expanded search space (new: conf_cost_weight, stage2_min_updates, giou_blend)
- HOTA 58.753→58.862 (+0.185%), IDSW 297→269 (-9.4%)
- Key changes: high_conf_det_threshold 0.608→0.795, oru_threshold 14→0, Kalman looser (q_scale/r_scale ~14x), minimum_consecutive_frames 2→1, stage2_min_updates 5, giou_blend 0.396, conf_cost_weight 0.170

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- HOTA 58.862→58.961 (+0.168%), IDSW 269→266, IDF1 71.365→71.730
- Optuna search was capped at stage2_min_updates≤5; manual scan found peak at 12 (cliff at 14+)
- Widen search_space.json high: 5→15 so future guard runs can explore the full range

---
Co-authored-by: Claude Code <noreply@anthropic.com>
---
Co-authored-by: Claude Code <noreply@anthropic.com>
- HOTA 58.961→59.031 (+0.119%), IDSW 266→262, IDF1 71.730→71.852
- max_interpolation_gap 45→48 (Optuna undershoot, true peak at 48)
- giou_blend 0.3963→0.42 (refined from 0.396 Optuna result)
- velocity_decay 0.827→0.82 (slight tightening of decay)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
Borda and others added 3 commits April 4, 2026 12:13
… vel=0.82)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
- HOTA 59.031→59.092 (+0.103%), IDSW 262→259, IDF1 71.852→71.993
- minimum_iou_threshold 0.1545→0.146 (Optuna undershoot near discontinuity)
- p_scale 1.756→2.5, q_scale 0.002819→0.003 (Kalman covariance fine-tuning)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
…5, q_scale=0.003)

- bytetrack/sdp HOTA 59.031→59.092 (+0.103%): i19 default params confirmed
- Guard passed: bytetrack 59.031 (-0.000%), sort -0.000%, ocsort -0.208% (all within 0.5% threshold)

---
Co-authored-by: Claude Code <noreply@anthropic.com>
@Borda Borda force-pushed the bemch/auto-research branch 2 times, most recently from adbf00a to acb15b0 Compare April 4, 2026 21:57
- Add Journal › ByteTrack section: 10-row experiment table (kept iterations), collapsed descriptions block, code features table, failed experiments list, key lesson
- Fill SDP + autotrack + Optuna row: HOTA 59.092, IDF1 71.993, MOTA 66.977, IDSW 259

---
Co-authored-by: Claude Code <noreply@anthropic.com>
@Borda Borda force-pushed the bemch/auto-research branch from 15fce38 to cf03dd4 Compare April 4, 2026 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants