Add `autotrack` — autonomous MOT tracker optimization loop [rebase&merge] by Borda · Pull Request #346 · roboflow/trackers

Borda · 2026-04-03T07:14:23Z

Summary

MOT tracker quality depends on two largely independent axes: algorithm design and hyperparameter tuning. Most published improvements conflate them — a well-tuned weaker algorithm routinely beats a poorly-tuned stronger one, making it hard to isolate what actually matters. This PR separates the axes by adding autotrack/, an autonomous optimization loop for SORT, ByteTrack, and OC-SORT on MOT17.

The goal is both practical (better trackers, reproducible tuning) and scientific (the experiment log — including every reverted change — is itself a research artifact).

Approach

Three progressive layers build on each other:

Layer 1 — SOTA trackers with solid defaults. The existing trackers/core/ implementations of SORT, ByteTrack, and OC-SORT are already competitive out of the box. This layer is the foundation; autotrack/ does not replace it.

Layer 2 — Optuna extracts the best from the existing parameter surface. optimize_tracking.py runs an Optuna study over the tracker's exposed hyperparameters (Kalman noise scales, confidence thresholds, buffer sizes). No code changes — pure tuning. FRCNN results gain 1–2.5 HOTA points; SDP gains 2–4 points. This layer alone is useful as a standalone tuning tool and can be adopted without running the agent loop.

Layer 3 — autotrack goes beyond tuning by making algorithmic improvements. This is the novel contribution. An autonomous agent iterates over structural code changes (state representation, association strategy, camera motion compensation, Kalman mechanics), measures HOTA at fixed default parameters after each change, keeps improvements, and reverts regressions. Optuna acts as a second-pass validator after each kept change to confirm the improvement is real and not a tuning artifact. The iteration log is JSONL and captures every attempt, kept or reverted.

Human defines:  research question · metric · hard boundaries
Agent decides:  what to change · what to try next

Two tools govern the loop:

Tool	Role
`optimize_tracking.py --n-trials 1`	Campaign metric — default params, clean code-change signal
`optimize_tracking.py --n-trials N`	Optuna study — warm-starts from `best_config.json`, validates tuned ceiling

The agent is explicitly permitted to update optimize_tracking.py as the tracker architecture evolves — adding parameters that newly exist, removing ones absorbed into the implementation, tightening search ranges as knowledge accumulates.

Benchmarks

MOT17-val, full 7-sequence eval. Defaults = fixed params from default_config.json, no tuning. +Optuna = n=500 trials. +autotrack + Optuna = in progress.

FRCNN public detections (bundled, no GPU)

Config	ByteTrack	OC-SORT	SORT
Defaults (HOTA)	50.36	49.69	49.95
+ Optuna (HOTA)	51.76	52.22	51.49
+ autotrack + Optuna (HOTA)	(pending)	(pending)	(pending)

SDP public detections (bundled, no GPU)

Config	ByteTrack	OC-SORT	SORT
Defaults (HOTA)	53.94	53.35	53.22
+ Optuna (HOTA)	56.12	57.75	56.08
+ autotrack + Optuna (HOTA)	(pending)	(pending)	(pending)

Estimated ceiling with code improvements + Optuna on FRCNN: ~61.9 HOTA (vs ~56.0 for tuning alone), derived from the DetA/AssA decomposition — DetA is bounded by the detector (~0.57–0.62 for FRCNN), but AssA has substantial headroom from ~0.55 to ~0.65 via better association logic.

Hard guarantees

Three invariants are enforced by program.md and cannot be relaxed by the agent:

No GT leakage. The tracker sees only det/det.txt. gt/gt.txt is never accessed at inference time.
Reproducible detections. FRCNN and SDP detections are bundled with the MOT17 benchmark. Generated detections (RF-DETR, YOLO World X) are written to content-addressed sibling directories before any agent run — they are frozen inputs, not live inference.
Metrics via trackers.eval only. trackers/eval/ is out of scope for agent edits. The metric computation is identical across all iterations; the agent cannot move the goalposts.

Quick start

# 1. Install the optimize dependency group
uv sync --group optimize

# 2. Download MOT17-val (bundled detections, no GPU needed)
trackers download mot17 --split val --asset annotations,detections

# 3. Baseline: measure defaults
cd autotrack
uv run python optimize_tracking.py bytetrack frcnn --n-trials 1   # ~50 HOTA

# 4. Tune: Optuna over the parameter surface
uv run python optimize_tracking.py bytetrack frcnn --n-trials 500  # ~52 HOTA

To run the autonomous agent loop, point any coding agent at program.md:

claude
> Read program.md and start the experiment loop.

References

Bewley et al., SORT, ICIP 2016
Zhang et al., ByteTrack, ECCV 2022
Cao et al., OC-SORT, CVPR 2023
Luiten et al., HOTA, IJCV 2021
Akiba et al., Optuna, KDD 2019

- experiments/program.md: autoresearch contract — research question, HOTA≥60 target, hard boundaries, 7 research starting points (Kalman P/R init, two-threshold association, velocity attenuation, etc.) - experiments/optimize_tracking.py: Optuna-based metric runner; n_trials=1 evaluates defaults; multi-core via multiprocessing+SQLite; agent updates search space as architecture evolves - experiments/README.md: motivation, approach, target analysis (HOTA ceiling derivation), pre-flight checks, references - pyproject.toml: add `optimize` dependency group (optuna[rdb], fire) --- Co-authored-by: Claude Code <noreply@anthropic.com>

…usion

autotrack/optimize_tracking.py - --det-tag TAG CLI arg: overrides the directory suffix for any custom detector without touching _DET_SOURCE_TO_TAG; _validate_args and _resolve_sequences both accept it - Multiprocessing progress bar: replaced pool.starmap with starmap_async + a polling loop that loads the SQLite study every 2 s and feeds a Rich Progress bar showing completed trials and live best HOTA (mirrors the existing single-worker callback approach) - Module docstring updated with --det-tag usage example autotrack/README.md - Fixed cd experiments → cd autotrack; old --tracker sort --fast → positional syntax - YOLO section replaced with YOLOX section (correct weights filename) - RF-DETR section added as a standalone step - New Custom detections section: dir layout, MOT format, --det-tag usage - Pre-flight checks table updated (removed API key row, fixed commands) - Fixed /optimize campaign experiments/ → autotrack/ - Fixed broken Files table row for optimize_tracking.py autotrack/program.md - generate_detections.py added to scope_files - Weights filename corrected (yolox_x.pth → bytetrack_x_mot17.pth.tar) - RF-DETR and custom detector quickstart notes added below pre-flight table --- Co-authored-by: Claude Code <noreply@anthropic.com>

- generate_detections.py: remove YOLOX backend (loader, predictor, frame processing); add YOLO-World via inference-models with center→top-left coord conversion; rename rfdetr-l → rfdetr/l to match yolo_world/l slash notation - optimize_tracking.py: swap yolox→yoloworld in _DET_SOURCE_TO_TAG; extract _run_parallel_study; fix multiline ternaries to if/else; use setattr() for dynamic Kalman attrs (mypy); pass >3 args as kwargs - best_config.json: drop broken yolox entry (HOTA=7.7); add real Optuna results for yoloworld, rfdetr, dpm across all three trackers - pyproject.toml: remove YOLOX git source + no-build-isolation; add inference-models>=0.19.0 --- Co-authored-by: Claude Code <noreply@anthropic.com>

- search_space.json: expand 16 boundary-hugging parameters across all three trackers (lost_track_buffer, track_activation_threshold, minimum_iou_threshold, high_conf_det_threshold, q_scale/r_scale/p_scale, velocity_decay, q_miss_alpha, max_interpolation_gap, p_reset_threshold, direction_consistency_weight); add log=true to lost_track_buffer (all trackers) and minimum_iou_threshold (all trackers) - optimize_tracking.py: pass log= to suggest_int so log-scale int parameters are respected - best_config.json: bytetrack/rfdetr updated to HOTA 45.08 from new run - uv.lock: regenerated after yolox removal --- Co-authored-by: Claude Code <noreply@anthropic.com>

…mation (ORU) - Add oru_enabled parameter to ByteTrackKalmanBoxTracker: on re-detection after occlusion, replay virtual predict+update cycles along linearly interpolated trajectory to re-estimate velocity - Expose oru_enabled in optimize_tracking.py _build_tracker and _define_search_space - Add oru_enabled to default_config.json and search_space.json --- Co-authored-by: Claude Code <noreply@anthropic.com>

…0.05) - Add stage2_iou_threshold=0.05 param to ByteTrackTracker; stage-1 keeps minimum_iou_threshold=0.1 - Lower stage-2 threshold recovers more low-confidence detections without breaking high-conf stage - Expose to Optuna via search_space.json; add to default_config.json and optimize_tracking.py --- Co-authored-by: OpenAI Codex <codex@openai.com>

…larity - Add iou_age_weight=0.03: scale stage-1 IoU similarity by 1/(1+w*lost_frames) for each track - Biases Hungarian assignment toward recently-seen tracks; reduces stale-prediction false matches - iou_age_weight=0.03 is active at default params; Optuna range [0.0, 0.2] log-scale --- Co-authored-by: Claude Code <noreply@anthropic.com>

- Apply age discount only to cost matrix (not threshold check): raw IoU used for min-threshold gate, discount only biases solver assignment toward active tracks - Tighten Optuna search range [0.0, 0.2] -> [0.0, 0.1] - Fix pre-existing bug: optimize_tracking.py final re-eval now applies _apply_kalman_patch --- Co-authored-by: Claude Code <noreply@anthropic.com>

Apply Optuna-found parameter values as new defaults: lost_track_buffer 30→62, track_activation_threshold 0.7→0.314, q_scale 0.01→0.00246, r_scale 0.1→0.292, p_scale 1.0→7.34, velocity_decay 0.95→0.817, q_miss_alpha 0.1→0.461, max_interpolation_gap 20→30, p_reset_threshold 5→13; HOTA 56.781→57.424 (+1.13%) --- Co-authored-by: Claude Code <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces the new autotrack/ workflow for autonomous + Optuna-based optimization of MOT17 trackers, and updates core tracker internals to support additional post-processing and association/Kalman behaviors that the optimization loop can tune and validate.

Changes:

Added autotrack/ tooling: Optuna runner (optimize_tracking.py), detection generation (generate_detections.py), visualization utilities, and configuration/artifact files (default_config.json, search_space.json, best_config.json, program.md).
Extended ByteTrack and SORT utilities with new association / Kalman mechanics and MOT-gap interpolation.
Added an optimize dependency group and adjusted repo formatting/ignore configs to support the new workflow.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
trackers/core/sort/utils.py	Adds MOT-format short-gap interpolation helper used by autotrack evaluation output.
trackers/core/bytetrack/tracker.py	Adds stage-2 IoU threshold and IoU age discount for stage-1 ranking; updates association gating logic.
trackers/core/bytetrack/kalman.py	Adds velocity decay, miss-noise inflation, P-reset, and ORU mechanics to ByteTrack Kalman tracker.
README.md	Badge formatting change (single-line).
pyproject.toml	Adds `optimize` dependency group and uv git source for onnx-simplifier.
docs/trackers/ocsort.md	Reflowed paragraph formatting.
docs/trackers/comparison.md	Reflowed admonition formatting.
CODE_OF_CONDUCT.md	Reflowed paragraph formatting.
autotrack/visualize_detections.py	New utility to render MOT detections on frames.
autotrack/search_space.json	New Optuna parameter search space definitions per tracker.
autotrack/README.md	New documentation for the autotrack workflow and benchmarks.
autotrack/program.md	New campaign contract/spec for the autonomous optimization loop.
autotrack/optimize_tracking.py	New Optuna study runner + evaluation harness using `trackers.eval`.
autotrack/generate_detections.py	New script to generate MOT17 detections via RF-DETR / YOLO-World backends.
autotrack/default_config.json	New baseline/default parameter set for `--n-trials 1` runs.
autotrack/best_config.json	New committed “best known” tuned configs used for warm-starting/guarding.
.pre-commit-config.yaml	mdformat configured with `--wrap=no` (drives markdown reflow behavior).
.gitignore	Adjusts ignores (including `.python-version`) and adds autotrack output/cache patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

trackers/core/bytetrack/kalman.py

autotrack/visualize_detections.py

autotrack/optimize_tracking.py

autotrack/generate_detections.py

autotrack/program.md

…ecovery Short occlusions (1-4 frames) are handled well by velocity decay alone; ORU trajectory replay is beneficial only for longer gaps where velocity has drifted. HOTA 57.424→57.813 (+0.686%), IDF1 69.573→70.009 --- Co-authored-by: Claude Code <noreply@anthropic.com>

- bytetrack/sdp Optuna result: 58.753 (was 56.115 before i10-i11) - New optimal params include oru_threshold=14, q_scale/r_scale/p_scale all ~10x lower --- Co-authored-by: Claude Code <noreply@anthropic.com>

- q_scale 0.00246→0.000202, r_scale 0.292→0.0441, p_scale 7.34→0.731 (tighter Kalman — trust measurements more) - oru_threshold 5→14, velocity_decay 0.817→0.774, q_miss_alpha 0.461→0.282 - stage2_iou_threshold 0.05→0.233, lost_track_buffer 62→52, p_reset_threshold 13→26 - HOTA 57.813→58.753 (+1.30%) --- Co-authored-by: Claude Code <noreply@anthropic.com>

- Confidence boost in Hungarian cost: solver_iou *= (1 + w * conf[det]) - Neutral at all tested defaults (0.0–0.5); added to Optuna search space [0.0, 1.0] - IDSW improved 297→293 at w=0.3 but HOTA regressed; w=0.1 exactly neutral --- Co-authored-by: Claude Code <noreply@anthropic.com>

- Mature-track-only stage-2: only tracks with >= N updates participate in low-conf recovery - Neutral at N=0,1; regresses at N>=2 — ghost exclusion hurts legitimate young tracks - Added to Optuna search space [0, 5] for future joint optimisation --- Co-authored-by: Claude Code <noreply@anthropic.com>

…disabled) - Add _giou_matrix() helper and giou_blend param to ByteTrackTracker stage-1 cost - giou_blend=0.0 default keeps metric at 58.753 (best found 0.32 gave +0.092%, below 0.1% threshold) - Add giou_blend to search_space.json [0.0, 1.0] and optimize_tracking.py wiring - Fix best_config.json trailing newline --- Co-authored-by: Claude Code <noreply@anthropic.com>

…earch) - 1000-trial Optuna search over expanded search space (new: conf_cost_weight, stage2_min_updates, giou_blend) - HOTA 58.753→58.862 (+0.185%), IDSW 297→269 (-9.4%) - Key changes: high_conf_det_threshold 0.608→0.795, oru_threshold 14→0, Kalman looser (q_scale/r_scale ~14x), minimum_consecutive_frames 2→1, stage2_min_updates 5, giou_blend 0.396, conf_cost_weight 0.170 --- Co-authored-by: Claude Code <noreply@anthropic.com>

- HOTA 58.862→58.961 (+0.168%), IDSW 269→266, IDF1 71.365→71.730 - Optuna search was capped at stage2_min_updates≤5; manual scan found peak at 12 (cliff at 14+) - Widen search_space.json high: 5→15 so future guard runs can explore the full range --- Co-authored-by: Claude Code <noreply@anthropic.com>

--- Co-authored-by: Claude Code <noreply@anthropic.com>

- HOTA 58.961→59.031 (+0.119%), IDSW 266→262, IDF1 71.730→71.852 - max_interpolation_gap 45→48 (Optuna undershoot, true peak at 48) - giou_blend 0.3963→0.42 (refined from 0.396 Optuna result) - velocity_decay 0.827→0.82 (slight tightening of decay) --- Co-authored-by: Claude Code <noreply@anthropic.com>

… vel=0.82) --- Co-authored-by: Claude Code <noreply@anthropic.com>

- HOTA 59.031→59.092 (+0.103%), IDSW 262→259, IDF1 71.852→71.993 - minimum_iou_threshold 0.1545→0.146 (Optuna undershoot near discontinuity) - p_scale 1.756→2.5, q_scale 0.002819→0.003 (Kalman covariance fine-tuning) --- Co-authored-by: Claude Code <noreply@anthropic.com>

…5, q_scale=0.003) - bytetrack/sdp HOTA 59.031→59.092 (+0.103%): i19 default params confirmed - Guard passed: bytetrack 59.031 (-0.000%), sort -0.000%, ocsort -0.208% (all within 0.5% threshold) --- Co-authored-by: Claude Code <noreply@anthropic.com>

- Add Journal › ByteTrack section: 10-row experiment table (kept iterations), collapsed descriptions block, code features table, failed experiments list, key lesson - Fill SDP + autotrack + Optuna row: HOTA 59.092, IDF1 71.993, MOTA 66.977, IDSW 259 --- Co-authored-by: Claude Code <noreply@anthropic.com>

Borda and others added 13 commits April 2, 2026 13:51

experiment(optimize/i2): velocity decay beta=0.95 during lost frames

fbe0b8d

experiment(optimize/i3): Q inflation on missed frames, alpha=0.1

93261e7

experiment(optimize/i9B): post-processing gap interpolation max_gap=20

6c680c8

experiment(optimize/i11): covariance reset on re-detection after occl…

34b3702

…usion

Copilot AI review requested due to automatic review settings April 3, 2026 07:14

Borda marked this pull request as draft April 3, 2026 07:14

Borda added the enhancement New feature or request label Apr 3, 2026

Copilot started reviewing on behalf of Borda April 3, 2026 07:15 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Borda force-pushed the bemch/auto-research branch from 07f7488 to a62024a Compare April 3, 2026 08:36

Borda and others added 5 commits April 3, 2026 12:54

guard/i11: persist Optuna best_config update (HOTA 58.753)

ba7584a

- bytetrack/sdp Optuna result: 58.753 (was 56.115 before i10-i11) - New optimal params include oru_threshold=14, q_scale/r_scale/p_scale all ~10x lower --- Co-authored-by: Claude Code <noreply@anthropic.com>

Borda force-pushed the bemch/auto-research branch from 699e62f to 1bc1138 Compare April 3, 2026 21:50

Borda and others added 5 commits April 4, 2026 00:22

guard/i17: persist best_config with stage2_min_updates=12 (HOTA 58.961)

971c0f4

--- Co-authored-by: Claude Code <noreply@anthropic.com>

Borda and others added 3 commits April 4, 2026 12:13

guard/i18: persist best_config HOTA 59.031 (max_interp=48, giou=0.42,…

e19de58

… vel=0.82) --- Co-authored-by: Claude Code <noreply@anthropic.com>

Borda force-pushed the bemch/auto-research branch 2 times, most recently from adbf00a to acb15b0 Compare April 4, 2026 21:57

Borda force-pushed the bemch/auto-research branch from 15fce38 to cf03dd4 Compare April 4, 2026 22:35

fix(pre_commit): 🎨 auto format pre-commit hooks

34d440c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `autotrack` — autonomous MOT tracker optimization loop [rebase&merge]#346

Add `autotrack` — autonomous MOT tracker optimization loop [rebase&merge]#346
Borda wants to merge 28 commits intodevelopfrom
bemch/auto-research

Borda commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Borda commented Apr 3, 2026

Summary

Approach

Benchmarks

FRCNN public detections (bundled, no GPU)

SDP public detections (bundled, no GPU)

Hard guarantees

Quick start

References

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants