Releases: BlockRunAI/runcode
v2.5.29 — AskUser Fix
What's new
Fix: AskUser tool no longer crashes runcode
When the agent used the AskUser tool in Ink UI mode, runcode crashed. Root cause: readline.createInterface on stdin conflicted with Ink's raw-mode stdin ownership.
Fix: Added onAskUser callback to ExecutionScope and AgentConfig. When running in Ink UI mode, AskUser questions are now routed through a proper Ink dialog (cyan question box with TextInput) instead of readline.
What it looks like now:
╭─ Question ─────────────────────────────
│ What language should I use?
│ 1. TypeScript
│ 2. Python
│ 3. Go
╰─────────────────────────────────────
answer>
The main input box is blocked while the dialog is active (same pattern as the permission dialog).
Also in this release
- AskUser added to auto-allow list — no permission prompt before asking a question
- Non-interactive fallback preserved for piped/scripted mode
v2.4.0 — Context Compression (15-40% Token Savings)
7-Layer Context Compression
Integrated BlockRun's compression library directly into the agent loop. Saves 15-40% tokens automatically:
| Layer | Method | Savings |
|---|---|---|
| 1 | Deduplication | 2-5% |
| 2 | Whitespace normalization | 3-8% |
| 3 | Dictionary encoding (41 codes) | 4-8% |
| 4 | Path shortening | 1-3% |
| 5 | JSON compaction | 2-4% |
| 6 | Observation compression | 15-97% |
| 7 | Dynamic codebook | 5-15% |
How it works:
- Runs automatically when conversation has >10 messages
- Adds a header explaining codes to the model
- Layer 6 (observation) is the biggest win — summarizes large tool results to ~300 chars
- All layers are LLM-safe (model can still understand compressed text)
This is the same compression used by BlockRun's backend, now running client-side for immediate token reduction.
v2.3.2
v2.3.1
v2.3.0 — Token Management Overhaul
Token Reduction Improvements
Based on deep comparison with Claude Code's token management:
Smarter Pipeline
- Microcompact only runs when history >15 messages (was every loop)
- Circuit breaker: 3 failures → stop retrying compaction
- Token estimation padded 33% for conservatism (was under-counting)
Selective Thinking
- Keeps last 2 turns' thinking blocks (was only latest)
- Preserves recent reasoning while reducing old bloat
Per-Model Budgets
- Default max_tokens: 8K → 16K
- Model-specific caps: Opus 32K, Sonnet 64K, Haiku 16K, etc.
Cheaper Compaction
- Tiers down further: haiku → Gemini Flash
- Free models as compaction target
New: /tokens command
Estimated: ~45,200 tokens (API-anchored)
Context: 200k window (22.6% used)
Messages: 47
Tool results: 12 (340KB)
Thinking: 3 blocks
✓ Healthy
v2.2.7
v2.2.6
- Proxy: parse SSE JSON properly for token extraction (was regex)
- LLM: wallet cache refreshes after 30min TTL
- Grep: native fallback handles nested glob patterns better
- Optimize: clock skew protection on time-based cleanup
- WebFetch: strip SVG and form elements from HTML
- Config: version fallback updated to 2.0.0
- Index: cleaner default command detection