Skip to content

gpucomm/core

Repository files navigation

gpucomm/core

Minimalist white cat illustration

Low-level GPU compute runtime for Apple Silicon, focused on memory, synchronization, and data movement using Metal.

Goals

  • Minimal runtime: device/queue/pipeline + buffer utilities
  • Kernel experiments: communication patterns + memory behavior
  • Built-in benchmarks: bandwidth, latency, scaling

Dependencies

  • Apple frameworks: Metal (compute), plus Foundation for CLI/utilities
  • License: MIT (see LICENSE)
  • Workflows: docs/workflows.md

Reliability On Hardware

This repo is meant to be experiments + tests = reliability on real hardware.

What’s solid now:

  • Experiments are measurable: bandwidth/transfer/scan/matmul/latency + sweeps + --reps + p50/p95
  • Correctness checks exist where it matters (scan, matmul for small sizes, plus gpucomm selftest)
  • CI keeps it buildable and the CLI usable (swift build -c release + gpucomm --help)

Caveats:

  • GitHub Actions macOS runners aren’t Apple Silicon GPUs you control, so CI can’t validate “real” Metal timings—only build + basic CLI behavior
  • “Reliability on hardware” still depends on running these benches on target machines and tracking regressions (record chip/macOS version + commit + outputs)

Repro / Reporting

When you post results (issues/comments), include:

  • Hardware: chip + GPU (and whether on battery/low-power mode)
  • OS/tooling: macOS version + Xcode/Swift version
  • Repo state: commit SHA + command line + --format json/jsonl output

Quick metadata + sanity check:

git rev-parse HEAD
sw_vers
xcodebuild -version
system_profiler SPHardwareDataType | head -n 30
system_profiler SPDisplaysDataType | head -n 80

swift build -c release
./.build/release/gpucomm selftest --format json

Example benchmark report (JSONL, p50/p95 via --reps):

./.build/release/gpucomm bench transfer-sweep --sizes-kib 1,4,64 --iters 5000 --warmup 200 --reps 5 --direction both --mode both --format jsonl
./.build/release/gpucomm bench bandwidth-sweep --sizes-mib 1,4,16,64 --iters 200 --reps 5 --mode private --format jsonl

Roadmap Progress

Primary tracking issue: #1

Milestone Roadmap Comment Commit
Transfer benchmark #1 (comment) eefc5bb
Scan (1024) #1 (comment) 93d5627
Scan (multi-block) #1 (comment) c7c9f9e
Matmul (naive+tiled) #1 (comment) d42298c
Matmul sweep #1 (comment) b70fde7
Matmul tiled variants #1 (comment) 731c33f
Output formats (--format) #1 (comment) 7d30ac8
Scan sweep #1 (comment) 9be8a34
Bandwidth sweep #1 (comment) 37fdb5b
Transfer sweep #1 (comment) f000bc7
Percentiles for sweeps #1 (comment) 9054a8b
--reps for single benches #1 (comment) c637e3b
macOS CI build #1 (comment) 466795e
CI help smoke #1 (comment) 25c6f84
Latency benchmark #1 (comment) 8382a5c
Hardware selftest #1 (comment) c101034

Build

swift build -c release

Run

.build/release/gpucomm bench bandwidth --size-mib 64 --iters 200 --mode shared
.build/release/gpucomm bench bandwidth --size-mib 64 --iters 200 --mode private
.build/release/gpucomm bench bandwidth-sweep --sizes-mib 1,4,16,64 --iters 200 --mode private --format jsonl
.build/release/gpucomm bench scan --n 1024 --iters 200 --warmup 20
.build/release/gpucomm bench scan --n 65536 --iters 50 --warmup 10
.build/release/gpucomm bench scan-sweep --ns 1024,4096,65536,1048576 --iters 50 --warmup 10 --format jsonl
.build/release/gpucomm bench latency --kind kernel --iters 2000 --warmup 200 --reps 5 --format json
.build/release/gpucomm bench matmul --m 256 --n 256 --k 256 --iters 50 --warmup 10 --variant tiled16
.build/release/gpucomm bench matmul-sweep --m 512 --n 512 --k 512 --iters 10 --warmup 3
.build/release/gpucomm bench transfer --size-kib 4 --iters 10000 --warmup 100 --direction h2d --mode private --strategy blit
.build/release/gpucomm bench transfer --size-kib 4 --iters 10000 --warmup 100 --direction d2h --mode private --strategy blit --format json
.build/release/gpucomm bench transfer-sweep --sizes-kib 1,4,64 --iters 5000 --warmup 200 --direction both --mode both --format jsonl
.build/release/gpucomm run reduction --n 1024
.build/release/gpucomm selftest

Layout

  • Sources/GPUCommCore: runtime + benchmarks
  • Sources/GPUCommCore/Resources/Kernels: Metal kernels (compiled at runtime)
  • Sources/gpucomm: CLI

About

Metal-based GPU compute runtime focused on memory, synchronization, and data movement on Apple Silicon

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors