gpucomm/core

Low-level GPU compute runtime for Apple Silicon, focused on memory, synchronization, and data movement using Metal.

Goals

Minimal runtime: device/queue/pipeline + buffer utilities
Kernel experiments: communication patterns + memory behavior
Built-in benchmarks: bandwidth, latency, scaling

Dependencies

Apple frameworks: Metal (compute), plus Foundation for CLI/utilities
License: MIT (see LICENSE)
Workflows: docs/workflows.md

Reliability On Hardware

This repo is meant to be experiments + tests = reliability on real hardware.

What’s solid now:

Experiments are measurable: bandwidth/transfer/scan/matmul/latency + sweeps + --reps + p50/p95
Correctness checks exist where it matters (scan, matmul for small sizes, plus gpucomm selftest)
CI keeps it buildable and the CLI usable (swift build -c release + gpucomm --help)

Caveats:

GitHub Actions macOS runners aren’t Apple Silicon GPUs you control, so CI can’t validate “real” Metal timings—only build + basic CLI behavior
“Reliability on hardware” still depends on running these benches on target machines and tracking regressions (record chip/macOS version + commit + outputs)

Repro / Reporting

When you post results (issues/comments), include:

Hardware: chip + GPU (and whether on battery/low-power mode)
OS/tooling: macOS version + Xcode/Swift version
Repo state: commit SHA + command line + --format json/jsonl output

Quick metadata + sanity check:

git rev-parse HEAD
sw_vers
xcodebuild -version
system_profiler SPHardwareDataType | head -n 30
system_profiler SPDisplaysDataType | head -n 80

swift build -c release
./.build/release/gpucomm selftest --format json

Example benchmark report (JSONL, p50/p95 via --reps):

./.build/release/gpucomm bench transfer-sweep --sizes-kib 1,4,64 --iters 5000 --warmup 200 --reps 5 --direction both --mode both --format jsonl
./.build/release/gpucomm bench bandwidth-sweep --sizes-mib 1,4,16,64 --iters 200 --reps 5 --mode private --format jsonl

Roadmap Progress

Primary tracking issue: #1

Milestone	Roadmap Comment	Commit
Transfer benchmark	#1 (comment)	`eefc5bb`
Scan (1024)	#1 (comment)	`93d5627`
Scan (multi-block)	#1 (comment)	`c7c9f9e`
Matmul (naive+tiled)	#1 (comment)	`d42298c`
Matmul sweep	#1 (comment)	`b70fde7`
Matmul tiled variants	#1 (comment)	`731c33f`
Output formats (`--format`)	#1 (comment)	`7d30ac8`
Scan sweep	#1 (comment)	`9be8a34`
Bandwidth sweep	#1 (comment)	`37fdb5b`
Transfer sweep	#1 (comment)	`f000bc7`
Percentiles for sweeps	#1 (comment)	`9054a8b`
`--reps` for single benches	#1 (comment)	`c637e3b`
macOS CI build	#1 (comment)	`466795e`
CI help smoke	#1 (comment)	`25c6f84`
Latency benchmark	#1 (comment)	`8382a5c`
Hardware selftest	#1 (comment)	`c101034`

Build

swift build -c release

Run

.build/release/gpucomm bench bandwidth --size-mib 64 --iters 200 --mode shared
.build/release/gpucomm bench bandwidth --size-mib 64 --iters 200 --mode private
.build/release/gpucomm bench bandwidth-sweep --sizes-mib 1,4,16,64 --iters 200 --mode private --format jsonl
.build/release/gpucomm bench scan --n 1024 --iters 200 --warmup 20
.build/release/gpucomm bench scan --n 65536 --iters 50 --warmup 10
.build/release/gpucomm bench scan-sweep --ns 1024,4096,65536,1048576 --iters 50 --warmup 10 --format jsonl
.build/release/gpucomm bench latency --kind kernel --iters 2000 --warmup 200 --reps 5 --format json
.build/release/gpucomm bench matmul --m 256 --n 256 --k 256 --iters 50 --warmup 10 --variant tiled16
.build/release/gpucomm bench matmul-sweep --m 512 --n 512 --k 512 --iters 10 --warmup 3
.build/release/gpucomm bench transfer --size-kib 4 --iters 10000 --warmup 100 --direction h2d --mode private --strategy blit
.build/release/gpucomm bench transfer --size-kib 4 --iters 10000 --warmup 100 --direction d2h --mode private --strategy blit --format json
.build/release/gpucomm bench transfer-sweep --sizes-kib 1,4,64 --iters 5000 --warmup 200 --direction both --mode both --format jsonl
.build/release/gpucomm run reduction --n 1024
.build/release/gpucomm selftest

Layout

Sources/GPUCommCore: runtime + benchmarks
Sources/GPUCommCore/Resources/Kernels: Metal kernels (compiled at runtime)
Sources/gpucomm: CLI

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
Sources		Sources
assets		assets
docs		docs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpucomm/core

Goals

Dependencies

Reliability On Hardware

Repro / Reporting

Roadmap Progress

Build

Run

Layout

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gpucomm/core

Goals

Dependencies

Reliability On Hardware

Repro / Reporting

Roadmap Progress

Build

Run

Layout

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages