Overview
Extend the IL kernel generator's SIMD unary operation support beyond the current Negate/Abs/Sqrt.
Parent issue: #545
Current State
| Operation |
SIMD Status |
Implementation |
| Negate |
✅ SIMD |
Vector256.op_UnaryNegation |
| Abs |
✅ SIMD |
Vector256.Abs() |
| Sqrt |
✅ SIMD |
Vector256.Sqrt() |
| Floor |
❌ Scalar |
Math.Floor per-element |
| Ceil |
❌ Scalar |
Math.Ceiling per-element |
| Round |
❌ Scalar |
Math.Round per-element |
| Exp |
❌ Scalar |
Math.Exp per-element |
| Log |
❌ Scalar |
Math.Log per-element |
| Sin/Cos/Tan |
❌ Scalar |
Math.Sin/Cos/Tan per-element |
SIMD eligibility check: ILKernelGenerator.cs:2086
Vector operation dispatch: ILKernelGenerator.cs:2980-3014
Task List
Tier 1: Quick Wins (Vector256 methods exist in .NET)
Tier 2: Medium Effort
Tier 3: Transcendentals (Complex)
Implementation Details
Floor/Ceil (Tier 1)
// In CanUseUnarySimd (line ~2086), add:
|| key.Op == UnaryOp.Floor
|| key.Op == UnaryOp.Ceil
// In EmitUnaryVectorOperation (line ~2980), add:
case UnaryOp.Floor:
var floorMethod = typeof(Vector256).GetMethod("Floor",
new[] { typeof(Vector256<>).MakeGenericType(GetClrType(type)) });
il.EmitCall(OpCodes.Call, floorMethod, null);
break;
case UnaryOp.Ceil:
var ceilMethod = typeof(Vector256).GetMethod("Ceiling",
new[] { typeof(Vector256<>).MakeGenericType(GetClrType(type)) });
il.EmitCall(OpCodes.Call, ceilMethod, null);
break;
Transcendentals (Tier 3) - Research Notes
Exp approximation approach:
// Exp(x) via range reduction + polynomial
// 1. Clamp x to avoid overflow
// 2. n = round(x / ln2), r = x - n*ln2
// 3. exp(r) ≈ polynomial (|r| < ln2/2)
// 4. result = 2^n * exp(r)
public static Vector256<float> Exp(Vector256<float> x)
{
var ln2 = Vector256.Create(0.693147180559945f);
var invLn2 = Vector256.Create(1.44269504088896f);
// ... polynomial coefficients ...
}
Files to Modify
| File |
Changes |
ILKernelGenerator.cs:2086 |
Add Floor/Ceil/Round to eligibility |
ILKernelGenerator.cs:2980 |
Add vector operation dispatch |
SimdKernels.cs (optional) |
C# fallback implementations |
Benchmarks
[Benchmark] public NDArray Floor_10M() => np.floor(_array);
[Benchmark] public NDArray Ceil_10M() => np.ceil(_array);
[Benchmark] public NDArray Exp_10M() => np.exp(_array);
[Benchmark] public NDArray Sin_10M() => np.sin(_array);
NumPy Baseline (10M float64)
| Operation |
NumPy Time |
| np.floor |
~8 ms |
| np.ceil |
~8 ms |
| np.exp |
~20 ms |
| np.sin |
~50 ms |
Success Criteria
- Floor/Ceil SIMD: ≥1.5× faster than current scalar
- All existing unary tests pass
- No accuracy regression vs scalar implementation
Overview
Extend the IL kernel generator's SIMD unary operation support beyond the current Negate/Abs/Sqrt.
Parent issue: #545
Current State
Vector256.op_UnaryNegationVector256.Abs()Vector256.Sqrt()Math.Floorper-elementMath.Ceilingper-elementMath.Roundper-elementMath.Expper-elementMath.Logper-elementMath.Sin/Cos/Tanper-elementSIMD eligibility check:
ILKernelGenerator.cs:2086Vector operation dispatch:
ILKernelGenerator.cs:2980-3014Task List
Tier 1: Quick Wins (Vector256 methods exist in .NET)
SIMD Floor
Vector256.Floor()CanUseUnarySimd()eligibilityEmitUnaryVectorOperation()dispatchSIMD Ceiling
Vector256.Ceiling()SIMD Truncate
Vector256.Truncate()Tier 2: Medium Effort
Vector256.Round()or compositionTier 3: Transcendentals (Complex)
SIMD Exp/Log (research)
Vector256.Exp()in .NET BCLSIMD Sin/Cos/Tan (research)
Implementation Details
Floor/Ceil (Tier 1)
Transcendentals (Tier 3) - Research Notes
Exp approximation approach:
Files to Modify
ILKernelGenerator.cs:2086ILKernelGenerator.cs:2980SimdKernels.cs(optional)Benchmarks
NumPy Baseline (10M float64)
Success Criteria