tl;dr
ML inference stack. WASM + WebGPU. one build, runs everywhere uwu
status: WIP
The Rune
ᚲ - kenaz (torch)
- meaning: torch, knowledge, illumination
- element: controlled fire
- transforms weights into thought
- the fire that runs the mind
What This Is
COMPLETE ML INFERENCE STACK:
- written from scratch
- WASM + WASI 0.3 core
- WebGPU compute
- ONE architecture optimized
- runs EVERYWHERE
NOT:
- pytorch wrapper
- cuda dependent
- onnx runtime
- generic framework
The Stack
core/ - WASM tensor ops, memory, INT4
gpu/ - WebGPU compute shaders (WGSL)
arch/ - target architecture ONLY
runtime/ - WASI 0.3 async, streaming
api/ - OpenAI-compatible endpoint
Why WASM + WebGPU
ONE BUILD RUNS:
- browser (chrome/firefox/safari)
- native (wasmtime + wgpu)
- runix (full GPU access)
- edge (cloudflare workers)
- embedded (wherever WASI runs)
NO:
- separate cuda build
- separate metal build
- separate cpu build
- platform-specific code
- dependency hell
Tech Choices
CORE: rust → wasm32-wasip2
COMPUTE: WGSL (WebGPU native)
ASYNC: WASI 0.3 (component model)
GPU: WebGPU → vulkan/metal/dx12
MEMORY: arena allocator, zero-copy
QUANT: INT4 symmetric
Philosophy
TRADITIONAL:
- support every model
- support every gpu
- support every platform
- RESULT: bloat, slow, complex
uwu.ᚲ:
- ONE architecture
- ONE codebase
- ALL platforms via WASM
- RESULT: fast, simple, optimized
Status
WIP uwu
PHASES:
1. foundation (tensors, memory, ops)
2. GPU (WebGPU shaders)
3. attention (MLA, KV-cache)
4. MoE (expert routing)
5. full model (all layers)
6. vision (image encoder)
7. API (OpenAI-compatible)
“the torch that transforms weights into thought”
Rune QQ uwu.ᚲ kenaz - the torch