uwu-kenaz-101

2026-01-29 17:00 299 words 2 min read

no table of contents
ML inference stack. WASM + WebGPU. Runs everywhere. One architecture, optimized.

tl;dr

ML inference stack. WASM + WebGPU. one build, runs everywhere uwu

status: WIP


The Rune

ᚲ - kenaz (torch)
- meaning: torch, knowledge, illumination
- element: controlled fire
- transforms weights into thought
- the fire that runs the mind

What This Is

COMPLETE ML INFERENCE STACK:
- written from scratch
- WASM + WASI 0.3 core
- WebGPU compute
- ONE architecture optimized
- runs EVERYWHERE

NOT:
- pytorch wrapper
- cuda dependent
- onnx runtime
- generic framework

The Stack

core/     - WASM tensor ops, memory, INT4
gpu/      - WebGPU compute shaders (WGSL)
arch/     - target architecture ONLY
runtime/  - WASI 0.3 async, streaming
api/      - OpenAI-compatible endpoint

Why WASM + WebGPU

ONE BUILD RUNS:
- browser (chrome/firefox/safari)
- native (wasmtime + wgpu)
- runix (full GPU access)
- edge (cloudflare workers)
- embedded (wherever WASI runs)

NO:
- separate cuda build
- separate metal build
- separate cpu build
- platform-specific code
- dependency hell

Tech Choices

CORE:     rust → wasm32-wasip2
COMPUTE:  WGSL (WebGPU native)
ASYNC:    WASI 0.3 (component model)
GPU:      WebGPU → vulkan/metal/dx12
MEMORY:   arena allocator, zero-copy
QUANT:    INT4 symmetric

Philosophy

TRADITIONAL:
- support every model
- support every gpu
- support every platform
- RESULT: bloat, slow, complex

uwu.ᚲ:
- ONE architecture
- ONE codebase
- ALL platforms via WASM
- RESULT: fast, simple, optimized

Status

WIP uwu

PHASES:
1. foundation (tensors, memory, ops)
2. GPU (WebGPU shaders)
3. attention (MLA, KV-cache)
4. MoE (expert routing)
5. full model (all layers)
6. vision (image encoder)
7. API (OpenAI-compatible)

“the torch that transforms weights into thought”


Rune QQ uwu.ᚲ kenaz - the torch

© 2024 - 2026 rune.みんな
Powered by theme astro-koharu · Inspired by Shoka