TinyTPU is a real 4×4 weight-stationary systolic array written in synthesizable SystemVerilog. Compiled to WebAssembly via Verilator and Emscripten - every number on screen is a live signal from actual hardware. Nothing is faked.
Live Telemetry
Every clock cycle of the 4×4 systolic array captured directly from the WebAssembly-compiled SystemVerilog - weight loading, streaming inputs, drained outputs. Every signal is RTL.
Open full instrument →tiny_tpu_top Real synthesizable RTL. Real WASM execution. Real signals. No teaching animations, no JS reimplementations of the math.
always_ff · always_comb · zero latches. Real SystemVerilog you can drop into any FPGA synthesis flow.
Matrix B loads as stationary weights into each PE. Authentic TPU-v1 dataflow - not a textbook approximation.
Verilator compiles the RTL to C++, Emscripten compiles that to WASM. The browser runs the actual hardware.
RTL output must bit-match a numpy reference model before it earns the right to appear on screen.
PE weights, activations, partial sums, and FSM phase exposed via a stable top-level debug bus - no hacks.
L1 single MAC cell → L2 the 4×4 array → L3 tiling matrices larger than the hardware. One concept at a time.
Execution chain
The RTL is the single source of truth. Verilator compiles it to a cycle-accurate C++ model. Emscripten compiles that to WebAssembly. The React island reads hardware state out of the compiled binary - it never reimplements the math in JavaScript.
rtl/*.sv PEs, controller FSM, debug output bus
verilator --cc Cycle-accurate compiled hardware model
em++ -O3 embind surface, MODULARIZE, ES6 export
React island SVG pixels from live hardware state
Datapath
Matrix B loads as stationary weights. Matrix A streams from the west edge with row-skew - row i delayed i cycles so each activation meets the correct weight at the right clock edge. Partial sums accumulate downward and drain from the south edge, skewed.
If the diagonal is wrong, the multiply is wrong. That is why the interface foregrounds phase, flow, and per-PE state.
Open the instrument. Enter two matrices. Watch actual RTL execute in your browser.