Monorepo layout, state contract, and build flow

TinyTPU is a single monorepo with four distinct layers: the RTL, the WASM bridge, the web frontend, and the simulation harness. This page documents the boundaries between them and the data contract that keeps the visualizer honest.

Monorepo layout

Each directory has a single responsibility. Nothing crosses these boundaries except through the defined interfaces documented here.

The state data contract

The C++ harness and the React visualizer communicate through a single shared contract: the CycleState object emitted once per clock tick. This contract is defined in two places that must stay in sync: docs/STATE_SCHEMA.md (prose) and web/src/lib/state-schema.ts (TypeScript types).

Every field in CycleState maps to either a direct hardware signal or a documented derivation from hardware signals. There are no fabricated values.

CycleState top-level fields

Field TypeScript type Hardware source Notes
cycle number counter in harness Starts at 0 when start fires
fsmState "IDLE" | "LOAD_WEIGHTS" | "STREAM" | "DRAIN" | "DONE" dbg_fsm_state (3-bit) "DONE" is harness-derived
pes[16] PEState[] dbg_weight / dbg_act / dbg_psum Row-major: index i×4+j
westInputs[4] number[] dbg_west Signed int8, current cycle
southOutputs[4] SouthOutput[] dbg_south + harness validity valid = harness-computed
done boolean done port of tiny_tpu_top Single-cycle pulse

PEState fields

Field Hardware source Notes
row, col PE index (i, j) 0–3
weight dbg_weight[i][j] Signed int8, stationary
actIn Derived (not a direct register read) j==0 ? dbg_west[i] : dbg_act[i][j-1]
psum dbg_psum[i][j] Signed int32, registered psum_out
active fsmState==STREAM && actIn!=0 Harness-computed boolean
The actIn derivation: the debug bus exposes dbg_act[i][j], which is pe[i][j].act_out, the registered output of the passthrough, not the input of the current cycle. The harness computes actIn as (j==0) ? dbg_west[i] : dbg_act[i][j-1], which is the correct activation signal entering PE[i][j] on this cycle. This derivation is documented in both STATE_SCHEMA.md and rtl/README.md.

Build flow

All RTL tooling (Verilator, Emscripten) runs inside WSL2 Ubuntu. The web frontend can run on either WSL2 or Windows; prefer WSL2 for consistency.

1. RTL lint (always clean)

verilator --lint-only -Wall rtl/*.sv

Run this before every commit touching RTL. Any warning is a build failure. UNOPTFLAT (combinational loop), BLKANDNBLK (mixed blocking/non-blocking), and inferred latches are all blocking conditions.

2. RTL simulation (golden verification)

source ~/.venvs/tinytpu/bin/activate
pytest sim/golden.py -q
cd sim && make MODULE=test_top TOPLEVEL=tiny_tpu_top \
  VERILOG_SOURCES="../rtl/pe.sv ../rtl/systolic_array.sv \
                   ../rtl/controller.sv ../rtl/tiny_tpu_top.sv"

The cocotb test suite runs the RTL via Verilator's Python bindings and asserts bit-exact equality with sim/golden.py for 20+ random int8 matrix pairs. A failing test means the RTL is wrong. Do not proceed.

3. WASM build

bash wasm/build.sh
# outputs web/public/tiny_tpu.mjs + web/public/tiny_tpu.wasm

The build script runs verilator --cc on the RTL (generates C++) then invokes em++ to compile the C++ harness + Verilator model to WebAssembly. Artifacts land in web/public/ so they are served from the web root in both dev and production.

Rebuild WASM after any RTL change. The WASM binary is a compiled snapshot of the RTL at build time. If you modify rtl/*.sv, re-run bash wasm/build.sh before testing the frontend; otherwise the browser is running stale hardware.

4. Frontend (web/)

cd web
pnpm dev          # dev server at localhost:4321
pnpm lint         # eslint
pnpm typecheck    # astro check && tsc --noEmit
pnpm build        # production build → web/dist/

Full pre-PR check

verilator --lint-only -Wall rtl/*.sv
cd sim && pytest golden.py -q && make MODULE=test_top \
  TOPLEVEL=tiny_tpu_top \
  VERILOG_SOURCES="../rtl/pe.sv ../rtl/systolic_array.sv \
                   ../rtl/controller.sv ../rtl/tiny_tpu_top.sv"
cd web && pnpm lint && pnpm typecheck && pnpm build

Key design decisions

RTL is the single source of truth

The frontend never reimplements the matmul in JavaScript for the animation. It reads state out of the compiled WASM binary, a binary that is itself a compiled representation of the SystemVerilog. If the RTL is wrong, the visualizer shows the wrong thing. If the RTL is right (golden-verified), the visualizer is right.

Debug output bus, not public_flat

Verilator's public_flat attribute can expose internal signals by name, but the names change across synthesis tools and Verilator versions. TinyTPU instead exposes a stable, explicitly-typed debug output bus on tiny_tpu_top. The harness reads these ports after each eval(), the same way any downstream module would read outputs. This keeps the viz interface stable and the RTL synthesizable.

WASM is client-only, always

Astro builds the site at compile time. Importing WASM during SSR causes a window is not defined error; WASM requires a browser environment. Every React island that touches WASM uses client:only="react" and loads the module inside a useEffect behind a typeof window !== "undefined" guard. Both guards must be present; either alone is insufficient.

Run-once, not per-frame

sim.run() steps the WASM through the full 14-cycle matmul and returns the complete CycleState[] array once. The visualizer animates by indexing into this pre-computed array. The WASM does not execute on every animation frame. This decouples rendering performance from simulation throughput.