Matrix last extended 2026-05-23 · 629 measured iterations · 210 cells · 5 hosts · higher RTF = faster (audio-s / wall-s).
Rows = hosts sorted by release year, annotated with detected GPU and the
GPU class the wizard uses (legacy = 1× multiplier, integrated = 2×,
discrete = 4×; see ADR 0028). Columns = model family × language.
Each cell runs the wizard's full picker on the (host, family, language)
niche: filter by gates (batch RTF ≥ 2.0 · peak RSS ≤ 1024 MiB · model
not in AccuracyBucket::Inaccurate per the registry's
wer_by_lang["en"]), group by quant, take Vulkan over CPU at
≥ 1.2× CPU, prefer q8 → q5 → fp16. Accuracy is decided from
the registry (Open-ASR-Leaderboard means) — not from the worst
single fixture in our equivalence set — to mirror what the Rust wizard
does (crates/fono/src/wizard.rs:1442-1445). Our ~13-clip
set is too small to reliably gate on the worst fixture. The displayed
quant and backend are what the wizard would actually use; the ⚡ badge
fires when Vulkan wins the tiebreak.
One panel per host. Bars within a model variant show fp16 / q8 / q5. Empty slots = not measured. Y axis is log by default; toggle "Piecewise RTF" in the filter bar to give the decision zone (0–10×) more vertical room. The horizontal line marks batch RTF = 2.0 (the wizard's batch gate).
Ratio = (quant batch RTF) ÷ (fp16 batch RTF), aggregated across all
measured (host, model) pairs that ran on the CPU build. Bars show
the median; thin whiskers span min→max; n is the sample count.
Hosts with AVX-VNNI (Alder Lake+, Ryzen Zen3+) execute integer-quant kernels
in vector units; pre-VNNI hosts fall back to scalar paths. Vulkan rows are
excluded here — see chart 4 for the CPU-vs-Vulkan comparison.
Paired bars per (host, model, quant) where both backends have data.
Speedup = Vulkan RTF ÷ CPU RTF (>1 means Vulkan faster).
One point per measured (host, build, model, quant, language)
cell. X = English WER (lower is better, leftmost is most accurate);
Y = batch RTF (higher is faster, log scale). The top-left
is the sweet spot. Colour = host; circle = CPU build, triangle = Vulkan
build. Marker size grows with model family (tiny→turbo). A horizontal
dashed green line marks the batch RTF ≥ 2.0 comfortable
gate used by the wizard. Hover any point for full details including
Δ accuracy vs that cell's fp16 baseline. Click legend
entries to hide/show individual host/build pairs.
One scatter per host. X-axis is the bench's English WER
mean across all measured fixtures
(accuracy_en_mean); the worst-fixture value
(accuracy_en_max) is shown only in the tooltip — our
~13-clip equivalence set is too small to gate on the worst row
without one bad transcript flipping the verdict (same reasoning as
section 1; see comment in wizardVerdict). Y-axis is
batch RTF (higher=better). Cells with bench mean WER above the
0.30 display cap are dropped from the plot; the
per-host header notes how many are hidden.
Gates are evaluated against the registry, not the bench.
The accuracy gate is REGISTRY_WER[model] ≤ 0.15
(AccuracyBucket::Inaccurate in
crates/fono/src/wizard.rs:1442-1445) — Open-ASR-Leaderboard
means published per model, sourced from the much larger fixture
pool than ours. Bench RTF / RSS still gate live perf.
The green diamonds are the
Pareto frontier in the bench-mean / RTF plane —
cells where no other cell on that host is both more accurate AND
faster. The blue reticle (⊕)
marks the recommendation the wizard would actually pick:
it runs the same wizardPickFor simulator as the section-1
heatmap on every shipped (family · language) niche, keeps those that
pass the gates, and sorts by Rust shortlist order
(wizard.rs:1506-1517): registry accuracy bucket asc,
then largest approx_mb first. Vulkan beats CPU only at
≥ 1.2×; quant inside a niche follows q8 → q5 → fp16.
Dashed horizontal line shows the batch RTF gate.
Three orthogonal capability axes. Solid dots are measured; the
grey dashed line is what the wizard's
affordability estimator would predict (hwcheck.rs:228-259
and ADR 0028). Where dots sit above the line the wizard is
under-promising; below = over-promising. All three charts
respect the filter bar above.
Drag the sliders to set your accuracy ceiling (max WER) and speed floor
(min batch RTF). Every measured cell falls into one of four quadrants:
Ship it (low WER + high RTF, top-left),
Fast but inaccurate (top-right),
Accurate but slow (bottom-left), or
Unusable (bottom-right) — shown
as faint background tints. Dot shape and colour encode the
model family: ● tiny ·
▲ base ·
◆ small ·
★ turbo (also reflected in the legend).
The lists below show exactly which (host, build, model, quant)
cells land in each quadrant.
Rows = hosts. Columns = (backend × family × language × quant). Green ≥2 iterations, yellow = 1, grey = not measured. Use this to plan the next bench session.
Click headers to sort. Filters above apply here too.
| Host | Build | Fam | Lang | Model | Quant | Batch RTF | Stream RTF | TTFF s | RSS MiB | Acc (EN↓) | Δacc | Disk MiB | Iters | Verdict |
|---|