Gemma-4 26B-A4B-it
Q4_K_M·26B params·256K ctx·GGUF
visiontool-callinghottool-callingvisionllamacpp
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_Mcommit:
b68961b3c96eweights 16.82 GiB · on-disk 16.90 GiB
All runs (20)
| Hardware | Backend | Shape | Conc. | Gen tok/s ↓ | TTFT | TPOT (ms) | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | codegen | 1 | 101.4 | 238ms | 9.7 | 1000 | 9.86s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | chat | 1 | 100.5 | 100ms | 9.3 | 100 | 995ms | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | agent | 1 | 94.3 | 494ms | 9.8 | 500 | 5.30s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | rag | 1 | 74.7 | 708ms | 9.6 | 200 | 2.68s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | codegen | 1 | 47.9 | 296ms | 20.7 | 1000 | 20.88s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | chat | 1 | 47.7 | 244ms | 19.2 | 100 | 2.10s | 0.001 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | codegen | 1 | 46.0 | 263ms | 21.5 | 997 | 21.64s | 0.002 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | agent | 4 | 45.2 | 1.54s | 16.7 | 310 | 6.64s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | agent | 1 | 44.8 | 712ms | 21.1 | 500 | 11.15s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | chat | 1 | 43.7 | 209ms | 20.9 | 97 | 2.22s | 0.001 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | rag | 1 | 43.2 | 590ms | 21.2 | 200 | 4.63s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | agent | 1 | 43.0 | 830ms | 21.7 | 497 | 11.57s | 0.006 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | rag | 1 | 40.7 | 626ms | 22.2 | 197 | 4.84s | 0.004 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | agent | 4 | 18.3 | 7.29s | 40.8 | 500 | 27.38s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | agent | 4 | 12.3 | 5.18s | 68.5 | 497 | 40.49s | 0.015 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (cpu) | codegen | 1 | 8.5 | 1.89s | 115.5 | 1000 | 117.58s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (cpu) | chat | 1 | 8.3 | 1.41s | 110.4 | 100 | 12.08s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (cpu) | agent | 1 | 7.3 | 9.60s | 118.0 | 500 | 68.94s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (cpu) | rag | 1 | 6.9 | 6.29s | 120.6 | 200 | 28.91s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (cpu) | agent | 4 | 3.1 | 13.28s | 298.7 | 500 | 162.96s | 0.000 GiB |
Environment
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (cpu)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b1203 (rocm)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (vulkan)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue