Gemma-4 26B-A4B-it

Q4_K_M·26B params·256K ctx·GGUF
visiontool-callinghottool-callingvisionllamacpp
checkpoint: unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_M
commit: b68961b3c96e
weights 16.82 GiB · on-disk 16.90 GiB

All runs (20)

HardwareBackendShapeConc.Gen tok/sTTFTTPOT (ms)Out tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)codegen1
101.4
238ms9.710009.86s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)chat1
100.5
100ms9.3100995ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent1
94.3
494ms9.85005.30s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)rag1
74.7
708ms9.62002.68s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)codegen1
47.9
296ms20.7100020.88s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)chat1
47.7
244ms19.21002.10s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)codegen1
46.0
263ms21.599721.64s0.002 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent4
45.2
1.54s16.73106.64s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)agent1
44.8
712ms21.150011.15s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)chat1
43.7
209ms20.9972.22s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)rag1
43.2
590ms21.22004.63s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)agent1
43.0
830ms21.749711.57s0.006 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)rag1
40.7
626ms22.21974.84s0.004 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)agent4
18.3
7.29s40.850027.38s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)agent4
12.3
5.18s68.549740.49s0.015 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)codegen1
8.5
1.89s115.51000117.58s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)chat1
8.3
1.41s110.410012.08s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)agent1
7.3
9.60s118.050068.94s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)rag1
6.9
6.29s120.620028.91s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)agent4
3.1
13.28s298.7500162.96s0.000 GiB

Environment

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (cpu)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b1203 (rocm)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (vulkan)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue