Gemma-4 31B-it

Q4_K_M·31B params·GGUF
checkpoint: unsloth/gemma-4-31B-it-GGUF:Q4_K_M

All runs (10)

HardwareBackendShapeConc.Gen tok/sTTFTTPOT (ms)Out tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)chat1
19.4
348ms48.31005.15s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)codegen1
19.3
915ms51.1100051.89s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent1
18.2
1.89s52.050027.49s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)rag1
15.2
2.87s51.520013.18s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)codegen1
10.2
989ms97.099697.56s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)chat1
9.9
740ms94.4979.80s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)agent1
9.6
3.26s98.549751.86s0.010 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)rag1
9.1
2.34s99.519721.55s0.007 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent4
8.8
6.05s94.431035.15s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)agent4
3.1
14.37s306.3497162.75s0.016 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b1203 (rocm)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue