Gemma-4 E4B-it

Q4_K_M·4B params·128K ctx·GGUF
visiontool-callingtool-callingvisionllamacpp
checkpoint: unsloth/gemma-4-E4B-it-GGUF:Q4_K_M
commit: ce152932ac27
weights 5.56 GiB · on-disk 5.00 GiB

All runs (25)

HardwareBackendShapeConc.Gen tok/sTTFTTPOT (ms)Out tokTotalVRAM Δ
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)codegen1
126.8
67ms7.810007.89s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)chat1
124.3
59ms7.7100804ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent1
123.2
59ms8.05004.06s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)rag1
120.5
118ms7.92001.66s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)chat1
118.4
66ms8.1100845ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)codegen1
117.2
101ms8.410008.53s0.010 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent1
111.7
171ms8.65004.48s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)rag1
101.9
307ms8.42001.96s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent4
59.0
1.88s13.35008.48s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)codegen1
53.8
170ms18.5100018.58s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)chat1
52.9
148ms18.11001.89s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)codegen1
52.4
164ms18.999618.98s0.001 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent4
52.2
1.77s15.65009.58s-0.010 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)agent1
51.6
446ms18.65009.69s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)rag1
50.4
347ms18.82003.97s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)chat1
50.3
141ms18.6971.93s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)agent1
49.8
561ms19.14979.98s0.005 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)rag1
48.1
382ms19.21974.09s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (vulkan)agent4
25.5
5.18s28.550019.63s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b1203 (rocm)agent4
19.0
3.28s47.049726.21s0.009 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)codegen1
10.2
1.19s96.6100098.34s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)chat1
10.0
844ms94.31009.99s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)agent1
9.1
6.49s97.550055.13s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)rag1
8.6
4.16s99.320023.25s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (cpu)agent4
6.0
8.71s149.950083.66s0.000 GiB

Environment

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (cpu)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b1203 (rocm)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 5070 · 11.94 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
backendllama.cpp b9174 (cuda)
serverlemonade unknown
osCachyOS
kernel7.0.0-1-cachyos
driver595.58.03
python3.14.4
containerizedfalse
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (vulkan)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue