Gemma-4 E2B-it

Q4_K_M·2B params·GGUF
checkpoint: unsloth/gemma-4-E2B-it-GGUF:Q4_K_M

All runs (15)

HardwareBackendShapeConc.Gen tok/sTTFTTPOT (ms)Out tokTotalVRAM Δ
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)codegen1
216.7
38ms4.610004.62s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)chat1
211.9
34ms4.5100472ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent1
209.5
30ms4.75002.39s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)rag1
206.5
63ms4.6200969ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)codegen1
195.1
59ms5.110005.13s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)chat1
193.1
44ms4.9100518ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent1
189.3
84ms5.15002.64s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)rag1
181.1
116ms5.02001.10s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent4
97.6
987ms8.25005.12s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (rocm)codegen1
87.6
103ms11.3100011.41s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (rocm)chat1
86.2
87ms11.21001.16s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (rocm)agent1
83.7
101ms11.75005.97s0.003 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent4
83.5
1.04s9.85005.99s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (rocm)rag1
76.3
364ms11.52002.62s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)llama.cpp b8940 (rocm)agent4
29.4
2.06s30.150017.02s0.008 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (rocm)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 5070 · 11.94 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
backendllama.cpp b9174 (cuda)
serverlemonade unknown
osCachyOS
kernel7.0.0-1-cachyos
driver595.58.03
python3.14.4
containerizedfalse
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue