Gemma-4 26B-A4B-it

Q4_K_M·26B params·256K ctx·GGUF

visiontool-callinghottool-callingvisionllamacpp

intelligence: see on Artificial Analysis →

checkpoint: unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_M

commit: b68961b3c96e

weights 16.82 GiB · on-disk 16.90 GiB

All runs (20)


legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	chat	1	119.5	104.1	370.0	—	103ms	8.4	—	36	100	960ms	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	codegen	1	117.3	109.6	313.2	—	235ms	8.5	—	71	1000	9.12s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	rag	1	116.4	84.9	1004.8	—	634ms	8.6	—	853	200	2.35s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	agent	1	116.3	100.1	1229.9	—	426ms	8.6	—	618	500	5.00s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 300 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	agent	4	116.0	42.4	85.0	—	7.71s	8.6	—	618	500	12.21s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	chat	1	52.0	47.7	151.7	—	244ms	19.2	—	37	100	2.10s	0.001 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	codegen	1	48.3	47.9	277.2	—	296ms	20.7	—	71	1000	20.88s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	agent	1	47.3	44.8	744.6	—	712ms	21.1	—	618	500	11.15s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	rag	1	47.2	43.2	1212.2	—	590ms	21.2	—	1012	200	4.63s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	agent	4	24.5	18.3	84.6	—	7.29s	40.8	—	618	500	27.38s	0.000 GiB

Environment

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

hardware probes

copy 41% of theoryFP16 peak 30.3 TF

256-bit8000 MHz20 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
fixed	256 GB/s	106 GB/s	30.3 TF	-

compute: 11.5

backendllama.cpp b8940 (cpu)

osUbuntu 24.04.4 LTS

kernel7.0.2-2-pve

python3.12.3

runs/cell3

warmups1

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power300 W / 450 W max(67% cap)

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1920/2100 MHz · mem 9501 MHz

temp52°C idle · 67°C peak

peak draw295 W

hardware probes

copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps

384-bit9751 MHz82 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
200 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF
300 W	936 GB/s	391 GB/s	65.4 TF	65.3 TF
450 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF

compute: 8.6

backendllama.cpp cuda-4f13cb7 (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

backendllama.cpp b1203 (rocm)

osUbuntu 24.04.4 LTS

kernel7.0.2-2-pve

python3.12.3

runs/cell3

warmups1

endpoint/v1/chat/completions

streamingtrue

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

backendllama.cpp b8940 (vulkan)

osUbuntu 24.04.4 LTS

kernel7.0.2-2-pve

python3.12.3

runs/cell3

warmups1

endpoint/v1/chat/completions

streamingtrue