Qwen3.6 27B

Q5_K_M·27B params·GGUF

reasoning

intelligence: see on Artificial Analysis →

checkpoint: unsloth/Qwen3.6-27B-GGUF:Q5_K_M

commit: 82d411acf4a0

weights 18.17 GiB

All runs (19)


legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	chat	1	38.1	34.2	117.8	—	256ms	26.2	—	30	100	2.93s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	rag	1	37.8	32.7	1218.8	—	751ms	26.4	—	842	200	6.12s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	codegen	1	37.7	36.1	206.6	—	348ms	26.5	—	62	1000	27.70s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	1	37.6	35.1	1124.9	—	533ms	26.6	—	599	500	14.23s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	4	37.6	14.6	32.7	—	21.75s	26.6	—	599	500	35.52s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	chat	1	37.4	33.5	119.4	—	251ms	26.7	—	30	100	2.99s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	rag	1	37.1	31.1	1316.7	—	895ms	26.9	—	842	200	6.42s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	codegen	1	37.1	35.7	193.1	—	323ms	27.0	—	62	1000	28.01s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	1	37.0	34.8	1161.2	—	516ms	27.0	—	599	500	14.37s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	4	37.0	14.6	33.6	—	21.54s	27.0	—	599	500	35.39s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	chat	1	19.9	18.6	112.9	—	268ms	50.1	—	30	100	5.38s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	rag	1	19.3	17.3	756.4	—	1.20s	51.8	—	842	200	11.56s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	codegen	1	19.1	18.9	187.0	—	375ms	52.4	—	62	1000	52.77s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	1	19.0	18.2	517.0	—	1.16s	52.7	—	599	500	27.44s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	chat	1	10.7	10.4	83.9	—	358ms	93.5	—	30	100	9.63s	0.004 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	codegen	1	10.7	10.6	145.5	—	440ms	93.8	—	62	1000	94.20s	0.017 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	rag	1	10.7	9.9	452.5	—	1.61s	93.8	—	842	200	20.26s	0.007 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	agent	4	10.7	4.4	8.8	—	70.91s	93.8	—	599	500	117.83s	0.037 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	agent	1	10.7	10.6	2036.6	—	294ms	93.8	—	599	500	47.21s	0.010 GiB

Environment

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power350 W / 450 W max(78% cap)

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1965/2100 MHz · mem 9501 MHz

temp44°C idle · 64°C peak

peak draw337 W

hardware probes

copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps

384-bit9751 MHz82 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
200 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF
300 W	936 GB/s	391 GB/s	65.4 TF	65.3 TF
450 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF

compute: 8.6

backendllama.cpp cuda-4f13cb7 (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power450 W / 450 W max

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1965/2100 MHz · mem 9501 MHz

temp43°C idle · 83°C peak

peak draw430 W

backendllama.cpp cuda-4f13cb7 (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power200 W / 450 W max(44% cap)

backendllama.cpp 59778f0 (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driver590.48.01

python3.12.3

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1320 MHz · mem 1000 MHz

temp47°C idle · 75°C peak

peak draw99 W

hardware probes

copy 41% of theoryFP16 peak 30.3 TF

256-bit8000 MHz20 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
fixed	256 GB/s	106 GB/s	30.3 TF	-

compute: 11.5

backendllama.cpp rocm-4f13cb7 (rocm)

osUbuntu 24.04 LTS

kernel7.0.2-2-pve

driverROCm 7.2.3

libc2.39

python3.12.3

llama.cppversion: 1 (4f13cb7) built with Clang 22.0.0 for Linux x86_64

build flagsGGML_HIP=ON AMDGPU_TARGETS=gfx1151 CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue