Qwen3.6 27B

Q4_K_XL·27B params·GGUF

reasoning

intelligence: see on Artificial Analysis →

checkpoint: unsloth/Qwen3.6-27B-GGUF:UD-Q4_K_XL

commit: 82d411acf4a0

weights 16.40 GiB

All runs (25)


legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	chat	1	41.1	36.0	120.4	—	255ms	24.3	—	30	100	2.77s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	rag	1	40.8	34.6	1303.2	—	760ms	24.5	—	842	200	5.77s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	codegen	1	40.7	38.8	205.3	—	360ms	24.6	—	62	1000	25.77s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	1	40.6	37.5	1187.2	—	514ms	24.6	—	599	500	13.33s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiB450 W maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	4	40.5	15.9	36.3	—	19.92s	24.7	—	599	500	32.64s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	chat	1	40.3	36.7	127.8	—	235ms	24.8	—	30	100	2.73s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	rag	1	40.0	34.0	1367.1	—	784ms	25.0	—	842	200	5.88s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	codegen	1	40.0	38.3	223.3	—	311ms	25.0	—	62	1000	26.12s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	1	39.9	36.5	1188.3	—	504ms	25.1	—	599	500	13.70s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	4	39.9	15.5	36.5	—	20.43s	25.1	—	599	500	33.40s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	chat	1	22.9	21.1	118.8	—	253ms	43.6	—	30	100	4.73s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	rag	1	21.7	19.7	1038.0	—	919ms	46.0	—	842	200	10.17s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	codegen	1	21.3	21.2	192.9	—	358ms	47.0	—	62	1000	47.28s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	1	21.2	20.7	986.9	—	607ms	47.2	—	599	500	24.14s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	chat	1	12.0	11.6	86.0	—	360ms	83.2	—	31	100	8.61s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	codegen	1	12.0	11.9	123.5	—	481ms	83.6	—	63	1000	84.14s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	rag	1	11.9	10.8	268.0	—	1.95s	83.8	—	1005	200	18.60s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	agent	1	11.8	11.4	245.5	—	1.98s	84.8	—	599	500	44.00s	0.000 GiB
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	4	10.9	9.7	—	—	4.02s	91.7	—	—	341	35.46s	0.030 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (vulkan)	baseline	agent	4	7.8	7.2	157.2	—	3.82s	128.2	—	599	500	69.24s	0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power350 W / 450 W max(78% cap)

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1965/2100 MHz · mem 9501 MHz

temp43°C idle · 65°C peak

peak draw336 W

hardware probes

copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps

384-bit9751 MHz82 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
200 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF
300 W	936 GB/s	391 GB/s	65.4 TF	65.3 TF
450 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF

compute: 8.6

backendllama.cpp cuda-4f13cb7 (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power450 W / 450 W max

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1965/2100 MHz · mem 9501 MHz

temp45°C idle · 83°C peak

peak draw441 W

backendllama.cpp cuda-4f13cb7 (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power200 W / 450 W max(44% cap)

backendllama.cpp 59778f0 (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driver590.48.01

python3.12.3

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

hardware probes

copy 41% of theoryFP16 peak 30.3 TF

256-bit8000 MHz20 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
fixed	256 GB/s	106 GB/s	30.3 TF	-

compute: 11.5

backendllama.cpp b1203 (rocm)

osUbuntu 24.04.4 LTS

kernel7.0.2-2-pve

python3.12.3

runs/cell3

warmups1

endpoint/v1/chat/completions

streamingtrue

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

backendllama.cpp b8940 (vulkan)

osUbuntu 24.04.4 LTS

kernel7.0.2-2-pve

python3.12.3

runs/cell3

warmups1

endpoint/v1/chat/completions

streamingtrue