Qwen3.6 27B
Q4_K_XL·27B params·GGUF
reasoning
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.6-27B-GGUF:UD-Q4_K_XLcommit:
82d411acf4a0weights 16.40 GiB
All runs (25)
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | chat | 1 | 41.1 | 36.0 | 120.4 | — | 255ms | 24.3 | — | 30 | 100 | 2.77s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | rag | 1 | 40.8 | 34.6 | 1303.2 | — | 760ms | 24.5 | — | 842 | 200 | 5.77s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | codegen | 1 | 40.7 | 38.8 | 205.3 | — | 360ms | 24.6 | — | 62 | 1000 | 25.77s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 1 | 40.6 | 37.5 | 1187.2 | — | 514ms | 24.6 | — | 599 | 500 | 13.33s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 4 | 40.5 | 15.9 | 36.3 | — | 19.92s | 24.7 | — | 599 | 500 | 32.64s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | chat | 1 | 40.3 | 36.7 | 127.8 | — | 235ms | 24.8 | — | 30 | 100 | 2.73s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | rag | 1 | 40.0 | 34.0 | 1367.1 | — | 784ms | 25.0 | — | 842 | 200 | 5.88s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | codegen | 1 | 40.0 | 38.3 | 223.3 | — | 311ms | 25.0 | — | 62 | 1000 | 26.12s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 1 | 39.9 | 36.5 | 1188.3 | — | 504ms | 25.1 | — | 599 | 500 | 13.70s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 4 | 39.9 | 15.5 | 36.5 | — | 20.43s | 25.1 | — | 599 | 500 | 33.40s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | chat | 1 | 22.9 | 21.1 | 118.8 | — | 253ms | 43.6 | — | 30 | 100 | 4.73s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | rag | 1 | 21.7 | 19.7 | 1038.0 | — | 919ms | 46.0 | — | 842 | 200 | 10.17s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | codegen | 1 | 21.3 | 21.2 | 192.9 | — | 358ms | 47.0 | — | 62 | 1000 | 47.28s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 1 | 21.2 | 20.7 | 986.9 | — | 607ms | 47.2 | — | 599 | 500 | 24.14s | 0.000 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (vulkan) | baseline | chat | 1 | 12.0 | 11.6 | 86.0 | — | 360ms | 83.2 | — | 31 | 100 | 8.61s | 0.000 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (vulkan) | baseline | codegen | 1 | 12.0 | 11.9 | 123.5 | — | 481ms | 83.6 | — | 63 | 1000 | 84.14s | 0.000 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (vulkan) | baseline | rag | 1 | 11.9 | 10.8 | 268.0 | — | 1.95s | 83.8 | — | 1005 | 200 | 18.60s | 0.000 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (vulkan) | baseline | agent | 1 | 11.8 | 11.4 | 245.5 | — | 1.98s | 84.8 | — | 599 | 500 | 44.00s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 4 | 10.9 | 9.7 | — | — | 4.02s | 91.7 | — | — | 341 | 35.46s | 0.030 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (vulkan) | baseline | agent | 4 | 7.8 | 7.2 | 157.2 | — | 3.82s | 128.2 | — | 599 | 500 | 69.24s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp43°C idle · 65°C peak
peak draw336 W
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
| 300 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.3 TF |
| 450 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
compute: 8.6
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp45°C idle · 83°C peak
peak draw441 W
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
hardware probes
copy 41% of theoryFP16 peak 30.3 TF
256-bit8000 MHz20 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| fixed | 256 GB/s | 106 GB/s | 30.3 TF | - |
compute: 11.5
backendllama.cpp b1203 (rocm)
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (vulkan)
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue