Qwen3.6 27B
Q4_K_XL·27B params·GGUF
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.6-27B-GGUF:Qwen3.6-27B-UD-Q4_K_XL.ggufAll runs (15)
| Hardware | Backend | Shape | Conc. | Gen tok/s ↓ | TTFT | TPOT (ms) | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | codegen | 1 | 21.2 | 358ms | 47.0 | 1000 | 47.28s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | chat | 1 | 21.1 | 253ms | 43.6 | 100 | 4.73s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | agent | 1 | 20.7 | 607ms | 47.2 | 500 | 24.14s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | rag | 1 | 19.7 | 919ms | 46.0 | 200 | 10.17s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | codegen | 1 | 11.9 | 481ms | 83.6 | 1000 | 84.14s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | chat | 1 | 11.6 | 360ms | 83.2 | 100 | 8.61s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | codegen | 1 | 11.5 | 426ms | 86.7 | 1000 | 87.08s | 0.003 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | agent | 1 | 11.4 | 1.98s | 84.8 | 500 | 44.00s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | chat | 1 | 11.2 | 351ms | 86.3 | 100 | 8.93s | 0.002 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | agent | 1 | 11.1 | 1.78s | 87.0 | 500 | 45.13s | 0.007 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | rag | 1 | 10.8 | 1.95s | 83.8 | 200 | 18.60s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | rag | 1 | 10.5 | 1.73s | 87.1 | 200 | 19.09s | 0.004 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | agent | 4 | 9.7 | 4.02s | 91.7 | 341 | 35.46s | 0.030 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b8940 (vulkan) | agent | 4 | 7.2 | 3.82s | 128.2 | 500 | 69.24s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | agent | 4 | 5.3 | 3.44s | 178.9 | 500 | 93.77s | -0.002 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b1203 (rocm)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (vulkan)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue