Qwen3.6 27B
Q5_K_M·27B params·GGUF
reasoning
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.6-27B-GGUF:Q5_K_Mcommit:
82d411acf4a0weights 18.17 GiB
All runs (19)
| Hardware | Backend | Mode | Shape | Conc. | Gen tok/s ↓ | Prefill tok/s | TTFT | TPOT (ms) | Prompt tok | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | codegen | 1 | 36.1 | 206.6 | 348ms | 26.5 | 62 | 1000 | 27.70s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | codegen | 1 | 35.7 | 193.1 | 323ms | 27.0 | 62 | 1000 | 28.01s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 1 | 35.1 | 1124.9 | 533ms | 26.6 | 599 | 500 | 14.23s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 1 | 34.8 | 1161.2 | 516ms | 27.0 | 599 | 500 | 14.37s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | chat | 1 | 34.2 | 117.8 | 256ms | 26.2 | 30 | 100 | 2.93s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | chat | 1 | 33.5 | 119.4 | 251ms | 26.7 | 30 | 100 | 2.99s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | rag | 1 | 32.7 | 1218.8 | 751ms | 26.4 | 842 | 200 | 6.12s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | rag | 1 | 31.1 | 1316.7 | 895ms | 26.9 | 842 | 200 | 6.42s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | codegen | 1 | 18.9 | 187.0 | 375ms | 52.4 | 62 | 1000 | 52.77s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | chat | 1 | 18.6 | 112.9 | 268ms | 50.1 | 30 | 100 | 5.38s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 1 | 18.2 | 517.0 | 1.16s | 52.7 | 599 | 500 | 27.44s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | rag | 1 | 17.3 | 756.4 | 1.20s | 51.8 | 842 | 200 | 11.56s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 4 | 14.6 | 32.7 | 21.75s | 26.6 | 599 | 500 | 35.52s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 4 | 14.6 | 33.6 | 21.54s | 27.0 | 599 | 500 | 35.39s | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | codegen | 1 | 10.6 | 145.5 | 440ms | 93.8 | 62 | 1000 | 94.20s | 0.017 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 1 | 10.6 | 2036.6 | 294ms | 93.8 | 599 | 500 | 47.21s | 0.010 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | chat | 1 | 10.4 | 83.9 | 358ms | 93.5 | 30 | 100 | 9.63s | 0.004 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | rag | 1 | 9.9 | 452.5 | 1.61s | 93.8 | 842 | 200 | 20.26s | 0.007 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 4 | 4.4 | 8.8 | 70.91s | 93.8 | 599 | 500 | 117.83s | 0.037 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp44°C idle · 64°C peak
peak draw337 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp43°C idle · 83°C peak
peak draw430 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1320 MHz · mem 1000 MHz
temp47°C idle · 75°C peak
peak draw99 W
backendllama.cpp rocm-4f13cb7 (rocm)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
driverROCm 7.2.3
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 1 (4f13cb7) built with Clang 22.0.0 for Linux x86_64
build flagsGGML_HIP=ON AMDGPU_TARGETS=gfx1151 CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue