Qwen3.6 27B
Q3_K_M·27B params·GGUF
reasoning
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.6-27B-GGUF:Q3_K_Mcommit:
82d411acf4a0weights 12.65 GiB
All runs (20)
| Hardware | Backend | Mode | Shape | Conc. | Gen tok/s ↓ | Prefill tok/s | TTFT | TPOT (ms) | Prompt tok | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | codegen | 1 | 37.2 | 219.7 | 296ms | 25.9 | 62 | 1000 | 26.89s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 1 | 35.9 | 1227.1 | 488ms | 26.0 | 599 | 500 | 13.91s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | chat | 1 | 35.8 | 122.7 | 244ms | 25.1 | 30 | 100 | 2.79s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | codegen | 1 | 35.7 | 190.8 | 344ms | 27.0 | 62 | 1000 | 28.01s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 1 | 34.7 | 1186.5 | 505ms | 27.1 | 599 | 500 | 14.41s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | chat | 1 | 33.9 | 128.9 | 235ms | 26.5 | 30 | 100 | 2.95s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | rag | 1 | 33.3 | 1361.4 | 778ms | 25.5 | 842 | 200 | 6.00s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | rag | 1 | 31.9 | 1319.7 | 802ms | 26.9 | 842 | 200 | 6.28s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | codegen | 1 | 20.9 | 188.7 | 356ms | 47.3 | 62 | 1000 | 47.80s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 1 | 20.5 | 1007.2 | 595ms | 47.4 | 599 | 500 | 24.34s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | chat | 1 | 20.5 | 117.1 | 259ms | 44.9 | 30 | 100 | 4.88s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | rag | 1 | 19.3 | 1009.6 | 938ms | 46.8 | 842 | 200 | 10.35s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 4 | 15.0 | 37.3 | 21.12s | 26.0 | 599 | 500 | 34.63s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 4 | 14.5 | 38.5 | 21.69s | 27.1 | 599 | 500 | 35.78s | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | codegen | 1 | 14.2 | 154.9 | 413ms | 69.7 | 62 | 1000 | 70.20s | 0.016 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 1 | 14.2 | 2163.9 | 277ms | 70.0 | 599 | 500 | 35.25s | 0.009 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | chat | 1 | 13.8 | 91.0 | 333ms | 69.4 | 30 | 100 | 7.23s | 0.004 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | rag | 1 | 12.9 | 463.1 | 1.53s | 70.0 | 842 | 200 | 15.47s | 0.007 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 4 | 10.1 | — | 3.89s | 87.3 | — | 341 | 33.93s | 0.030 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 4 | 5.9 | 14.3 | 52.94s | 69.9 | 599 | 500 | 87.89s | 0.037 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1980/2100 MHz · mem 9501 MHz
temp44°C idle · 65°C peak
peak draw340 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1980/2100 MHz · mem 9501 MHz
temp44°C idle · 81°C peak
peak draw433 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1037 MHz · mem 1000 MHz
temp49°C idle · 76°C peak
peak draw99 W
backendllama.cpp rocm-4f13cb7 (rocm)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
driverROCm 7.2.3
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 1 (4f13cb7) built with Clang 22.0.0 for Linux x86_64
build flagsGGML_HIP=ON AMDGPU_TARGETS=gfx1151 CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue