Qwen3.6 27B
Q4_K_M·27B params·GGUF
reasoning
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.6-27B-GGUF:Q4_K_Mcommit:
82d411acf4a0weights 15.66 GiB
All runs (20)
| Hardware | Backend | Mode | Shape | Conc. | Gen tok/s ↓ | Prefill tok/s | TTFT | TPOT (ms) | Prompt tok | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | codegen | 1 | 40.3 | 189.2 | 312ms | 23.7 | 62 | 1000 | 24.82s | 0.010 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | codegen | 1 | 39.3 | 206.7 | 302ms | 24.2 | 62 | 1000 | 25.45s | 0.010 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 1 | 39.2 | 1213.7 | 519ms | 23.8 | 599 | 500 | 12.76s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 1 | 38.3 | 1220.5 | 500ms | 24.3 | 599 | 500 | 13.06s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | chat | 1 | 37.4 | 127.5 | 235ms | 23.5 | 30 | 100 | 2.67s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | chat | 1 | 36.3 | 126.8 | 237ms | 24.0 | 30 | 100 | 2.75s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | rag | 1 | 35.8 | 1361.9 | 759ms | 23.7 | 842 | 200 | 5.59s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | rag | 1 | 33.8 | 1192.9 | 857ms | 24.2 | 842 | 200 | 5.91s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | codegen | 1 | 21.6 | 195.0 | 375ms | 45.9 | 62 | 1000 | 46.39s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | chat | 1 | 21.1 | 117.8 | 255ms | 43.5 | 30 | 100 | 4.73s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 1 | 21.1 | 957.3 | 626ms | 46.0 | 599 | 500 | 23.75s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | rag | 1 | 20.0 | 1031.2 | 931ms | 45.0 | 842 | 200 | 9.99s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 4 | 16.3 | 36.3 | 19.50s | 23.8 | 599 | 500 | 31.84s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 4 | 16.1 | 37.8 | 19.80s | 24.3 | 599 | 500 | 32.36s | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | codegen | 1 | 12.0 | 146.3 | 435ms | 82.7 | 62 | 1000 | 83.14s | 0.017 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 1 | 12.0 | 2076.2 | 289ms | 82.8 | 599 | 500 | 41.70s | 0.010 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | chat | 1 | 11.7 | 85.5 | 352ms | 82.4 | 30 | 100 | 8.54s | 0.004 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | rag | 1 | 11.1 | 466.7 | 1.53s | 82.9 | 842 | 200 | 18.02s | 0.007 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 4 | 10.0 | — | 3.93s | 87.5 | — | 341 | 33.98s | 0.040 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 4 | 5.0 | 12.6 | 62.69s | 82.8 | 599 | 500 | 104.09s | 0.037 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp44°C idle · 65°C peak
peak draw329 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp43°C idle · 82°C peak
peak draw434 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1106 MHz · mem 1000 MHz
temp48°C idle · 75°C peak
peak draw100 W
backendllama.cpp rocm-4f13cb7 (rocm)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
driverROCm 7.2.3
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 1 (4f13cb7) built with Clang 22.0.0 for Linux x86_64
build flagsGGML_HIP=ON AMDGPU_TARGETS=gfx1151 CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue