Qwen3.6 27B-MTP
Q8_0·27B params·GGUF
reasoning
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0All runs (43)
| Hardware | Backend | Mode | Shape | Conc. | Gen tok/s ↓ | Prefill tok/s | TTFT | TPOT (ms) | Prompt tok | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | codegen | 1 | 57.1 | 192.4 | 383ms | 0.1 | 62 | 1000 | 17.50s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | chat | 1 | 55.9 | 111.5 | 287ms | 0.1 | 30 | 100 | 1.79s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | agent | 1 | 55.0 | 1189.9 | 503ms | 0.1 | 599 | 500 | 9.10s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | rag | 1 | 53.8 | 1285.1 | 857ms | 0.1 | 842 | 200 | 3.72s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | codegen | 1 | 53.2 | 205.8 | 317ms | 0.1 | 62 | 1000 | 18.79s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | agent | 1 | 51.2 | 1210.6 | 495ms | 0.1 | 599 | 500 | 9.77s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | mtp-3-pl-200w | codegen | 1 | 50.6 | 193.1 | 380ms | 0.1 | 62 | 1000 | 19.75s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | mtp-3-pl-200w | chat | 1 | 50.5 | 118.2 | 265ms | 0.1 | 30 | 100 | 1.98s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | mtp-3-pl-200w | agent | 1 | 49.9 | 961.4 | 623ms | 0.1 | 599 | 500 | 10.01s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | chat | 1 | 49.6 | 120.8 | 258ms | 0.1 | 30 | 100 | 2.02s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | mtp-2-pl-200w | codegen | 1 | 48.7 | 190.1 | 390ms | 0.1 | 62 | 1000 | 20.51s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | rag | 1 | 47.8 | 1293.5 | 854ms | 0.1 | 842 | 200 | 4.18s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | mtp-2-pl-200w | chat | 1 | 47.4 | 115.1 | 275ms | 0.0 | 30 | 100 | 2.11s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | mtp-2-pl-200w | agent | 1 | 47.2 | 927.1 | 646ms | 0.1 | 599 | 500 | 10.59s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | mtp-3-pl-200w | rag | 1 | 45.1 | 891.0 | 1.05s | 0.1 | 842 | 200 | 4.44s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | mtp-2-pl-200w | rag | 1 | 42.5 | 929.2 | 1.13s | 0.1 | 842 | 200 | 4.71s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | codegen | 1 | 27.0 | 236.9 | 328ms | 35.7 | 62 | 1000 | 37.04s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | agent | 1 | 26.2 | 1160.3 | 516ms | 35.7 | 599 | 500 | 19.12s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | baseline-pl-200w | codegen | 1 | 26.1 | 198.6 | 337ms | 37.2 | 62 | 1000 | 38.31s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | chat | 1 | 25.7 | 131.2 | 236ms | 35.6 | 30 | 100 | 3.89s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | baseline-pl-200w | agent | 1 | 25.3 | 957.8 | 625ms | 37.4 | 599 | 500 | 19.73s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | baseline-pl-200w | chat | 1 | 25.1 | 126.0 | 238ms | 37.0 | 30 | 100 | 3.98s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | rag | 1 | 24.6 | 1088.0 | 810ms | 35.7 | 842 | 200 | 8.13s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | agent | 4 | 24.3 | 45.7 | 13.08s | 0.1 | 599 | 500 | 21.22s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | baseline-pl-200w | rag | 1 | 23.5 | 1243.4 | 911ms | 37.2 | 842 | 200 | 8.50s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | agent | 4 | 22.3 | 43.0 | 14.11s | 0.1 | 599 | 500 | 22.99s | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | agent | 1 | 18.7 | 1808.0 | 331ms | 0.0 | 599 | 500 | 26.81s | 0.023 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | chat | 1 | 18.1 | 60.1 | 509ms | 0.0 | 30 | 100 | 5.51s | 0.007 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | codegen | 1 | 17.4 | 127.7 | 486ms | 0.1 | 62 | 1000 | 57.37s | 0.041 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | rag | 1 | 17.1 | 419.2 | 1.87s | 0.0 | 842 | 200 | 11.71s | 0.011 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | agent | 1 | 16.1 | 2070.4 | 292ms | 0.0 | 599 | 500 | 31.11s | 0.025 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | codegen | 1 | 15.7 | 132.6 | 484ms | 0.1 | 62 | 1000 | 63.67s | 0.044 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | chat | 1 | 15.7 | 63.3 | 501ms | 0.0 | 30 | 100 | 6.37s | 0.008 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | rag | 1 | 15.4 | 438.1 | 1.68s | 0.0 | 842 | 200 | 13.01s | 0.011 GiB |
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590 | llama.cpp 4f13cb7-mtp (cuda) | baseline-pl-200w | agent | 4 | 15.3 | — | 3.11s | 55.0 | — | 341 | 22.31s | 0.000 GiB |
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | agent | 4 | 11.1 | 26.3 | 28.39s | 35.7 | 599 | 500 | 46.78s | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | agent | 4 | 7.9 | 19.1 | 39.55s | 0.0 | 599 | 500 | 65.22s | 0.088 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | agent | 1 | 7.7 | 2096.0 | 286ms | 129.7 | 599 | 500 | 65.13s | 0.010 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | codegen | 1 | 7.7 | 142.8 | 434ms | 129.7 | 62 | 1000 | 130.50s | 0.017 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | chat | 1 | 7.4 | 66.4 | 455ms | 129.5 | 30 | 100 | 13.44s | 0.003 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | rag | 1 | 7.3 | 476.3 | 1.53s | 129.8 | 842 | 200 | 27.40s | 0.006 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | agent | 4 | 6.8 | 15.8 | 44.75s | 0.0 | 599 | 500 | 75.79s | 0.096 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | agent | 4 | 3.2 | 8.1 | 97.82s | 129.8 | 599 | 500 | 162.65s | 0.039 GiB |
Environment
2× GeForce RTX 3090 · 24 GiB each
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090 × 2
archNVIDIA
vram48 GiB (system 64.0 GiB)
power200 W × 2 / 450 W × 2 max(44% cap)
backendllama.cpp 4f13cb7-mtp (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
2× GeForce RTX 3090 · 24 GiB each
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090 × 2
archNVIDIA
vram48 GiB (system 64.0 GiB)
power450 W × 2 / 450 W × 2 max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1800/2100 MHz · mem 9501 MHz
temp60°C idle · 69°C peak
peak draw294 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp 4f13cb7-mtp (rocm)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue