Qwen3.6 27B-MTP

Q8_0·27B params·GGUF
reasoning
checkpoint: unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0

All runs (43)

HardwareBackendModeShapeConc.Gen tok/sPrefill tok/sTTFTTPOT (ms)Prompt tokOut tokTotalVRAM Δ
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3codegen1
57.1
192.4383ms0.162100017.50s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3chat1
55.9
111.5287ms0.1301001.79s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3agent1
55.0
1189.9503ms0.15995009.10s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3rag1
53.8
1285.1857ms0.18422003.72s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2codegen1
53.2
205.8317ms0.162100018.79s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2agent1
51.2
1210.6495ms0.15995009.77s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wcodegen1
50.6
193.1380ms0.162100019.75s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wchat1
50.5
118.2265ms0.1301001.98s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wagent1
49.9
961.4623ms0.159950010.01s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2chat1
49.6
120.8258ms0.1301002.02s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wcodegen1
48.7
190.1390ms0.162100020.51s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2rag1
47.8
1293.5854ms0.18422004.18s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wchat1
47.4
115.1275ms0.0301002.11s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wagent1
47.2
927.1646ms0.159950010.59s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wrag1
45.1
891.01.05s0.18422004.44s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wrag1
42.5
929.21.13s0.18422004.71s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)baselinecodegen1
27.0
236.9328ms35.762100037.04s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)baselineagent1
26.2
1160.3516ms35.759950019.12s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wcodegen1
26.1
198.6337ms37.262100038.31s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)baselinechat1
25.7
131.2236ms35.6301003.89s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wagent1
25.3
957.8625ms37.459950019.73s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wchat1
25.1
126.0238ms37.0301003.98s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)baselinerag1
24.6
1088.0810ms35.78422008.13s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3agent4
24.3
45.713.08s0.159950021.22s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wrag1
23.5
1243.4911ms37.28422008.50s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2agent4
22.3
43.014.11s0.159950022.99s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3agent1
18.7
1808.0331ms0.059950026.81s0.023 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3chat1
18.1
60.1509ms0.0301005.51s0.007 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3codegen1
17.4
127.7486ms0.162100057.37s0.041 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3rag1
17.1
419.21.87s0.084220011.71s0.011 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2agent1
16.1
2070.4292ms0.059950031.11s0.025 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2codegen1
15.7
132.6484ms0.162100063.67s0.044 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2chat1
15.7
63.3501ms0.0301006.37s0.008 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2rag1
15.4
438.11.68s0.084220013.01s0.011 GiB
2× GeForce RTX 3090 · 24 GiB each200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wagent4
15.3
3.11s55.034122.31s0.000 GiB
2× GeForce RTX 3090 · 24 GiB each450 W × 2drv 590
llama.cpp cuda-4f13cb7 (cuda)baselineagent4
11.1
26.328.39s35.759950046.78s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3agent4
7.9
19.139.55s0.059950065.22s0.088 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselineagent1
7.7
2096.0286ms129.759950065.13s0.010 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinecodegen1
7.7
142.8434ms129.7621000130.50s0.017 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinechat1
7.4
66.4455ms129.53010013.44s0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinerag1
7.3
476.31.53s129.884220027.40s0.006 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2agent4
6.8
15.844.75s0.059950075.79s0.096 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselineagent4
3.2
8.197.82s129.8599500162.65s0.039 GiB

Environment

2× GeForce RTX 3090 · 24 GiB each
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090 × 2
archNVIDIA
vram48 GiB (system 64.0 GiB)
power200 W × 2 / 450 W × 2 max(44% cap)
backendllama.cpp 4f13cb7-mtp (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
2× GeForce RTX 3090 · 24 GiB each
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090 × 2
archNVIDIA
vram48 GiB (system 64.0 GiB)
power450 W × 2 / 450 W × 2 max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1800/2100 MHz · mem 9501 MHz
temp60°C idle · 69°C peak
peak draw294 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp 4f13cb7-mtp (rocm)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue