Skip to content

Qwen3.6 27B-MTP

Q8_0·27B params·GGUF
reasoning
checkpoint: unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0

All runs (43)

legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3codegen1
57.1
57.1192.4383ms0.162100017.50s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3chat1
55.9
55.9111.5287ms0.1301001.79s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3agent1
55.0
55.01189.9503ms0.15995009.10s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3rag1
53.8
53.81285.1857ms0.18422003.72s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2codegen1
53.2
53.2205.8317ms0.162100018.79s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2agent1
51.2
51.21210.6495ms0.15995009.77s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wcodegen1
50.6
50.6193.1380ms0.162100019.75s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wchat1
50.5
50.5118.2265ms0.1301001.98s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wagent1
49.9
49.9961.4623ms0.159950010.01s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2chat1
49.6
49.6120.8258ms0.1301002.02s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wcodegen1
48.7
48.7190.1390ms0.162100020.51s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2rag1
47.8
47.81293.5854ms0.18422004.18s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wchat1
47.4
47.4115.1275ms0.0301002.11s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wagent1
47.2
47.2927.1646ms0.159950010.59s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-3-pl-200wrag1
45.1
45.1891.01.05s0.18422004.44s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)mtp-2-pl-200wrag1
42.5
42.5929.21.13s0.18422004.71s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselinechat1
28.1
25.7131.2236ms35.6301003.89s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselinecodegen1
28.0
27.0236.9328ms35.762100037.04s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselineagent1
28.0
26.21160.3516ms35.759950019.12s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselinerag1
28.0
24.61088.0810ms35.78422008.13s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baselineagent4
28.0
11.126.328.39s35.759950046.78s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wchat1
27.1
25.1126.0238ms37.0301003.98s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wcodegen1
26.9
26.1198.6337ms37.262100038.31s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wrag1
26.9
23.51243.4911ms37.28422008.50s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wagent1
26.8
25.3957.8625ms37.459950019.73s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=3agent4
24.3
24.345.713.08s0.159950021.22s0.000 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)MTP n=2agent4
22.3
22.343.014.11s0.159950022.99s0.000 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3agent1
18.7
18.71808.0331ms0.059950026.81s0.023 GiB
legacystack comparable
2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590
llama.cpp 4f13cb7-mtp (cuda)baseline-pl-200wagent4
18.2
15.33.11s55.034122.31s0.000 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3chat1
18.1
18.160.1509ms0.0301005.51s0.007 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3codegen1
17.4
17.4127.7486ms0.162100057.37s0.041 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3rag1
17.1
17.1419.21.87s0.084220011.71s0.011 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2agent1
16.1
16.12070.4292ms0.059950031.11s0.025 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2codegen1
15.7
15.7132.6484ms0.162100063.67s0.044 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2chat1
15.7
15.763.3501ms0.0301006.37s0.008 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2rag1
15.4
15.4438.11.68s0.084220013.01s0.011 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=3agent4
7.9
7.919.139.55s0.059950065.22s0.088 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinechat1
7.7
7.466.4455ms129.53010013.44s0.003 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinecodegen1
7.7
7.7142.8434ms129.7621000130.50s0.017 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselineagent1
7.7
7.72096.0286ms129.759950065.13s0.010 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselineagent4
7.7
3.28.197.82s129.8599500162.65s0.039 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)baselinerag1
7.7
7.3476.31.53s129.884220027.40s0.006 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp 4f13cb7-mtp (rocm)MTP n=2agent4
6.8
6.815.844.75s0.059950075.79s0.096 GiB

Environment

2× GeForce RTX 3090 · 24 GiB each
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090 × 2
archNVIDIA
vram48 GiB (system 64.0 GiB)
power200 W × 2 / 450 W × 2 max(44% cap)
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
captheorycopyfp16bf16
200 W936 GB/s391 GB/s65.4 TF65.4 TF
300 W936 GB/s391 GB/s65.4 TF65.3 TF
450 W936 GB/s391 GB/s65.4 TF65.4 TF
compute: 8.6
backendllama.cpp 4f13cb7-mtp (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
2× GeForce RTX 3090 · 24 GiB each
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090 × 2
archNVIDIA
vram48 GiB (system 64.0 GiB)
power450 W × 2 / 450 W × 2 max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1800/2100 MHz · mem 9501 MHz
temp60°C idle · 69°C peak
peak draw294 W
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
hardware probes
copy 41% of theoryFP16 peak 30.3 TF
256-bit8000 MHz20 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
captheorycopyfp16bf16
fixed256 GB/s106 GB/s30.3 TF-
compute: 11.5
backendllama.cpp 4f13cb7-mtp (rocm)
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue