Qwen3.6 27B-MTP

Q8_0·27B params·GGUF

reasoning

intelligence: see on Artificial Analysis →

checkpoint: unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0

All runs (43)


legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=3	codegen	1	57.1	57.1	192.4	—	383ms	0.1	—	62	1000	17.50s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=3	chat	1	55.9	55.9	111.5	—	287ms	0.1	—	30	100	1.79s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=3	agent	1	55.0	55.0	1189.9	—	503ms	0.1	—	599	500	9.10s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=3	rag	1	53.8	53.8	1285.1	—	857ms	0.1	—	842	200	3.72s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=2	codegen	1	53.2	53.2	205.8	—	317ms	0.1	—	62	1000	18.79s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=2	agent	1	51.2	51.2	1210.6	—	495ms	0.1	—	599	500	9.77s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	mtp-3-pl-200w	codegen	1	50.6	50.6	193.1	—	380ms	0.1	—	62	1000	19.75s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	mtp-3-pl-200w	chat	1	50.5	50.5	118.2	—	265ms	0.1	—	30	100	1.98s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	mtp-3-pl-200w	agent	1	49.9	49.9	961.4	—	623ms	0.1	—	599	500	10.01s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=2	chat	1	49.6	49.6	120.8	—	258ms	0.1	—	30	100	2.02s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	mtp-2-pl-200w	codegen	1	48.7	48.7	190.1	—	390ms	0.1	—	62	1000	20.51s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=2	rag	1	47.8	47.8	1293.5	—	854ms	0.1	—	842	200	4.18s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	mtp-2-pl-200w	chat	1	47.4	47.4	115.1	—	275ms	0.0	—	30	100	2.11s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	mtp-2-pl-200w	agent	1	47.2	47.2	927.1	—	646ms	0.1	—	599	500	10.59s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	mtp-3-pl-200w	rag	1	45.1	45.1	891.0	—	1.05s	0.1	—	842	200	4.44s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	mtp-2-pl-200w	rag	1	42.5	42.5	929.2	—	1.13s	0.1	—	842	200	4.71s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	chat	1	28.1	25.7	131.2	—	236ms	35.6	—	30	100	3.89s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	codegen	1	28.0	27.0	236.9	—	328ms	35.7	—	62	1000	37.04s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	agent	1	28.0	26.2	1160.3	—	516ms	35.7	—	599	500	19.12s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	rag	1	28.0	24.6	1088.0	—	810ms	35.7	—	842	200	8.13s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline	agent	4	28.0	11.1	26.3	—	28.39s	35.7	—	599	500	46.78s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	baseline-pl-200w	chat	1	27.1	25.1	126.0	—	238ms	37.0	—	30	100	3.98s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	baseline-pl-200w	codegen	1	26.9	26.1	198.6	—	337ms	37.2	—	62	1000	38.31s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	baseline-pl-200w	rag	1	26.9	23.5	1243.4	—	911ms	37.2	—	842	200	8.50s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	baseline-pl-200w	agent	1	26.8	25.3	957.8	—	625ms	37.4	—	599	500	19.73s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=3	agent	4	24.3	24.3	45.7	—	13.08s	0.1	—	599	500	21.22s	0.000 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB each450 W × 2 maxdrv 590	llama.cpp cuda-4f13cb7 (cuda)	MTP n=2	agent	4	22.3	22.3	43.0	—	14.11s	0.1	—	599	500	22.99s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=3	agent	1	18.7	18.7	1808.0	—	331ms	0.0	—	599	500	26.81s	0.023 GiB
legacy	stack comparable	2× GeForce RTX 3090 · 24 GiB eachcap 200 W × 2drv 590	llama.cpp 4f13cb7-mtp (cuda)	baseline-pl-200w	agent	4	18.2	15.3	—	—	3.11s	55.0	—	—	341	22.31s	0.000 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=3	chat	1	18.1	18.1	60.1	—	509ms	0.0	—	30	100	5.51s	0.007 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=3	codegen	1	17.4	17.4	127.7	—	486ms	0.1	—	62	1000	57.37s	0.041 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=3	rag	1	17.1	17.1	419.2	—	1.87s	0.0	—	842	200	11.71s	0.011 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=2	agent	1	16.1	16.1	2070.4	—	292ms	0.0	—	599	500	31.11s	0.025 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=2	codegen	1	15.7	15.7	132.6	—	484ms	0.1	—	62	1000	63.67s	0.044 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=2	chat	1	15.7	15.7	63.3	—	501ms	0.0	—	30	100	6.37s	0.008 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=2	rag	1	15.4	15.4	438.1	—	1.68s	0.0	—	842	200	13.01s	0.011 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=3	agent	4	7.9	7.9	19.1	—	39.55s	0.0	—	599	500	65.22s	0.088 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	baseline	chat	1	7.7	7.4	66.4	—	455ms	129.5	—	30	100	13.44s	0.003 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	baseline	codegen	1	7.7	7.7	142.8	—	434ms	129.7	—	62	1000	130.50s	0.017 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	baseline	agent	1	7.7	7.7	2096.0	—	286ms	129.7	—	599	500	65.13s	0.010 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	baseline	agent	4	7.7	3.2	8.1	—	97.82s	129.8	—	599	500	162.65s	0.039 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	baseline	rag	1	7.7	7.3	476.3	—	1.53s	129.8	—	842	200	27.40s	0.006 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp 4f13cb7-mtp (rocm)	MTP n=2	agent	4	6.8	6.8	15.8	—	44.75s	0.0	—	599	500	75.79s	0.096 GiB

Environment

2× GeForce RTX 3090 · 24 GiB each

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090 × 2

archNVIDIA

vram48 GiB (system 64.0 GiB)

power200 W × 2 / 450 W × 2 max(44% cap)

hardware probes

copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps

384-bit9751 MHz82 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
200 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF
300 W	936 GB/s	391 GB/s	65.4 TF	65.3 TF
450 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF

compute: 8.6

backendllama.cpp 4f13cb7-mtp (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driver590.48.01

python3.12.3

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

2× GeForce RTX 3090 · 24 GiB each

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090 × 2

archNVIDIA

vram48 GiB (system 64.0 GiB)

power450 W × 2 / 450 W × 2 max

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1800/2100 MHz · mem 9501 MHz

temp60°C idle · 69°C peak

peak draw294 W

backendllama.cpp cuda-4f13cb7 (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

hardware probes

copy 41% of theoryFP16 peak 30.3 TF

256-bit8000 MHz20 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
fixed	256 GB/s	106 GB/s	30.3 TF	-

compute: 11.5

backendllama.cpp 4f13cb7-mtp (rocm)

osUbuntu 24.04 LTS

kernel7.0.2-2-pve

python3.12.3

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue