Qwen3.6 35B-A3B-MTP
Q4_K_M·35B params·GGUF
reasoning
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.6-35B-A3B-MTP-GGUF:UD-Q4_K_MAll runs (30)
| Hardware | Backend | Mode | Shape | Conc. | Gen tok/s ↓ | Prefill tok/s | TTFT | TPOT (ms) | Prompt tok | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | codegen | 1 | 169.0 | 371.9 | 172ms | 0.1 | 62 | 1000 | 5.92s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | agent | 1 | 162.7 | 2760.7 | 231ms | 0.1 | 599 | 500 | 3.07s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | agent | 1 | 161.4 | 2506.1 | 239ms | 0.1 | 599 | 500 | 3.10s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | codegen | 1 | 160.4 | 332.9 | 177ms | 0.1 | 62 | 1000 | 6.24s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | chat | 1 | 148.3 | 208.5 | 149ms | 0.1 | 30 | 100 | 674ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | chat | 1 | 143.2 | 213.6 | 140ms | 0.1 | 30 | 100 | 698ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | rag | 1 | 136.6 | 2351.2 | 399ms | 0.1 | 842 | 200 | 1.46s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | codegen | 1 | 135.6 | 424.9 | 148ms | 6.7 | 62 | 1000 | 7.38s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | agent | 1 | 126.9 | 2637.4 | 227ms | 6.8 | 599 | 500 | 3.94s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | rag | 1 | 122.1 | 2678.8 | 400ms | 0.1 | 842 | 200 | 1.64s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | chat | 1 | 120.0 | 236.4 | 127ms | 6.7 | 30 | 100 | 834ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | rag | 1 | 110.6 | 2719.8 | 338ms | 6.8 | 842 | 200 | 1.81s | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | agent | 1 | 70.6 | 4976.2 | 122ms | 0.0 | 599 | 500 | 7.08s | 0.014 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | agent | 1 | 70.0 | 5621.5 | 107ms | 0.0 | 599 | 500 | 7.14s | 0.017 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | codegen | 1 | 69.9 | 282.7 | 227ms | 0.0 | 62 | 1000 | 14.30s | 0.025 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | codegen | 1 | 69.4 | 313.9 | 207ms | 0.1 | 62 | 1000 | 14.41s | 0.030 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | chat | 1 | 69.4 | 195.1 | 158ms | 0.0 | 30 | 100 | 1.44s | 0.007 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=2 | agent | 4 | 68.4 | 140.4 | 4.79s | 0.1 | 599 | 500 | 7.56s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | MTP n=3 | agent | 4 | 68.4 | 177.6 | 4.57s | 0.1 | 599 | 500 | 7.52s | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | chat | 1 | 65.4 | 200.3 | 157ms | 0.0 | 30 | 100 | 1.53s | 0.007 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | rag | 1 | 63.4 | 1129.2 | 627ms | 0.0 | 842 | 200 | 3.15s | 0.002 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | rag | 1 | 60.8 | 1153.1 | 610ms | 0.0 | 842 | 200 | 3.29s | 0.002 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline | agent | 4 | 55.0 | 147.2 | 5.62s | 6.8 | 599 | 500 | 9.37s | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | codegen | 1 | 52.3 | 331.9 | 189ms | 18.9 | 62 | 1000 | 19.12s | 0.014 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | agent | 1 | 52.1 | 5654.6 | 106ms | 19.0 | 599 | 500 | 9.59s | 0.009 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | chat | 1 | 49.5 | 216.8 | 139ms | 19.0 | 30 | 100 | 2.02s | 0.003 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | rag | 1 | 46.2 | 1279.9 | 538ms | 19.0 | 842 | 200 | 4.33s | 0.020 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=3 | agent | 4 | 30.1 | 89.5 | 10.24s | 0.0 | 599 | 500 | 17.10s | 0.010 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | MTP n=2 | agent | 4 | 29.7 | 66.0 | 10.48s | 0.0 | 599 | 500 | 17.41s | 0.065 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp 4f13cb7-mtp (rocm) | baseline | agent | 4 | 21.8 | 50.0 | 14.42s | 19.0 | 599 | 500 | 23.90s | 0.034 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1980/2100 MHz · mem 9501 MHz
temp42°C idle · 69°C peak
peak draw383 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp 4f13cb7-mtp (rocm)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue