Qwen3.5 27B
Q4_K_XL·27B params·GGUF
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.5-27B-GGUF:Qwen3.5-27B-UD-Q4_K_XL.ggufAll runs (10)
| Hardware | Backend | Shape | Conc. | Gen tok/s ↓ | TTFT | TPOT (ms) | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | codegen | 1 | 21.4 | 346ms | 46.3 | 1000 | 46.81s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | chat | 1 | 21.2 | 253ms | 43.7 | 100 | 4.72s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | agent | 1 | 20.9 | 611ms | 46.5 | 500 | 23.89s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | rag | 1 | 19.8 | 912ms | 45.8 | 200 | 10.12s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | codegen | 1 | 11.9 | 417ms | 83.5 | 1000 | 84.02s | 0.003 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | chat | 1 | 11.6 | 330ms | 83.5 | 100 | 8.59s | 0.002 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | agent | 1 | 11.5 | 1.75s | 83.6 | 500 | 43.53s | 0.008 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | rag | 1 | 10.9 | 1.73s | 83.6 | 200 | 18.39s | 0.005 GiB |
| GeForce RTX 3090 · 24 GiB | llama.cpp 59778f0 (cuda) | agent | 4 | 9.7 | 3.91s | 91.5 | 341 | 35.41s | 0.000 GiB |
| Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM) | llama.cpp b1203 (rocm) | agent | 4 | 5.4 | 2.12s | 178.9 | 500 | 93.01s | -0.005 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b1203 (rocm)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell3
warmups1
endpoint/v1/chat/completions
streamingtrue