Qwen3.5 27B

Q4_K_M·27B params·GGUF

reasoning

intelligence: see on Artificial Analysis →

checkpoint: unsloth/Qwen3.5-27B-GGUF:Q4_K_M

commit: 3221f178a6b8

weights 15.59 GiB

All runs (20)

Hardware	Backend	Mode	Shape	Conc.	Gen tok/s ↓	Prefill tok/s	TTFT	TPOT (ms)	Prompt tok	Out tok	Total	VRAM Δ
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	codegen	1	40.5	219.0	358ms	23.7	62	1000	24.70s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	codegen	1	39.8	202.4	305ms	24.2	62	1000	25.14s	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	1	39.1	1180.7	507ms	23.7	599	500	12.79s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	1	38.4	1182.5	506ms	24.2	599	500	13.01s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	chat	1	37.9	127.3	236ms	24.0	30	100	2.64s	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	chat	1	37.8	116.4	258ms	23.4	30	100	2.65s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	rag	1	35.2	1316.8	785ms	24.1	842	200	5.68s	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	rag	1	34.1	1311.6	762ms	23.6	842	200	5.86s	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	codegen	1	21.4	199.6	346ms	46.3	62	1000	46.81s	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	chat	1	21.2	126.7	253ms	43.7	30	100	4.72s	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	1	20.9	980.5	611ms	46.5	599	500	23.89s	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	rag	1	19.8	1037.8	912ms	45.8	842	200	10.12s	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	4	16.5	41.1	19.29s	23.8	599	500	31.61s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	4	16.1	31.4	19.67s	24.2	599	500	32.24s	0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b1203 (rocm)	baseline	codegen	1	11.9	172.4	417ms	83.5	63	1000	84.02s	0.003 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b1203 (rocm)	baseline	chat	1	11.6	96.9	330ms	83.5	31	100	8.59s	0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b1203 (rocm)	baseline	agent	1	11.5	279.9	1.75s	83.6	599	500	43.53s	0.008 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b1203 (rocm)	baseline	rag	1	10.9	294.6	1.73s	83.6	1005	200	18.39s	0.005 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	4	9.7	—	3.91s	91.5	—	341	35.41s	0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b1203 (rocm)	baseline	agent	4	5.4	232.7	2.12s	178.9	599	500	93.01s	-0.005 GiB

Environment

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power350 W / 450 W max(78% cap)

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1965/2100 MHz · mem 9501 MHz

temp45°C idle · 64°C peak

peak draw334 W

backendllama.cpp cuda-4f13cb7 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

containerizedtrue

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power450 W / 450 W max

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1980/2100 MHz · mem 9501 MHz

temp40°C idle · 83°C peak

peak draw435 W

backendllama.cpp cuda-4f13cb7 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

containerizedtrue

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power200 W / 450 W max(44% cap)

backendllama.cpp 59778f0 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driver590.48.01

python3.12.3

containerizedtrue

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

backendllama.cpp b1203 (rocm)

serverlemonade 10.4.0

osUbuntu 24.04.4 LTS

kernel7.0.2-2-pve

python3.12.3

containerizedtrue

runs/cell3

warmups1

endpoint/v1/chat/completions

streamingtrue