Qwen2.5-Coder 32B-Instruct

AWQ·32B params·safetensors

intelligence: see on Artificial Analysis →

checkpoint: Qwen/Qwen2.5-Coder-32B-Instruct-AWQ

commit: 1ed0a6145da0

weights 18.00 GiB

All runs (15)


legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-450w	chat	1	41.9	41.0	839.5	—	61ms	23.8	—	49	100	2.39s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-350w	chat	1	41.9	41.0	878.1	—	58ms	23.8	—	49	100	2.39s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-350w	rag	1	41.6	40.1	16203.0	—	53ms	24.0	—	855	62	1.77s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-450w	rag	1	41.6	40.0	16188.0	—	53ms	24.0	—	855	62	1.77s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-350w	codegen	1	41.5	41.3	815.9	—	96ms	24.1	—	81	720	17.46s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-450w	codegen	1	41.5	41.3	817.0	—	96ms	24.1	—	81	720	17.46s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-350w	agent	1	41.4	41.1	11373.2	—	53ms	24.2	—	606	295	7.56s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-450w	agent	1	41.3	41.2	11391.4	—	53ms	24.2	—	606	295	7.56s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-350w	agent	4	39.0	38.8	7704.1	—	99ms	25.7	—	606	295	7.61s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590	vLLM 0.21.0 (cuda)	baseline-pl-450w	agent	4	38.9	38.7	8186.8	—	100ms	25.7	—	606	295	7.62s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	vLLM 0.21.0 (cuda)	baseline	chat	1	20.0	19.5	452.6	—	112ms	49.9	—	49	100	4.76s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	vLLM 0.21.0 (cuda)	baseline	codegen	1	19.4	19.3	417.2	—	187ms	51.4	—	81	720	37.27s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	vLLM 0.21.0 (cuda)	baseline	agent	1	19.4	19.2	5080.7	—	119ms	51.6	—	606	363	18.90s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	vLLM 0.21.0 (cuda)	baseline	rag	1	19.2	18.9	8013.5	—	111ms	52.2	—	855	62	3.73s
legacy	stack comparable	GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590	vLLM 0.21.0 (cuda)	baseline	agent	4	19.0	18.8	3535.8	—	175ms	52.6	—	606	295	15.71s

Environment

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power420 W / 450 W max(93% cap)

hardware probes

copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps

384-bit9751 MHz82 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
200 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF
300 W	936 GB/s	391 GB/s	65.4 TF	65.3 TF
450 W	936 GB/s	391 GB/s	65.4 TF	65.4 TF

compute: 8.6

backendvLLM 0.21.0 (cuda)

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driver590.48.01

python3.12.3

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue