Qwen2.5 32B-Instruct

AWQ·32B params·safetensors
checkpoint: Qwen/Qwen2.5-32B-Instruct-AWQ
commit: 5c7cb76a268f
weights 18.00 GiB

All runs (15)

HardwareBackendModeShapeConc.Gen tok/sPrefill tok/sTTFTTPOT (ms)Prompt tokOut tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wcodegen1
41.3
824.495ms24.18164615.63s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wcodegen1
41.3
820.395ms24.18164615.63s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wchat1
41.0
835.861ms23.849872.08s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wchat1
41.0
842.061ms23.849872.08s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wagent1
41.0
11422.754ms24.26062927.49s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wagent1
41.0
11335.854ms24.26062927.50s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wrag1
39.9
16170.453ms24.0855551.60s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wrag1
39.9
16214.953ms24.0855551.60s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wagent4
38.5
8763.495ms25.86063017.79s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wagent4
38.5
7334.1102ms25.86063188.25s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinecodegen1
19.3
413.1189ms51.48162532.42s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinechat1
19.2
455.7112ms51.349874.20s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselineagent1
19.2
5096.9119ms51.560630115.70s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinerag1
18.8
8194.7114ms52.6855553.41s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselineagent4
18.7
4615.3155ms53.460630116.11s0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue