Qwen/Qwen2.5-Coder 32B-Instruct

unknown·32B params·unknown
checkpoint: Qwen/Qwen2.5-Coder-32B-Instruct-AWQ
commit: 1ed0a6145da0
weights 18.00 GiB

All runs (5)

HardwareBackendShapeConc.Gen tok/sTTFTTPOT (ms)Out tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiBvLLM 0.21.0 (cuda)chat1
19.5
112ms49.91004.76s0.000 GiB
GeForce RTX 3090 · 24 GiBvLLM 0.21.0 (cuda)codegen1
19.3
187ms51.472037.27s0.000 GiB
GeForce RTX 3090 · 24 GiBvLLM 0.21.0 (cuda)agent1
19.2
119ms51.636318.90s0.000 GiB
GeForce RTX 3090 · 24 GiBvLLM 0.21.0 (cuda)rag1
18.9
111ms52.2623.73s0.000 GiB
GeForce RTX 3090 · 24 GiBvLLM 0.21.0 (cuda)agent4
18.8
175ms52.629515.71s0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue