Qwen2.5-Coder 14B-Instruct

AWQ·14B params·safetensors
checkpoint: Qwen/Qwen2.5-Coder-14B-Instruct-AWQ
commit: eb3172f06a6d
weights 9.29 GiB

All runs (15)

HardwareBackendModeShapeConc.Gen tok/sPrefill tok/sTTFTTPOT (ms)Prompt tokOut tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wcodegen1
81.2
1772.144ms12.3815546.83s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wcodegen1
81.1
1754.944ms12.3815546.84s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wchat1
80.7
1541.233ms12.149931.14s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wchat1
80.7
1577.032ms12.149931.14s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wagent1
80.2
21337.331ms12.36062363.10s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wagent1
80.2
21252.230ms12.46062363.10s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wrag1
77.3
29814.234ms12.385568894ms0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wrag1
77.2
30194.736ms12.385568895ms0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wagent4
73.6
10112.366ms13.46062363.19s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wagent4
73.4
14834.058ms13.56062363.18s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinechat1
42.6
873.758ms22.949932.06s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinecodegen1
41.1
807.697ms24.38155413.49s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselineagent1
40.6
10574.657ms24.36062366.04s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinerag1
39.9
15660.655ms24.6855681.76s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselineagent4
38.9
8209.5105ms25.36062366.06s0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue