Qwen2.5-Coder 7B-Instruct

AWQ·7B params·safetensors
checkpoint: Qwen/Qwen2.5-Coder-7B-Instruct-AWQ
commit: 8e8ed243bbe6
weights 5.19 GiB

All runs (15)

HardwareBackendModeShapeConc.Gen tok/sPrefill tok/sTTFTTPOT (ms)Prompt tokOut tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wcodegen1
148.6
2920.527ms6.7815063.42s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wcodegen1
148.4
2884.727ms6.7815063.42s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wchat1
147.5
2469.621ms6.64968461ms0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wchat1
147.2
2346.422ms6.64968459ms0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wagent1
146.6
29395.321ms6.76064422.99s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wagent1
145.9
28717.523ms6.76064422.99s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wagent4
141.7
19177.845ms7.06064423.12s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wagent4
141.4
13125.346ms7.06064423.13s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wrag1
138.3
43974.425ms6.785532282ms0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wrag1
136.8
41641.027ms6.785532281ms0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinechat1
85.8
1399.935ms11.14968752ms0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinerag1
78.7
31764.531ms11.885532499ms0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinecodegen1
77.4
1456.954ms12.9815066.54s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselineagent1
76.9
19988.131ms12.96064425.73s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselineagent4
76.1
15152.255ms13.06064445.81s0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue