Qwen2.5 14B-Instruct

AWQ·14B params·safetensors
checkpoint: Qwen/Qwen2.5-14B-Instruct-AWQ
commit: 539535859b13
weights 9.29 GiB

All runs (15)

HardwareBackendModeShapeConc.Gen tok/sPrefill tok/sTTFTTPOT (ms)Prompt tokOut tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wcodegen1
81.0
1765.744ms12.3816227.63s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wcodegen1
81.0
1763.544ms12.3816227.63s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wchat1
80.7
1533.333ms12.14974908ms0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wchat1
80.7
1509.834ms12.14974908ms0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wagent1
80.1
21761.729ms12.46063554.41s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wagent1
80.1
21655.729ms12.46063554.41s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wrag1
76.3
30203.135ms12.385545675ms0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wrag1
76.0
30227.236ms12.385545675ms0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-450wagent4
74.2
10789.763ms13.46063554.77s0.000 GiB
GeForce RTX 3090 · 24 GiB420 Wdrv 590
vLLM 0.21.0 (cuda)baseline-pl-350wagent4
74.0
11489.765ms13.46063554.78s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinechat1
42.6
859.659ms22.549741.60s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselineagent1
40.6
10787.656ms24.56063558.73s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinecodegen1
40.0
803.997ms24.98162215.46s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselineagent4
39.3
7270.793ms25.26063559.04s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
vLLM 0.21.0 (cuda)baselinerag1
38.9
16420.153ms24.5855451.36s0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue