Qwen2.5-Coder 14B-Instruct
AWQ·14B params·safetensors
intelligence: see on Artificial Analysis →
checkpoint:
Qwen/Qwen2.5-Coder-14B-Instruct-AWQcommit:
eb3172f06a6dweights 9.29 GiB
All runs (15)
| Hardware | Backend | Mode | Shape | Conc. | Gen tok/s ↓ | Prefill tok/s | TTFT | TPOT (ms) | Prompt tok | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | codegen | 1 | 81.2 | 1772.1 | 44ms | 12.3 | 81 | 554 | 6.83s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | codegen | 1 | 81.1 | 1754.9 | 44ms | 12.3 | 81 | 554 | 6.84s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | chat | 1 | 80.7 | 1541.2 | 33ms | 12.1 | 49 | 93 | 1.14s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | chat | 1 | 80.7 | 1577.0 | 32ms | 12.1 | 49 | 93 | 1.14s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 1 | 80.2 | 21337.3 | 31ms | 12.3 | 606 | 236 | 3.10s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 1 | 80.2 | 21252.2 | 30ms | 12.4 | 606 | 236 | 3.10s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | rag | 1 | 77.3 | 29814.2 | 34ms | 12.3 | 855 | 68 | 894ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | rag | 1 | 77.2 | 30194.7 | 36ms | 12.3 | 855 | 68 | 895ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 4 | 73.6 | 10112.3 | 66ms | 13.4 | 606 | 236 | 3.19s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 4 | 73.4 | 14834.0 | 58ms | 13.5 | 606 | 236 | 3.18s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | chat | 1 | 42.6 | 873.7 | 58ms | 22.9 | 49 | 93 | 2.06s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | codegen | 1 | 41.1 | 807.6 | 97ms | 24.3 | 81 | 554 | 13.49s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 1 | 40.6 | 10574.6 | 57ms | 24.3 | 606 | 236 | 6.04s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | rag | 1 | 39.9 | 15660.6 | 55ms | 24.6 | 855 | 68 | 1.76s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 4 | 38.9 | 8209.5 | 105ms | 25.3 | 606 | 236 | 6.06s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue