Qwen2.5 32B-Instruct
AWQ·32B params·safetensors
intelligence: see on Artificial Analysis →
checkpoint:
Qwen/Qwen2.5-32B-Instruct-AWQcommit:
5c7cb76a268fweights 18.00 GiB
All runs (15)
| Hardware | Backend | Mode | Shape | Conc. | Gen tok/s ↓ | Prefill tok/s | TTFT | TPOT (ms) | Prompt tok | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | codegen | 1 | 41.3 | 824.4 | 95ms | 24.1 | 81 | 646 | 15.63s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | codegen | 1 | 41.3 | 820.3 | 95ms | 24.1 | 81 | 646 | 15.63s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | chat | 1 | 41.0 | 835.8 | 61ms | 23.8 | 49 | 87 | 2.08s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | chat | 1 | 41.0 | 842.0 | 61ms | 23.8 | 49 | 87 | 2.08s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 1 | 41.0 | 11422.7 | 54ms | 24.2 | 606 | 292 | 7.49s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 1 | 41.0 | 11335.8 | 54ms | 24.2 | 606 | 292 | 7.50s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | rag | 1 | 39.9 | 16170.4 | 53ms | 24.0 | 855 | 55 | 1.60s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | rag | 1 | 39.9 | 16214.9 | 53ms | 24.0 | 855 | 55 | 1.60s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 4 | 38.5 | 8763.4 | 95ms | 25.8 | 606 | 301 | 7.79s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 4 | 38.5 | 7334.1 | 102ms | 25.8 | 606 | 318 | 8.25s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | codegen | 1 | 19.3 | 413.1 | 189ms | 51.4 | 81 | 625 | 32.42s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | chat | 1 | 19.2 | 455.7 | 112ms | 51.3 | 49 | 87 | 4.20s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 1 | 19.2 | 5096.9 | 119ms | 51.5 | 606 | 301 | 15.70s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | rag | 1 | 18.8 | 8194.7 | 114ms | 52.6 | 855 | 55 | 3.41s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 4 | 18.7 | 4615.3 | 155ms | 53.4 | 606 | 301 | 16.11s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue