Qwen2.5 7B-Instruct
AWQ·7B params·safetensors
intelligence: see on Artificial Analysis →
checkpoint:
Qwen/Qwen2.5-7B-Instruct-AWQcommit:
b25037543e93weights 5.19 GiB
All runs (15)
| Hardware | Backend | Mode | Shape | Conc. | Gen tok/s ↓ | Prefill tok/s | TTFT | TPOT (ms) | Prompt tok | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | codegen | 1 | 148.8 | 2931.4 | 27ms | 6.7 | 81 | 699 | 4.70s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | codegen | 1 | 148.3 | 2898.2 | 27ms | 6.7 | 81 | 699 | 4.73s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | chat | 1 | 148.0 | 2531.6 | 20ms | 6.6 | 49 | 99 | 663ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | chat | 1 | 147.2 | 2349.9 | 22ms | 6.6 | 49 | 99 | 667ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 1 | 147.1 | 29319.8 | 22ms | 6.7 | 606 | 395 | 2.67s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 1 | 146.3 | 29823.7 | 23ms | 6.7 | 606 | 395 | 2.68s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | rag | 1 | 141.6 | 40827.1 | 24ms | 6.7 | 855 | 53 | 422ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 4 | 140.7 | 12233.0 | 50ms | 7.0 | 606 | 395 | 2.81s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 4 | 140.5 | 14201.9 | 48ms | 7.0 | 606 | 395 | 2.81s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | rag | 1 | 140.1 | 45003.9 | 28ms | 6.7 | 855 | 53 | 418ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | chat | 1 | 85.2 | 1460.0 | 35ms | 11.4 | 49 | 99 | 1.14s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | rag | 1 | 77.9 | 31596.5 | 32ms | 12.0 | 855 | 53 | 780ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 1 | 77.0 | 20154.2 | 31ms | 13.0 | 606 | 395 | 5.10s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | codegen | 1 | 77.0 | 1423.9 | 55ms | 12.9 | 81 | 699 | 9.03s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 4 | 75.7 | 11745.5 | 58ms | 13.0 | 606 | 395 | 5.22s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue