Qwen/Qwen2.5 14B-Instruct
unknown·14B params·unknown
intelligence: see on Artificial Analysis →
checkpoint:
Qwen/Qwen2.5-14B-Instruct-AWQcommit:
539535859b13weights 9.29 GiB
All runs (5)
| Hardware | Backend | Shape | Conc. | Gen tok/s ↓ | TTFT | TPOT (ms) | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|
| GeForce RTX 3090 · 24 GiB | vLLM 0.21.0 (cuda) | chat | 1 | 42.6 | 59ms | 22.5 | 74 | 1.60s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | vLLM 0.21.0 (cuda) | agent | 1 | 40.6 | 56ms | 24.5 | 355 | 8.73s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | vLLM 0.21.0 (cuda) | codegen | 1 | 40.0 | 97ms | 24.9 | 622 | 15.46s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | vLLM 0.21.0 (cuda) | agent | 4 | 39.3 | 93ms | 25.2 | 355 | 9.04s | 0.000 GiB |
| GeForce RTX 3090 · 24 GiB | vLLM 0.21.0 (cuda) | rag | 1 | 38.9 | 53ms | 24.5 | 45 | 1.36s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue