Qwen2.5 14B-Instruct
AWQ·14B params·safetensors
intelligence: see on Artificial Analysis →
checkpoint:
Qwen/Qwen2.5-14B-Instruct-AWQcommit:
539535859b13weights 9.29 GiB
All runs (15)
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | chat | 1 | 82.3 | 80.7 | 1509.8 | — | 34ms | 12.1 | — | 49 | 74 | 908ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | chat | 1 | 82.3 | 80.7 | 1533.3 | — | 33ms | 12.1 | — | 49 | 74 | 908ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | rag | 1 | 81.5 | 76.0 | 30227.2 | — | 36ms | 12.3 | — | 855 | 45 | 675ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | rag | 1 | 81.5 | 76.3 | 30203.1 | — | 35ms | 12.3 | — | 855 | 45 | 675ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | codegen | 1 | 81.4 | 81.0 | 1765.7 | — | 44ms | 12.3 | — | 81 | 622 | 7.63s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | codegen | 1 | 81.4 | 81.0 | 1763.5 | — | 44ms | 12.3 | — | 81 | 622 | 7.63s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 1 | 80.8 | 80.1 | 21761.7 | — | 29ms | 12.4 | — | 606 | 355 | 4.41s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 1 | 80.8 | 80.1 | 21655.7 | — | 29ms | 12.4 | — | 606 | 355 | 4.41s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 4 | 74.6 | 74.0 | 11489.7 | — | 65ms | 13.4 | — | 606 | 355 | 4.78s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 4 | 74.6 | 74.2 | 10789.7 | — | 63ms | 13.4 | — | 606 | 355 | 4.77s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | chat | 1 | 44.5 | 42.6 | 859.6 | — | 59ms | 22.5 | — | 49 | 74 | 1.60s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | rag | 1 | 40.9 | 38.9 | 16420.1 | — | 53ms | 24.5 | — | 855 | 45 | 1.36s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 1 | 40.9 | 40.6 | 10787.6 | — | 56ms | 24.5 | — | 606 | 355 | 8.73s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | codegen | 1 | 40.1 | 40.0 | 803.9 | — | 97ms | 24.9 | — | 81 | 622 | 15.46s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 4 | 39.6 | 39.3 | 7270.7 | — | 93ms | 25.2 | — | 606 | 355 | 9.04s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
| 300 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.3 TF |
| 450 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
compute: 8.6
backendvLLM 0.21.0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue