Qwen2.5-Coder 7B-Instruct
AWQ·7B params·safetensors
intelligence: see on Artificial Analysis →
checkpoint:
Qwen/Qwen2.5-Coder-7B-Instruct-AWQcommit:
8e8ed243bbe6weights 5.19 GiB
All runs (15)
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | chat | 1 | 150.9 | 147.5 | 2469.6 | — | 21ms | 6.6 | — | 49 | 68 | 461ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | chat | 1 | 150.8 | 147.2 | 2346.4 | — | 22ms | 6.6 | — | 49 | 68 | 459ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | rag | 1 | 149.6 | 138.3 | 43974.4 | — | 25ms | 6.7 | — | 855 | 32 | 282ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | rag | 1 | 149.6 | 136.8 | 41641.0 | — | 27ms | 6.7 | — | 855 | 32 | 281ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | codegen | 1 | 149.5 | 148.6 | 2920.5 | — | 27ms | 6.7 | — | 81 | 506 | 3.42s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | codegen | 1 | 149.3 | 148.4 | 2884.7 | — | 27ms | 6.7 | — | 81 | 506 | 3.42s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 1 | 148.4 | 145.9 | 28717.5 | — | 23ms | 6.7 | — | 606 | 442 | 2.99s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 1 | 148.3 | 146.6 | 29395.3 | — | 21ms | 6.7 | — | 606 | 442 | 2.99s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 4 | 142.7 | 141.7 | 19177.8 | — | 45ms | 7.0 | — | 606 | 442 | 3.12s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 4 | 142.7 | 141.4 | 13125.3 | — | 46ms | 7.0 | — | 606 | 442 | 3.13s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | chat | 1 | 89.9 | 85.8 | 1399.9 | — | 35ms | 11.1 | — | 49 | 68 | 752ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | rag | 1 | 84.5 | 78.7 | 31764.5 | — | 31ms | 11.8 | — | 855 | 32 | 499ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | codegen | 1 | 77.6 | 77.4 | 1456.9 | — | 54ms | 12.9 | — | 81 | 506 | 6.54s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 1 | 77.2 | 76.9 | 19988.1 | — | 31ms | 12.9 | — | 606 | 442 | 5.73s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 4 | 76.9 | 76.1 | 15152.2 | — | 55ms | 13.0 | — | 606 | 444 | 5.81s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
| 300 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.3 TF |
| 450 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
compute: 8.6
backendvLLM 0.21.0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue