Qwen2.5-Coder 32B-Instruct
AWQ·32B params·safetensors
intelligence: see on Artificial Analysis →
checkpoint:
Qwen/Qwen2.5-Coder-32B-Instruct-AWQcommit:
1ed0a6145da0weights 18.00 GiB
All runs (15)
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | chat | 1 | 41.9 | 41.0 | 839.5 | — | 61ms | 23.8 | — | 49 | 100 | 2.39s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | chat | 1 | 41.9 | 41.0 | 878.1 | — | 58ms | 23.8 | — | 49 | 100 | 2.39s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | rag | 1 | 41.6 | 40.1 | 16203.0 | — | 53ms | 24.0 | — | 855 | 62 | 1.77s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | rag | 1 | 41.6 | 40.0 | 16188.0 | — | 53ms | 24.0 | — | 855 | 62 | 1.77s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | codegen | 1 | 41.5 | 41.3 | 815.9 | — | 96ms | 24.1 | — | 81 | 720 | 17.46s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | codegen | 1 | 41.5 | 41.3 | 817.0 | — | 96ms | 24.1 | — | 81 | 720 | 17.46s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 1 | 41.4 | 41.1 | 11373.2 | — | 53ms | 24.2 | — | 606 | 295 | 7.56s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 1 | 41.3 | 41.2 | 11391.4 | — | 53ms | 24.2 | — | 606 | 295 | 7.56s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-350w | agent | 4 | 39.0 | 38.8 | 7704.1 | — | 99ms | 25.7 | — | 606 | 295 | 7.61s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 420 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline-pl-450w | agent | 4 | 38.9 | 38.7 | 8186.8 | — | 100ms | 25.7 | — | 606 | 295 | 7.62s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | chat | 1 | 20.0 | 19.5 | 452.6 | — | 112ms | 49.9 | — | 49 | 100 | 4.76s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | codegen | 1 | 19.4 | 19.3 | 417.2 | — | 187ms | 51.4 | — | 81 | 720 | 37.27s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 1 | 19.4 | 19.2 | 5080.7 | — | 119ms | 51.6 | — | 606 | 363 | 18.90s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | rag | 1 | 19.2 | 18.9 | 8013.5 | — | 111ms | 52.2 | — | 855 | 62 | 3.73s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | vLLM 0.21.0 (cuda) | baseline | agent | 4 | 19.0 | 18.8 | 3535.8 | — | 175ms | 52.6 | — | 606 | 295 | 15.71s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power420 W / 450 W max(93% cap)
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
| 300 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.3 TF |
| 450 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
compute: 8.6
backendvLLM 0.21.0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendvLLM 0.21.0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue