granite-4.1 30b
Q4_K_M·30B params·GGUF
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/granite-4.1-30b-GGUF:Q4_K_Mcommit:
6cb34f31b11cweights 16.29 GiB
All runs (15)
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | chat | 1 | 43.1 | 40.4 | 378.2 | — | 74ms | 23.2 | — | 28 | 100 | 2.37s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | codegen | 1 | 42.6 | 40.9 | 338.2 | — | 214ms | 23.5 | — | 59 | 884 | 21.67s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | rag | 1 | 42.5 | 36.5 | 1925.2 | — | 361ms | 23.5 | — | 695 | 116 | 3.03s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | chat | 1 | 42.4 | 39.8 | 370.3 | — | 76ms | 23.6 | — | 28 | 100 | 2.42s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 1 | 42.3 | 39.9 | 1982.6 | — | 280ms | 23.6 | — | 555 | 479 | 11.97s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 4 | 42.2 | 18.3 | 36.0 | — | 15.37s | 23.7 | — | 555 | 479 | 27.46s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | codegen | 1 | 41.9 | 40.2 | 363.9 | — | 205ms | 23.9 | — | 59 | 884 | 21.94s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | rag | 1 | 41.7 | 36.5 | 1855.7 | — | 339ms | 24.0 | — | 695 | 116 | 3.17s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 1 | 41.6 | 38.7 | 1973.4 | — | 281ms | 24.0 | — | 555 | 479 | 12.30s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 4 | 41.6 | 13.5 | 42.6 | — | 15.64s | 24.0 | — | 555 | 479 | 24.66s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | chat | 1 | 21.5 | 21.0 | 306.1 | — | 92ms | 46.5 | — | 28 | 100 | 4.59s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | rag | 1 | 21.0 | 19.5 | 1193.7 | — | 428ms | 47.6 | — | 695 | 138 | 6.91s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | codegen | 1 | 21.0 | 20.9 | 359.1 | — | 217ms | 47.6 | — | 59 | 884 | 42.46s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 1 | 20.8 | 20.3 | 1999.8 | — | 278ms | 48.1 | — | 555 | 377 | 18.94s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 4 | 12.3 | 11.4 | 154.4 | — | 4.48s | 81.2 | — | 555 | 499 | 40.13s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp44°C idle · 64°C peak
peak draw332 W
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
| 300 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.3 TF |
| 450 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
compute: 8.6
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1980/2100 MHz · mem 9501 MHz
temp44°C idle · 82°C peak
peak draw430 W
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue