NVIDIA-Nemotron-3-Nano-Omni 30B-A3B-Reasoning

Q4_K_M·30B params·GGUF
checkpoint: unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF:Q4_K_M

All runs (5)

HardwareBackendShapeConc.Gen tok/sTTFTTPOT (ms)Out tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)codegen1
134.2
217ms7.010007.45s0.010 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)chat1
123.5
156ms6.5100810ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent1
121.5
462ms7.05004.12s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)rag1
109.6
439ms6.72001.82s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent4
1.2
678ms0.011.08s0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue