LFM2.5-350M

Q4_K_M·350M params·GGUF
checkpoint: LiquidAI/LFM2.5-350M-GGUF:Q4_K_M

All runs (10)

HardwareBackendShapeConc.Gen tok/sTTFTTPOT (ms)Out tokTotalVRAM Δ
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)codegen1
861.4
8ms1.1672763ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent1
824.4
7ms1.2180218ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)codegen1
813.8
12ms1.2527633ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)rag1
792.4
13ms1.16481ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)chat1
761.9
6ms1.24963ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)chat1
722.7
9ms1.24157ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent1
715.9
28ms1.2141197ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)rag1
632.0
23ms1.25993ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent4
340.3
132ms2.1197551ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent4
316.9
206ms2.1186617ms0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 5070 · 11.94 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
backendllama.cpp b9174 (cuda)
serverlemonade unknown
osCachyOS
kernel7.0.0-1-cachyos
driver595.58.03
python3.14.4
containerizedfalse
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue