LFM2 1.2B-Tool

Q4_K_M·1.2B params·GGUF
checkpoint: LiquidAI/LFM2-1.2B-Tool-GGUF:Q4_K_M

All runs (10)

HardwareBackendShapeConc.Gen tok/sTTFTTPOT (ms)Out tokTotalVRAM Δ
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)codegen1
522.1
11ms1.96591.26s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent1
504.5
11ms1.9438850ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)chat1
499.9
10ms1.9100198ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)rag1
475.8
26ms1.9101222ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)codegen1
465.3
20ms2.16331.37s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)chat1
459.9
13ms2.1100206ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent1
444.0
23ms2.2314707ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)rag1
423.6
37ms2.198246ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent4
220.3
290ms3.94351.98s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent4
200.4
304ms4.23621.72s0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 5070 · 11.94 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
backendllama.cpp b9174 (cuda)
serverlemonade unknown
osCachyOS
kernel7.0.0-1-cachyos
driver595.58.03
python3.14.4
containerizedfalse
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue