LFM2 2.6B

Q4_K_M·2.6B params·GGUF
checkpoint: LiquidAI/LFM2-2.6B-GGUF:Q4_K_M

All runs (10)

HardwareBackendShapeConc.Gen tok/sTTFTTPOT (ms)Out tokTotalVRAM Δ
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)codegen1
268.6
22ms3.79263.46s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent1
262.7
16ms3.73701.44s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)chat1
259.5
19ms3.7100382ms0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)rag1
249.9
50ms3.790357ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)chat1
238.9
21ms3.9100408ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)codegen1
232.4
35ms4.38323.60s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)rag1
222.6
66ms4.0121508ms0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent1
221.1
90ms4.35002.26s0.000 GiB
GeForce RTX 5070 · 11.94 GiBllama.cpp b9174 (cuda)agent4
133.9
369ms6.83692.82s0.000 GiB
GeForce RTX 3090 · 24 GiBllama.cpp 59778f0 (cuda)agent4
111.7
434ms7.95003.94s0.000 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 5070 · 11.94 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
backendllama.cpp b9174 (cuda)
serverlemonade unknown
osCachyOS
kernel7.0.0-1-cachyos
driver595.58.03
python3.14.4
containerizedfalse
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue