LFM2 2.6B
Q4_K_M·2.6B params·GGUF
intelligence: see on Artificial Analysis →
checkpoint:
LiquidAI/LFM2-2.6B-GGUF:Q4_K_Mcommit:
a759abdc5955weights 1.46 GiB
All runs (20)
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | chat | 1 | 332.6 | 309.0 | 1849.8 | — | 17ms | 3.0 | — | 31 | 100 | 318ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | rag | 1 | 329.2 | 262.9 | 13059.1 | — | 55ms | 3.0 | — | 752 | 103 | 374ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | codegen | 1 | 328.2 | 314.4 | 2408.8 | — | 32ms | 3.0 | — | 65 | 1000 | 3.18s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 1 | 326.6 | 299.8 | 29757.0 | — | 28ms | 3.1 | — | 602 | 500 | 1.61s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 4 | 325.8 | 121.0 | 394.2 | — | 1.57s | 3.1 | — | 602 | 500 | 3.28s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | chat | 1 | 324.9 | 297.0 | 1727.6 | — | 18ms | 3.1 | — | 31 | 100 | 337ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | rag | 1 | 322.4 | 253.0 | 12718.6 | — | 57ms | 3.1 | — | 752 | 103 | 395ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | codegen | 1 | 321.9 | 306.7 | 2048.2 | — | 33ms | 3.1 | — | 65 | 1000 | 3.20s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 1 | 320.5 | 295.2 | 14347.7 | — | 42ms | 3.1 | — | 602 | 500 | 1.64s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 4 | 319.2 | 134.4 | 290.3 | — | 1.98s | 3.1 | — | 602 | 500 | 3.76s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | chat | 1 | 273.7 | 259.5 | 1613.7 | — | 19ms | 3.7 | — | 31 | 100 | 382ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | codegen | 1 | 271.5 | 268.6 | 2955.2 | — | 22ms | 3.7 | — | 65 | 926 | 3.46s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | rag | 1 | 268.9 | 249.9 | 14823.6 | — | 50ms | 3.7 | — | 752 | 90 | 357ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | agent | 1 | 266.8 | 262.7 | 37212.1 | — | 16ms | 3.7 | — | 602 | 370 | 1.44s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | chat | 1 | 254.0 | 238.9 | 1453.5 | — | 21ms | 3.9 | — | 31 | 100 | 408ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | rag | 1 | 251.1 | 222.6 | 9361.0 | — | 66ms | 4.0 | — | 752 | 121 | 508ms | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | codegen | 1 | 234.6 | 232.4 | 2028.5 | — | 35ms | 4.3 | — | 65 | 832 | 3.60s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 1 | 234.4 | 221.1 | 6452.0 | — | 90ms | 4.3 | — | 602 | 500 | 2.26s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | agent | 4 | 147.2 | 133.9 | 1634.8 | — | 369ms | 6.8 | — | 602 | 369 | 2.82s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 4 | 126.2 | 111.7 | 1464.0 | — | 434ms | 7.9 | — | 602 | 500 | 3.94s | 0.000 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1980/2100 MHz · mem 9501 MHz
temp44°C idle · 59°C peak
peak draw336 W
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
| 300 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.3 TF |
| 450 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
compute: 8.6
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1800/2100 MHz · mem 9501 MHz
temp53°C idle · 75°C peak
peak draw431 W
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 5070 · 12 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
hardware probes
copy 40% of theoryFP16 peak 69.6 TFcopy/math spread 2.5%
192-bit14001 MHz48 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 672 GB/s | 271 GB/s | 67.9 TF | 68.4 TF |
| 250 W | 672 GB/s | 271 GB/s | 69.5 TF | 68.2 TF |
| 300 W | 672 GB/s | 270 GB/s | 69.6 TF | 68.4 TF |
compute: 12
backendllama.cpp b9174 (vulkan)
osCachyOS
kernel7.0.0-1-cachyos
driver595.58.03
python3.14.4
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue