LFM2 1.2B
Q4_K_M·1.2B params·GGUF
intelligence: see on Artificial Analysis →
checkpoint:
LiquidAI/LFM2-1.2B-GGUF:Q4_K_Mcommit:
5399e76c648fweights 0.68 GiB
All runs (25)
| Hardware | Backend | Mode | Shape | Conc. | Gen tok/s ↓ | Prefill tok/s | TTFT | TPOT (ms) | Prompt tok | Out tok | Total | VRAM Δ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | codegen | 1 | 579.5 | 3614.2 | 19ms | 1.6 | 65 | 579 | 1.01s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | codegen | 1 | 565.0 | 4603.1 | 16ms | 1.7 | 65 | 579 | 1.10s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 1 | 548.7 | 35599.3 | 23ms | 1.7 | 602 | 441 | 759ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 1 | 539.9 | 40478.1 | 20ms | 1.6 | 602 | 441 | 817ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | chat | 1 | 536.2 | 2470.9 | 12ms | 1.6 | 31 | 100 | 184ms | 0.000 GiB |
GeForce RTX 5070 · 12 GiB250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | codegen | 1 | 529.6 | 5711.1 | 12ms | 1.9 | 65 | 536 | 1.03s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | rag | 1 | 516.1 | 25021.7 | 34ms | 1.6 | 752 | 164 | 301ms | 0.000 GiB |
GeForce RTX 5070 · 12 GiB250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | agent | 1 | 513.2 | 50474.1 | 12ms | 1.9 | 602 | 500 | 964ms | 0.000 GiB |
GeForce RTX 5070 · 12 GiB250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | chat | 1 | 508.7 | 2900.0 | 11ms | 1.9 | 31 | 100 | 196ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | chat | 1 | 507.6 | 2418.5 | 13ms | 1.6 | 31 | 100 | 182ms | 0.000 GiB |
GeForce RTX 5070 · 12 GiB250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | rag | 1 | 485.5 | 25831.5 | 27ms | 1.9 | 752 | 76 | 209ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | codegen | 1 | 471.0 | 3714.3 | 22ms | 2.1 | 65 | 733 | 1.57s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | chat | 1 | 458.1 | 2270.7 | 14ms | 2.1 | 31 | 100 | 211ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | rag | 1 | 454.9 | 15947.2 | 47ms | 1.7 | 752 | 164 | 354ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 1 | 446.9 | 10918.7 | 48ms | 2.1 | 602 | 500 | 1.12s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | rag | 1 | 426.4 | 24414.9 | 37ms | 2.1 | 752 | 76 | 225ms | 0.000 GiB |
GeForce RTX 3090 · 24 GiB450 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 4 | 250.6 | 600.1 | 1.06s | 1.6 | 602 | 441 | 1.91s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 4 | 235.2 | 531.3 | 1.16s | 1.7 | 602 | 441 | 1.99s | 0.000 GiB |
GeForce RTX 5070 · 12 GiB250 Wdrv 595 | llama.cpp b9174 (vulkan) | baseline | agent | 4 | 223.8 | 1865.5 | 352ms | 4.0 | 602 | 500 | 2.17s | 0.000 GiB |
GeForce RTX 3090 · 24 GiB200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 4 | 214.0 | 1919.4 | 328ms | 4.0 | 602 | 497 | 2.07s | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (rocm) | baseline | codegen | 1 | 208.8 | 2854.3 | 24ms | 4.7 | 65 | 637 | 3.05s | 0.002 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (rocm) | baseline | chat | 1 | 204.5 | 1754.1 | 18ms | 4.7 | 31 | 100 | 488ms | 0.000 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (rocm) | baseline | rag | 1 | 194.7 | 68755.5 | 16ms | 4.8 | 892 | 131 | 640ms | 0.001 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (rocm) | baseline | agent | 1 | 194.6 | 7831.4 | 77ms | 5.0 | 602 | 434 | 2.25s | 0.001 GiB |
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified | llama.cpp b8940 (rocm) | baseline | agent | 4 | 109.4 | 1725.9 | 404ms | 8.2 | 602 | 435 | 4.00s | -0.005 GiB |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1800/2100 MHz · mem 9501 MHz
temp42°C idle · 56°C peak
peak draw322 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1800/2100 MHz · mem 9501 MHz
temp46°C idle · 66°C peak
peak draw410 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (rocm)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 5070 · 12 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
backendllama.cpp b9174 (vulkan)
serverlemonade unknown
osCachyOS
kernel7.0.0-1-cachyos
driver595.58.03
python3.14.4
containerizedfalse
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue