LFM2 1.2B

Q4_K_M·1.2B params·GGUF
checkpoint: LiquidAI/LFM2-1.2B-GGUF:Q4_K_M
commit: 5399e76c648f
weights 0.68 GiB

All runs (25)

HardwareBackendModeShapeConc.Gen tok/sPrefill tok/sTTFTTPOT (ms)Prompt tokOut tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
579.5
3614.219ms1.6655791.01s0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
565.0
4603.116ms1.7655791.10s0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
548.7
35599.323ms1.7602441759ms0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
539.9
40478.120ms1.6602441817ms0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
536.2
2470.912ms1.631100184ms0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595
llama.cpp b9174 (vulkan)baselinecodegen1
529.6
5711.112ms1.9655361.03s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
516.1
25021.734ms1.6752164301ms0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595
llama.cpp b9174 (vulkan)baselineagent1
513.2
50474.112ms1.9602500964ms0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595
llama.cpp b9174 (vulkan)baselinechat1
508.7
2900.011ms1.931100196ms0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
507.6
2418.513ms1.631100182ms0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595
llama.cpp b9174 (vulkan)baselinerag1
485.5
25831.527ms1.975276209ms0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
471.0
3714.322ms2.1657331.57s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
458.1
2270.714ms2.131100211ms0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
454.9
15947.247ms1.7752164354ms0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
446.9
10918.748ms2.16025001.12s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
426.4
24414.937ms2.175276225ms0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
250.6
600.11.06s1.66024411.91s0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
235.2
531.31.16s1.76024411.99s0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595
llama.cpp b9174 (vulkan)baselineagent4
223.8
1865.5352ms4.06025002.17s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent4
214.0
1919.4328ms4.06024972.07s0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp b8940 (rocm)baselinecodegen1
208.8
2854.324ms4.7656373.05s0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp b8940 (rocm)baselinechat1
204.5
1754.118ms4.731100488ms0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp b8940 (rocm)baselinerag1
194.7
68755.516ms4.8892131640ms0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp b8940 (rocm)baselineagent1
194.6
7831.477ms5.06024342.25s0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified
llama.cpp b8940 (rocm)baselineagent4
109.4
1725.9404ms8.26024354.00s-0.005 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1800/2100 MHz · mem 9501 MHz
temp42°C idle · 56°C peak
peak draw322 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1800/2100 MHz · mem 9501 MHz
temp46°C idle · 66°C peak
peak draw410 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
backendllama.cpp b8940 (rocm)
serverlemonade 10.4.0
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 5070 · 12 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
backendllama.cpp b9174 (vulkan)
serverlemonade unknown
osCachyOS
kernel7.0.0-1-cachyos
driver595.58.03
python3.14.4
containerizedfalse
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue