LFM2 1.2B

Q4_K_M·1.2B params·GGUF

intelligence: see on Artificial Analysis →

checkpoint: LiquidAI/LFM2-1.2B-GGUF:Q4_K_M

commit: 5399e76c648f

weights 0.68 GiB

All runs (25)

Hardware	Backend	Mode	Shape	Conc.	Gen tok/s ↓	Prefill tok/s	TTFT	TPOT (ms)	Prompt tok	Out tok	Total	VRAM Δ
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	codegen	1	579.5	3614.2	19ms	1.6	65	579	1.01s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	codegen	1	565.0	4603.1	16ms	1.7	65	579	1.10s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	1	548.7	35599.3	23ms	1.7	602	441	759ms	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	1	539.9	40478.1	20ms	1.6	602	441	817ms	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	chat	1	536.2	2470.9	12ms	1.6	31	100	184ms	0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	codegen	1	529.6	5711.1	12ms	1.9	65	536	1.03s	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	rag	1	516.1	25021.7	34ms	1.6	752	164	301ms	0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	agent	1	513.2	50474.1	12ms	1.9	602	500	964ms	0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	chat	1	508.7	2900.0	11ms	1.9	31	100	196ms	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	chat	1	507.6	2418.5	13ms	1.6	31	100	182ms	0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	rag	1	485.5	25831.5	27ms	1.9	752	76	209ms	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	codegen	1	471.0	3714.3	22ms	2.1	65	733	1.57s	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	chat	1	458.1	2270.7	14ms	2.1	31	100	211ms	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	rag	1	454.9	15947.2	47ms	1.7	752	164	354ms	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	1	446.9	10918.7	48ms	2.1	602	500	1.12s	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	rag	1	426.4	24414.9	37ms	2.1	752	76	225ms	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	4	250.6	600.1	1.06s	1.6	602	441	1.91s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	4	235.2	531.3	1.16s	1.7	602	441	1.99s	0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	agent	4	223.8	1865.5	352ms	4.0	602	500	2.17s	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	4	214.0	1919.4	328ms	4.0	602	497	2.07s	0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	codegen	1	208.8	2854.3	24ms	4.7	65	637	3.05s	0.002 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	chat	1	204.5	1754.1	18ms	4.7	31	100	488ms	0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	rag	1	194.7	68755.5	16ms	4.8	892	131	640ms	0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	agent	1	194.6	7831.4	77ms	5.0	602	434	2.25s	0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	agent	4	109.4	1725.9	404ms	8.2	602	435	4.00s	-0.005 GiB

Environment

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power350 W / 450 W max(78% cap)

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1800/2100 MHz · mem 9501 MHz

temp42°C idle · 56°C peak

peak draw322 W

backendllama.cpp cuda-4f13cb7 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

containerizedtrue

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power450 W / 450 W max

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1800/2100 MHz · mem 9501 MHz

temp46°C idle · 66°C peak

peak draw410 W

backendllama.cpp cuda-4f13cb7 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

containerizedtrue

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power200 W / 450 W max(44% cap)

backendllama.cpp 59778f0 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driver590.48.01

python3.12.3

containerizedtrue

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

backendllama.cpp b8940 (rocm)

serverlemonade 10.4.0

osUbuntu 24.04.4 LTS

kernel7.0.2-2-pve

python3.12.3

containerizedtrue

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 5070 · 12 GiB

cpuAMD Ryzen 9 7900 12-Core Processor

gpuNVIDIA GeForce RTX 5070

archNVIDIA

vram11.94 GiB (system 30.4 GiB)

power250 W / 300 W max(83% cap)

backendllama.cpp b9174 (vulkan)

serverlemonade unknown

osCachyOS

kernel7.0.0-1-cachyos

driver595.58.03

python3.14.4

containerizedfalse

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue