LFM2 8B-A1B

Q4_K_M·8B params·GGUF

intelligence: see on Artificial Analysis →

checkpoint: LiquidAI/LFM2-8B-A1B-GGUF:Q4_K_M

commit: 11624c2ea122

weights 4.70 GiB

All runs (25)

Hardware	Backend	Mode	Shape	Conc.	Gen tok/s ↓	Prefill tok/s	TTFT	TPOT (ms)	Prompt tok	Out tok	Total	VRAM Δ
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	codegen	1	423.4	2183.2	34ms	2.3	65	883	2.05s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	codegen	1	406.4	2167.9	33ms	2.3	65	883	2.26s	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	1	397.4	35490.6	22ms	2.3	602	500	1.18s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	1	391.6	35824.1	23ms	2.3	602	500	1.19s	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	chat	1	385.1	1474.1	21ms	2.3	31	100	248ms	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	chat	1	369.5	1465.5	21ms	2.3	31	100	261ms	0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	codegen	1	364.9	2004.8	33ms	2.7	65	791	2.17s	0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	agent	1	355.0	39235.6	15ms	2.8	602	426	1.20s	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	rag	1	341.7	12238.0	53ms	2.4	752	111	313ms	0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	chat	1	336.1	1098.5	28ms	2.7	31	100	295ms	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	codegen	1	332.9	1813.9	39ms	2.9	65	883	2.62s	0.000 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	rag	1	319.2	13067.4	52ms	2.7	752	117	370ms	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	chat	1	318.7	1277.6	24ms	2.8	31	100	302ms	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	1	315.9	8274.9	73ms	3.0	602	434	1.42s	0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	rag	1	299.8	11883.9	52ms	2.3	752	111	316ms	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	rag	1	278.6	9916.1	62ms	2.9	752	111	374ms	0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	4	177.4	403.1	1.61s	2.3	602	500	2.63s	0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	codegen	1	151.4	897.6	75ms	6.5	65	843	5.59s	0.004 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	4	150.6	500.3	1.53s	2.3	602	500	2.44s	0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	agent	1	144.7	29283.6	21ms	6.7	602	347	2.41s	0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	chat	1	144.6	627.2	49ms	6.5	31	100	691ms	0.001 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	rag	1	142.6	38318.1	25ms	6.6	892	81	555ms	0.001 GiB
GeForce RTX 5070 · 12 GiB250 Wdrv 595	llama.cpp b9174 (vulkan)	baseline	agent	4	119.4	2116.6	289ms	7.8	602	362	2.93s	0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	4	116.0	1985.3	339ms	8.0	602	357	3.17s	0.000 GiB
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unified	llama.cpp b8940 (rocm)	baseline	agent	4	61.2	1589.4	506ms	15.7	602	359	5.76s	-0.009 GiB

Environment

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power350 W / 450 W max(78% cap)

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1980/2100 MHz · mem 9501 MHz

temp44°C idle · 59°C peak

peak draw334 W

backendllama.cpp cuda-4f13cb7 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

containerizedtrue

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power450 W / 450 W max

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1965/2100 MHz · mem 9501 MHz

temp44°C idle · 66°C peak

peak draw380 W

backendllama.cpp cuda-4f13cb7 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

containerizedtrue

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power200 W / 450 W max(44% cap)

backendllama.cpp 59778f0 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driver590.48.01

python3.12.3

containerizedtrue

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

backendllama.cpp b8940 (rocm)

serverlemonade 10.4.0

osUbuntu 24.04.4 LTS

kernel7.0.2-2-pve

python3.12.3

containerizedtrue

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 5070 · 12 GiB

cpuAMD Ryzen 9 7900 12-Core Processor

gpuNVIDIA GeForce RTX 5070

archNVIDIA

vram11.94 GiB (system 30.4 GiB)

power250 W / 300 W max(83% cap)

backendllama.cpp b9174 (vulkan)

serverlemonade unknown

osCachyOS

kernel7.0.0-1-cachyos

driver595.58.03

python3.14.4

containerizedfalse

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue