Qwen3-Coder 30B-A3B-Instruct

Q4_K_XL·30B params·GGUF

reasoning

intelligence: see on Artificial Analysis →

checkpoint: unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q4_K_XL

commit: b17cb02dd882

weights 16.45 GiB

All runs (15)

Hardware	Backend	Mode	Shape	Conc.	Gen tok/s ↓	Prefill tok/s	TTFT	TPOT (ms)	Prompt tok	Out tok	Total
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	codegen	1	183.6	485.3	125ms	5.0	60	785	4.28s
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	codegen	1	181.3	473.9	120ms	5.1	60	785	4.32s
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	1	170.4	4382.9	157ms	5.0	585	267	1.55s
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	chat	1	169.7	332.1	86ms	4.9	28	99	576ms
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	1	156.4	4519.5	164ms	5.1	585	267	1.71s
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	chat	1	153.3	316.1	89ms	5.0	28	99	548ms
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	codegen	1	152.7	470.6	165ms	6.4	60	811	5.28s
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	rag	1	146.4	4125.5	205ms	5.0	834	91	838ms
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	chat	1	146.2	416.1	72ms	6.0	28	99	635ms
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	1	141.1	3628.9	161ms	6.5	585	257	1.80s
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	rag	1	127.2	4330.1	236ms	5.1	834	91	816ms
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	rag	1	117.1	3587.1	220ms	6.2	834	91	975ms
GeForce RTX 3090 · 24 GiB450 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-450w	agent	4	78.6	324.9	2.46s	5.0	585	267	4.15s
GeForce RTX 3090 · 24 GiB350 Wdrv 590	llama.cpp cuda-4f13cb7 (cuda)	baseline-pl-350w	agent	4	71.1	294.2	2.34s	5.1	585	267	4.01s
GeForce RTX 3090 · 24 GiB200 Wdrv 590	llama.cpp 59778f0 (cuda)	baseline	agent	4	50.5	387.3	1.94s	13.8	585	257	5.60s

Environment

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power350 W / 450 W max(78% cap)

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1965/2100 MHz · mem 9501 MHz

temp44°C idle · 60°C peak

peak draw330 W

backendllama.cpp cuda-4f13cb7 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

containerizedtrue

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power450 W / 450 W max

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1965/2100 MHz · mem 9501 MHz

temp44°C idle · 67°C peak

peak draw396 W

backendllama.cpp cuda-4f13cb7 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driverNVIDIA 590.48.01 + CUDA 13.1

libc2.39

python3.12.3

containerizedtrue

llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64

build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue

GeForce RTX 3090 · 24 GiB

cpuAMD EPYC 7302P 16-Core Processor

gpuNVIDIA GeForce RTX 3090

archNVIDIA

vram24 GiB (system 64.0 GiB)

power200 W / 450 W max(44% cap)

backendllama.cpp 59778f0 (cuda)

serverlemonade unknown

osUbuntu 24.04 LTS

kernel6.17.13-7-pve

driver590.48.01

python3.12.3

containerizedtrue

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue