Qwen3.6 27B

Q2_K·27B params·GGUF
reasoning
checkpoint: unsloth/Qwen3.6-27B-GGUF:Q2_K
commit: 82d411acf4a0
weights 11.04 GiB

All runs (15)

HardwareBackendModeShapeConc.Gen tok/sPrefill tok/sTTFTTPOT (ms)Prompt tokOut tokTotalVRAM Δ
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
43.6
184.0321ms21.862100022.95s0.010 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
42.3
125.6240ms21.2301002.37s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
42.2
1178.4509ms21.959950011.84s0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
42.1
204.0334ms22.962100023.78s0.010 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
40.2
1019.4588ms23.059950012.44s0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
39.2
118.3258ms22.5301002.55s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
38.2
1363.4881ms21.58422005.23s0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
36.1
1305.2830ms22.88422005.54s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
24.3
196.9367ms40.962100041.14s0.010 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
24.0
120.2250ms38.0301004.16s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
23.7
976.1614ms41.159950021.14s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
22.1
958.9947ms40.38422009.05s0.000 GiB
GeForce RTX 3090 · 24 GiB450 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
17.6
39.318.07s21.959950029.48s0.000 GiB
GeForce RTX 3090 · 24 GiB350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
17.0
40.518.71s23.059950030.56s0.000 GiB
GeForce RTX 3090 · 24 GiB200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent4
11.1
4.01s77.934130.88s0.030 GiB

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1980/2100 MHz · mem 9501 MHz
temp44°C idle · 65°C peak
peak draw331 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp44°C idle · 83°C peak
peak draw433 W
backendllama.cpp cuda-4f13cb7 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
containerizedtrue
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
serverlemonade unknown
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
containerizedtrue
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue