Skip to content

Qwen3.6 27B

Q4_K_M·27B params·GGUF
reasoning
checkpoint: unsloth/Qwen3.6-27B-GGUF:Q4_K_M
commit: 82d411acf4a0
weights 15.66 GiB

All runs (68)

legacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wchat1
42.6
37.4127.5235ms23.5301002.67s0.000 GiB
legacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wrag1
42.2
35.81361.9759ms23.78422005.59s0.000 GiB
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450tg_1281
42.2
42.2128
legacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wcodegen1
42.1
40.3189.2312ms23.762100024.82s0.010 GiB
legacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent1
42.0
39.21213.7519ms23.859950012.76s0.000 GiB
legacystack comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-450wagent4
41.9
16.336.319.50s23.859950031.84s0.000 GiB
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450mixed_4096_2561
41.9
41.94096256
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450mixed_2048_2561
41.9
41.92048256
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450mixed_2048_7681
41.9
41.92048768
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450mixed_1024_10241
41.7
41.710241024
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wchat1
41.7
36.3126.8237ms24.0301002.75s0.000 GiB
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450mixed_64_10241
41.6
41.6641024
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450mixed_16_15361
41.5
41.5161536
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450mixed_384_11521
41.5
41.53841152
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450tg_10241
41.4
41.41024
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450tg_5121
41.4
41.4512
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wrag1
41.4
33.81192.9857ms24.28422005.91s0.000 GiB
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wcodegen1
41.3
39.3206.7302ms24.262100025.45s0.010 GiB
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450mixed_1280_30721
41.3
41.312803072
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent1
41.2
38.31220.5500ms24.359950013.06s0.000 GiB
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590
llama.cpp cuda-4f13cb7 (cuda)baseline-pl-350wagent4
41.2
16.137.819.80s24.359950032.36s0.000 GiB
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450mixed_1024_161
37.6
37.6102416
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinechat1
23.0
21.1117.8255ms43.5301004.73s0.000 GiB
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinerag1
22.2
20.01031.2931ms45.08422009.99s0.000 GiB
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselinecodegen1
21.8
21.6195.0375ms45.962100046.39s0.000 GiB
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent1
21.7
21.1957.3626ms46.059950023.75s0.000 GiB
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2mixed_1024_161
20.6
20.6102416
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2tg_1281
19.6
19.6128
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2mixed_4096_2561
19.2
19.24096256
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2mixed_2048_2561
19.1
19.12048256
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2mixed_2048_7681
19.0
19.02048768
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2mixed_1024_10241
19.0
19.010241024
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2mixed_64_10241
19.0
19.0641024
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2mixed_16_15361
19.0
19.0161536
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2tg_5121
18.9
18.9512
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2tg_10241
18.9
18.91024
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2mixed_384_11521
18.9
18.93841152
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2mixed_1280_30721
18.9
18.912803072
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2mixed_2048_2561
12.2
12.22048256
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2tg_1281
12.2
12.2128
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2tg_5121
12.2
12.2512
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2mixed_4096_2561
12.2
12.24096256
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2mixed_2048_7681
12.2
12.22048768
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2tg_10241
12.2
12.21024
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2mixed_1024_10241
12.2
12.210241024
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2mixed_64_10241
12.2
12.2641024
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2mixed_384_11521
12.2
12.23841152
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2mixed_16_15361
12.2
12.2161536
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2mixed_1280_30721
12.1
12.112803072
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp rocm-4f13cb7 (rocm)baselinechat1
12.1
11.785.5352ms82.4301008.54s0.004 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp rocm-4f13cb7 (rocm)baselinecodegen1
12.1
12.0146.3435ms82.762100083.14s0.017 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp rocm-4f13cb7 (rocm)baselineagent1
12.1
12.02076.2289ms82.859950041.70s0.010 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp rocm-4f13cb7 (rocm)baselineagent4
12.1
5.012.662.69s82.8599500104.09s0.037 GiB
legacystack comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7
llama.cpp rocm-4f13cb7 (rocm)baselinerag1
12.1
11.1466.71.53s82.984220018.02s0.007 GiB
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2mixed_1024_161
12.0
12.0102416
legacystack comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590
llama.cpp 59778f0 (cuda)baselineagent4
11.4
10.03.93s87.534133.98s0.040 GiB
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450pp_51211401.6512
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450pp_102411430.41024
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450pp_204811417.32048
rawhardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2-pl450pp_409611405.14096
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2pp_5121778.0512
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2pp_10241797.71024
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2pp_20481791.82048
rawhardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
llama.cpp llama.cpp-3e12fbd (cuda)raw-v4-r2pp_40961786.64096
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2pp_5121354.1512
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2pp_10241354.91024
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2pp_20481351.52048
rawhardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
llama.cpp llama.cpp-4f13cb7 (rocm)raw-v4-r2pp_40961344.74096

Environment

GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp44°C idle · 65°C peak
peak draw329 W
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
captheorycopyfp16bf16
200 W936 GB/s391 GB/s65.4 TF65.4 TF
300 W936 GB/s391 GB/s65.4 TF65.3 TF
450 W936 GB/s391 GB/s65.4 TF65.4 TF
compute: 8.6
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp43°C idle · 82°C peak
peak draw434 W
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1106 MHz · mem 1000 MHz
temp48°C idle · 75°C peak
peak draw100 W
hardware probes
copy 41% of theoryFP16 peak 30.3 TF
256-bit8000 MHz20 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
captheorycopyfp16bf16
fixed256 GB/s106 GB/s30.3 TF-
compute: 11.5
backendllama.cpp rocm-4f13cb7 (rocm)
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
driverROCm 7.2.3
libc2.39
python3.12.3
llama.cppversion: 1 (4f13cb7) built with Clang 22.0.0 for Linux x86_64
build flagsGGML_HIP=ON AMDGPU_TARGETS=gfx1151 CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
clocksgfx 210 MHz · mem 405 MHz
temp39°C idle · 39°C peak
peak draw25 W
backendllama.cpp llama.cpp-3e12fbd (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell3
warmups0
endpointllama-bench
streamingfalse
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
clocksgfx 210 MHz · mem 405 MHz
temp39°C idle · 39°C peak
peak draw25 W
backendllama.cpp llama.cpp-3e12fbd (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell3
warmups0
endpointllama-bench
streamingfalse
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
temp46°C idle · 46°C peak
peak draw16 W
backendllama.cpp llama.cpp-4f13cb7 (rocm)
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
driveramdgpu + ROCm 7.2.3
python3.12.3
runs/cell3
warmups0
endpointllama-bench
streamingfalse