Qwen3.6 27B
Q8_0·27B params·GGUF
reasoning
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.6-27B-GGUF:Qwen3.6-27B-Q8_0.ggufAll runs (5)
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | chat | 1 | 7.7 | 7.5 | 66.9 | — | 453ms | 129.5 | — | 30 | 100 | 13.42s | 0.003 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | codegen | 1 | 7.7 | 7.6 | 141.4 | — | 439ms | 129.7 | — | 62 | 1000 | 131.22s | 0.017 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | rag | 1 | 7.7 | 7.3 | 474.9 | — | 1.54s | 129.8 | — | 842 | 200 | 27.43s | 0.006 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 4 | 7.7 | 3.2 | 7.7 | — | 97.84s | 129.8 | — | 599 | 500 | 162.66s | 0.039 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 1 | 7.7 | 7.7 | 2129.7 | — | 281ms | 129.9 | — | 599 | 500 | 65.26s | 0.011 GiB |
Environment
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1447 MHz · mem 1000 MHz
temp47°C idle · 71°C peak
peak draw98 W
hardware probes
copy 41% of theoryFP16 peak 30.3 TF
256-bit8000 MHz20 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| fixed | 256 GB/s | 106 GB/s | 30.3 TF | - |
compute: 11.5
backendllama.cpp rocm-4f13cb7 (rocm)
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
driverROCm 7.2.3
libc2.39
python3.12.3
llama.cppversion: 1 (4f13cb7) built with Clang 22.0.0 for Linux x86_64
build flagsGGML_HIP=ON AMDGPU_TARGETS=gfx1151 CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue