Qwen3.6 27B

Q8_0·27B params·GGUF

reasoning

checkpoint: unsloth/Qwen3.6-27B-GGUF:Qwen3.6-27B-Q8_0.gguf

All runs (5)


legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	chat	1	7.7	7.5	66.9	—	453ms	129.5	—	30	100	13.42s	0.003 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	codegen	1	7.7	7.6	141.4	—	439ms	129.7	—	62	1000	131.22s	0.017 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	rag	1	7.7	7.3	474.9	—	1.54s	129.8	—	842	200	27.43s	0.006 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	agent	4	7.7	3.2	7.7	—	97.84s	129.8	—	599	500	162.66s	0.039 GiB
legacy	stack comparable	Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7	llama.cpp rocm-4f13cb7 (rocm)	baseline	agent	1	7.7	7.7	2129.7	—	281ms	129.9	—	599	500	65.26s	0.011 GiB

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)

cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S

gpuAMD Radeon 8060S

archStrix Halo (gfx1151)

vram96 GiB (system 31.1 GiB, unified)

pcieGen 4 x16 / Gen 4 x16 max

clocksgfx 1447 MHz · mem 1000 MHz

temp47°C idle · 71°C peak

peak draw98 W

hardware probes

copy 41% of theoryFP16 peak 30.3 TF

256-bit8000 MHz20 SM/CU

Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.

cap	theory	copy	fp16	bf16
fixed	256 GB/s	106 GB/s	30.3 TF	-

compute: 11.5

backendllama.cpp rocm-4f13cb7 (rocm)

osUbuntu 24.04 LTS

kernel7.0.2-2-pve

driverROCm 7.2.3

libc2.39

python3.12.3

llama.cppversion: 1 (4f13cb7) built with Clang 22.0.0 for Linux x86_64

build flagsGGML_HIP=ON AMDGPU_TARGETS=gfx1151 CMAKE_BUILD_TYPE=Release

runs/cell5

warmups2

endpoint/v1/chat/completions

streamingtrue