Qwen3.6 27B
Q4_K_M·27B params·GGUF
reasoning
intelligence: see on Artificial Analysis →
checkpoint:
unsloth/Qwen3.6-27B-GGUF:Q4_K_Mcommit:
82d411acf4a0weights 15.66 GiB
All runs (68)
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | chat | 1 | 42.6 | 37.4 | 127.5 | — | 235ms | 23.5 | — | 30 | 100 | 2.67s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | rag | 1 | 42.2 | 35.8 | 1361.9 | — | 759ms | 23.7 | — | 842 | 200 | 5.59s | 0.000 GiB |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | tg_128 | 1 | 42.2 | 42.2 | — | — | — | — | — | — | 128 | — | — |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | codegen | 1 | 42.1 | 40.3 | 189.2 | — | 312ms | 23.7 | — | 62 | 1000 | 24.82s | 0.010 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 1 | 42.0 | 39.2 | 1213.7 | — | 519ms | 23.8 | — | 599 | 500 | 12.76s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-450w | agent | 4 | 41.9 | 16.3 | 36.3 | — | 19.50s | 23.8 | — | 599 | 500 | 31.84s | 0.000 GiB |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | mixed_4096_256 | 1 | 41.9 | 41.9 | — | — | — | — | — | 4096 | 256 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | mixed_2048_256 | 1 | 41.9 | 41.9 | — | — | — | — | — | 2048 | 256 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | mixed_2048_768 | 1 | 41.9 | 41.9 | — | — | — | — | — | 2048 | 768 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | mixed_1024_1024 | 1 | 41.7 | 41.7 | — | — | — | — | — | 1024 | 1024 | — | — |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | chat | 1 | 41.7 | 36.3 | 126.8 | — | 237ms | 24.0 | — | 30 | 100 | 2.75s | 0.000 GiB |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | mixed_64_1024 | 1 | 41.6 | 41.6 | — | — | — | — | — | 64 | 1024 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | mixed_16_1536 | 1 | 41.5 | 41.5 | — | — | — | — | — | 16 | 1536 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | mixed_384_1152 | 1 | 41.5 | 41.5 | — | — | — | — | — | 384 | 1152 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | tg_1024 | 1 | 41.4 | 41.4 | — | — | — | — | — | — | 1024 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | tg_512 | 1 | 41.4 | 41.4 | — | — | — | — | — | — | 512 | — | — |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | rag | 1 | 41.4 | 33.8 | 1192.9 | — | 857ms | 24.2 | — | 842 | 200 | 5.91s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | codegen | 1 | 41.3 | 39.3 | 206.7 | — | 302ms | 24.2 | — | 62 | 1000 | 25.45s | 0.010 GiB |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | mixed_1280_3072 | 1 | 41.3 | 41.3 | — | — | — | — | — | 1280 | 3072 | — | — |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 1 | 41.2 | 38.3 | 1220.5 | — | 500ms | 24.3 | — | 599 | 500 | 13.06s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 350 Wdrv 590 | llama.cpp cuda-4f13cb7 (cuda) | baseline-pl-350w | agent | 4 | 41.2 | 16.1 | 37.8 | — | 19.80s | 24.3 | — | 599 | 500 | 32.36s | 0.000 GiB |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | mixed_1024_16 | 1 | 37.6 | 37.6 | — | — | — | — | — | 1024 | 16 | — | — |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | chat | 1 | 23.0 | 21.1 | 117.8 | — | 255ms | 43.5 | — | 30 | 100 | 4.73s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | rag | 1 | 22.2 | 20.0 | 1031.2 | — | 931ms | 45.0 | — | 842 | 200 | 9.99s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | codegen | 1 | 21.8 | 21.6 | 195.0 | — | 375ms | 45.9 | — | 62 | 1000 | 46.39s | 0.000 GiB |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 1 | 21.7 | 21.1 | 957.3 | — | 626ms | 46.0 | — | 599 | 500 | 23.75s | 0.000 GiB |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | mixed_1024_16 | 1 | 20.6 | 20.6 | — | — | — | — | — | 1024 | 16 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | tg_128 | 1 | 19.6 | 19.6 | — | — | — | — | — | — | 128 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | mixed_4096_256 | 1 | 19.2 | 19.2 | — | — | — | — | — | 4096 | 256 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | mixed_2048_256 | 1 | 19.1 | 19.1 | — | — | — | — | — | 2048 | 256 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | mixed_2048_768 | 1 | 19.0 | 19.0 | — | — | — | — | — | 2048 | 768 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | mixed_1024_1024 | 1 | 19.0 | 19.0 | — | — | — | — | — | 1024 | 1024 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | mixed_64_1024 | 1 | 19.0 | 19.0 | — | — | — | — | — | 64 | 1024 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | mixed_16_1536 | 1 | 19.0 | 19.0 | — | — | — | — | — | 16 | 1536 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | tg_512 | 1 | 18.9 | 18.9 | — | — | — | — | — | — | 512 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | tg_1024 | 1 | 18.9 | 18.9 | — | — | — | — | — | — | 1024 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | mixed_384_1152 | 1 | 18.9 | 18.9 | — | — | — | — | — | 384 | 1152 | — | — |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | mixed_1280_3072 | 1 | 18.9 | 18.9 | — | — | — | — | — | 1280 | 3072 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | mixed_2048_256 | 1 | 12.2 | 12.2 | — | — | — | — | — | 2048 | 256 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | tg_128 | 1 | 12.2 | 12.2 | — | — | — | — | — | — | 128 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | tg_512 | 1 | 12.2 | 12.2 | — | — | — | — | — | — | 512 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | mixed_4096_256 | 1 | 12.2 | 12.2 | — | — | — | — | — | 4096 | 256 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | mixed_2048_768 | 1 | 12.2 | 12.2 | — | — | — | — | — | 2048 | 768 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | tg_1024 | 1 | 12.2 | 12.2 | — | — | — | — | — | — | 1024 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | mixed_1024_1024 | 1 | 12.2 | 12.2 | — | — | — | — | — | 1024 | 1024 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | mixed_64_1024 | 1 | 12.2 | 12.2 | — | — | — | — | — | 64 | 1024 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | mixed_384_1152 | 1 | 12.2 | 12.2 | — | — | — | — | — | 384 | 1152 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | mixed_16_1536 | 1 | 12.2 | 12.2 | — | — | — | — | — | 16 | 1536 | — | — |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | mixed_1280_3072 | 1 | 12.1 | 12.1 | — | — | — | — | — | 1280 | 3072 | — | — |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | chat | 1 | 12.1 | 11.7 | 85.5 | — | 352ms | 82.4 | — | 30 | 100 | 8.54s | 0.004 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | codegen | 1 | 12.1 | 12.0 | 146.3 | — | 435ms | 82.7 | — | 62 | 1000 | 83.14s | 0.017 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 1 | 12.1 | 12.0 | 2076.2 | — | 289ms | 82.8 | — | 599 | 500 | 41.70s | 0.010 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | agent | 4 | 12.1 | 5.0 | 12.6 | — | 62.69s | 82.8 | — | 599 | 500 | 104.09s | 0.037 GiB |
| legacy | stack comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)unifieddrv 7 | llama.cpp rocm-4f13cb7 (rocm) | baseline | rag | 1 | 12.1 | 11.1 | 466.7 | — | 1.53s | 82.9 | — | 842 | 200 | 18.02s | 0.007 GiB |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | mixed_1024_16 | 1 | 12.0 | 12.0 | — | — | — | — | — | 1024 | 16 | — | — |
| legacy | stack comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 590 | llama.cpp 59778f0 (cuda) | baseline | agent | 4 | 11.4 | 10.0 | — | — | 3.93s | 87.5 | — | — | 341 | 33.98s | 0.040 GiB |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | pp_512 | 1 | — | — | 1401.6 | — | — | — | — | 512 | — | — | |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | pp_1024 | 1 | — | — | 1430.4 | — | — | — | — | 1024 | — | — | |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | pp_2048 | 1 | — | — | 1417.3 | — | — | — | — | 2048 | — | — | |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2-pl450 | pp_4096 | 1 | — | — | 1405.1 | — | — | — | — | 4096 | — | — | |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | pp_512 | 1 | — | — | 778.0 | — | — | — | — | 512 | — | — | |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | pp_1024 | 1 | — | — | 797.7 | — | — | — | — | 1024 | — | — | |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | pp_2048 | 1 | — | — | 791.8 | — | — | — | — | 2048 | — | — | |
| raw | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | llama.cpp llama.cpp-3e12fbd (cuda) | raw-v4-r2 | pp_4096 | 1 | — | — | 786.6 | — | — | — | — | 4096 | — | — | |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | pp_512 | 1 | — | — | 354.1 | — | — | — | — | 512 | — | — | |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | pp_1024 | 1 | — | — | 354.9 | — | — | — | — | 1024 | — | — | |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | pp_2048 | 1 | — | — | 351.5 | — | — | — | — | 2048 | — | — | |
| raw | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | llama.cpp llama.cpp-4f13cb7 (rocm) | raw-v4-r2 | pp_4096 | 1 | — | — | 344.7 | — | — | — | — | 4096 | — | — |
Environment
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power350 W / 450 W max(78% cap)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp44°C idle · 65°C peak
peak draw329 W
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
| 300 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.3 TF |
| 450 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
compute: 8.6
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1965/2100 MHz · mem 9501 MHz
temp43°C idle · 82°C peak
peak draw434 W
backendllama.cpp cuda-4f13cb7 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driverNVIDIA 590.48.01 + CUDA 13.1
libc2.39
python3.12.3
llama.cppversion: 18 (4f13cb7) built with GNU 13.3.0 for Linux x86_64
build flagsGGML_CUDA=ON CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
backendllama.cpp 59778f0 (cuda)
osUbuntu 24.04 LTS
kernel6.17.13-7-pve
driver590.48.01
python3.12.3
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
pcieGen 4 x16 / Gen 4 x16 max
clocksgfx 1106 MHz · mem 1000 MHz
temp48°C idle · 75°C peak
peak draw100 W
hardware probes
copy 41% of theoryFP16 peak 30.3 TF
256-bit8000 MHz20 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| fixed | 256 GB/s | 106 GB/s | 30.3 TF | - |
compute: 11.5
backendllama.cpp rocm-4f13cb7 (rocm)
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
driverROCm 7.2.3
libc2.39
python3.12.3
llama.cppversion: 1 (4f13cb7) built with Clang 22.0.0 for Linux x86_64
build flagsGGML_HIP=ON AMDGPU_TARGETS=gfx1151 CMAKE_BUILD_TYPE=Release
runs/cell5
warmups2
endpoint/v1/chat/completions
streamingtrue
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
clocksgfx 210 MHz · mem 405 MHz
temp39°C idle · 39°C peak
peak draw25 W
backendllama.cpp llama.cpp-3e12fbd (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell3
warmups0
endpointllama-bench
streamingfalse
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
clocksgfx 210 MHz · mem 405 MHz
temp39°C idle · 39°C peak
peak draw25 W
backendllama.cpp llama.cpp-3e12fbd (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell3
warmups0
endpointllama-bench
streamingfalse
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
temp46°C idle · 46°C peak
peak draw16 W
backendllama.cpp llama.cpp-4f13cb7 (rocm)
osUbuntu 24.04 LTS
kernel7.0.2-2-pve
driveramdgpu + ROCm 7.2.3
python3.12.3
runs/cell3
warmups0
endpointllama-bench
streamingfalse