Hardware Probe
unknown·unknown
intelligence: see on Artificial Analysis →
All runs (34)
| probe | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | hardware-probe v3 (rocm) | baseline | static_gpu | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | hardware-probe v3 (rocm) | baseline | torch_device_properties | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | hardware-probe v3 (rocm) | baseline | hip_device_copy | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7 | hardware-probe v3 (rocm) | baseline | hip_rocblas_hgemm_fp16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe v2 (cuda) | baseline | static_gpu | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe v2 (cuda) | baseline | torch_device_properties | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe v2 (cuda) | baseline | torch_device_copy | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe v2 (cuda) | baseline | torch_matmul_fp16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe v2 (cuda) | baseline | torch_matmul_bf16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | static_gpu | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | torch_device_properties | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | torch_device_copy | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | torch_matmul_fp16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | torch_matmul_bf16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-450w | static_gpu | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-450w | torch_device_properties | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-450w | torch_device_copy | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-450w | torch_matmul_fp16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 3090 · 24 GiB450 W maxdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-450w | torch_matmul_bf16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe v2 (cuda) | baseline | static_gpu | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe v2 (cuda) | baseline | torch_device_properties | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe v2 (cuda) | baseline | torch_device_copy | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe v2 (cuda) | baseline | torch_matmul_fp16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe v2 (cuda) | baseline | torch_matmul_bf16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | static_gpu | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | torch_device_properties | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | torch_device_copy | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | torch_matmul_fp16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-200w | torch_matmul_bf16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-250w | static_gpu | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-250w | torch_device_properties | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-250w | torch_device_copy | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-250w | torch_matmul_fp16 | 1 | — | — | — | — | — | — | — | — | — | — | |
| probe | hardware comparable | GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595 | hardware-probe torch-2.11.0-cu128 (cuda) | pl-250w | torch_matmul_bf16 | 1 | — | — | — | — | — | — | — | — | — | — |
Environment
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
temp46°C idle · 46°C peak
peak draw16 W
hardware probes
copy 41% of theoryFP16 peak 30.3 TF
256-bit8000 MHz20 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| fixed | 256 GB/s | 106 GB/s | 30.3 TF | - |
compute: 11.5
backendhardware-probe v3 (rocm)
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
driveramdgpu + ROCm 7.12.0
python3.12.3
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
clocksgfx 210 MHz · mem 405 MHz
temp34°C idle · 34°C peak
peak draw25 W
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
| 300 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.3 TF |
| 450 W | 936 GB/s | 391 GB/s | 65.4 TF | 65.4 TF |
compute: 8.6
backendhardware-probe v2 (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
clocksgfx 210 MHz · mem 405 MHz
temp34°C idle · 34°C peak
peak draw24 W
backendhardware-probe torch-2.11.0-cu128 (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
clocksgfx 765 MHz · mem 810 MHz
temp36°C idle · 36°C peak
peak draw74 W
backendhardware-probe torch-2.11.0-cu128 (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 5070 · 12 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
clocksgfx 180 MHz · mem 405 MHz
temp30°C idle · 30°C peak
peak draw3 W
hardware probes
copy 40% of theoryFP16 peak 69.6 TFcopy/math spread 2.5%
192-bit14001 MHz48 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
| cap | theory | copy | fp16 | bf16 |
|---|---|---|---|---|
| 200 W | 672 GB/s | 271 GB/s | 67.9 TF | 68.4 TF |
| 250 W | 672 GB/s | 271 GB/s | 69.5 TF | 68.2 TF |
| 300 W | 672 GB/s | 270 GB/s | 69.6 TF | 68.4 TF |
compute: 12
backendhardware-probe v2 (cuda)
osCachyOS
kernel7.0.8-1-cachyos
driverNVIDIA 595.71.05 + CUDA 13.2
python3.14.4
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 5070 · 12 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power200 W / 300 W max(67% cap)
clocksgfx 180 MHz · mem 405 MHz
temp30°C idle · 30°C peak
peak draw1 W
backendhardware-probe torch-2.11.0-cu128 (cuda)
osCachyOS
kernel7.0.8-1-cachyos
driverNVIDIA 595.71.05 + CUDA 13.2
python3.14.4
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 5070 · 12 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
clocksgfx 2535 MHz · mem 13801 MHz
temp31°C idle · 31°C peak
peak draw20 W
backendhardware-probe torch-2.11.0-cu128 (cuda)
osCachyOS
kernel7.0.8-1-cachyos
driverNVIDIA 595.71.05 + CUDA 13.2
python3.14.4
runs/cell5
warmups2
endpointdriver+torch
streamingfalse