Skip to content

Hardware Probe

unknown·unknown

All runs (34)

probehardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
hardware-probe v3 (rocm)baselinestatic_gpu1
probehardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
hardware-probe v3 (rocm)baselinetorch_device_properties1
probehardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
hardware-probe v3 (rocm)baselinehip_device_copy1
probehardware comparable
Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)cap 0 Wunifieddrv 7
hardware-probe v3 (rocm)baselinehip_rocblas_hgemm_fp161
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe v2 (cuda)baselinestatic_gpu1
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe v2 (cuda)baselinetorch_device_properties1
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe v2 (cuda)baselinetorch_device_copy1
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe v2 (cuda)baselinetorch_matmul_fp161
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe v2 (cuda)baselinetorch_matmul_bf161
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wstatic_gpu1
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wtorch_device_properties1
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wtorch_device_copy1
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wtorch_matmul_fp161
probehardware comparable
GeForce RTX 3090 · 24 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wtorch_matmul_bf161
probehardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-450wstatic_gpu1
probehardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-450wtorch_device_properties1
probehardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-450wtorch_device_copy1
probehardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-450wtorch_matmul_fp161
probehardware comparable
GeForce RTX 3090 · 24 GiB450 W maxdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-450wtorch_matmul_bf161
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe v2 (cuda)baselinestatic_gpu1
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe v2 (cuda)baselinetorch_device_properties1
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe v2 (cuda)baselinetorch_device_copy1
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe v2 (cuda)baselinetorch_matmul_fp161
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe v2 (cuda)baselinetorch_matmul_bf161
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wstatic_gpu1
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wtorch_device_properties1
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wtorch_device_copy1
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wtorch_matmul_fp161
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 200 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-200wtorch_matmul_bf161
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-250wstatic_gpu1
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-250wtorch_device_properties1
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-250wtorch_device_copy1
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-250wtorch_matmul_fp161
probehardware comparable
GeForce RTX 5070 · 12 GiBcap 250 Wdrv 595
hardware-probe torch-2.11.0-cu128 (cuda)pl-250wtorch_matmul_bf161

Environment

Strix Halo · Radeon 8060S · 128 GiB unified (96 GiB VRAM)
cpuAMD RYZEN AI MAX+ 395 w/ Radeon 8060S
gpuAMD Radeon 8060S
archStrix Halo (gfx1151)
vram96 GiB (system 31.1 GiB, unified)
temp46°C idle · 46°C peak
peak draw16 W
hardware probes
copy 41% of theoryFP16 peak 30.3 TF
256-bit8000 MHz20 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
captheorycopyfp16bf16
fixed256 GB/s106 GB/s30.3 TF-
compute: 11.5
backendhardware-probe v3 (rocm)
osUbuntu 24.04.4 LTS
kernel7.0.2-2-pve
driveramdgpu + ROCm 7.12.0
python3.12.3
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
clocksgfx 210 MHz · mem 405 MHz
temp34°C idle · 34°C peak
peak draw25 W
hardware probes
copy 42% of theoryFP16 peak 65.4 TFcopy/math flat across caps
384-bit9751 MHz82 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
captheorycopyfp16bf16
200 W936 GB/s391 GB/s65.4 TF65.4 TF
300 W936 GB/s391 GB/s65.4 TF65.3 TF
450 W936 GB/s391 GB/s65.4 TF65.4 TF
compute: 8.6
backendhardware-probe v2 (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power200 W / 450 W max(44% cap)
clocksgfx 210 MHz · mem 405 MHz
temp34°C idle · 34°C peak
peak draw24 W
backendhardware-probe torch-2.11.0-cu128 (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 3090 · 24 GiB
cpuAMD EPYC 7302P 16-Core Processor
gpuNVIDIA GeForce RTX 3090
archNVIDIA
vram24 GiB (system 64.0 GiB)
power450 W / 450 W max
clocksgfx 765 MHz · mem 810 MHz
temp36°C idle · 36°C peak
peak draw74 W
backendhardware-probe torch-2.11.0-cu128 (cuda)
osUbuntu 24.04 LTS
kernel7.0.2-4-pve
driverNVIDIA 595.71.05 + CUDA 13.2
python3.12.3
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 5070 · 12 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
clocksgfx 180 MHz · mem 405 MHz
temp30°C idle · 30°C peak
peak draw3 W
hardware probes
copy 40% of theoryFP16 peak 69.6 TFcopy/math spread 2.5%
192-bit14001 MHz48 SM/CU
Microbenchmarks for memory copy and tensor math; raw-engine decode and API workload rows measure model-serving speed.
captheorycopyfp16bf16
200 W672 GB/s271 GB/s67.9 TF68.4 TF
250 W672 GB/s271 GB/s69.5 TF68.2 TF
300 W672 GB/s270 GB/s69.6 TF68.4 TF
compute: 12
backendhardware-probe v2 (cuda)
osCachyOS
kernel7.0.8-1-cachyos
driverNVIDIA 595.71.05 + CUDA 13.2
python3.14.4
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 5070 · 12 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power200 W / 300 W max(67% cap)
clocksgfx 180 MHz · mem 405 MHz
temp30°C idle · 30°C peak
peak draw1 W
backendhardware-probe torch-2.11.0-cu128 (cuda)
osCachyOS
kernel7.0.8-1-cachyos
driverNVIDIA 595.71.05 + CUDA 13.2
python3.14.4
runs/cell5
warmups2
endpointdriver+torch
streamingfalse
GeForce RTX 5070 · 12 GiB
cpuAMD Ryzen 9 7900 12-Core Processor
gpuNVIDIA GeForce RTX 5070
archNVIDIA
vram11.94 GiB (system 30.4 GiB)
power250 W / 300 W max(83% cap)
clocksgfx 2535 MHz · mem 13801 MHz
temp31°C idle · 31°C peak
peak draw20 W
backendhardware-probe torch-2.11.0-cu128 (cuda)
osCachyOS
kernel7.0.8-1-cachyos
driverNVIDIA 595.71.05 + CUDA 13.2
python3.14.4
runs/cell5
warmups2
endpointdriver+torch
streamingfalse