GPU Kernel Stats Tool
You can use the GPU Kernel Stats tool to see performance statistics and the originating framework operation for every GPU-accelerated kernel that was launched during a profiling session. This tool helps identify potential bottlenecks at a low level and surfaces optimization opportunities.
Supported Platforms
GPU: Supported
TPU: Not supported
Interface Components
The GPU Kernel Stats tool interface is a table with one row for each unique kernel-framework operation pair. You can click any of the column headings to sort the table accordingly. The default order is based on the total duration of the kernel-op pair.
The table includes the following information for each kernel-op pair:
- Kernel name: The name of the kernel that was launched.
- Registers per thread: The number of GPU registers used by the kernel per thread.
- Shared memory used: The total size of shared memory used by the kernel in bytes.
- Block dimension: The dimensions of the thread block expressed as blockDim.x, blockDim.y, blockDim.z.
- Grid dimensions: The dimensions of the grid of thread blocks expressed as gridDim.x, gridDim.y, gridDim.z.
- Theoretical occupancy: The theoretical occupancy of the GPU expressed as a percentage. This is an indication of the ability to hide memory latency.
- Kernel uses Tensor Core: A heuristic indicating whether the kernel itself contains Tensor Core instructions, based on the presence of common Tensor Core instructions.
- Tensor Cores eligibility: A heuristic indicating whether the originating framework operation is eligible to use Tensor Cores, based on commonly occurring ops that employ the Tensor Core.
- Op name: The name of the framework operation that launched this kernel.
- Occurrences: The number of times this specific kernel-operation pair was executed during the profiling period.
- Total duration (us): The cumulative sum of the execution time of all occurrences of this kernel-operation pair.
- Average duration (us): The average execution time across all occurrences of this kernel-operation pair.
- Minimum duration (us): The shortest execution time observed for this kernel-operation pair.
- Maximum duration (us): The longest execution time observed for this kernel-operation pair.