XProf: Accelerator Performance Analysis

XProf is a profiling and performance analysis tool for machine learning.

Features

  • High quality profile information based on hardware events and counters, and compiler metadata.
  • Low collection overhead, typically <1% on TPUs and <5% on GPUs during the profiling period.
  • Broad suite of tools providing a deep understanding of your workload:
    • Overview Page: See an aggregated top-level view of how your model performed during a profile run, including how well it utilized hardware resources.
    • Trace Viewer: Visualize a detailed timeline of events that occurred, and which part of the system executed them (e.g., CPU, TPU, or GPU).
    • Graph Viewer: Visualize the graph structure of your XLA program. It displays the High Level Operations (HLO) graph.
    • Memory Viewer: Visualize memory usage over the program's lifetime, and dive into the details of the contents of memory at the point of peak memory usage.
    • Memory Profile: Visualize the dynamic memory usage of your accelerators during the execution of your program.
    • HLO Op Profile: understand hardware performance for different categories of High Level Operation (HLO) ops executed by your program.
    • HLO Op Stats: See the performance statistics of High Level Optimizer (HLO) operations executed by your program, and identify the most time-consuming operations within your HLO graph.
    • Framework Op Stats: See the performance statistics of framework-level operations (e.g., JAX, TensorFlow, or PyTorch/XLA) executed on the host and accelerator.
    • Roofline Analysis: See an intuitive visual performance model that shows inherent hardware limitations impacting your program's performance, indicating whether it is memory-bound or compute-bound.
    • Megascale Stats: Analyze inter-slice communication performance of workloads spanning multiple TPU slices that communicate across the Data Center Network (DCN).
    • GPU Kernel Stats: See performance statistics and the originating framework operation for every GPU accelerated kernel in your program.

Getting Started

For installation instructions, see the XProf Quick Start.

If you use Google Cloud to run your workloads, we recommend the xprofiler tool. It provides a streamlined profile collection and viewing experience using VMs running XProf.

To get a quick demo of XProf capabilities, try the demo notebook.

Tensorboard Integration

Historically, the only way to install and use XProf was using Tensorboard. This was called the tensorboard plugin profile; some older documentation might still use this term. This integration is now optional: you can think of Tensorboard as a container for the XProf suite of tools, which can also be installed and used standalone, with identical behavior.