This is a simple demonstration of XProf's capabilites, on an example training workload running on Cloud TPUs.
# Install the stable version of the profiler plugin
pip install -U tensorboard_plugin_profile
# git clone the xprof repo so we have access to the demo data there
git clone http://github.com/openxla/xprof
Cloning into 'xprof'... warning: redirecting to https://github.com/openxla/xprof/ remote: Enumerating objects: 12360, done. remote: Counting objects: 100% (428/428), done. remote: Compressing objects: 100% (257/257), done. remote: Total 12360 (delta 244), reused 214 (delta 170), pack-reused 11932 (from 2) Receiving objects: 100% (12360/12360), 52.22 MiB | 36.28 MiB/s, done. Resolving deltas: 100% (9127/9127), done.
# Load the TensorBoard notebook extension.
%load_ext tensorboard
# Launch TensorBoard and navigate to the Profile tab to view performance profile
%tensorboard --logdir=xprof/demo
Once tensorboard loads the profile data, use the tools drop down to explore various tools. Please see the per-tool documentation pages for an explanation of what you're seeing.