This is a simple demonstration of XProf's capabilites, on an example training workload running on Cloud TPUs.
# Install the stable version of the profiler plugin
pip install -U tensorboard_plugin_profile
# Update protobuf version in the environment
pip install -U protobuf
# git clone the xprof repo so we have access to the demo data there
git clone http://github.com/openxla/xprof
Cloning into 'xprof'... warning: redirecting to https://github.com/openxla/xprof/ remote: Enumerating objects: 14319, done. remote: Counting objects: 100% (3038/3038), done. remote: Compressing objects: 100% (791/791), done. remote: Total 14319 (delta 2528), reused 2254 (delta 2247), pack-reused 11281 (from 3) Receiving objects: 100% (14319/14319), 52.69 MiB | 36.63 MiB/s, done. Resolving deltas: 100% (10777/10777), done.
# Load the TensorBoard notebook extension.
%load_ext tensorboard
# Launch TensorBoard and navigate to the Profile tab to view performance profile
%tensorboard --logdir=xprof/demo
Once tensorboard loads the profile data, use the tools drop down to explore various tools. Please see the per-tool documentation pages for an explanation of what you're seeing.