Determinism (GPU)

Compilation

XLA compilation is deterministic if persisted autotuning is used to perform autotuning once and avoid it in subsequent compilations. Otherwise due to fluctuations in measurements different kernels can be picked as the fastest ones in different compilation runs.

--xla_gpu_require_complete_aot_autotune_results can be used to ensure that no autotuning happens on repeated compilations - they either reuse compatible results of previous runs or fail.

Execution

Programs compiled by XLA can be non-deterministic on operations like scatter, select-and-scatter, GEMMs, convolutions, multi-headed attention. The flag --xla_gpu_exclude_nondeterministic_ops switches these operations to deterministic and potentially slower implementations and makes compilation fail on select-and-scatter which does not have a deterministic implementaiton.