Determinism (GPU)
bookmark_border Stay organized with collections Save and categorize content based on your preferences.

On this page
Compilation
Execution

Compilation

XLA compilation is deterministic if persisted autotuning is used to perform autotuning once and avoid it in subsequent compilations. Otherwise due to fluctuations in measurements different kernels can be picked as the fastest ones in different compilation runs.

--xla_gpu_require_complete_aot_autotune_results can be used to ensure that no autotuning happens on repeated compilations - they either reuse compatible results of previous runs or fail.

Execution

Programs compiled by XLA can be non-deterministic on operations like scatter, select-and-scatter, GEMMs, convolutions, multi-headed attention. The flag --xla_gpu_exclude_nondeterministic_ops switches these operations to deterministic and potentially slower implementations and makes compilation fail on select-and-scatter which does not have a deterministic implementaiton.

Determinism (GPU) bookmark_borderbookmark Stay organized with collections Save and categorize content based on your preferences.

Compilation

Execution

Determinism (GPU)
bookmark_border Stay organized with collections Save and categorize content based on your preferences.