XLA compilation is deterministic if
persisted autotuning is used to perform autotuning
once and avoid it in subsequent compilations. Otherwise due to fluctuations in
measurements different kernels can be picked as the fastest ones in different
compilation runs.
--xla_gpu_require_complete_aot_autotune_results can be used to ensure that no
autotuning happens on repeated compilations - they either reuse compatible
results of previous runs or fail.
Execution
Programs compiled by XLA can be non-deterministic on operations like scatter,
select-and-scatter, GEMMs, convolutions, multi-headed attention. The flag
--xla_gpu_exclude_nondeterministic_ops switches these operations to
deterministic and potentially slower implementations and makes compilation fail
on select-and-scatter which does not have a deterministic implementaiton.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-08-03 UTC."],[],[]]