Effort Levels

XLA provides options to control the amount of effort the compiler will expend to

optimize for runtime performance, and
make the program "fit in memory" (which has a platform-dependent meaning)

Optimization Level

Similar to the -O flags in gcc or clang, this field allows the user to influence how much work the compiler does in optimizing for execution time. It can be set via the optimization_level field of the ExecutableBuildOptionsProto message, or the optimization_level field of the ExecutionOptions message.

Lower optimization levels will cause various HLO passes to behave differently, typically doing less work, or may disable certain HLO passes entirely. The optimization level may also influence the compiler backend, such that the exact effect of this field has a dependence on the target platform. However, as a general guideline, the following table describes the expected overall effect of each value:

Level	Use Case
EFFORT_O0	Fastest compilation, slowest runtime
EFFORT_O1	Faster compilation with reasonable runtime
EFFORT_O2	Strongly prioritize runtime (suitable default for production workloads)
EFFORT_O3	Expensive or experimental optimizations

Use in XLA:GPU

In XLA:GPU, there are several passes that we disable by default because they significantly increase compilation time by increasing the HLO size. For convenience, we consolidate them under the optimization level option, such that setting optimization_level to O1 or above will lead to the following behavior:

Collectives commonly used for data-parallel communication will be pipelined. This behavior can also be steered more granularly by enabling individual flags.
- xla_gpu_enable_pipelined_all_gather
- xla_gpu_enable_pipelined_all_reduce
- xla_gpu_enable_pipelined_reduce_scatter
Unrolling while loops by a factor of two. Breaks down the loop-barrier potentially leading to a better compute-communication overlap and less copies.
- xla_gpu_enable_while_loop_double_buffering
Latency Hiding Scheduler will do most of the work to hide the communication latency.
- xla_gpu_enable_latency_hiding_scheduler
To maximize networking bandwidth, combiner passes will combine pipelined collectives to the maximum available memory. The optimization does not kick in if the loop is already unrolled in the input HLO.

Memory Fitting Level

Another effort level option controls the degree to which the compiler will attempt to make the resulting program "fit in memory", where "fit" and "memory" have backend-dependent meanings (for example, in XLA:TPU, this option controls the degree to which the compiler works to keep the TPU's high-bandwidth memory (HBM) usage below the HBM capacity). It can be set via the memory_fitting_level field of the ExecutableBuildOptionsProto message, or the memory_fitting_level field of the ExecutionOptions message.

As with optimization level, the exact meaning of each effort level value is backend-dependent, but the following table describes the expected effect as a general guideline:

Level	Use Case
EFFORT_O0	Minimal effort to fit (fail compilation as quickly as possible instead)
EFFORT_O1	Reduced effort to fit
EFFORT_O2	Significant effort to fit (suitable default for production workloads)
EFFORT_O3	Expensive or experimental algorithms to reduce memory usage

Effort Levels Stay organized with collections Save and categorize content based on your preferences.

Optimization Level

Use in XLA:GPU

Memory Fitting Level

Effort Levels