Category: Compile Time: Scoped Vmem OOM
This error indicates that the program requires more Scoped Vector Memory (Vmem) than what was allocated.
Sample Error Messages:
RESOURCE_EXHAUSTED: Ran out of memory in memory space vmem while allocating on stack for %my-custom-kernel = bf16[2048,4096]{1,0:T(8,128)(2,1)} custom-call(...) ...
XLA Backends: TPU
Overview
TPUs have Vector Memory (VMEM) which is a local scratchpad memory used exclusively by the TensorCore (TC). The compiler manages Vmem for different types of allocations:
- Instruction-scoped allocations: Temporary storage in Vmem while executing a single HLO instruction. This includes operand span buffer (e.g. for double buffering) and register spills.
- Program-scoped allocations: Allocations that live beyond the scope of a single HLO instruction. These are usually HLO temporaries and intermediate results that are inputs and/or outputs of HLO instructions.
A Compile Time Scoped Vmem OOM occurs when the instruction-scoped allocations exceed the allocation limit for that instruction. This limit is controlled
- globally for the entire program via the flag --xla_tpu_scoped_vmem_limit_kib and
- per custom kernel via vmem_limit_bytes param.
These errors are typically caused by an internal compiler bug or by a custom kernel exceeding its allocation limit.
Debugging
Carefully analyze the error message to identify if the error stems from a custom kernel or a standard HLO. An error due to a custom kernel should have the following signature:
Ran out of memory in memory space vmem while allocating on stack for %my-custom-call = <output-shape> custom-call(<params>), custom_call_target="tpu_custom_call" ...
- Custom Kernel Scoped Vmem OOM: If the error points to a custom kernel → Jump to Retune the Kernel.
- Non-Kernel Vmem Issues: If the Vmem OOM occurs due to a non-custom-kernel op, it is likely an internal compiler bug. Please file a bug on XLA with an HLO dump.
Retune the Kernel
If the error originates from a custom kernel, use the following techniques to reduce the kernel's memory requirement:
- Adjust Block Sizes: Reduce the block sizes (tile sizes) in your kernel configuration, to lower Scoped Vmem usage.
- Set Per-Kernel Scoped Vmem Limits: Explicitly request the required amount of memory for that specific kernel using the vmem_limit_bytes param
- Modify Memory Coloring: Explicitly color/constrain the kernel's inputs/outputs to VMEM using pallas.tpu.with_memory_space_constraint. But be careful not to color too many inputs outputs to Vmem, as that might cause an overall VMEM OOM.
- If kernel specific retuning is difficult or the issue affects many kernels, you can adjust the global Vmem limit using the flag --xla_tpu_scoped_vmem_limit_kib.