-stablehlo-aggressive-folder
Folds StableHLO operations
Options
-assume-no-undeclared-side-effects : Allow dead code to be eliminated in some situations (e.g. dead while loops) under the assumption that ops are pure unless declared with explicit MLIR `MemoryEffects`. Notably, this means `func.call` ops will be assumed pure.
-fold-op-element-limit : Folding an op into a constant can sometimes come at the cost of memory overhead. (This occurs if the op's inputs are reused, meaning that they can't be deleted after the op is folded to a constant, or when folding operations like `iota` whose outputs take up more memory than their inputs.) In such cases, this config option sets an upper limit on how many elements an op's result may have before the op is no longer folded.
-optimize-float : Allow float optimizations that, though mathematically equivalent, may result in slightly different quantization of floating-point values (e.g. `log(sqrt(x))` -> `0.5 * log(x)`). Float optimizations that can't affect numerical results are always enabled.
-stablehlo-aggressive-simplification
Canonicalizes StableHLO operations
Performs graph simplifications, including:
- add(cst, X) -> add(X, cst)
- add(X, 0) -> X
- and(cst, X) -> and(X, cst)
- and(X, 0) -> 0
- and(X, 1) -> X
- broadcast_in_dim(broadcast_in_dim(X, [dimsA...]), [dimsB...]) -> broadcast_in_dim(X, merge(dimsA, dimsB))
- broadcast_in_dim(X, [dims...]) -> transpose(X, [dims...]) [if same numel & rank]
- broadcast_in_dim(X, [iota...]) -> X
- broadcast_in_dim(X, [sorted...]) -> reshape(X, [sorted...]) [if same numel]
- compare(cst, X, comparator) -> compare(X, cst, inv(comparator))
- compare(X, X, [EQ,GE,LE]) -> true
- compare(X, X, [NE,GT,LT]) -> false
- complex(real(X), imag(X))) -> X
- concatenate(concatenate(X, Y), Z) -> concatenate(X, Y, Z)
- concatenate(X) -> X
- concatenate(X, Y, []) -> concatenate(X, Y)
- convert(X, [X.type]) -> X
- dynamic_broadcast_in_dim(dynamic_broadcast_in_dim(X, _, [dimsA...]), shape, [dimsB...]) -> dynamic_broadcast_in_dim(X, shape, merge(dimsA, dimsB))
- dynamic_broadcast_in_dim(dynamic_reshape(X, shape), shape) -> dynamic_reshape(X, shape)
- dynamic_broadcast_in_dim(X, _, _, [all_nonexpanding...]) -> convert(X)
- dynamic_broadcast_in_dim(X, shape_of(X)) -> X
- dynamic_gather(x, constant(slice_sizes)) -> gather(x, slice_sizes)
- dynamic_iota(shape, dim) ->
- dynamic_pad(X, low, high, interior) -> pad(X, low, high, interior)
- dynamic_reshape(dynamic_reshape(X, _), shape)) -> dynamic_reshape(X, shape)
- dynamic_reshape(op(dynamic_reshape(X, shape)), shape)
- dynamic_slice(X, begin, slice_sizes) -> slice(X, begin, slice_sizes)
- dynamic_update_slice(X, update, start_indices : zero)) -> update
- dynamic_update_slice(X, update : zero_extent)) -> X
- gather(X, cst_start_indices) -> slice(X, slice_start, slice_end)
- get_dimension_size(X, i) -> X.shape[i]
- get_tuple_element(tuple(X_0, X_1, ...), i) -> X_i
- imag(complex(R,I)) -> I
- iota(dim) : multi_rank
- iota(dim) : type -> constant(0) : type [if type[dim] == 1]
- max(cst, X) -> max(X, cst)
- minimum(cst, X) -> minimum(X, cst)
- multiply(cst, X) -> multiply(X, cst)
- multiply(X, 0i) -> 0i
- multiply(X, 1i) -> X
- op(X : zero_extent_tensor) -> constant([])
- or(cst, X) -> or(X, cst)
- or(X, 0) -> X
- or(X, 1) -> 1
- pad(empty_tensor, _) -> broadcast_in_dim(empty_tensor, _)
- real(complex(R,I)) -> X
- real_dynamic_slice(X, start, limit, strides)
- real_dynamic_slice(X, start, limit, strides)
- reduce[A](_, _, fn:return A) -> A...
- reduce(empty_0, empty_1, ...) -> [broadcast_in_dim(empty_i)...]
- reduce(in_1, in_2, _, _) -> reduce(in_1, _, _) [if unused(in_2)]
- reduce(X..., dims=[], add) -> X...
- reshape(reshape(X, _), [shape]) -> reshape(X, [shape])
- reshape(X, [X.shape]) -> X
- select(broadcast(not(p)), t, f) => select(broadcast(p), f, t)
- select(not(p), t, f) => select(p, f, t)
- shape_of(dynamic_reshape(X, shape)) -> shape
- slice(concat(X,Y,Z,...),...) -> concat(slice(X),slice(Y),slice(Z))
- slice(X, [A:A], [B:B], ...) -> X
- sort(X) -> sort(X, dim = N) [when dim can be inferred]
- sort(X,Y) -> sort(X) [if Y unused and unused in comparator]
- subtract(X, 0) -> X
- subtract(X, X) -> 0
- transpose(X, [iota...]) -> X
- transpose(X, [no_mem_layout_change...]) -> reshape(X)
- tuple(get_tuple_element(X, 0), get_tuple_element(X, 1), ...) -> X
- while -> while (loop invariants as implicit captures)
- xor(cst, X) -> xor(X, cst)
- (+more)
This list is pulled from code comments so is not fully exhaustive, but has high coverage of the pass today.
Options
-fold-op-element-limit : Folding an op into a constant can sometimes come at the cost of memory overhead. (This occurs if the op's inputs are reused, meaning that they can't be deleted after the op is folded to a constant, or when folding operations like `iota` whose outputs take up more memory than their inputs.) In such cases, this config option sets an upper limit on how many elements an op's result may have before the op is no longer folded.
-stablehlo-target-independent-optimization
Runs canonicalizers, folders, and other target-independent optimizations.
Uses patterns from StablehloAggressiveSimplificationPass and StablehloAggressiveFolderPass together, allowing canonicalization and folding to be performed in the same pattern set, often leading to better results.
Users should prefer this pass to calling the others directly.
Options
-assume-no-undeclared-side-effects : Allow dead code to be eliminated in some situations (e.g. dead while loops) under the assumption that ops are pure unless declared with explicit MLIR `MemoryEffects`. Notably, this means `func.call` ops will be assumed pure.
-fold-op-element-limit : Folding an op into a constant can sometimes come at the cost of memory overhead. (This occurs if the op's inputs are reused, meaning that they can't be deleted after the op is folded to a constant, or when folding operations like `iota` whose outputs take up more memory than their inputs.) In such cases, this config option sets an upper limit on how many elements an op's result may have before the op is no longer folded.
-optimize-float : Allow float optimizations that, though mathematically equivalent, may result in slightly different quantization of floating-point values (e.g. `log(sqrt(x))` -> `0.5 * log(x)`). Float optimizations that can't affect numerical results are always enabled.