OpenXLA Project

`-sdy-close-shardings`

Closes tensor shardings and drops replicated axes.

`-sdy-drop-sharding-rules`

Drops OpShardingRuleAttr from all registered ops.

`-sdy-insert-explicit-reshards`

Inserts explicit reshards to make all operations have compatible shardings.

A compatible sharding essentially means that the operation can accept the sharded operands and produce a sharded result without requiring any reshard communications (note that the operation might still require communication such as all-reduce or halo-swaps).

After propagation, some operations may still have incompatible shardings.

Please note, when an axis (or sub-axis) is used to shard non-corresponding dimensions (e.g. non-contracting dimensions in matmul) across multiple tensors, or when an axis shards a dimension in one tensor but not the corresponding dimension in the other tensor, it is said that the operation has a sharding conflict. Hence, after this pass, the operations become conflict-free.

This pass injects reshard operations explicitly so that, for each operation, corresponding dimensions become sharded in the same way across all operands and results, and every axis (or sub-axis) can only be used to shard a single dimension type.

A clarifying example:

Input:

mesh = <"x"=4, "y"=2>
%lhs : tensor<8x32xf32> {sdy.sharding=<@mesh, \[{"y"},{"x"}\]>}
%rhs : tensor<32x16xf32> {sdy.sharding=<@mesh, \[{"y"}, {"x"}\]>}
stablehlo.dot %lhs, %rhs {sdy.sharding_per_value=<[<@mesh, \[{"x"}, {}\]>]>}
  : (tensor<8x32xf32>, tensor<32x16xf32>) -> tensor<8x16xf32>

Output:

sdy.mesh = <"x"=4, "y"=2>
%lhs : tensor<8x32xf32> {sdy.sharding=<@mesh, \[{"x"}, {"y"}\]>}
%rhs : tensor<32x16xf32> {sdy.sharding=<@mesh, \[{"y"}, {"x"}\]>}
%0 = sdy.reshard %rhs <@mesh, \[{"y"}, {}\]> : tensor<32x16xf32>
stablehlo.dot %lhs, %0 {sdy.sharding_per_value=<[<@mesh, \[{"x"}, {}\]>]>}
  : (tensor<8x32xf32>, tensor<32x16xf32>) -> tensor<8x16xf32>

In the example above, there is a conflict since lhs and rhs tensors are both sharded on axis "x" on their non-contracting dimensions. Here, rhs tensor is resharded, before the dot operation, explicitly to be sharded only on its first dimension and on axis "x". This way, the dot operation becomes compatible.

`-sdy-remove-sharding-groups`

Removes ShardingGroupOps after propagation.

`-sdy-sharding-constraint-to-reshard`

Converts ShardingConstraintOp into ReshardOp.

`-sdy-sink-data-flow-edges`

Sinks all DataFlowEdgeOp into their input.

Moves the sharding of each DataFlowEdgeOp to its input (the root target of the edge), and replaces the op with its input.

TODO(tomnatan): consider moving the sharding to all targets that can have a sharding attached.

`-sdy-update-non-divisible-input-output-shardings`

Makes FuncOp inputs/outputs evenly sharded, removing any need for padding due to non-divisible shardings.

Users of Shardy expect the function inputs/outputs to be evenly divisible/shardable to avoid requiring padding their tensors. Propagation may make inputs/outputs have non-divisible shardings, so this pass updates them to the largest dimension sharding prefix of the original sharding that is evenly sharded.