此页面由 Cloud Translation API 翻译。

StableHLO 规范

StableHLO 是机器学习 (ML) 模型中高级操作 (HLO) 的一种操作集。StableHLO 充当不同机器学习框架和机器学习编译器之间的可移植层：生成 StableHLO 程序的机器学习框架与使用 StableHLO 程序的机器学习编译器兼容。

我们的目标是在各种机器学习框架（例如 TensorFlow、JAX 和 PyTorch）与机器学习编译器（例如 XLA 和 IREE）之间建立更高的互操作性，从而简化和加速机器学习开发。为此，本文档提供了 StableHLO 编程语言的规范。

本规范包含三个主要部分。首先，程序部分介绍了 StableHLO 程序的结构，这些程序由 StableHLO 函数组成，而这些函数本身由 StableHLO 操作组成。在该结构中，Ops 部分指定各个操作的语义。执行部分提供了所有这些操作在程序中一起执行的语义。最后，表示法部分讨论了整个规范中使用的表示法。

计划

Program ::= {Func}

StableHLO 程序包含任意数量的 StableHLO 函数。下面是一个包含函数 @main 的示例程序，该函数有 3 个输入（%image、%weights 和 %bias）和 1 个输出。函数的正文包含 6 个操作。

func.func @main(
  %image: tensor<28x28xf32>,
  %weights: tensor<784x10xf32>,
  %bias: tensor<1x10xf32>
) -> tensor<1x10xf32> {
  %0 = "stablehlo.reshape"(%image) : (tensor<28x28xf32>) -> tensor<1x784xf32>
  %1 = "stablehlo.dot"(%0, %weights) : (tensor<1x784xf32>, tensor<784x10xf32>) -> tensor<1x10xf32>
  %2 = "stablehlo.add"(%1, %bias) : (tensor<1x10xf32>, tensor<1x10xf32>) -> tensor<1x10xf32>
  %3 = "stablehlo.constant"() { value = dense<0.0> : tensor<1x10xf32> } : () -> tensor<1x10xf32>
  %4 = "stablehlo.maximum"(%2, %3) : (tensor<1x10xf32>, tensor<1x10xf32>) -> tensor<1x10xf32>
  "func.return"(%4): (tensor<1x10xf32>) -> ()
}

函数

Func        ::= 'func' '.' 'func' FuncId FuncInputs FuncOutputs '{' FuncBody '}'
FuncInputs  ::= '(' [FuncInput {',' FuncInput}] `)`
FuncInput   ::= '%' ValueId ':' ValueType
FuncOutputs ::= ['->' FuncOutput, {',' FuncOutput}]
FuncOutput  ::= ValueType
FuncBody    ::= {Op}

StableHLO 函数（也称为“命名函数”）具有标识符、输入/输出和正文。未来，我们计划为函数引入其他元数据，以更好地与 HLO 兼容（#425、#626、#740、#744）。

标识符

FuncId  ::= '@' letter {letter | digit}
ValueId ::= '%' digit {digit}
          | '%' letter {letter | digit}
letter  ::= 'a' | ... | 'z' | 'A' | ... | 'Z' | '_'
digit   ::= '0' | ... | '9'

StableHLO 标识符与许多编程语言中的标识符类似，但有以下两个特点：1) 所有标识符都具有可区分不同种类标识符的签名；2) 值标识符可以是完全数字，以简化 StableHLO 程序的生成。

类型

Type         ::= ValueType | NonValueType
ValueType    ::= TensorType | QuantizedTensorType | TokenType | TupleType
NonValueType ::= TensorElementType | QuantizedTensorElementType | FunctionType | StringType

“StableHLO 类型”分为“值类型”（也称为“一级类型”），分别代表 StableHLO 值和描述其他程序元素的非值类型。StableHLO 类型与许多编程语言中的类型类似，其主要特性是 StableHLO 在特定领域的特性，这会导致一些异常结果（例如，标量类型不是值类型）。

TensorType ::= 'tensor' '<' Shape TensorElementType '>'
Shape ::= {DimensionSize 'x'}
DimensionSize ::= digit {digit}

张量类型表示张量，即多维数组。它们具有形状和元素类型，其中形状表示非负维度大小，按照从 0 到 R-1 编号的相应维度（也称为轴）的升序排列。R 的维度数量称为“秩”。例如，tensor<2x3xf32> 是形状为 2x3 且元素类型为 f32 的张量类型。它有两个维度（即两个轴），第 0 个维度和第 1 个维度，其尺寸分别为 2 和 3。其排名为 2。

这定义了对尺寸大小是静态已知的静态形状的支持。将来，我们还计划支持动态形状，其中尺寸大小部分或完全未知 (#8)。此外，我们还计划探索将张量类型扩展到维度大小和元素类型之外，例如，包括布局 (#629) 和稀疏性 (#1078)。

QuantizedTensorType ::= 'tensor' '<' Shape QuantizedTensorElementType '>'
QuantizedTensorElementType ::= '!quant.uniform' '<'
                  QuantizationStorageType
                  ['<' QuantizationStorageMin ':' QuantizationStorageMax '>']
                  ':' QuantizationExpressedType
                  [':' QuantizationDimension]
                  ',' QuantizationParameters '>'
QuantizationStorageType ::= IntegerType
QuantizationStorageMin ::= IntegerConstant
QuantizationStorageMax ::= IntegerConstant
QuantizationExpressedType ::= FloatType
QuantizationDimension ::= IntegerConstant
QuantizationParameters ::= QuantizationParameter
                         | '{' QuantizationParameter {',' QuantizationParameter} '}'
QuantizationParameter ::= QuantizationScale ':' QuantizationZeroPoint
QuantizationScale ::= FloatConstant
QuantizationZeroPoint ::= IntegerConstant

名称	类型	限制条件
`storage_type`	整数类型	(C1-C4)、(C9)
`storage_min`	整数常量	(C2)、(C4)、(C8)
`storage_max`	整数常量	(C3)、(C4)、(C8)
`expressed_type`	浮点类型	(C1)、(C5)
`quantization_dimension`	可选整数常量	(C11-C13)
`scales`	可变数的浮点数常量	(C5-C7)、(C10)、(C11)、(C13)
`zero_points`	整数常量的不同数	(C8-C10)

量化元素类型表示介于 storage_min 到 storage_max（含）之间的存储类型整数值，对应于所表达类型的浮点值。对于给定的整数值 i，相应的浮点值 f 可计算为 f = (i - zero_point) * scale，其中 scale 和 zero_point 称为量化参数。storage_min 和 storage_max 在语法中是可选的，但它们的默认值分别为 min_value(storage_type) 和 max_value(storage_type)。量化元素类型具有以下限制：

(C1) num_bits(storage_type) < num_bits(expressed_type)。
(C2) type(storage_min) = storage_type。
(C3) type(storage_max) = storage_type。
(C4) min_value(storage_type) <= storage_min < storage_max <= max_value(storage_type)。
(C5) type(scales...) = expressed_type。
(C6) 0 < scales。
(C7) is_finite(scales...)。
(C8) storage_min <= zero_points <= storage_max。
(C9) type(zero_points...) = storage_type。
(C10) size(scales) = size(zero_points)。
(C11) 如果 is_empty(quantization_dimension)，则 size(scales) = 1。
(C12) 0 <= quantization_dimension。

目前，QuantizationScale 是一个浮点常量，但对基于整数的比例（以乘数和偏移表示）非常关注。我们计划在不久的将来对此进行探索 (#1404)。

目前正在讨论 QuantizationZeroPoint 的语义，包括类型、值，以及量化张量类型中是只有一个零点还是可能有多个零点。根据本次讨论的结果，零点左右的规范将来可能会发生变化 (#1405)。

另一个正在进行的讨论涉及 QuantizationStorageMin 和 QuantizationStorageMax 的语义，以确定是否应对这些值和量化张量的值施加任何约束 (#1406)。

最后，我们计划探索未知比例和零点的表示方式，与我们计划如何表示未知维度大小的方式类似 (#1407)。

量化张量类型表示具有量化元素的张量。这些张量与常规张量完全相同，只不过它们的元素类型是量化元素类型，而不是常规元素类型。

在量化张量中，量化可以是“每个张量”，也就是说，整个张量有一个 scale 和 zero_point，“每个轴”可以是“每个轴”，也就是说，有多个 scales 和 zero_points，每个特定维度的切片有一对 quantization_dimension。更正式地说，在按轴量化的张量 t 中，有 quantization_dimension 的 dim(t, quantization_dimension) 切片：t[:, ..., 0, ..., :], t[:, ..., 1, ..., :] 等。i 切片中的所有元素都使用 scales[i] 和 zero_points[i] 作为其量化参数。量化张量类型具有以下限制：

对于每张量量化：
- 没有其他限制条件。
对于按轴量化：
- (C12) quantization_dimension < rank(self)。
- (C13) dim(self, quantization_dimension) = size(scales)。

TokenType ::= 'token'

令牌类型表示令牌，即某些操作生成和使用的不透明值。令牌用于对操作强制执行执行顺序（如执行部分中所述）。

TupleType ::= 'tuple' '<' TupleElementTypes '>'
TupleElementTypes ::= [ValueType {',' ValueType}]

元组类型表示元组，即异构列表。元组是一项旧版功能，仅用于与 HLO 兼容。在 HLO 中，元组用于表示可变输入和输出。在 StableHLO 中，系统原生支持可变输入和输出，而 StableHLO 中唯一使用的元组是全面表示 HLO ABI，其中 T、tuple<T> 和 tuple<tuple<T>> 等可能因特定实现而存在显著差异。未来，我们计划更改 HLO ABI，或许能够从 StableHLO 中移除元组类型 (#598)。

TensorElementType ::= BooleanType | IntegerType | FloatType | ComplexType
BooleanType ::= 'i1'
IntegerType ::= SignedIntegerType | UnsignedIntegerType
SignedIntegerType ::= 'si4' | 'si8' | 'si16' | 'si32' | 'si64'
UnsignedIntegerType ::= 'ui4' | 'ui8' | 'ui16' | 'ui32' | 'ui64'
FloatType ::= 'f8E4M3FN' | 'f8E5M2' | 'f8E4M3FNUZ' | 'f8E5M2FNUZ'
            | 'f8E4M3B11FNUZ' | 'bf16' | 'f16' | 'f32' | 'f64'
ComplexType ::= 'complex' '<' ComplexElementType '>'
ComplexElementType ::= 'f32' | 'f64'

元素类型表示张量类型的元素。与许多编程语言不同，这些类型在 StableHLO 中不是顶级。这意味着，StableHLO 程序无法直接表示这些类型的值（因此，习惯使用 tensor<T> 类型的 0 维张量值表示 T 类型的标量值）。

布尔值类型表示布尔值 true 和 false。
整数类型可以是有符号 (si) 或无符号 (ui)，并且具有支持的位宽度（4、8、16、32 或 64）之一。有符号 siN 类型表示 -2^(N-1) 到 2^(N-1)-1（含）之间的整数值，无符号 uiN 类型表示从 0 到 2^N-1（含）的整数值。
浮点类型可以是以下其中一项：
- f8E4M3FN 和 f8E5M2 类型，分别对应于适用于深度学习的 FP8 格式中所述的 FP8 格式的 E4M3 和 E5M2 编码。
- f8E4M3FNUZ 和 f8E5M2FNUZ 类型，对应于深度神经网络的 8 位数值格式中所述的 FP8 格式的 E4M3 和 E5M2 编码。
- f8E4M3B11FNUZ 类型，对应于深度神经网络的混合式 8 位浮点 (HFP8) 训练和推断中所述的 FP8 格式的 E4M3 编码。
- bf16 类型，对应于 BFloat16：Cloud TPU 上高性能的秘诀中所述的 bfloat16 格式。
- f16、f32 和 f64 类型分别对应于 IEEE 754 标准中所述的 binary16（“半精度”）、binary32（“单精度”）和 binary64（“双精度”）格式。
复杂类型表示同时具有同一元素类型的实部和虚部的复杂值。支持的复杂类型包括 complex<f32>（两个部分均为 f32 类型）和 complex<f64>（两个部分均为 f64 类型）。

FunctionType ::= '(' InputTypes ')' '->' '(' OutputTypes ')'
InputTypes ::= [ValueType {',' ValueType}]
OutputTypes ::= [ValueType {',' ValueType}]

函数类型同时表示已命名函数和匿名函数。它们具有输入类型（-> 左侧的类型列表）和输出类型（-> 右侧的类型列表）。在许多编程语言中，函数类型是第一类，但未在 StableHLO 中。

StringType ::= 'string'

字符串类型表示字节序列。与许多编程语言不同，字符串类型不是 StableHLO 中的第一类，仅用于为程序元素指定静态元数据。

运维

“StableHLO 运算”（也称为“运算”）代表机器学习模型中一组封闭的高级运算。如上所述，StableHLO 语法深受 MLIR 启发，后者不一定是最符合人体工程学的替代方案，但可以说最适合 StableHLO 的目标是在机器学习框架和机器学习编译器之间提高互操作性。

Op            ::= [OpOutputs] OpName OpInputs ':' OpSignature
OpName        ::= '"' 'stablehlo' '.' OpMnemonic '"'
OpMnemonic    ::= 'abs' | 'add' | ...

StableHLO 操作（也称为 ops）具有名称、输入/输出和签名。名称由 stablehlo. 前缀和一个助记符组成，用于唯一标识一项受支持的操作。如需查看所有受支持操作的完整列表，请参阅下文。

目前，实际的 StableHLO 程序有时包含本文档中未说明的操作。未来，我们计划将这些运算吸纳到 StableHLO 运算集中，或者禁止它们出现在 StableHLO 程序中。同时，下面列出了这些操作：

builtin.module、func.func、func.call 和 func.return (#425)。
chlo 操作 (#602)。
StableHLO 操作中的“Not in HLO”类别 - 它们最初是 StableHLO 运算集的一部分，但后来被认为不适合它：broadcast、create_token、cross-replica-sum、dot、einsum、torch_index_select、unary_einsum（#3）。
StableHLO 操作的“动态主义”类别 - 它们是从 MHLO 引导而来，但我们尚未确定它们：compute_reshape_shape、cstr_reshapable、dynamic_broadcast_in_dim、dynamic_conv、dynamic_gather、dynamic_iota、dynamic_pad、dynamic_reshape、real_dynamic_slice、set_dimension_size（#8）。
形状计算，包括 arith、shape 和 tensor 操作 (#8)。

OpInputs        ::= OpInputValues OpInputFuncs OpInputAttrs
OpInputValues   ::= '(' [OpInputValue {',' OpInputValue}] ')'
OpInputValue    ::= ValueId
OpInputFuncs    ::= ['(' OpInputFunc {',' OpInputFunc} ')']
OpInputAttrs    ::= ['{' OpInputAttr {',' OpInputAttr} '}']
OpOutputs       ::= [OpOutput {',' OpOutput} '=']
OpOutput        ::= ValueId

操作会使用输入并生成输出。输入分为输入值（在执行期间计算）、输入函数（以静态方式提供，因为在 StableHLO 函数中不是顶级值）和输入属性（也以静态方式提供）。操作使用和生成的输入和输出类型取决于其助记符。例如，add 操作使用 2 个输入值并生成 1 个输出值。相比之下，select_and_scatter 操作使用 3 个输入值、2 个输入函数和 3 个输入属性。

OpInputFunc ::= '{' Unused FuncInputs ':' FuncBody '}'
Unused      ::= '^' digit {digit}
              | '^' letter {letter | digit}

输入函数（也称为“匿名函数”）与命名函数非常相似，只不过以下几点：1) 它们没有标识符（因此称为“匿名函数”）；2) 它们不声明输出类型（输出类型是通过函数内的 return 操作推断出来的）。

输入函数的语法包含当前未使用的部分（请参阅上文的 Unused 产生式），该部分是为了与 MLIR 兼容。在 MLIR 中，有一个更常见的“区域”概念，“区域”可以有多个通过跳跃操作连接在一起的操作“块”。这些块具有与 Unused 产生式对应的 ID，以便可以区分开。StableHLO 没有跳转操作，因此未使用 MLIR 语法的相应部分（但仍然存在）。

OpInputAttr      ::= OpInputAttrName '=' OpInputAttrValue
OpInputAttrName  ::= letter {letter | digit}
OpInputAttrValue ::= Constant

输入属性具有名称和值，该值是一个受支持的常量。它们是为节目元素指定静态元数据的主要方式。例如，concatenate 操作使用 dimension 属性来指定用于串联其输入值的维度。同样，slice 操作会使用 start_indices 和 limit_indices 等多个属性来指定用于对输入值进行切片的边界。

目前，实际存在的 StableHLO 程序有时包含本文档未介绍的属性。将来，我们计划将这些属性纳入 StableHLO 运算集，或禁止它们出现在 StableHLO 程序中。同时，下面列出了这些属性：

layout (#629)。
mhlo.frontend_attributes (#628)。
mhlo.sharding (#619)。
output_operand_aliases (#740)。
位置元数据 (#594)。

OpSignature ::= '(' [ValueType {',' ValueType}] ')' '->' '(' [ValueType {',' ValueType}] ')'

操作签名包含所有输入值的类型（-> 左侧的类型列表）和所有输出值的类型（-> 右侧的类型列表）。严格来说，输入类型是冗余的，而输出类型几乎总是冗余的（因为对于大多数 StableHLO 操作，输出类型可以从输入推断出来）。不过，为了与 MLIR 兼容，操作签名特意成为 StableHLO 语法的一部分。

以下是一个助记符为 select_and_scatter 的操作示例。它会使用 3 个输入值（%operand、%source 和 %init_value）、2 个输入函数和 3 个输入属性（window_dimensions、window_strides 和 padding）。请注意，该操作的签名仅包含其输入值的类型（而不是以内嵌方式提供的输入函数和属性的类型）。

%result = "stablehlo.select_and_scatter"(%operand, %source, %init_value) ({
  ^bb0(%arg0: tensor<i32>, %arg1: tensor<i32>):
    %0 = "stablehlo.compare"(%arg0, %arg1) {
      comparison_direction = #stablehlo<comparison_direction GE>
    } : (tensor<i32>, tensor<i32>) -> tensor<i1>
    "stablehlo.return"(%0) : (tensor<i1>) -> ()
}, {
  ^bb0(%arg0: tensor<i32>, %arg1: tensor<i32>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i32>, tensor<i32>) -> tensor<i32>
    "stablehlo.return"(%0) : (tensor<i32>) -> ()
}) {
  window_dimensions = dense<[3, 1]> : tensor<2xi64>,
  window_strides = dense<[2, 1]> : tensor<2xi64>,
  padding = dense<[[0, 1], [0, 0]]> : tensor<2x2xi64>
} : (tensor<4x2xi32>, tensor<2x2xi32>, tensor<i32>) -> tensor<4x2xi32>

常量

Constant ::= BooleanConstant
           | IntegerConstant
           | FloatConstant
           | ComplexConstant
           | TensorConstant
           | QuantizedTensorConstant
           | StringConstant
           | EnumConstant

StableHLO 常量具有字面量和类型，它们共同表示 StableHLO 值。通常，类型是常量语法的一部分，除非它不明确（例如，布尔常量明确具有 i1 类型，而整数常量可以有多种可能的类型）。

BooleanConstant ::= BooleanLiteral
BooleanLiteral  ::= 'true' | 'false'

布尔值常量表示布尔值 true 和 false。布尔值常量具有 i1 类型。

IntegerConstant   ::= IntegerLiteral ':' IntegerType
IntegerLiteral    ::= ['-' | '+'] DecimalDigits
                    | ['-' | '+'] '0x' HexadecimalDigits
DecimalDigits     ::= decimalDigit {decimalDigit}
HexadecimalDigits ::= hexadecimalDigit {hexadecimalDigit}
decimalDigit      ::= '0' | ... | '9'
hexadecimalDigit  ::= decimalDigit | 'a' | ... | 'f' | 'A' | ... | 'F'

整数常量通过使用十进制或十六进制表示法的字符串来表示整数值。不支持其他基数，例如二进制或八进制。整数常量具有以下限制：

(C1) is_wellformed(integer_literal, integer_type)。

FloatConstant  ::= FloatLiteral ':' FloatType
FloatLiteral   ::= SignPart IntegerPart FractionalPart ScientificPart
                 | '0x' [HexadecimalDigits]
SignPart       ::= ['-' | '+']
IntegerPart    ::= DecimalDigits
FractionalPart ::= ['.' [DecimalDigits]]
ScientificPart ::= [('e' | 'E') ['-' | '+'] DecimalDigits]

浮点常量通过使用十进制或科学计数法的字符串来表示浮点值。此外，还可使用十六进制记数法以相应类型的浮点格式直接指定底层位。浮点常量具有以下限制：

(C1) 如果使用非十六进制表示法，则为 is_wellformed(float_literal, float_type)。
(C2) 如果使用十六进制表示法，则为 size(hexadecimal_digits) = num_bits(float_type) / 4。

ComplexConstant ::= ComplexLiteral ':' ComplexType
ComplexLiteral  ::= '(' RealPart ',' ImaginaryPart ')'
RealPart        ::= FloatLiteral
ImaginaryPart   ::= FloatLiteral

复数常量表示使用实数部分（最先）和虚部（第二部分）的列表来表示复值。例如，(1.0, 0.0) : complex<f32> 表示 1.0 + 0.0i，(0.0, 1.0) : complex<f32> 表示 0.0 + 1.0i。这些部分随后在内存中的存储顺序由实现定义。复杂常量具有以下限制：

(C1) is_wellformed(real_part, complex_element_type(complex_type))。
(C2) is_wellformed(imaginary_part, complex_element_type(complex_type))。

TensorConstant ::= TensorLiteral ':' TensorType
TensorLiteral  ::= 'dense' '<' (DenseLiteral | ElementLiteral) '>'
DenseLiteral   ::= DenseDimension | DenseElements
DenseDimension ::= '[' [DenseLiteral {',' DenseLiteral}] ']'
DenseElements  ::= [ElementLiteral {',' ElementLiteral}]
ElementLiteral ::= BooleanLiteral | IntegerLiteral | FloatLiteral | ComplexLiteral

张量常量使用通过 NumPy 表示法指定的嵌套列表表示张量值。例如，dense<[[1, 2, 3], [4, 5, 6]]> : tensor<2x3xi32> 表示一个张量值，从索引到元素的以下映射关系分别为：{0, 0} => 1、{0, 1} => 2、{0, 2} => 3、{1, 0} => 4、{1, 1} => 5、{1, 2} => 6。这些元素在内存中的存储顺序由实现定义。张量常量具有以下限制：

(C1) has_syntax(tensor_literal, element_type(tensor_type))，其中：
- has_syntax(element_literal: Syntax, element_type: Type) = is_wellformed(element_literal, type).
- has_syntax(tensor_literal: List, element_type: Type) = has_syntax(tensor_literal..., element_type).
(C2) has_shape(tensor_literal, shape(tensor_type))，其中：
- has_shape(element_literal: Syntax, []) = true.
- has_shape(tensor_literal: List, shape: List) = size(tensor_literal) = shape[0] and has_shape(tensor_literal..., shape[1:]).
- 否则为 false。

QuantizedTensorConstant ::= QuantizedTensorLiteral ':' QuantizedTensorType
QuantizedTensorLiteral  ::= 'dense' '<' (DenseLiteral | ElementLiteral) '>'

量化张量常量使用与张量常量相同的表示法表示量化张量值，并将元素指定为其存储类型的常量。量化张量常量具有以下限制：

(C1) has_syntax(quantized_tensor_literal, storage_type(quantized_tensor_type))。
(C2) has_shape(quantized_tensor_literal, shape(quantized_tensor_type))。

StringConstant  ::= StringLiteral
StringLiteral   ::= '"' {stringCharacter | escapeSequence} '"'
stringCharacter ::= all ASCII characters except '\00', '\01', ... '\1f' and '"'
escapeSequence  ::= '\' ('"' | '\' | 'n' | 't' | (hexadecimalDigit hexadecimalDigit))

字符串字面量由使用 ASCII 字符和转义序列指定的字节组成。它们与编码无关，因此对这些字节的解释是由实现定义的。字符串字面量的类型是 string。

Ops Agent 可以

abs

语义

对 operand 张量执行元素级抽象运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于有符号整数：整数模。
对于浮点数：IEEE-754 中的 abs。
对于复数：复数模。
对于量化类型：dequantize_op_quantize(abs, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	带符号整数、浮点或复杂类型的张量，或每个张量量化张量	(C1-C2)

输出

名称	类型	限制条件
`result`	带符号整数/浮点类型的张量或每个张量量化张量的张量	(C1-C2)

限制条件

(C1) shape(result) = shape(operand)。
(C2) baseline_element_type(result) 的定义如下：
- 如果 is_complex(operand)，则为 complex_element_type(element_type(operand))。
- 否则为 baseline_element_type(operand)。

示例

// %operand: [-2, 0, 2]
%result = "stablehlo.abs"(%operand) : (tensor<3xi32>) -> tensor<3xi32>
// %result: [2, 0, 2]

更多示例

add

语义

执行两个张量 lhs 和 rhs 的元素级加法，并生成一个 result 张量。根据元素类型，执行以下操作：

对于布尔值：逻辑 OR。
对于整数：整数加法。
对于浮点数：IEEE-754 中的 addition。
适用于复数：复数加法。
对于量化类型：dequantize_op_quantize(add, lhs, rhs, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	或每张量量化张量	(C1)
（I2）	`rhs`	或每张量量化张量	(C1)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)

限制条件

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

示例

// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.add"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[6, 8], [10, 12]]

更多示例

after_all

语义

确保生成 inputs 的操作先于任何依赖于 result 的操作执行。执行此操作不会产生任何影响，它仅用于建立从 result 到 inputs 的数据依赖关系。

输入内容

标签	名称	类型
（I1）	`inputs`	`token` 的可变数

输出

名称	类型
`result`	`token`

示例

// %input0: !stablehlo.token
// %input1: !stablehlo.token
%result = "stablehlo.after_all"(%input0, %input1) : (!stablehlo.token, !stablehlo.token) -> !stablehlo.token

更多示例

all_gather

语义

在 StableHLO 进程网格中的每个进程组内，沿着 all_gather_dim 串联每个进程的 operand 张量值，并生成一个 result 张量。

该操作将 StableHLO 进程网格拆分为 process_groups，其定义如下：

如果 channel_id <= 0 and use_global_device_ids = false，则为 cross_replica(replica_groups)。
如果 channel_id > 0 and use_global_device_ids = false，则为 cross_replica_and_partition(replica_groups)。
如果 channel_id > 0 and use_global_device_ids = true，则为 flattened_ids(replica_groups)。

之后，在每个 process_group 中：

针对 process_group 中的所有 receiver 的 operands@receiver = [operand@sender for sender in process_group] 权限。
针对 process_group 中的所有 process 的 result@process = concatenate(operands@process, all_gather_dim) 权限。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1)、(C6)
（I2）	`all_gather_dim`	`si64` 类型的常量	(C1)、(C6)
(I3)	`replica_groups`	`si64` 类型的二维张量常数	(C2-C4)
（I4）	`channel_id`	`si64` 类型的常量	(C5)
(I5)	`use_global_device_ids`	`i1` 类型的常量	(C5)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C6)

限制条件

(C1) 0 <= all_gather_dim < rank(operand)。
(C2) is_unique(replica_groups)。
(C3) size(replica_groups) 的定义如下：
- 如果使用 cross_replica，则为 num_replicas。
- 如果使用 cross_replica_and_partition，则为 num_replicas。
- 如果使用 flattened_ids，则为 num_processes。
(C4) 0 <= replica_groups < size(replica_groups)。
(C5) 如果 use_global_device_ids = true，则 channel_id > 0。
(C6) type(result) = type(operand)，但以下情况除外：
- dim(result, all_gather_dim) = dim(operand, all_gather_dim) * dim(process_groups, 1).

示例

// num_replicas: 2
// num_partitions: 1
// %operand@(0, 0): [[1, 2], [3, 4]]
// %operand@(1, 0): [[5, 6], [7, 8]]
%result = "stablehlo.all_gather"(%operand) {
  all_gather_dim = 1 : i64,
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>,
  // channel_id = 0
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
  // use_global_device_ids = false
} : (tensor<2x2xi64>) -> tensor<2x4xi64>
// %result@(0, 0): [[1, 2, 5, 6], [3, 4, 7, 8]]
// %result@(1, 0): [[1, 2, 5, 6], [3, 4, 7, 8]]

更多示例

all_reduce

语义

在 StableHLO 进程网格中的每个进程组内，将归约函数 computation 应用于每个进程的 operand 张量值，并生成 result 张量。

该操作将 StableHLO 进程网格拆分为 process_groups，其定义如下：

如果 channel_id <= 0 and use_global_device_ids = false，则为 cross_replica(replica_groups)。
如果 channel_id > 0 and use_global_device_ids = false，则为 cross_replica_and_partition(replica_groups)。
如果 channel_id > 0 and use_global_device_ids = true，则为 flattened_ids(replica_groups)。

之后，在每个 process_group 中：

result@process[result_index] = exec(schedule)，适用于某些二元树 schedule，其中：
- exec(node) = computation(exec(node.left), exec(node.right))。
- exec(leaf) = leaf.value。
schedule 是一种由实现定义的二元树，其有序遍历为 to_destination_type(operands@process_group...[result_index], type(func_inputs(computation)[0]))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C5)、(C6)
（I2）	`replica_groups`	`si64` 类型的一维张量常量的变数	(C1-C3)
(I3)	`channel_id`	`si64` 类型的常量	(C4)
（I4）	`use_global_device_ids`	`i1` 类型的常量	(C4)
(I5)	`computation`	function	(C5)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C6-C7)

限制条件

(C1) is_unique(replica_groups)。
(C2) size(replica_groups) 的定义如下：
- 如果使用 cross_replica，则为 num_replicas。
- 如果使用 cross_replica_and_partition，则为 num_replicas。
- 如果使用 flattened_ids，则为 num_processes。
(C3) 0 <= replica_groups < size(replica_groups)。
(C4) 如果 use_global_device_ids = true，则 channel_id > 0。
(C5) computation 的类型为 (tensor<E>, tensor<E>) -> (tensor<E>)，其中 is_promotable(element_type(operand), E)。
(C6) shape(result) = shape(operand)。
(C7) element_type(result) = E。

示例

// num_replicas: 2
// num_partitions: 1
// %operand@(0, 0): [1, 2, 3, 4]
// %operand@(1, 0): [5, 6, 7, 8]
%result = "stablehlo.all_reduce"(%operand) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor<i64>) -> tensor<i64>
// %result@(0, 0): [6, 8, 10, 12]
// %result@(1, 0): [6, 8, 10, 12]

更多示例

all_to_all

语义

在 StableHLO 进程网格中的每个进程组内，将 operand 张量的值沿 split_dimension 拆分为多个部分，将拆分的部分分散到各个进程之间，沿着 concat_dimension 串联分散的部分并生成一个 result 张量。

该操作将 StableHLO 进程网格拆分为 process_groups，其定义如下：

如果 channel_id <= 0，则为 cross_replica(replica_groups)。
如果 channel_id > 0，则为 cross_partition(replica_groups)。

之后，在每个 process_group 中：

针对 process_group 中的所有 sender 使用 split_parts@sender = split(operand@sender, split_count, split_dimension)。
scattered_parts@receiver = [split_parts@sender[receiver_index] for sender in process_group]，其中 receiver_index = process_group.index(receiver)。
result@process = concatenate(scattered_parts@process, concat_dimension).

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1-C3)、(C9)
（I2）	`split_dimension`	`si64` 类型的常量	(C1)、(C2)、(C9)
(I3)	`concat_dimension`	`si64` 类型的常量	(C3)、(C9)
（I4）	`split_count`	`si64` 类型的常量	(C2)、(C4)、(C8)、(C9)
(I5)	`replica_groups`	`si64` 类型的二维张量常数	(C5-C8)
(I6)	`channel_id`	`si64` 类型的常量

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C9)

限制条件

(C1) 0 <= split_dimension < rank(operand)。
(C2) dim(operand, split_dimension) % split_count = 0。
(C3) 0 <= concat_dimension < rank(operand)。
(C4) 0 < split_count。
(C5) is_unique(replica_groups)。
(C6) size(replica_groups) 的定义如下：
- 如果使用 cross_replica，则为 num_replicas。
- 如果使用 cross_partition，则为 num_partitions。
(C7) 0 <= replica_groups < size(replica_groups)。
(C8) dim(replica_groups, 1) = split_count。
(C9) type(result) = type(operand)，但以下情况除外：
- dim(result, split_dimension) = dim(operand, split_dimension) / split_count.
- dim(result, concat_dimension) = dim(operand, concat_dimension) * split_count.

示例

// num_replicas: 2
// num_partitions: 1
// %operand@(0, 0): [[1, 2, 3, 4],
//                   [5, 6, 7, 8]]
// %operand@(1, 0): [[9, 10, 11, 12],
//                   [13, 14, 15, 16]]
%result = "stablehlo.all_to_all"(%operand) {
  split_dimension = 1 : i64,
  concat_dimension = 0 : i64,
  split_count = 2 : i64,
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>
} : (tensor<2x4xi64>) -> tensor<4x2xi64>
// %result@(0, 0): [[1, 2],
//                  [5, 6],
//                  [9, 10],
//                  [13, 14]]
// %result@(1, 0): [[3, 4],
//                  [7, 8],
//                  [11, 12],
//                  [15, 16]]

更多示例

和

语义

对两个张量 lhs 和 rhs 执行元素级 AND，并生成一个 result 张量。根据元素类型，执行以下操作：

对于布尔值：逻辑 AND。
对于整数：按位 AND。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	布尔值或整数类型的张量	(C1)
（I2）	`rhs`	布尔值或整数类型的张量	(C1)

输出

名称	类型	限制条件
`result`	布尔值或整数类型的张量	(C1)

限制条件

(C1) type(lhs) = type(rhs) = type(result)。

示例

// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.and"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[1, 2], [3, 0]]

atan2

语义

对 lhs 和 rhs 张量执行元素级 atan2 运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 atan2。
对于复数：复数 atan2。
对于量化类型：dequantize_op_quantize(atan2, lhs, rhs, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	浮点或复杂类型或每个张量量化张量的张量	(C1)
（I2）	`rhs`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

示例

// %lhs: [0.0, 1.0, -1.0]
// %rhs: [0.0, 0.0, 0.0]
%result = "stablehlo.atan2"(%lhs, %rhs) : (tensor<3xf64>, tensor<3xf64>) -> tensor<3xf64>
// %result: [0.0, 1.57079637, -1.57079637] // [0.0, pi/2, -pi/2]

更多示例

batch_norm_grad

语义

计算从 grad_output 向外传播的 batch_norm_training 的多个输入的梯度，并生成 grad_operand、grad_scale 和 grad_offset 张量。更正式地说，此操作可以使用 Python 语法表示为对现有 StableHLO 操作的分解，如下所示：

def compute_sum(operand, feature_index):
  (sum,) = reduce(
      inputs=[operand],
      init_values=[constant(0, element_type(operand))],
      dimensions=[i for i in range(rank(operand)) if i != feature_index],
      body=lambda x, y: add(x, y))
  return sum

def compute_mean(operand, feature_index):
  sum = compute_sum(operand, feature_index)
  divisor = constant(size(operand) / dim(operand, feature_index),
                     element_type(operand))
  divisor_bcast = broadcast_in_dim(divisor, [], type(sum))
  return divide(sum, divisor_bcast)

def batch_norm_grad(operand, scale, mean, variance, grad_output, epsilon, feature_index):
  # Broadcast inputs to type(operand)
  scale_bcast = broadcast_in_dim(scale, [feature_index], type(operand))
  mean_bcast = broadcast_in_dim(mean, [feature_index], type(operand))
  variance_bcast = broadcast_in_dim(variance, [feature_index], type(operand))
  epsilon_bcast = broadcast_in_dim(constant(epsilon, element_type(operand)), [],
                                   type(operand))

  # Perform normalization using the provided `mean` and `variance`
  # Intermediate values will be useful for computing gradients
  centered_operand = subtract(operand, mean_bcast)
  stddev = sqrt(add(variance_bcast, epsilon_bcast))
  normalized_operand = divide(centered_operand, stddev)

  # Use the implementation from batchnorm_expander.cc in XLA
  # Temporary variables have exactly the same names as in the C++ code
  elements_per_feature = broadcast_in_dim(
      constant(divide(size(operand), dim(operand, feature_index)),
               element_type(grad_output)),
      [], type(operand))
  i1 = multiply(grad_output, elements_per_feature)
  i2 = broadcast_in_dim(
      compute_sum(grad_output, feature_index), [feature_index], type(operand))
  i3 = broadcast_in_dim(
      compute_sum(multiply(grad_output, centered_operand), feature_index),
      [feature_index], type(operand))
  i4 = multiply(i3, centered_operand)
  i5 = divide(i4, add(variance_bcast, epsilon_bcast))
  i6 = subtract(subtract(i1, i2), i5)

  grad_operand =
      multiply(divide(divide(scale_bcast, stddev), elements_per_feature), i6)
  grad_scale =
      compute_sum(multiply(grad_output, normalized_operand), feature_index)
  grad_offset = compute_sum(grad_output, feature_index)

  return grad_operand, grad_scale, grad_offset

对于量化类型，请执行 dequantize_batch_norm_grad_or_training_quantize(lambda operand, scale, mean, variance, grad_output: batch_norm_grad(operand, scale, mean, variance, grad_output, epsilon, feature_index), operand, scale, mean, variance, grad_output, type(grad_operand), type(grad_scale), type(feature_index))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点类型或每个张量量化张量的张量	(C1-C3)、(C5)
（I2）	`scale`	浮点或每张量量化类型的一维张量	(C2)、(C4)、(C5)
(I3)	`mean`	浮点或每张量量化类型的一维张量	(C2)、(C4)
（I4）	`variance`	浮点或每张量量化类型的一维张量	(C2)、(C4)
(I5)	`grad_output`	浮点类型或每个张量量化张量的张量	(C2)、(C3)
(I6)	`epsilon`	`f32` 类型的常量
（I7）	`feature_index`	`si64` 类型的常量	(C1)、(C5)

输出

名称	类型	限制条件
`grad_operand`	浮点类型或每个张量量化张量的张量	(C2)、(C3)
`grad_scale`	浮点或每张量量化类型的一维张量	(C2)、(C4)
`grad_offset`	浮点或每张量量化类型的一维张量	(C2)、(C4)

限制条件

(C1) 0 <= feature_index < rank(operand)。
(C2) operand、scale、mean、variance、grad_output、grad_operand、grad_scale 和 grad_offset 具有相同的 baseline_element_type。
(C3) operand、grad_output 和 grad_operand 具有相同的形状。
(C4) scale、mean、variance、grad_scale 和 grad_offset 具有相同的形状。
(C5) size(scale) = dim(operand, feature_index)。

示例

// %operand: [
//            [[1.0, 2.0], [3.0, 4.0]],
//            [[3.0, 4.0], [1.0, 2.0]]
//           ]
// %scale: [1.0, 1.0]
// %mean: [2.0, 3.0]
// %variance: [1.0, 1.0]
// %grad_output: [
//                [[0.1, 0.1], [0.1, 0.1]],
//                [[0.1, 0.1], [0.1, 0.1]]
//               ]
%grad_operand, %grad_scale, %grad_offset =
"stablehlo.batch_norm_grad"(%operand, %scale, %mean, %variance, %grad_output) {
  epsilon = 0.0 : f32,
  feature_index = 2 : i64
} : (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>, tensor<2xf64>,
     tensor<2x2x2xf64>) -> (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>)
// %grad_operand: [
//                 [[0.0, 0.0], [0.0, 0.0]],
//                 [[0.0, 0.0], [0.0, 0.0]]
//                ]
// %grad_scale:  [0.0, 0.0]
// %grad_offset: [0.4, 0.4]

batch_norm_inference

语义

在除 feature_index 维度以外的所有维度上归一化 operand 张量，并生成 result 张量。更正式地说，此操作可以使用 Python 语法表示为对现有 StableHLO 操作的分解，如下所示：

def batch_norm_inference(operand, scale, offset, mean, variance, epsilon, feature_index):
  # Broadcast inputs to shape(operand)
  scale_bcast = broadcast_in_dim(scale, [feature_index], type(operand))
  offset_bcast = broadcast_in_dim(offset, [feature_index], type(operand))
  mean_bcast = broadcast_in_dim(mean, [feature_index], type(operand))
  variance_bcast = broadcast_in_dim(variance, [feature_index], type(operand))
  epsilon_bcast = broadcast_in_dim(constant(epsilon, element_type(operand)), [],
                                   type(operand))

  # Perform normalization using the provided `mean` and `variance` instead of
  # computing them like `batch_norm_training` does.
  centered_operand = subtract(operand, mean_bcast)
  stddev = sqrt(add(variance_bcast, epsilon_bcast))
  normalized_operand = divide(centered_operand, stddev)
  return add(multiply(scale_bcast, normalized_operand), offset_bcast)

对于量化类型，请执行 dequantize_op_quantize(lambda operand, scale, offset, mean, variance: batch_norm_inference(operand, scale, offset, mean, variance, epsilon, feature_index), operand, scale, offset, mean, variance, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点类型或每个张量量化张量的张量	(C1-C7)
（I2）	`scale`	浮点或每张量量化类型的一维张量	(C2)、(C3)
(I3)	`offset`	浮点或每张量量化类型的一维张量	(C2)、(C4)
（I4）	`mean`	浮点或每张量量化类型的一维张量	(C5)
(I5)	`variance`	浮点或每张量量化类型的一维张量	(C2)、(C6)
(I6)	`epsilon`	`f32` 类型的常量
（I7）	`feature_index`	`si64` 类型的常量	(C1)、(C3-C6)

输出

名称	类型	限制条件
`result`	浮点类型或每个张量量化张量的张量	(C2)、(C7)

限制条件

(C1) 0 <= feature_index < rank(operand)。
(C2) operand、scale、offset、mean、variance 和 result 具有相同的 baseline_element_type。
(C3) size(scale) = dim(operand, feature_index)。
(C4) size(offset) = dim(operand, feature_index)。
(C5) size(mean) = dim(operand, feature_index)。
(C6) size(variance) = dim(operand, feature_index)。
(C7) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [
//            [[1.0, 2.0], [3.0, 4.0]],
//            [[3.0, 4.0], [1.0, 2.0]]
//           ]
// %scale: [1.0, 1.0]
// %offset: [1.0, 1.0]
// %mean: [2.0, 3.0]
// %variance: [1.0, 1.0]
%result = "stablehlo.batch_norm_inference"(%operand, %scale, %offset, %mean, %variance) {
  epsilon = 0.0 : f32,
  feature_index = 2 : i64
} : (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>, tensor<2xf64>, tensor<2xf64>) -> tensor<2x2x2xf64>
// %result: [
//           [[0.0, 0.0], [2.0, 2.0]],
//           [[2.0, 2.0], [0.0, 0.0]]
//          ]

batch_norm_training

语义

计算除 feature_index 维度以外的所有维度的均值和方差，并标准化 operand 张量以生成 output、batch_mean 和 batch_var 张量。更正式地说，此操作可以使用 Python 语法表示为对现有 StableHLO 操作的分解，如下所示：

def compute_mean(operand, feature_index):
  (sum,) = reduce(
      inputs=[operand],
      init_values=[constant(0, element_type(operand))],
      dimensions=[i for i in range(rank(operand)) if i != feature_index],
      body=lambda x, y: add(x, y))
  divisor = constant(size(operand) / dim(operand, feature_index),
                     element_type(operand))
  divisor_bcast = broadcast_in_dim(divisor, [], type(sum))
  return divide(sum, divisor_bcast)

def compute_variance(operand, feature_index):
  mean = compute_mean(operand, feature_index)
  mean_bcast = broadcast_in_dim(mean, [feature_index], type(operand))
  centered_operand = subtract(operand, mean_bcast)
  return compute_mean(mul(centered_operand, centered_operand), feature_index)

def batch_norm_training(operand, scale, offset, epsilon, feature_index):
  mean = compute_mean(operand, feature_index)
  variance = compute_variance(operand, feature_index)
  return batch_norm_inference(operand, scale, offset, mean, variance, epsilon,
                              feature_index),
         mean, variance

对于量化类型，请执行 dequantize_batch_norm_grad_or_training_quantize(lambda operand, scale, offset: batch_norm_training(operand, scale, offset, epsilon, feature_index), operand, scale, offset, type(output), type(batch_mean), type(batch_var))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点类型或每个张量量化张量的张量	(C1)
（I2）	`scale`	浮点或每张量量化的一维张量	(C2)、(C3)
(I3)	`offset`	浮点或每张量量化的一维张量	(C2)、(C4)
（I4）	`epsilon`	`f32` 类型的常量	(C1)、(C3-C6)
(I5)	`feature_index`	`si64` 类型的常量	(C1)、(C3-C6)

输出

名称	类型	限制条件
`output`	浮点类型或每个张量量化张量的张量	(C7)
`batch_mean`	浮点或每张量量化的一维张量	(C2)、(C5)
`batch_var`	浮点或每张量量化的一维张量	(C2)、(C6)

限制条件

(C1) 0 <= feature_index < rank(operand)。
(C2) operand、scale、offset、batch_mean、batch_var 和 output 具有相同的 baseline_element_type。
(C3) size(scale) = dim(operand, feature_index)。
(C4) size(offset) = dim(operand, feature_index)。
(C5) size(batch_mean) = dim(operand, feature_index)。
(C6) size(batch_var) = dim(operand, feature_index)。
(C7) baseline_type(output) = baseline_type(operand)。

示例

// %operand: [
//            [[1.0, 2.0], [3.0, 4.0]],
//            [[3.0, 4.0], [1.0, 2.0]]
//           ]
// %scale: [1.0, 1.0]
// %offset: [1.0, 1.0]
%output, %batch_mean, %batch_var = "stablehlo.batch_norm_training"(%operand, %scale, %offset) {
  epsilon = 0.0 : f32,
  feature_index = 2 : i64
} : (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>) ->
    (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>)
// %output: [
//           [[0.0, 0.0], [2.0, 2.0]],
//           [[2.0, 2.0], [0.0, 0.0]]
//          ]
// %batch_mean: [2.0, 3.0]
// %batch_var: [1.0, 1.0]

bitcast_convert

语义

对 operand 张量执行位投射操作，并生成一个 result 张量，其中整个 operand 张量的位将使用 result 张量的类型重新解释。

更正式地说，假设存在 E = element_type(operand)、E' = element_type(result) 和 R = rank(operand)：

如果为 num_bits(E') < num_bits(E)，则返回 bits(result[i0, ..., iR-1, :]) = bits(operand[i0, ..., iR-1])。
如果为 num_bits(E') > num_bits(E)，则返回 bits(result[i0, ..., iR-2]) = bits(operand[i0, ..., iR-2, :])。
如果为 num_bits(E') = num_bits(E)，则返回 bits(result[i0, ..., iR-1]) = bits(operand[i0, ..., iR-1])。

bits 返回给定值的内存中表示法，其行为是由实现定义的，因为张量的精确表示法是由实现定义的，元素类型的确切表示法也是由实现定义的。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或量化张量	(C1-C2)

输出

名称	类型	限制条件
`result`	或量化张量	(C1-C2)

限制条件

(C1) 给定 E = is_quantized(operand) ? storage_type(operand) : element_type(operand)、E' = is_quantized(result) ? storage_type(result) : element_type(result) 和 R = rank(operand)：
- 如果为 num_bits(E') = num_bits(E)，则为 shape(result) = shape(operand)。
- 如果为 num_bits(E') < num_bits(E)：
- rank(result) = R + 1.
- 针对所有0 <= i < Rdim(result, i) = dim(operand, i)。
- dim(result, R) * num_bits(E') = num_bits(E).
- 如果为 num_bits(E') > num_bits(E)：
- rank(result) = R - 1.
- 针对所有0 <= i < Rdim(result, i) = dim(operand, i)。
- dim(operand, R - 1) * num_bits(E) = num_bits(E').
(C2) 如果为 is_complex(operand) or is_complex(result)，则设为 is_complex(operand) and is_complex(result)。

示例

// %operand: 0x0123456789ABCDEF
%result = "stablehlo.bitcast_convert"(%operand) : (tensor<f64>) -> tensor<4xf16>
// %result: [0xCDEF, 0x89AB, 0x4567, 0x0123] // little-endian representation

更多示例

broadcast_in_dim

语义

通过复制 operand 张量中的数据来扩展输入张量的维度和/或秩，并生成 result 张量。更正式地说，是 result[result_index] = operand[operand_index]。其中，对于 axes(operand) 中的所有 d：

如果 dim(operand, d) = 1，则为 operand_index[d] = 0。
否则为 operand_index[d] = result_index[broadcast_dimensions[d]]。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或量化张量	(C1-C2)、(C5-C6)
（I2）	`broadcast_dimensions`	`si64` 类型的一维张量常数	(C2-C6)

输出

名称	类型	限制条件
`result`	或量化张量	(C1)、(C3)、(C5-C6)

限制条件

(C1) element_type(result) 的计算公式如下：
- element_type(operand)（如果 !is_per_axis_quantized(operand)）。
- element_type(operand)，但 quantization_dimension(operand)、scales(operand) 和 zero_points(operand) 可能与 quantization_dimension(result)、scales(result) 和 zero_points(result) 响应有所不同。
(C2) size(broadcast_dimensions) = rank(operand)。
(C3) 0 <= broadcast_dimensions < rank(result)。
(C4) is_unique(broadcast_dimensions)。
(C5) 对于 axes(operand) 中的所有 d：
- dim(operand, d) = 1或
- dim(operand, d) = dim(result, broadcast_dimensions[d]).
(C6) 如果 is_per_axis_quantized(result)：
- quantization_dimension(result) = broadcast_dimensions[quantization_dimension(operand)].
- 如果此字段的值为 dim(operand, quantization_dimension(operand)) = 1，则设为 scales(result)[i] = scales(operand)[0] and zero_points(result)[i] = zero_points(operand)[0] for i in range(dim(result, quantization_dimension(result)))。

示例

// %operand: [
//            [1, 2, 3]
//           ]
%result = "stablehlo.broadcast_in_dim"(%operand) {
  broadcast_dimensions = array<i64: 2, 1>
} : (tensor<1x3xi32>) -> tensor<2x3x2xi32>
// %result: [
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ],
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ]
//          ]

更多示例

场景

语义

根据 index 的值，通过正好执行 branches 中的一个函数来生成输出。更正式地说，是 result = selected_branch()，其中：

如果 0 <= index < size(branches)，则为 selected_branch = branches[index]。
否则为 selected_branch = branches[-1]。

输入内容

标签	名称	类型	限制条件
（I1）	`index`	`si32` 类型的 0 维张量
（I2）	`branches`	函数变量数	(C1-C4)

输出

名称	类型	限制条件
`results`	可变数量的张量、量化张量或词元	(C4)

限制条件

(C1) 0 < size(branches)。
(C2) input_types(branches...) = []。
(C3) same(output_types(branches...))。
(C4) type(results...) = output_types(branches[0])。

示例

// %index: -1
// %result_branch0: [0, 0]
// %result_branch1: [1, 1]
%result0, %result1 = "stablehlo.case"(%index) ({
  "stablehlo.return"(%result_branch0, %result_branch0) : (tensor<2xi64>, tensor<2xi64>) -> ()
}, {
  "stablehlo.return"(%result_branch1, %result_branch1) : (tensor<2xi64>, tensor<2xi64>) -> ()
}) : (tensor<i32>) -> (tensor<2xi64>, tensor<2xi64>)
// %result0: [1, 1]
// %result1: [1, 1]

更多示例

cbrt

语义

对 operand 张量执行元素级立方根运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 rootn(x, 3)。
对于复数：复数立方根。
对于量化类型：dequantize_op_quantize(cbrt, operand, type(result))

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [0.0, 1.0, 8.0, 27.0]
%result = "stablehlo.cbrt"(%operand) : (tensor<4xf64>) -> tensor<4xf64>
// %result: [0.0, 1.0, 2.0, 3.0]

更多示例

ceil

语义

执行 operand 张量的元素级循环并生成 result 张量。实现符合 IEEE-754 规范的 roundToIntegralTowardPositive 操作。对于量化类型，请执行 dequantize_op_quantize(ceil, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [-0.8166, -0.2530, 0.2530, 0.8166, 2.0]
%result = "stablehlo.ceil"(%operand) : (tensor<5xf32>) -> tensor<5xf32>
// %result: [-0.0, -0.0, 1.0, 1.0, 2.0]

更多示例

Cholesky

语义

计算一批矩阵的 Cholesky 分解。

更正式地说，对于 index_space(result) 中的所有 i，result[i0, ..., iR-3, :, :] 是 a[i0, ..., iR-3, :, :] 的 Cholesky 分解，采用下三角矩阵（如果 lower 为 true）或上三角矩阵（如果 lower 为 false）矩阵的形式。对立三角形（即严格的上三角形或相应的严格下三角形）中的输出值是由实现定义的。

如果存在 i，且输入矩阵不是埃尔米特正定矩阵，则此行为是未定义的。

对于量化类型，请执行 dequantize_op_quantize(lambda operand: cholesky(operand, lower), a, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`a`	浮点或复杂类型或每个张量量化张量的张量	(C1-C3)
（I2）	`lower`	`i1` 类型的 0 维张量常数

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(a) = baseline_type(result)。
(C2) 2 <= rank(a)。
(C3) dim(a, -2) = dim(a, -1)。

示例

// %a: [
//      [1.0, 2.0, 3.0],
//      [2.0, 20.0, 26.0],
//      [3.0, 26.0, 70.0]
//     ]
%result = "stablehlo.cholesky"(%a) {
  lower = true
} : (tensor<3x3xf32>) -> tensor<3x3xf64>
// %result: [
//           [1.0, 0.0, 0.0],
//           [2.0, 4.0, 0.0],
//           [3.0, 5.0, 6.0]
//          ]

限制取值范围

语义

将 operand 张量的每个元素限制在最小值和最大值之间，并生成一个 result 张量。更正式地说，是 result[result_index] = minimum(maximum(operand[result_index], min_element), max_element)，其中 min_element = rank(min) = 0 ? min[] : min[result_index]、max_element = rank(max) = 0 ? max[] : max[result_index]。对于量化类型，执行 dequantize_op_quantize(clamp, min, operand, max, type(result))。

对复数施加排序涉及令人惊讶的语义，因此未来我们计划取消对此操作的支持 (#560)。

输入内容

标签	名称	类型	限制条件
（I1）	`min`	或每张量量化张量	(C1)、(C3)
（I2）	`operand`	或每张量量化张量	(C1-C4)
(I3)	`max`	或每张量量化张量	(C2)、(C3)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C4)

限制条件

(C1) rank(min) = 0 or shape(min) = shape(operand)。
(C2) rank(max) = 0 or shape(max) = shape(operand)。
(C3) baseline_element_type(min) = baseline_element_type(operand) = baseline_element_type(max)。
(C4) baseline_type(operand) = baseline_type(result)。

示例

// %min: [5, 10, 15]
// %operand: [3, 13, 23]
// %max: [10, 15, 20]
%result = "stablehlo.clamp"(%min, %operand, %max) : (tensor<3xi32>, tensor<3xi32>, tensor<3xi32>) -> tensor<3xi32>
// %result: [5, 13, 20]

更多示例

collective_broadcast

语义

在 StableHLO 进程网格中的每个进程组内，将 operand 张量的值从源进程发送到目标进程，并生成 result 张量。

该操作将 StableHLO 进程网格拆分为 process_groups，其定义如下：

如果 channel_id <= 0，则为 cross_replica(replica_groups)。
如果 channel_id > 0，则为 cross_partition(replica_groups)。

之后，result@process 将按以下方式提供：

如果存在 i（使得进程位于 process_groups[i] 中），则返回 operand@process_groups[i, 0]。
broadcast_in_dim(constant(0, element_type(result)), [], type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	张量	(C3)
（I2）	`replica_groups`	`si64` 类型的一维张量常量的变数	(C1)、(C2)
(I3)	`channel_id`	`si64` 类型的常量

输出

名称	类型	限制条件
`result`	张量	(C3)

限制条件

(C1) is_unique(replica_groups)。
(C2) 0 <= replica_groups < N，其中 N 的定义如下：
- 如果使用 cross_replica，则为 num_replicas。
- 如果使用 cross_partition，则为 num_partitions。
(C3) type(result) = type(operand)。

示例

// num_replicas: 4
// num_partitions: 1
// %operand@(0, 0): [[1, 2]]
// %operand@(1, 0): [[3, 4]]
// %operand@(2, 0): [[5, 6]]
// %operand@(3, 0): [[7, 8]]
%result = "stablehlo.collective_broadcast"(%operand) {
  replica_groups = dense<[[2, 1]]> : tensor<1x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor1x2xi64>) -> tensor<1x2xi64>
// %result@(0, 0): [[0, 0]]
// %result@(1, 0): [[5, 6]]
// %result@(2, 0): [[5, 6]]
// %result@(3, 0): [[0, 0]]

collective_permute

语义

在 StableHLO 进程网格中的每个进程组内，将 operand 张量的值从源进程发送到目标进程，并生成 result 张量。

该操作将 StableHLO 进程网格拆分为 process_groups，其定义如下：

如果 channel_id <= 0，则为 cross_replica(source_target_pairs)。
如果 channel_id > 0，则为 cross_partition(source_target_pairs)。

之后，result@process 将按以下方式提供：

operand@process_groups[i, 0]（如果存在使 process_groups[i, 1] = process 如此的 i）。
broadcast_in_dim(constant(is_quantized(result) ? quantize(0, element_type(result)) : 0, element_type(result)), [], type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C5)
（I2）	`source_target_pairs`	`si64` 类型的二维张量常数	(C1-C4)
(I3)	`channel_id`	`si64` 类型的常量

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)

限制条件

(C1) dim(source_target_pairs, 1) = 2。
(C2) is_unique(source_target_pairs[:, 0])。
(C3) is_unique(source_target_pairs[:, 1])。
(C4) 0 <= source_target_pairs < N，其中 N 的定义如下：
- 如果使用 cross_replica，则为 num_replicas。
- 如果使用 cross_partition，则为 num_partitions。
(C5) type(result) = type(operand)。

示例

// num_replicas: 3
// num_partitions: 1
// %operand@(0, 0): [[1, 2], [3, 4]]
// %operand@(1, 0): [[5, 6], [7, 8]]
// %operand@(2, 0): [[9, 10], [11, 12]]
%result = "stablehlo.collective_permute"(%operand) {
  source_target_pairs = dense<[[0, 1], [1, 2]]> : tensor<2x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor<2x2xi64>) -> tensor<2x2xi64>
//
// %result@(0, 0): [[0, 0], [0, 0]]
// %result@(1, 0): [[1, 2], [3, 4]]
// %result@(2, 0): [[5, 6], [7, 8]]

更多示例

compare

语义

根据 comparison_direction 和 compare_type 对 lhs 和 rhs 张量进行元素级比较，并生成 result 张量。

comparison_direction 和 compare_type 的值具有以下语义：

对于布尔值和整数元素类型：

EQ：lhs = rhs。
NE：lhs != rhs。
GE：lhs >= rhs。
GT：lhs > rhs。
LE：lhs <= rhs。
LT：lhs < rhs。

对于具有 compare_type = FLOAT 的浮点元素类型，该操作会实现以下 IEEE-754 运算：

EQ：compareQuietEqual。
NE：compareQuietNotEqual。
GE：compareQuietGreaterEqual。
GT：compareQuietGreater。
LE：compareQuietLessEqual。
LT：compareQuietLess。

对于具有 compare_type = TOTALORDER 的浮点元素类型，运算会结合使用来自 IEEE-754 的 totalOrder 和 compareQuietEqual 运算。此功能似乎未使用，因此将来，我们计划将其移除 (#584)。

对于复杂元素类型，使用提供的 comparison_direction 和 compare_type 执行 (real, imag) 对的字典顺序比较。对复数施加排序涉及令人惊讶的语义，因此未来我们计划在 comparison_direction 为 GE、GT、LE 或 LT 时取消对复数的支持 (#560)。

适用于量化类型。执行 dequantize_compare(lhs, rhs, comparison_direction)。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	或每张量量化张量	(C1-C3)
（I2）	`rhs`	或每张量量化张量	(C1-C2)
(I3)	`comparison_direction`	`EQ`、`NE`、`GE`、`GT`、`LE` 和 `LT` 的枚举
（I4）	`compare_type`	`FLOAT`、`TOTALORDER`、`SIGNED` 和 `UNSIGNED` 的枚举	(C3)

输出

名称	类型	限制条件
`result`	布尔值类型的张量	(C2)

限制条件

(C1) baseline_element_type(lhs) = baseline_element_type(rhs)。
(C2) shape(lhs) = shape(rhs) = shape(result)。
(C3) compare_type 的定义如下：
- 如果 is_signed_integer(element_type(lhs))，则为 SIGNED。
- 如果 is_unsigned_integer(element_type(lhs)) or is_boolean(element_type(lhs))，则为 UNSIGNED。
- FLOAT如果 is_float(element_type(lhs))，则选择 TOTALORDER。
- 如果 is_complex(element_type(lhs))，则为 FLOAT。

示例

// %lhs: [1.0, 3.0]
// %rhs: [1.1, 2.9]
%result = "stablehlo.compare"(%lhs, %rhs) {
  comparison_direction = #stablehlo<comparison_direction LT>,
  compare_type = #stablehlo<comparison_type FLOAT>
} : (tensor<2xf32>, tensor<2xf32>) -> tensor<2xi1>
// %result: [true, false]

更多示例

复杂

语义

从一对实值和虚数值（lhs 和 rhs）执行元素级转换为复杂值，并生成 result 张量。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	`f32` 或 `f64` 类型的张量	(C1-C3)
（I2）	`rhs`	`f32` 或 `f64` 类型的张量	(C1)

输出

名称	类型	限制条件
`result`	复杂类型的张量	(C2)、(C3)

限制条件

(C1) type(lhs) = type(rhs)。
(C2) shape(result) = shape(lhs)。
(C3) element_type(result) 的类型为 complex<E>，其中 E = element_type(lhs)。

示例

// %lhs: [1.0, 3.0]
// %rhs: [2.0, 4.0]
%result = "stablehlo.complex"(%lhs, %rhs) : (tensor<2xf64>, tensor<2xf64>) -> tensor<2xcomplex<f64>>
// %result: [(1.0, 2.0), (3.0, 4.0)]

更多示例

concatenate

语义

按照与给定参数相同的顺序沿 dimension 维度串联 inputs，并生成一个 result 张量。更正式地说，是 result[i0, ..., id, ..., iR-1] = inputs[k][i0, ..., kd, ..., iR-1]，其中：

id = d0 + ... + dk-1 + kd.
d 等于 dimension，而 d0, ... 是第 d 个维度大小 (inputs)。

输入内容

标签	名称	类型	限制条件
（I1）	`inputs`	可变张量数量或每个张量量化张量	(C1-C6)
（I2）	`dimension`	`si64` 类型的常量	(C2)、(C4)、(C6)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C5-C6)

限制条件

(C1) same(element_type(inputs...))。
(C2) same(shape(inputs...))（dim(inputs..., dimension) 除外）。
(C3) 0 < size(inputs)。
(C4) 0 <= dimension < rank(inputs[0])。
(C5) element_type(result) = element_type(inputs[0])。
(C6) shape(result) = shape(inputs[0])，但以下几项除外：
- dim(result, dimension) = dim(inputs[0], dimension) + ....

示例

// %input0: [[1, 2], [3, 4], [5, 6]]
// %input1: [[7, 8]]
%result = "stablehlo.concatenate"(%input0, %input1) {
  dimension = 0 : i64
} : (tensor<3x2xi64>, tensor<1x2xi64>) -> tensor<4x2xi64>
// %result: [[1, 2], [3, 4], [5, 6], [7, 8]]

更多示例

常量

语义

从常量 value 生成 output 张量。

输入内容

标签	名称	类型	限制条件
（I1）	`value`	常量	(C1)

输出

名称	类型	限制条件
`output`	或量化张量	(C1)

限制条件

(C1) type(value) = type(output)。

示例

%output = "stablehlo.constant"() {
  value = dense<[[0.0, 1.0], [2.0, 3.0]]> : tensor<2x2xf32>
} : () -> tensor<2x2xf32>
// %output: [[0.0, 1.0], [2.0, 3.0]]

更多示例

转化

语义

对 operand 张量执行从一种元素类型到另一种元素类型的元素级转换，并生成 result 张量。

对于 boolean-to-any-supported-type 转换，值 false 会转换为零，值 true 会转换为 1。对于any-supported-type-to-boolean的转换，零值转换为 false，非零值转换为 true。请参阅下文，了解对于复杂类型是如何工作的。

对于涉及“整数到整数”“整数到浮点数”或“浮点到浮点数”的转换，如果源值可以在目标类型中精确表示，则结果值就是确切的表示法。否则，行为将待定 (#180)。

对于涉及floating-point-to-integer的转换，小数部分会被截断。如果截断的值无法在目的地类型中表示，则该行为待定 (#180)。

涉及复数到复数的转换遵循与浮点到浮点转换相同的行为，用于转换实部和虚部。

对于“复数到任何其他类型”和“任何其他类型到复数”complex-to-any-other-type转化，系统会分别忽略来源虚值或目标虚值值为零。complex-to-any-other-type实际部分的转换在浮点数转换之后。

原则上，此操作可以表示反量化（从量化张量转换为常规张量）、量化（从常规张量转换为量化张量）和重新量化（量化张量之间的转换），但目前我们有专门的操作 - uniform_dequantize 用于第一个用例，uniform_quantize 用于第二个和第三个用例。将来，这两个操作可能会合并到 convert 中 (#1576)。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	张量	(C1)

输出

名称	类型	限制条件
`result`	张量	(C1)

限制条件

(C1) shape(operand) = shape(result)。

示例

// %operand: [-1, 0, 1]
%result = "stablehlo.convert"(%operand) : (tensor<3xi64>) -> tensor<3xcomplex<f64>>
// %result: [(-1.0, 0.0), (0.0, 0.0), (1.0, 0.0)]

更多示例

卷积

语义

计算 lhs 窗口和 rhs 切片之间的点积，并生成 result。下图显示了如何使用一个具体示例根据 lhs 和 rhs 计算 result 中的元素。

更正式地说，请考虑以下关于 lhs 的输入重构，以便能够表示 lhs 的窗口：

lhs_window_dimensions = lhs_shape(dim(lhs, input_batch_dimension), dim(rhs, kernel_spatial_dimensions), dim(lhs, input_feature_dimension)).
lhs_window_strides = lhs_shape(1, window_strides, 1).
lhs_padding = lhs_shape([0, 0], padding, [0, 0]).
lhs_base_dilations = lhs_shape(1, lhs_dilation, 1).
lhs_window_dilations = lhs_shape(1, rhs_dilation, 1).

此重构使用以下辅助函数：

lhs_shape(n, hw, c) = permute([n] + hw + [c], [input_batch_dimension] + input_spatial_dimensions + [input_feature_dimension]).
result_shape(n1, hw, c1) = permute([n1] + hw + [c1], [output_batch_dimension] + output_spatial_dimensions + [output_feature_dimension]).
permute([j0, j1, ..., jR-1], permutation) = [i0, i1, ..., iR-1]，其中 j[d] = i[permutation[d]]。

如果为 feature_group_count = 1 和 batch_group_count = 1，则对于 index_space(dim(result, output_spatial_dimensions...)) 中的所有 output_spatial_index，result[result_shape(:, output_spatial_index, :)] = dot_product，其中：

padding_value = constant(0, element_type(lhs)).
padded_lhs = pad(lhs, padding_value, lhs_padding[:, 0], lhs_padding[:, 1], lhs_base_dilations - 1).
lhs_window_start = lhs_shape(0, output_spatial_index, 0) * lhs_window_strides.
lhs_window = slice(padded_lhs, lhs_window_start, lhs_window_start + lhs_window_dimensions, lhs_window_dilations).
reversed_lhs_window = reverse(lhs_window, [input_spatial_dimensions[dim] for dim in range(size(window_reversal)) if window_reversal[dim] = true])。此功能似乎未使用，因此将来我们计划将其移除 (#1181)。
dot_product = dot_general(reversed_lhs_window, rhs, lhs_batching_dimensions=[], lhs_contracting_dimensions=input_spatial_dimensions + [input_feature_dimension], rhs_batching_dimensions=[], rhs_contracting_dimensions=kernel_spatial_dimensions + [kernel_input_feature_dimension]).

如果为 feature_group_count > 1：

lhses = split(lhs, feature_group_count, input_feature_dimension).
rhses = split(rhs, feature_group_count, kernel_output_feature_dimension).
results... = convolution(lhses..., rhses..., ..., feature_group_count=1, ...).
result = concatenate(results, output_feature_dimension).

如果为 batch_group_count > 1：

lhses = split(lhs, batch_group_count, input_batch_dimension).
rhses = split(rhs, batch_group_count, kernel_output_feature_dimension).
results... = convolution(lhses..., rhses..., ..., batch_group_count=1, ...).
result = concatenate(results, output_feature_dimension)。

对于量化类型，执行 dequantize_op_quantize( lambda lhs, rhs: convolution(lhs, rhs, window_strides, padding, lhs_dilation, rhs_dilation, window_reversal, input_batch_dimension, input_feature_dimension, input_spatial_dimensions, kernel_input_feature_dimension, kernel_output_feature_dimension, kernel_spatial_dimensions, output_batch_dimension, output_feature_dimension, output_spatial_dimensions, feature_group_count, batch_group_count, precision_config), lhs, rhs, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	或每张量量化张量	(C1)、(C10-C11)、(C14) (C25)、(C27-C30)
（I2）	`rhs`	或量化张量	(C1)、(C14-C16)、(C25)、(C27-C32)
(I3)	`window_strides`	`si64` 类型的一维张量常数	(C2-C3)、(C25)
（I4）	`padding`	`si64` 类型的二维张量常数	(C4)、(C25)
(I5)	`lhs_dilation`	`si64` 类型的一维张量常数	(C5-C6)、(C25)
(I6)	`rhs_dilation`	`si64` 类型的一维张量常数	(C7-C8)、(C25)
（I7）	`window_reversal`	`i1` 类型的一维张量常数	(C9)
(I8)	`input_batch_dimension`	`si64` 类型的常量	(C10)、(C13)、(C25)
(I9)	`input_feature_dimension`	`si64` 类型的常量	(C11)、(C13-C14)
（I10）	`input_spatial_dimensions`	`si64` 类型的一维张量常数	(C12)、(C13)、(C25)
（I11）	`kernel_input_feature_dimension`	`si64` 类型的常量	(C14)、(C18)
（I12）	`kernel_output_feature_dimension`	`si64` 类型的常量	(C15-C16)、(C18)、(C25)、(C32)
（I13）	`kernel_spatial_dimensions`	`si64` 类型的一维张量常数	(C17-C18)、(C25)
（I14）	`output_batch_dimension`	`si64` 类型的常量	(C20)、(C25)
（I15）	`output_feature_dimension`	`si64` 类型的常量	(C20)、(C25)、(C33)
（I16）	`output_spatial_dimensions`	`si64` 类型的一维张量常数	(C19-C20)、(C25)
（I17）	`feature_group_count`	`si64` 类型的常量	(C11)、(C14)、(C16)、(C21)、(C23)
（I18）	`batch_group_count`	`si64` 类型的常量	(C10)、(C15)、(C22)、(C23)、(C25)
（I19）	`precision_config`	`DEFAULT`、`HIGH` 和 `HIGHEST` 的枚举数量不等	(C24)

输出

名称	类型	限制条件
`result`	或量化张量	(C25-C28)、(C30-C31)、(C33)

限制条件

(C1) N = rank(lhs) = rank(rhs)。
(C2) size(window_strides) = N - 2。
(C3) 0 < window_strides。
(C4) shape(padding) = [N - 2, 2]。
(C5) size(lhs_dilation) = N - 2。
(C6) 0 < lhs_dilation。
(C7) size(rhs_dilation) = N - 2。
(C8) 0 < rhs_dilation。
(C9) size(window_reversal) = N - 2。
(C10) dim(lhs, input_batch_dimension) % batch_group_count = 0。
(C11) dim(lhs, input_feature_dimension) % feature_group_count = 0。
(C12) size(input_spatial_dimensions) = N - 2。
(C13) 假设 input_dimensions = [input_batch_dimension] + input_spatial_dimensions + [input_feature_dimension]：
- is_unique(input_dimensions).
- 0 <= input_dimensions < N.
(C14) dim(rhs, kernel_input_feature_dimension) = dim(lhs, input_feature_dimension) / feature_group_count。
(C15) dim(rhs, kernel_output_feature_dimension) % batch_group_count = 0。
(C16) dim(rhs, kernel_output_feature_dimension) % feature_group_count = 0。
(C17) size(kernel_spatial_dimensions) = N - 2。
(C18) 假设 kernel_dimensions = kernel_spatial_dimensions + [kernel_input_feature_dimension] + [kernel_output_feature_dimension]：
- is_unique(kernel_dimensions).
- 0 <= kernel_dimensions < N.
(C19) size(output_spatial_dimensions) = N - 2。
(C20) 假设 output_dimensions = [output_batch_dimension] + output_spatial_dimensions + [output_feature_dimension]：
- is_unique(output_dimensions).
- 0 <= output_dimensions < N.
(C21) 0 < feature_group_count。
(C22) 0 < batch_group_count。
(C23) feature_group_count = 1 or batch_group_count = 1。
(C24) size(precision_config) = 2。
(C25) dim(result, result_dim) 的定义如下：
- 如果 result_dim = output_batch_dimension，则为 dim(lhs, input_batch_dimension) / batch_group_count。
- 如果 result_dim = output_feature_dimension，则为 dim(rhs, kernel_output_feature_dimension)。
- 否则为 num_windows，其中：
- output_spatial_dimensions[spatial_dim] = result_dim.
- lhs_dim = input_spatial_dimensions[spatial_dim].
- rhs_dim = kernel_spatial_dimensions[spatial_dim].
- dilated_input_shape[lhs_dim] = dim(lhs, lhs_dim) = 0 ? 0 : (dim(lhs, lhs_dim) - 1) * lhs_dilation[spatial_dim] + 1.
- padded_input_shape[lhs_dim] = padding[spatial_dim, 0] + dilated_input_shape[lhs_dim] + padding[spatial_dim, 1].
- dilated_window_shape[lhs_dim] = dim(rhs, rhs_dim) = 0 ? 0 : (dim(rhs, rhs_dim) - 1) * rhs_dilation[spatial_dim] + 1.
- is_empty_window[lhs_dim] = padded_input_shape[lhs_dim] = 0 || dilated_window_shape[lhs_dim] > padded_input_shape[lhs_dim].
- num_windows = is_empty_window[lhs_dim] ? 0 : floor((padded_input_shape[lhs_dim] - dilated_window_shape[lhs_dim]) / window_strides[spatial_dim]) + 1.
(C26) rank(result) = N。
如果操作使用非量化张量：
- (C27) element_type(lhs) = element_type(rhs) = element_type(result)。
如果操作使用量化张量：
- (C28) is_quantized_tensor(lhs) and is_quantized_tensor(rhs) and is_quantized_tensor(result)。
- (C29) storage_type(lhs) = storage_type(rhs)。
- (C30) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C31) 如果为 is_per_tensor_quantized(rhs)，则设为 is_per_tensor_quantized(result)。
- (C32) 如果为 is_per_axis_quantized(rhs)，则设为 quantization_dimension(rhs) = kernel_output_feature_dimension。
- (C33) 如果为 is_per_axis_quantized(result)，则设为 quantization_dimension(result) = output_feature_dimension。

示例

// %lhs: [[
//        [
//          [1], [2], [5], [6]
//        ],
//        [
//          [3], [4], [7], [8]
//        ],
//        [
//          [10], [11], [14], [15]
//        ],
//        [
//          [12], [13], [16], [17]
//        ]
//      ]]
//
// %rhs : [
//         [[[1]], [[1]], [[1]]],
//         [[[1]], [[1]], [[1]]],
//         [[[1]], [[1]], [[1]]]
//        ]
%result = "stablehlo.convolution"(%lhs, %rhs) {
  window_strides = dense<4> : tensor<2xi64>,
  padding = dense<0> : tensor<2x2xi64>,
  lhs_dilation = dense<2> : tensor<2xi64>,
  rhs_dilation = dense<1> : tensor<2xi64>,
  window_reversal = dense<false> : tensor<2xi1>,
  // In the StableHLO dialect, dimension numbers are encoded via:
  // `[<input dimensions>]x[<kernel dimensions>]->[output dimensions]`.
  // "b" is batch dimension, "f" is feature dimension,
  // "i" is input feature dimension, "o" is output feature dimension,
  // "0/1/etc" are spatial dimensions.
  dimension_numbers = #stablehlo.conv<[b, 0, 1, f]x[0, 1, i, o]->[b, 0, 1, f]>,
  feature_group_count = 1 : i64,
  batch_group_count = 1 : i64,
  precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>]
} : (tensor<1x4x4x1xi32>, tensor<3x3x1x1xi32>) -> tensor<1x2x2x1xi32>
// %result: [[
//            [[10], [26]],
//            [[46], [62]]
//          ]]

余弦

语义

对 operand 张量执行元素级余弦运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 cos。
对于复数：复余弦。
对于量化类型：dequantize_op_quantize(cosine, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [
//            [0.0, 1.57079632],       // [0, pi/2]
//            [3.14159265, 4.71238898] // [pi, 3pi/2]
//           ]
%result = "stablehlo.cosine"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[1.0, 0.0], [-1.0, 0.0]]

更多示例

count_leading_zeros

语义

对 operand 张量中的前导零位数量执行元素级计数，并生成一个 result 张量。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	整数类型的张量	(C1)

输出

名称	类型	限制条件
`result`	整数类型的张量	(C1)

限制条件

(C1) type(operand) = type(result)。

示例

// %operand: [[0, 1], [128, -1]]
%result = "stablehlo.count_leading_zeros"(%operand) : (tensor<2x2xi64>) -> tensor<2x2xi64>
// %result: [[64, 63], [56, 0]]

更多示例

custom_call

语义

封装实现定义的操作 call_target_name，该操作接受 inputs 和 called_computations 并生成 results。has_side_effect、backend_config 和 api_version 可用于提供由实现定义的其他元数据。

目前，此操作包含整理得相当混乱的元数据集合，反映了它在 XLA 编译器中对应操作的自然演变。将来，我们计划统一这些元数据 (#741)。

输入内容

标签	名称	类型
（I1）	`inputs`	值数量不等
（I2）	`call_target_name`	`string` 类型的常量
(I3)	`has_side_effect`	`i1` 类型的常量
（I4）	`backend_config`	`string` 类型的常量
(I5)	`api_version`	`si32` 类型的常量
(I6)	`called_computations`	`string` 类型的常量可变数

输出

名称	类型
`results`	值数量不等

示例

%results = "stablehlo.custom_call"(%input0) {
  call_target_name = "foo",
  has_side_effect = false,
  backend_config = "bar",
  api_version = 1 : i32,
  called_computations = [@foo]
} : (tensor<f64>) -> tensor<f64>

除号

语义

对被除数 lhs 和除数 rhs 张量进行元素级除法，并生成 result 张量。根据元素类型，执行以下操作：

对于整数：整数除法，可生成代数商，并舍弃所有小数部分。
对于浮点数：IEEE-754 中的 division。
适用于复数：复数除法。
对于量化类型：
- dequantize_op_quantize(divide, lhs, rhs, type(result)).

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)
（I2）	`rhs`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

输出

名称	类型	限制条件
`result`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

限制条件

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

示例

// %lhs: [17.1, -17.1, 17.1, -17.1]
// %rhs: [3.0, 3.0, -3.0, -3.0]
%result = "stablehlo.divide"(%lhs, %rhs) : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>
// %result: [5.66666651, -5.66666651, -5.66666651, 5.66666651]

更多示例

dot_general

语义

计算 lhs 的切片与 rhs 的切片之间的点积，并生成 result 张量。

更正式地说，是 result[result_index] = dot_product，其中：

lhs_result_dimensions = [d for d in axes(lhs) and d not in lhs_batching_dimensions and d not in lhs_contracting_dimensions].
rhs_result_dimensions = [d for d in axes(rhs) and d not in rhs_batching_dimensions and d not in rhs_contracting_dimensions].
result_batching_index + result_lhs_index + result_rhs_index = result_index 其中 size(result_batching_index) = size(lhs_batching_dimensions)、size(result_lhs_index) = size(lhs_result_dimensions) 和 size(result_rhs_index) = size(rhs_result_dimensions)。
transposed_lhs = transpose(lhs, lhs_batching_dimensions + lhs_result_dimensions + lhs_contracting_dimensions).
transposed_lhs_slice = slice(transposed_lhs, result_batching_index + result_lhs_index + [:, ..., :]).
reshaped_lhs_slice = reshape(transposed_lhs_slice, dims(lhs, lhs_contracting_dimensions)).
transposed_rhs = transpose(rhs, rhs_batching_dimensions + rhs_result_dimensions + rhs_contracting_dimensions).
transposed_rhs_slice = slice(transposed_rhs, result_batching_index + result_rhs_index + [:, ..., :]).
reshaped_rhs_slice = reshape(transposed_rhs_slice, dims(rhs, rhs_contracting_dimensions)).
dot_product = reduce( inputs=[multiply(reshaped_lhs_slice, reshaped_rhs_slice)], init_values=[constant(0, element_type(result))], dimensions=range(size(lhs_contracting_dimensions)), body=lambda x, y: add(x, y))。

对于量化类型，执行 dequantize_op_quantize( lambda lhs, rhs: dot_general(lhs, rhs, lhs_batching_dimensions, rhs_batching_dimensions, lhs_contracting_dimensions, rhs_contracting_dimensions, precision_config), lhs, rhs, type(result))。

这仅指定了每个张量量化的语义。正在进行每轴量化 (#1574)。此外，将来，我们可能会考虑添加对混合量化的支持 (#1575)。

precision_config 用于控制加速器后端上的计算在速度和准确性之间的权衡。这可能是以下某种原因（目前，这些枚举值的语义尚未明确，但我们计划在 #755 中解决此问题）：

DEFAULT：以最快的速度计算，但与原始数字最接近的近似值最低。
HIGH：计算速度更慢，但更接近原始数字。
HIGHEST：计算最慢，但与原始数字最接近的近似值。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	或每张量量化张量	(C5-C6)、(C9-C10)、(C12-C16)
（I2）	`rhs`	或每张量量化张量	(C7-C10)、(C12)
(I3)	`lhs_batching_dimensions`	`si64` 类型的一维张量常数	(C1)、(C3)、(C5)、(C9)、(C12)
（I4）	`rhs_batching_dimensions`	`si64` 类型的一维张量常数	(C1)、(C4)、(C7)、(C9)
(I5)	`lhs_contracting_dimensions`	`si64` 类型的一维张量常数	(C2)、(C3)、(C6)、(C10)
(I6)	`rhs_contracting_dimensions`	`si64` 类型的一维张量常数	(C2)、(C4)、(C8)、(C10)
（I7）	`precision_config`	`DEFAULT`、`HIGH` 和 `HIGHEST` 的枚举数量不等	(C11)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C12)、(C14)、(C16)

限制条件

(C1) size(lhs_batching_dimensions) = size(rhs_batching_dimensions)。
(C2) size(lhs_contracting_dimensions) = size(rhs_contracting_dimensions)。
(C3) is_unique(lhs_batching_dimensions + lhs_contracting_dimensions)。
(C4) is_unique(rhs_batching_dimensions + rhs_contracting_dimensions)。
(C5) 0 <= lhs_batching_dimensions < rank(lhs)。
(C6) 0 <= lhs_contracting_dimensions < rank(lhs)。
(C7) 0 <= rhs_batching_dimensions < rank(rhs)。
(C8) 0 <= rhs_contracting_dimensions < rank(rhs)。
(C9) dim(lhs, lhs_batching_dimensions...) = dim(rhs, rhs_batching_dimensions...)。
(C10) dim(lhs, lhs_contracting_dimensions...) = dim(rhs, rhs_contracting_dimensions...)。
(C11) size(precision_config) = 2。
(C12) shape(result) = dim(lhs, lhs_batching_dimensions) + dim(lhs, lhs_result_dimensions) + dim(rhs, rhs_result_dimensions)。
如果操作使用非量化张量：
- (C13) element_type(lhs) = element_type(rhs)。
如果操作使用量化张量：
- (C14) is_quantized(lhs) and is_quantized(rhs) and is_quantized(result)。
- (C15) storage_type(lhs) = storage_type(rhs)。
- (C16) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C17) zero_points(rhs) = 0。

示例

// %lhs: [
//        [[1, 2],
//         [3, 4]],
//        [[5, 6],
//         [7, 8]]
//       ]
// %rhs: [
//        [[1, 0],
//         [0, 1]],
//        [[1, 0],
//         [0, 1]]
//       ]
%result = "stablehlo.dot_general"(%lhs, %rhs) {
  dot_dimension_numbers = #stablehlo.dot<
    lhs_batching_dimensions = [0],
    rhs_batching_dimensions = [0],
    lhs_contracting_dimensions = [2],
    rhs_contracting_dimensions = [1]
  >,
  precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>]
} : (tensor<2x2x2xi64>, tensor<2x2x2xi64>) -> tensor<2x2x2xi64>
// %result: [
//           [[1, 2],
//            [3, 4]],
//           [[5, 6],
//            [7, 8]]
//          ]

更多示例

dynamic_slice

语义

使用动态计算的起始索引从 operand 中提取 Slice，并生成 result 张量。start_indices 包含每个可能调整的维度的切片起始索引，slice_sizes 包含每个维度的切片大小。更正式地说，是 result[result_index] = operand[operand_index]，其中：

adjusted_start_indices = clamp(0, start_indices, shape(operand) - slice_sizes).
operand_index = adjusted_start_indices + result_index.

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1)、(C2)、(C4)
（I2）	`start_indices`	整数类型的零维张量的可变数	(C2)、(C3)
(I3)	`slice_sizes`	`si64` 类型的一维张量常数	(C2)、(C4)、(C5)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)、(C5)

限制条件

(C1) element_type(operand) = element_type(result)。
(C2) size(start_indices) = size(slice_sizes) = rank(operand)。
(C3) same(type(start_indices...))。
(C4) 0 <= slice_sizes <= shape(operand)。
(C5) shape(result) = slice_sizes。

示例

// %operand: [
//            [0, 0, 1, 1],
//            [0, 0, 1, 1],
//            [0, 0, 0, 0],
//            [0, 0, 0, 0]
//           ]
// %start_indices0: -1
// %start_indices1: 3
%result = "stablehlo.dynamic_slice"(%operand, %start_indices0, %start_indices1) {
  slice_sizes = dense<[2, 2]> : tensor<2xi64>
} : (tensor<4x4xi32>, tensor<i64>, tensor<i64>) -> tensor<2x2xi32>
// %result: [
//           [1, 1],
//           [1, 1]
//          ]

更多示例

dynamic_update_slice

语义

生成一个 result 张量，它等于 operand 张量，只不过使用 update 中的值更新以 start_indices 开始的 Slice。更正式地说，result[result_index] 的定义如下：

如果 0 <= update_index < shape(update)，则为 update[update_index]，其中：
- adjusted_start_indices = clamp(0, start_indices, shape(operand) - shape(update)).
- update_index = result_index - adjusted_start_indices.
否则为 operand[result_index]。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1-C4)、(C6)
（I2）	`update`	或每张量量化张量	(C2)、(C3)、(C6)
(I3)	`start_indices`	整数类型的零维张量的可变数	(C4)、(C5)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)

限制条件

(C1) type(operand) = type(result)。
(C2) element_type(update) = element_type(operand)。
(C3) rank(update) = rank(operand)。
(C4) size(start_indices) = rank(operand)。
(C5) same(type(start_indices...))。
(C6) 0 <= shape(update) <= shape(operand)。

示例

// %operand: [
//            [1, 1, 0, 0],
//            [1, 1, 0, 0],
//            [1, 1, 1, 1],
//            [1, 1, 1, 1]
//           ]
// %update: [
//           [1, 1],
//           [1, 1]
//          ]
// %start_indices0: -1
// %start_indices1: 3
%result = "stablehlo.dynamic_update_slice"(%operand, %update, %start_indices0, %start_indices1)
  : (tensor<4x4xi32>, tensor<2x2xi32>, tensor<i64>, tensor<i64>) -> tensor<4x4xi32>
// %result: [
//           [1, 1, 1, 1],
//           [1, 1, 1, 1],
//           [1, 1, 1, 1],
//           [1, 1, 1, 1]
//          ]

更多示例

指数函数

语义

对 operand 张量执行元素级指数运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 exp。
适用于复数：复数指数。
对于量化类型：dequantize_op_quantize(exponential, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [[0.0, 1.0], [2.0, 3.0]]
%result = "stablehlo.exponential"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [[1.0, 2.7182818284590451], [7.3890560989306504, 20.085536923187668]]

更多示例

exponential_minus_one

语义

对 operand 张量执行元素级指数减一运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 expm1。
对于复数：复数指数减 1。
对于量化类型：dequantize_op_quantize(exponential_minus_one, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [0.0, 1.0]
%result = "stablehlo.exponential_minus_one"(%operand) : (tensor<2xf64>) -> tensor<2xf64>
// %result: [0.0, 1.71828187]

更多示例

FF

语义

对真实和复杂输入/输出执行正向和逆逆傅里叶转换。

fft_type 是以下值之一：

FFT：转发复杂到复杂 FFT。
IFFT：复杂到复杂 FFT 的反函数。
RFFT：转发实际到复杂 FFT。
IRFFT：实数转换为复数 FFT（即取复数，返回实数）。

更正式地说，假设函数 fft 将复杂类型的一维张量作为输入，生成相同类型的一维张量作为输出，并计算离散傅里叶转换：

对于 fft_type = FFT，result 定义为一系列 L 计算的最终结果，其中 L = size(fft_length)。例如，对于 L = 3：

result1[i0, ..., :] = fft(operand[i0, ..., :]).
result2[i0, ..., :, iR-1] = fft(result1[i0, ..., :, iR-1]).
result[i0, ..., :, iR-2, iR-1] = fft(result2[i0, ..., :, iR-2, iR-1]).

此外，假设函数 ifft 具有相同的类型签名，并计算 fft 的逆：

对于 fft_type = IFFT，result 定义为 fft_type = FFT 的反转。例如，对于 L = 3：

result1[i0, ..., :, iR-2, iR-1] = ifft(operand[i0, ..., :, iR-2, iR-1]).
result2[i0, ..., :, iR-1] = ifft(result1[i0, ..., :, iR-1]).
result[i0, ..., :] = ifft(result2[i0, ..., :]).

此外，假设函数 rfft 接受浮点类型的一维张量，会生成相同浮点语义的复杂类型的一维张量，其工作原理如下：

rfft(real_operand) = truncated_result，其中
complex_operand... = (real_operand..., 0.0).
complex_result = fft(complex_operand).
truncated_result = complex_result[:(rank(complex_result) / 2 + 1)].

（在针对实际运算数计算离散傅里叶转换时，结果的前 N/2 + 1 元素明确定义结果的其余部分，因此 rfft 的结果会被截断以避免计算冗余元素）。

对于 fft_type = RFFT，result 定义为一系列 L 计算的最终结果，其中 L = size(fft_length)。例如，对于 L = 3：

result1[i0, ..., :] = rfft(operand[i0, ..., :]).
result2[i0, ..., :, iR-1] = fft(result1[i0, ..., :, iR-1]).
result[i0, ..., :, iR-2, iR-1] = fft(result2[i0, ..., :, iR-2, iR-1]).

最后，假设函数 irfft 具有相同的类型签名，并计算 rfft 的逆：

对于 fft_type = IRFFT，result 定义为 fft_type = RFFT 的反转。例如，对于 L = 3：

result1[i0, ..., :, iR-2, iR-1] = ifft(operand[i0, ..., :, iR-2, iR-1]).
result2[i0, ..., :, iR-1] = ifft(result1[i0, ..., :, iR-1]).
result[i0, ..., :] = irfft(result2[i0, ..., :]).

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型的张量	(C1)、(C2)、(C4)、(C5)
（I2）	`fft_type`	`FFT`、`IFFT`、`RFFT` 和 `IRFFT` 的枚举	(C2)、(C5)
(I3)	`fft_length`	`si64` 类型的一维张量常数	(C1)、(C3)、(C4)

输出

名称	类型	限制条件
`result`	浮点或复杂类型的张量	(C2)、(C4)、(C5)

限制条件

(C1) size(fft_length) <= rank(operand)。
(C2) operand 和 result 元素类型之间的关系有所不同：
- 如果为 fft_type = FFT，则 element_type(operand) 和 element_type(result) 具有相同的复杂类型。
- 如果为 fft_type = IFFT，则 element_type(operand) 和 element_type(result) 具有相同的复杂类型。
- 如果为 fft_type = RFFT，则 element_type(operand) 是浮点类型，而 element_type(result) 是相同浮点语义的复杂类型。
- 如果为 fft_type = IRFFT，则 element_type(operand) 是复杂类型，而 element_type(result) 是相同浮点语义的浮点类型。
(C3) 1 <= size(fft_length) <= 3。
(C4) 如果在 operand 和 result 之间有一个浮点类型的张量 real，则 shape(real)[-size(fft_length):] = fft_length。
(C5) shape(result) = shape(operand)，但以下几项除外：
- 如果为 fft_type = RFFT，则返回 dim(result, -1) = dim(operand, -1) = 0 ? 0 : dim(operand, -1) / 2 + 1。
- 如果为 fft_type = IRFFT，则返回 dim(operand, -1) = dim(result, -1) = 0 ? 0 : dim(result, -1) / 2 + 1。

示例

// %operand: [(1.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0)]
%result = "stablehlo.fft"(%operand) {
  fft_type = #stablehlo<fft_type FFT>,
  fft_length = dense<4> : tensor<1xi64>
} : (tensor<4xcomplex<f32>>) -> tensor<4xcomplex<f32>>
// %result: [(1.0, 0.0), (1.0, 0.0), (1.0, 0.0), (1.0, 0.0)]

floor

语义

执行 operand 张量的元素级平面，并生成一个 result 张量。实现符合 IEEE-754 规范的 roundToIntegralTowardNegative 操作。对于量化类型，请执行 dequantize_op_quantize(floor, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [-0.8166, -0.2530, 0.2530, 0.8166, 2.0]
%result = "stablehlo.floor"(%operand) : (tensor<5xf32>) -> tensor<5xf32>
// %result: [-1.0, -1.0, 0.0, 0.0, 2.0]

更多示例

收集

语义

从 start_indices 中指定的偏移量收集 operand 张量的切片，并生成一个 result 张量。

通过一个具体示例，下图显示了 result 中的元素如何映射到 operand 中的元素。该图选择了几个示例 result 索引，并详细介绍了它们对应的 operand 索引。

更正式地说，是 result[result_index] = operand[operand_index]，其中：

batch_dims = [d for d in axes(result) and d not in offset_dims].
batch_index = result_index[batch_dims...].
start_index 的定义如下：
- start_indices[bi0, ..., :, ..., biN]，其中 bi 是 batch_index 中的各个元素，如果 index_vector_dim < rank(start_indices)，则会在 index_vector_dim 索引处插入 :。
- 否则为 [start_indices[batch_index]]。
对于 axes(operand) 中的 d_operand，
- 如果 d_operand = start_index_map[d_start]，则为 full_start_index[d_operand] = clamp(start_index[d_start], 0, dim(operand, d_operand) - slice_sizes[d_operand])。
- 否则为 full_start_index[d_operand] = 0。
offset_index = result_index[offset_dims...].
full_offset_index = [oi0, ..., 0, ..., oiN]，其中 oi 是 offset_index 中的各个元素，0 是在 collapsed_slice_dims 中的索引处插入。
operand_index = full_start_index + full_offset_index。

如果 indices_are_sorted 为 true，则实现可以假定 start_indices 相对于 start_index_map 进行排序，否则行为将处于未定义状态。更正式地说，适用于 indices(result) 中的所有 i1 < i2 和 full_start_index(i1) <= full_start_index(i2)。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1)、(C7)、(C10-C12)、(C14)
（I2）	`start_indices`	整数类型的张量	(C2)、(C3)、(C13)
(I3)	`offset_dims`	`si64` 类型的一维张量常数	(C1)、(C4-C5)、(C13)
（I4）	`collapsed_slice_dims`	`si64` 类型的一维张量常数	(C1)、(C6-C8)、(C13)
(I5)	`start_index_map`	`si64` 类型的一维张量常数	(C3)、(C9)、(C10)
(I6)	`index_vector_dim`	`si64` 类型的常量	(C2)、(C3)、(C13)
（I7）	`slice_sizes`	`si64` 类型的一维张量常数	(C8)、(C11-C13)
(I8)	`indices_are_sorted`	`i1` 类型的常量

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C5)、(C13-C14)

限制条件

(C1) rank(operand) = size(offset_dims) + size(collapsed_slice_dims)。
(C2) 0 <= index_vector_dim <= rank(start_indices)。
(C3) size(start_index_map) = index_vector_dim < rank(start_indices) ? dim(start_indices, index_vector_dim) : 1。
(C4) is_unique(offset_dims) and is_sorted(offset_dims)。
(C5) 0 <= offset_dims < rank(result)。
(C6) is_unique(collapsed_slice_dims) and is_sorted(collapsed_slice_dims)。
(C7) 0 <= collapsed_slice_dims < rank(operand)。
(C8) slice_sizes[collapsed_slice_dims...] <= 1。
(C9) is_unique(start_index_map)。
(C10) 0 <= start_index_map < rank(operand)。
(C11) size(slice_sizes) = rank(operand)。
(C12) 0 <= slice_sizes <= shape(operand)。
(C13) shape(result) = combine(batch_dim_sizes, offset_dim_sizes)，其中：
- batch_dim_sizes = shape(start_indices)，只不过不包含与 index_vector_dim 对应的 start_indices 的尺寸大小。
- offset_dim_sizes = shape(slice_sizes)，区别在于不包含 slice_sizes 中与 collapsed_slice_dims 对应的尺寸尺寸。
- combine 将 batch_dim_sizes 放置在与 batch_dims 对应的轴上，并将 offset_dim_sizes 放置在与 offset_dims 对应的轴上。
(C14) element_type(operand) = element_type(result)。

示例

// %operand: [
//            [[1, 2], [3, 4], [5, 6], [7, 8]],
//            [[9, 10],[11, 12], [13, 14], [15, 16]],
//            [[17, 18], [19, 20], [21, 22], [23, 24]]
//           ]
// %start_indices: [
//                  [[0, 0], [1, 0], [2, 1]],
//                  [[0, 1], [1, 1], [0, 2]]
//                 ]
%result = "stablehlo.gather"(%operand, %start_indices) {
  dimension_numbers = #stablehlo.gather<
    offset_dims = [2, 3],
    collapsed_slice_dims = [0],
    start_index_map = [1, 0],
    index_vector_dim = 2>,
  slice_sizes = dense<[1, 2, 2]> : tensor<3xi64>,
  indices_are_sorted = false
} : (tensor<3x4x2xi32>, tensor<2x3x2xi64>) -> tensor<2x3x2x2xi32>
// %result: [
//            [
//              [[1, 2], [3, 4]],
//              [[3, 4], [5, 6]],
//              [[13, 14], [15, 16]]
//            ],
//            [
//              [[9, 10], [11, 12]],
//              [[11, 12], [13, 14]],
//              [[17, 18], [19, 20]]
//            ]
//          ]

更多示例

get_dimension_size

语义

生成 operand 的指定 dimension 的大小。更正式地说，是 result = dim(operand, dimension)。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	张量	(C1)
（I2）	`dimension`	`si64` 类型的常量	(C1)

输出

名称	类型
`result`	`si32` 类型的 0 维张量

限制条件

(C1) 0 <= dimension < rank(operand)。

示例

// %operand: [[1, 2, 3], [4, 5, 6]]
%result = "stablehlo.get_dimension_size"(%operand) {
  dimension = 1 : i64
} : (tensor<2x3xi64>) -> tensor<i32>
// %result: 3

更多示例

get_tuple_element

语义

提取 operand 元组的 index 位置处的元素，并生成一个 result。更正式地说，result = operand[index]。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	tuple	(C1)、(C2)
（I2）	`index`	`si32` 类型的常量	(C1)、(C2)

输出

名称	类型	限制条件
`result`	任何受支持的类型	(C2)

限制条件

(C1) 0 <= index < size(operand)。
(C2) type(result) = tuple_element_types(operand)[index]。

示例

// %operand: ([1.0, 2.0], (3))
%result = "stablehlo.get_tuple_element"(%operand) {
  index = 0 : i32
} : (tuple<tensor<2xf32>, tuple<tensor<i32>>>) -> tensor<2xf32>
// %result: [1.0, 2.0]

更多示例

if

语义

根据 pred 的值，通过正好执行 true_branch 或 false_branch 中的一个函数来生成输出。更正式地说，result = pred ? true_branch() : false_branch()。

输入内容

标签	名称	类型	限制条件
（I1）	`pred`	`i1` 类型的 0 维张量
（I2）	`true_branch`	function	(C1-C3)
(I3)	`false_branch`	function	(C1)、(C2)

输出

名称	类型	限制条件
`results`	可变数量的张量、量化张量或词元	(C3)

限制条件

(C1) input_types(true_branch) = input_types(false_branch) = []。
(C2) output_types(true_branch) = output_types(false_branch)。
(C3) type(results...) = output_types(true_branch)。

示例

// %result_true_branch: 10
// %result_false_branch: 11
// %pred: true
%result = "stablehlo.if"(%pred) ({
  "stablehlo.return"(%result_true_branch) : (tensor<i32>) -> ()
}, {
  "stablehlo.return"(%result_false_branch) : (tensor<i32>) -> ()
}) : (tensor<i1>) -> tensor<i32>
// %result: 10

更多示例

Imag

语义

从 operand 按元素提取虚部，并生成一个 result 张量。更正式地说，对于每个 x 元素：imag(x) = is_complex(x) ? imaginary_part(x) : constant(0, element_type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型的张量	(C1)、(C2)

输出

名称	类型	限制条件
`result`	浮点类型的张量	(C1)、(C2)

限制条件

(C1) shape(result) = shape(operand)。
(C2) element_type(result) 的定义如下：
- 如果 is_complex(operand)，则为 complex_element_type(element_type(operand))。
- 否则为 element_type(operand)。

示例

// %operand: [(1.0, 2.0), (3.0, 4.0)]
%result = "stablehlo.imag"(%operand) : (tensor<2xcomplex<f32>>) -> tensor<2xf32>
// %result: [2.0, 4.0]

更多示例

信息流

语义

从信息流读取数据并生成 results。

infeed_config 的语义是由实现定义的。

results 由载荷值在前和令牌值组成。将来，我们计划将载荷和令牌拆分为两个单独的输出以提高清晰度 (#670)。

输入内容

标签	名称	类型
（I1）	`token`	`token`
（I2）	`infeed_config`	`string` 类型的常量

输出

名称	类型	限制条件
`results`	可变数量的张量、量化张量或词元	(C1-C3)

限制条件

(C1) 0 < size(results)。
(C2) is_empty(result[:-1]) 或 is_tensor(type(results[:-1]))。
(C3) is_token(type(results[-1]))。

示例

// %token: !stablehlo.token
// infeed_queue[0]: [[1, 2], [3, 4]]
// infeed_queue[1]: [[5, 6], [7, 8]]
%results0:2 = "stablehlo.infeed"(%token) {
  infeed_config = ""
} : (!stablehlo.token) -> (tensor<2x2xi64>, !stablehlo.token)
// results0#0: [[1, 2], [3, 4]]
%results1:2 = "stablehlo.infeed"(%token) {
  infeed_config = ""
} : (!stablehlo.token) -> (tensor<2x2xi64>, !stablehlo.token)
// results1#0: [[5, 6], [7, 8]]

更多示例

Iota

语义

使用 iota_dimension 维度上从零开始递增的值填充 output 张量。更正式地说，

output[result_index] = constant(is_quantized(output) ? quantize(result_index[iota_dimension], element_type(output)) : result_index[iota_dimension], element_type(output)).

输入内容

标签	名称	类型	限制条件
（I1）	`iota_dimension`	`si64`	(C1)

输出

名称	类型	限制条件
`output`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

限制条件

(C1) 0 <= iota_dimension < rank(output)。

示例

%output = "stablehlo.iota"() {
  iota_dimension = 0 : i64
} : () -> tensor<4x5xi32>
// %output: [
//           [0, 0, 0, 0, 0],
//           [1, 1, 1, 1, 1],
//           [2, 2, 2, 2, 2],
//           [3, 3, 3, 3, 3]
//          ]

%output = "stablehlo.iota"() {
  iota_dimension = 1 : i64
} : () -> tensor<4x5xi32>
// %output: [
//           [0, 1, 2, 3, 4],
//           [0, 1, 2, 3, 4],
//           [0, 1, 2, 3, 4],
//           [0, 1, 2, 3, 4]
//          ]

更多示例

is_finite

语义

执行元素级检查 x 中的值是否为有限值（即既不是 +Inf、-Inf 也不是 NaN），并生成 y 张量。实现符合 IEEE-754 规范的 isFinite 操作。对于量化类型，结果始终为 true。

输入内容

标签	名称	类型	限制条件
（I1）	`x`	浮点类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`y`	布尔值类型的张量	(C1)

限制条件

(C1) shape(x) = shape(y)。

示例

// Logical values: -Inf, +Inf, NaN, ...
// %x: [0xFFF0000000000000, 0x7FF0000000000000, 0x7FF8000000000000, -10.0, -0.0, 0.0, 10.0]
%y = "stablehlo.is_finite"(%x) : (tensor<7xf64) -> tensor<7xi1>
// %y: [false, false, false, true, true, true, true]

更多示例

日志

语义

对 operand 张量执行元素级对数运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 log。
对于复数：复数对数。
对于量化类型：dequantize_op_quantize(log, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [[1.0, 2.0], [3.0, 4.0]]
%result = "stablehlo.log"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [[0.0, 0.69314718055994529], [1.0986122886681098, 1.3862943611198906]]

更多示例

log_plus_one

语义

对 operand 张量执行元素级对数加一次运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 logp1。
适用于复数：复数对数加 1。
对于量化类型：dequantize_op_quantize(log_plus_one, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [0.0, -0.999, 7.0, 6.38905621, 15.0]
%result = "stablehlo.log_plus_one"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// %result: [0.0, -6.90776825, 2.07944155, 2.0, 2.77258873]

更多示例

逻辑

语义

对 operand 张量执行元素级逻辑运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 division(1, addition(1, exp(-x)))。
适用于复数：复数逻辑。
对于量化类型：dequantize_op_quantize(logistic, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [[0.0, 1.0], [2.0, 3.0]]
%result = "stablehlo.logistic"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [[0.5, 0.73105858], [0.88079708, 0.95257413]]

更多示例

map

语义

沿着 dimensions 将映射函数 computation 应用于 inputs，并生成一个 result 张量。

更正式地说，是 result[result_index] = computation(inputs...[result_index])。请注意，dimensions 目前未使用，日后可能会被移除 (#487)。

输入内容

标签	名称	类型	限制条件
（I1）	`inputs`	可变张量数量或每个张量量化张量	(C1-C4)
（I2）	`dimensions`	`si64` 类型的一维张量常数	(C3)
(I3)	`computation`	function	(C4)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)、(C4)

限制条件

(C1) shape(inputs...) = shape(result)。
(C2) 0 < size(inputs) = N。
(C3) dimensions = range(rank(inputs[0]))。
(C4) computation 的类型为 (tensor<E0>, ..., tensor<EN-1>) -> tensor<E'>，其中 Ei = element_type(inputs[i]) 和 E' = element_type(result)。

示例

// %input0: [[0, 1], [2, 3]]
// %input1: [[4, 5], [6, 7]]
%result = "stablehlo.map"(%input0, %input1) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = stablehlo.multiply %arg0, %arg1 : tensor<i64>
    stablehlo.return %0 : tensor<i64>
}) {
  dimensions = dense<[0, 1]> : tensor<2xi64>
} : (tensor<2x2xi64>, tensor<2x2xi64>) -> tensor<2x2xi64>
// %result: [[0, 5], [12, 21]]

更多示例

最大值

语义

对 lhs 和 rhs 张量执行元素级最大运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于布尔值：逻辑 OR。
对于整数：最大值为整数。
对于浮点数：IEEE-754 中的 maximum。
对于复数：(real, imaginary) 对的字典顺序最大值。对复数施加排序涉及令人惊讶的语义，因此未来我们计划取消对此操作的支持 (#560)。
对于量化类型：
- dequantize_op_quantize(maximum, lhs, rhs, type(result)).

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	或每张量量化张量	(C1)
（I2）	`rhs`	或每张量量化张量	(C1)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)

限制条件

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

示例

// %lhs: [[1, 2], [7, 8]]
// %rhs: [[5, 6], [3, 4]]
%result = "stablehlo.maximum"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 6], [7, 8]]

更多示例

最小值

语义

对 lhs 和 rhs 张量执行元素级最小运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于布尔值：逻辑 AND。
对于整数：最小值为整数。
对于浮点数：IEEE-754 中的 minimum。
对于复数：(real, imaginary) 对的字典顺序最小值。对复数施加排序涉及令人惊讶的语义，因此未来我们计划取消对此操作的支持 (#560)。
对于量化类型：
- dequantize_op_quantize(minimum, lhs, rhs, type(result)).

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	或每张量量化张量	(C1)
（I2）	`rhs`	或每张量量化张量	(C1)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)

限制条件

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

示例

// %lhs: [[1, 2], [7, 8]]
// %rhs: [[5, 6], [3, 4]]
%result = "stablehlo.minimum"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[1, 2], [3, 4]]

更多示例

乘号

语义

执行两个张量 lhs 和 rhs 的元素级积，并生成一个 result 张量。根据元素类型，执行以下操作：

对于布尔值：逻辑 AND。
对于整数：整数乘法。
对于浮点数：IEEE-754 中的 multiplication。
适用于复数：复数乘法。
对于量化类型：
- dequantize_op_quantize(multiply, lhs, rhs, type(result)).

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	或每张量量化张量	(C1)
（I2）	`rhs`	或每张量量化张量	(C1)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.multiply"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 12], [21, 32]]

更多示例

negate

语义

对 operand 张量执行元素级否定运算并生成 result 张量。根据元素类型，执行以下操作：

对于有符号整数：整数否定。
对于无符号整数：位转换为有符号整数、整数否定、位转换回无符号整数。
对于浮点数：IEEE-754 中的 negate。
对于复数：复数否定。
对于量化类型：dequantize_op_quantize(negate, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

输出

名称	类型	限制条件
`result`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// Negation operation with integer Tensors
// %operand: [0, -2]
%result = "stablehlo.negate"(%operand) : (tensor<2xi32>) -> tensor<2xi32>
// %result: [0, 2]

// Negation operation with with complex tensors
// %operand: (2.5, 0.0)
%result = "stablehlo.negate"(%operand) : (tensor<1xcomplex<f32>>) -> tensor<1xcomplex<f32>>
// %result: [-2.5, -0.0]

更多示例

非

语义

对 operand 张量执行非元素级操作，并生成一个 result 张量。根据元素类型，执行以下操作：

对于布尔值：逻辑 NOT。
对于整数：按位 NOT。

参数

名称	类型	限制条件
`operand`	布尔值或整数类型的张量	(C1)

输出

名称	类型	限制条件
`result`	布尔值或整数类型的张量	(C1)

限制条件

(C1) type(operand) = type(result)。

示例

// Bitwise operation with with integer tensors
// %operand: [[1, 2], [3, 4]]
%result = "stablehlo.not"(%operand) : (tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[-2, -3], [-4, -5]]

// Bitwise operation with with boolean tensors
// %operand: [true, false]
%result = "stablehlo.not"(%operand) : (tensor<2xi1>) -> tensor<2xi1>
// %result: [false, true]

optimization_barrier

语义

确保生成 operand 的操作先于任何依赖于 result 的操作执行，并防止编译器转换跨屏障移动操作。除此之外，操作是身份，即 result = operand。

参数

名称	类型	限制条件
`operand`	可变张量数量、每个张量量化张量或词元	(C1)

输出

名称	类型	限制条件
`result`	可变张量数量、每个张量量化张量或词元	(C1)

限制条件

(C1) type(operand...) = type(result...)。

示例

// %operand0: 0.0
// %operand1: 1.0
%result0, %result1 = "stablehlo.optimization_barrier"(%operand0, %operand1) : (tensor<f32>, tensor<f32>) -> (tensor<f32>, tensor<f32>)
// %result0: 0.0
// %result1: 1.0

更多示例

或

语义

对两个张量 lhs 和 rhs 执行元素级 OR 运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于布尔值：逻辑 OR。
对于整数：按位 OR。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	整数或布尔值类型的张量	(C1)
（I2）	`rhs`	整数或布尔值类型的张量	(C1)

输出

名称	类型	限制条件
`result`	整数或布尔值类型的张量	(C1)

限制条件

(C1) type(lhs) = type(rhs) = type(result)。

示例

// Bitwise operation with with integer tensors
// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.or"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 6], [7, 12]]

// Logical operation with with boolean tensors
// %lhs: [[false, false], [true, true]]
// %rhs: [[false, true], [false, true]]
%result = "stablehlo.or"(%lhs, %rhs) : (tensor<2x2xi1>, tensor<2x2xi1>) -> tensor<2x2xi1>
// %result: [[false, true], [true, true]]

外播

语义

将 inputs 写入输出并生成 result 词元。

outfeed_config 的语义是由实现定义的。

输入内容

标签	名称	类型
（I1）	`inputs`	可变数量的张量或量化张量
（I2）	`token`	`token`
(I3)	`outfeed_config`	`string` 类型的常量

输出

名称	类型
`result`	`token`

示例

%result = "stablehlo.outfeed"(%inputs0, %token) {
  outfeed_config = ""
} : (tensor<2x2x2xi64>, !stablehlo.token) -> !stablehlo.token

更多示例

垫

语义

通过在张量周围以及具有给定 padding_value 的张量元素之间填充内边距来扩展 operand。

edge_padding_low 和 edge_padding_high 分别指定在每个维度的低端（索引 0 旁边）和高端（最高索引旁边）处添加的内边距。内边距可以为负数，其中负内边距的绝对值表示要从指定维度中移除的元素数量。

interior_padding 用于指定每个维度的任意两个元素之间添加的内边距大小（不得为负数）。内部内边距发生在边缘内边距之前，这样一来，负边缘内边距将会移除内内边距运算数中的元素。

更正式地说，result[result_index] 的定义如下：

如果 result_index = edge_padding_low + operand_index * (interior_padding + 1)，则为 operand[operand_index]。
否则为 padding_value。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1)、(C2)、(C4)
（I2）	`padding_value`	0 维张量或每张量量化张量	(C1)
(I3)	`edge_padding_low`	`si64` 类型的一维张量常数	(C1)、(C4)
（I4）	`edge_padding_high`	`si64` 类型的一维张量常数	(C1)、(C4)
(I5)	`interior_padding`	`si64` 类型的一维张量常数	(C2-C4)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C3-C6)

限制条件

(C1) element_type(operand) = element_type(padding_value) = element_type(result)。
(C2) size(edge_padding_low) = size(edge_padding_high) = size(interior_padding) = rank(operand)。
(C3) 0 <= interior_padding。
(C4) shape(result) = shape(operand) + edge_padding_low + max(shape(operand) - 1, 0) * interior_padding + edge_padding_high。

示例

// %operand: [
//            [1, 2, 3],
//            [4, 5, 6]
//           ]
// %padding_value: 0
%result = "stablehlo.pad"(%operand, %padding_value) {
  edge_padding_low = dense<[0, 1]> : tensor<2xi64>,
  edge_padding_high = dense<[2, 1]> : tensor<2xi64>,
  interior_padding = dense<[1, 2]> : tensor<2xi64>
} : (tensor<2x3xi32>, tensor<i32>) -> tensor<5x9xi32>
// %result: [
//           [0, 1, 0, 0, 2, 0, 0, 3, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 4, 0, 0, 5, 0, 0, 6, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0]
//          ]

更多示例

partition_id

语义

生成当前进程的 partition_id。

输出

名称	类型
`result`	`ui32` 类型的 0 维张量

示例

%result = "stablehlo.partition_id"() : () -> tensor<ui32>

更多示例

爆破

语义

对 operand 张量中设置的位数执行元素级计数，并生成一个 result 张量。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	整数类型的张量	(C1)

输出

名称	类型	限制条件
`result`	整数类型的张量	(C1)

限制条件

(C1) type(operand) = type(result)。

示例

// %operand: [0, 1, 2, 127]
%result = "stablehlo.popcnt"(%operand) : (tensor<4xi64>) -> tensor<4xi64>
// %result: [0, 1, 1, 7]

更多示例

幂数

语义

通过 rhs 张量对 lhs 张量执行元素级指数运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于整数：整数指数。
对于浮点数：IEEE-754 中的 pow。
对于复数：复数指数。
对于量化类型：dequantize_op_quantize(power, lhs, rhs, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)
（I2）	`rhs`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

输出

名称	类型	限制条件
`result`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %lhs: [-2.0, -0.0, -36.0, 5.0, 3.0, 10000.0]
// %rhs: [2.0, 2.0, 1.1, 2.0, -1.0, 10.0]
%result = "stablehlo.power"(%lhs, %rhs) : (tensor<6xf64>, tensor<6xf64>) -> tensor<6xf64>
// %result: [4.0, 0.0, -nan, 25.0, 0.333333343, inf]

更多示例

real

语义

从 operand 按元素提取实际部分，并生成一个 result 张量。更正式地说，对于每个 x 元素：real(x) = is_complex(x) ? real_part(x) : x。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型的张量	(C1)、(C2)

输出

名称	类型	限制条件
`result`	浮点类型的张量	(C1)、(C2)

限制条件

(C1) shape(result) = shape(operand)。
(C2) element_type(result) 的定义如下：
- 如果 is_complex(operand)，则为 complex_element_type(element_type(operand))。
- 否则为 element_type(operand)。

示例

// %operand: [(1.0, 2.0), (3.0, 4.0)]
%result = "stablehlo.real"(%operand) : (tensor<2xcomplex<f32>>) -> tensor<2xf32>
// %result: [1.0, 3.0]

更多示例

接收

语义

使用 channel_id 接收来自频道的数据并生成 results。

如果 is_host_transfer 为 true，则操作会从主机传输数据。否则，它会从其他设备传输数据。这意味着，是由实现定义的。该标志与 channel_type 中提供的信息重复，因此将来我们计划只保留其中一个 (#666)。

results 由载荷值在前和令牌值组成。将来，我们计划将载荷和令牌拆分为两个单独的输出以提高清晰度 (#670)。

输入内容

标签	名称	类型	限制条件
（I1）	`token`	`token`	(C4)
（I2）	`channel_id`	`si64` 类型的常量
(I3)	`channel_type`	`DEVICE_TO_DEVICE` 和 `HOST_TO_DEVICE` 的枚举	(C1)
（I4）	`is_host_transfer`	`i1` 类型的常量	(C1)

输出

名称	类型	限制条件
`results`	可变数量的张量、量化张量或词元	(C2-C4)

限制条件

(C1) channel_type 的定义如下：
- 如果 is_host_transfer = true，则为 HOST_TO_DEVICE；
- 否则为 DEVICE_TO_DEVICE。
(C2) 0 < size(results)。
(C3) is_empty(result[:-1]) 或 is_tensor(type(results[:-1]))。
(C4) is_token(type(results[-1]))。

示例

%results0, %results1 = "stablehlo.recv"(%token) {
  channel_handle = #stablehlo.channel_handle<handle = 1, type = 3>,
  is_host_transfer = true
} : (!stablehlo.token) -> (tensor<2x2xi64>, !stablehlo.token)

更多示例

reduce

语义

沿着 dimensions 对 inputs 和 init_values 应用归约函数 body，并生成 results 张量。

归约顺序由实现定义，这意味着 body 和 init_values 必须形成单声道，以确保相应运算针对所有实现上的所有输入生成相同的结果。但是，这个条件不适用于许多常见的缩减。例如，body 的浮点加法和 init_values 的 0 实际上不会形成单声道，因为浮点加法不遵守结合律。

更正式地说，是 results...[j0, ..., jR-1] = reduce(input_slices_converted)，其中：

input_slices = inputs...[j0, ..., :, ..., jR-1]，其中 : 插入 dimensions 处。
input_slices_converted = to_destination_type(input_slices..., type(func_inputs(body)[:len(func_inputs(body))//2])...).
init_values_converted = to_destination_type(init_values..., type(func_inputs(body)[len(func_inputs(body))//2:])...).
reduce(input_slices_converted) = exec(schedule)，适用于某些二元树 schedule，其中：
- exec(node) = body(exec(node.left), exec(node.right)).
- exec(leaf) = leaf.value.
schedule 是实现定义的完整二元树，其有序遍历包括：
- input_slices_converted...[index] 值，适用于 index_space(input_slices_converted) 中的所有 index（按 index 的字典顺序升序）。
- 在实现定义的位置插入一个实现定义数量的 init_values_converted。

输入内容

标签	名称	类型	限制条件
（I1）	`inputs`	可变张量数量或每个张量量化张量	(C1-C4)、(C6)、(C7)
（I2）	`init_values`	零维张量或每张量量化张量的可变数	(C2)、(C3)
(I3)	`dimensions`	`si64` 类型的一维张量常数	(C4)、(C5)、(C7)
（I4）	`body`	function	(C6)

输出

名称	类型	限制条件
`results`	可变张量数量或每个张量量化张量	(C3)、(C7)、(C8)

限制条件

(C1) same(shape(inputs...))。
(C2) element_type(inputs...) = element_type(init_values...)。
(C3) 0 < size(inputs) = size(init_values) = size(results) = N。
(C4) 0 <= dimensions < rank(inputs[0])。
(C5) is_unique(dimensions)。
(C6) body 的类型为 (tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ..., tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>)，其中 is_promotable(element_type(inputs[i]), Ei)。
(C7) shape(results...) = shape(inputs...)，只不过不包含与 dimensions 对应的 inputs... 的尺寸大小。
(C8) 针对 [0,N) 中的所有 i 使用 element_type(results[i]) = Ei。

示例

// %input = [[0, 1, 2, 3, 4, 5]]
// %init_value = 0
%result = "stablehlo.reduce"(%input, %init_value) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  dimensions = dense<1> : tensor<1xi64>
} : (tensor<1x6xi64>, tensor<i64>) -> tensor<1xi64>
// %result = [15]

更多示例

reduce_precision

语义

执行 operand 到另一个使用 exponent_bits 和 mantissa_bits 的浮点类型的元素级转换，然后再转换回原始浮点类型并生成 output 张量。

更正式一点：

原始值的尾数位会更新，以将原始值四舍五入为可使用 mantissa_bits 使用 roundToIntegralTiesToEven 语义表示的最接近的值。
然后，如果 mantissa_bits 小于原始值的尾数位，则尾数位会被截断为 mantissa_bits。
然后，如果中间结果的指数位不在 exponent_bits 提供的范围内，则中间结果会使用原始符号溢出到无穷大，或使用原始符号下溢到零。
对于量化类型，执行 dequantize_op_quantize( lambda operand: reduce_precision(operand, exponent_bits, mantissa_bits), operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点类型或每个张量量化张量的张量	(C1)
（I2）	`exponent_bits`	`si32` 类型的常量	(C2)
(I3)	`mantissa_bits`	`si32` 类型的常量	(C3)

输出

名称	类型	限制条件
`output`	浮点类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(output)。
(C2) 1 <= exponent_bits。
(C3) 0 <= mantissa_bits。

示例

// Logical values: +Inf, NaN, +Denormal, 0.0, 65519.0, 65520.0
// %operand: [0x7FF0000000000000, 0x7FFFFFFFFFFFFFFF, 0x0000000000000001, 0.0, 65519.0, 65520.0]
%output = "stablehlo.reduce_precision"(%operand) {
  exponent_bits = 5 : i32,
  mantissa_bits = 10 : i32
} : (tensor<6xf64>) -> tensor<6xf64>
// Logical values: +Inf, NaN, 0.0, 0.0, 65504.0, +Inf
// %output: [0x7FF0000000000000, 0x7FFFFFFFFFFFFFFF, 0.0, 0.0, 65504.0, 0x7FF0000000000000]

更多示例

reduce_scatter

语义

在 StableHLO 进程网格中的每个进程组内，使用 computations 对每个进程中的 operand 张量值执行归约，沿着 scatter_dimension 将归约结果拆分为部分，并在进程之间散布拆分的部分，以生成 result。

该操作将 StableHLO 进程网格拆分为 process_groups，其定义如下：

如果 channel_id <= 0 and use_global_device_ids = false，则为 cross_replica(replica_groups)。
如果 channel_id > 0 and use_global_device_ids = false，则为 cross_replica_and_partition(replica_groups)。
如果 channel_id > 0 and use_global_device_ids = true，则为 flattened_ids(replica_groups)。

之后，在每个 process_group 中：

reduced_value = all_reduce(operand, replica_groups, channel_id, use_global_device_ids, computation).
parts@sender = split(reduced_value@sender, dim(process_groups, 1), scatter_dimension).
result@receiver = parts@sender[receiver_index] - 针对 process_group 中的所有 sender，其中 receiver_index = process_group.index(receiver)。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1)、(C2)、(C7)、(C8)
（I2）	`scatter_dimension`	`si64` 类型的常量	(C1)、(C2)、(C8)
(I3)	`replica_groups`	`si64` 类型的二维张量常数	(C3-C5)
（I4）	`channel_id`	`si64` 类型的常量	(C6)
(I5)	`use_global_device_ids`	`i1` 类型的常量	(C6)
(I6)	`computation`	function	(C7)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C8-C9)

限制条件

(C1) dim(operand, scatter_dimension) % dim(process_groups, 1) = 0。
(C2) 0 <= scatter_dimension < rank(operand)。
(C3) is_unique(replica_groups)。
(C4) size(replica_groups) 的定义如下：
- 如果使用 cross_replica，则为 num_replicas。
- 如果使用 cross_replica_and_partition，则为 num_replicas。
- 如果使用 flattened_ids，则为 num_processes。
(C5) 0 <= replica_groups < size(replica_groups)。
(C6) 如果 use_global_device_ids = true，则 channel_id > 0。
(C7) computation 的类型为 (tensor<E>, tensor<E>) -> (tensor<E>)，其中 is_promotable(element_type(operand), E)。
(C8) shape(result) = shape(operand)，但以下情况除外：
- dim(result, scatter_dimension) = dim(operand, scatter_dimension) / dim(process_groups, 1).
(C9) element_type(result) = E。

示例

// num_replicas: 2
// num_partitions: 1
// %operand@(0, 0): [[1, 2, 3, 4],
//                   [5, 6, 7, 8]]
// %operand@(1, 0): [[9, 10, 11, 12],
//                   [13, 14, 15, 16]]
%result = "stablehlo.reduce_scatter"(%operand) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
  %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
  "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  scatter_dimension = 1 : i64,
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor<2x4xi64>) -> tensor<2x2xi64>
//
// %result@(0, 0): [[10, 12],
//                  [18, 20]]
// %result@(1, 0): [[14, 16],
//                  [22, 24]]

更多示例

reduce_window

语义

将归约函数 body 应用于 inputs 和 init_values 的窗口，并生成 results。

下图显示了如何使用一个具体示例根据 inputs... 计算 results... 中的元素。

更正式地说，是 results...[result_index] = reduce(windows, init_values, axes(inputs...), body)（请参阅减少），其中：

padded_inputs = pad(inputs..., init_values..., padding[:, 0], padding[:, 1], base_dilations - 1).
window_start = result_index * window_strides.
window_end = window_start + (window_dimensions - 1) * window_dilations + 1.
windows = slice(padded_inputs..., window_start, window_end, window_dilations).

输入内容

标签	名称	类型	限制条件
（I1）	`inputs`	可变张量数量或每个张量量化张量	(C1-C4)、(C6)、(C8)、(C10)、(C12)、(C13)、(C15)
（I2）	`init_values`	零维张量或每张量量化张量的可变数	(C1)、(C13)
(I3)	`window_dimensions`	`si64` 类型的一维张量常数	(C4)、(C5)、(C15)
（I4）	`window_strides`	`si64` 类型的一维张量常数	(C6)、(C7)、(C15)
(I5)	`base_dilations`	`si64` 类型的一维张量常数	(C8)、(C9)、(C15)
(I6)	`window_dilations`	`si64` 类型的一维张量常数	(C10)、(C11)、(C15)
（I7）	`padding`	`si64` 类型的二维张量常数	(C12)、(C15)
(I8)	`body`	function	(C13)

输出

名称	类型	限制条件
`results`	可变张量数量或每个张量量化张量	(C1)、(C14-C16)

限制条件

(C1) 0 < size(inputs) = size(init_values) = size(results) = N。
(C2) same(shape(inputs...))。
(C3) element_type(inputs...) = element_type(init_values...)。
(C4) size(window_dimensions) = rank(inputs[0])。
(C5) 0 < window_dimensions。
(C6) size(window_strides) = rank(inputs[0])。
(C7) 0 < window_strides。
(C8) size(base_dilations) = rank(inputs[0])。
(C9) 0 < base_dilations。
(C10) size(window_dilations) = rank(inputs[0])。
(C11) 0 < window_dilations。
(C12) shape(padding) = [rank(inputs[0]), 2]。
(C13) body 的类型为 (tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ..., tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>)，其中 is_promotable(element_type(inputs[i]), Ei)。
(C14) same(shape(results...))。
(C15) shape(results[0]) = num_windows，其中：
- dilated_input_shape = shape(inputs[0]) = 0 ? 0 : (shape(inputs[0]) - 1) * base_dilations + 1.
- padded_input_shape = padding[:, 0] + dilated_input_shape + padding[:, 1].
- dilated_window_shape = (window_dimensions - 1) * window_dilations + 1.
- is_empty_window = padded_input_shape = 0 || dilated_window_shape > padded_input_shape.
- num_windows = is_empty_window ? 0 : floor((padded_input_shape - dilated_window_shape) / window_strides) + 1.
(C16) 针对 [0,N) 中的所有 i 使用 element_type(results[i]) = Ei。

示例

// %input = [[1, 2], [3, 4], [5, 6]]
// %init_value = 0
%result = "stablehlo.reduce_window"(%input, %init_value) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  window_dimensions = dense<[2, 1]> : tensor<2xi64>,
  window_strides = dense<[4, 1]> : tensor<2xi64>,
  base_dilations = dense<[2, 1]> : tensor<2xi64>,
  window_dilations = dense<[3, 1]> : tensor<2xi64>,
  padding = dense<[[2, 1], [0, 0]]> : tensor<2x2xi64>
} : (tensor<3x2xi64>, tensor<i64>) -> tensor<2x2xi64>
// %result = [[0, 0], [3, 4]]

更多示例

余数

语义

执行被除数 lhs 和除数 rhs 张量的元素级余数，并生成 result 张量。

更正式地说，结果的符号取自被除数，结果的绝对值始终小于除数的绝对值。余数按 lhs - d * rhs 计算，其中 d 的计算公式如下：

对于整数：stablehlo.divide(lhs, rhs)。
对于浮点数：基于 IEEE-754 的 division(lhs, rhs)，具有舍入属性 roundTowardZero。
对于复数：待定 (#997)。
对于量化类型：
- dequantize_op_quantize(remainder, lhs, rhs, type(result)).

对于浮点元素类型，此运算与 IEEE-754 规范中的 remainder 运算相反，其中 d 是最接近 lhs/rhs 的精确值且等于偶数的整数值。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)
（I2）	`rhs`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

输出

名称	类型	限制条件
`result`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %lhs: [17, -17, 17, -17]
// %rhs: [3, 3, -3, -3]
%result = "stablehlo.remainder"(%lhs, %rhs) : (tensor<4xi64>, tensor<4xi64>) -> tensor<4xi64>
// %result: [2, -2, 2, -2]

更多示例

replica_id

语义

生成当前进程的 replica_id。

输出

名称	类型
`result`	`ui32` 类型的 0 维张量

示例

%result = "stablehlo.replica_id"() : () -> tensor<ui32>

更多示例

调整形状

语义

将 operand 张量重塑为 result 张量。从概念上讲，这相当于保留相同的规范化表示法，但可能会改变形状，例如从 tensor<2x3xf32> 更改为 tensor<3x2xf32> 或 tensor<6xf32>。

更正式地说，是 result[result_index] = operand[operand_index]，其中 result_index 和 operand_index 在 index_space(result) 和 index_space(operand) 的字典顺序中具有相同的位置。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或量化张量	(C1-C3)

输出

名称	类型	限制条件
`result`	或量化张量	(C1-C3)

限制条件

(C1) element_type(result) 的计算公式如下：
- element_type(operand)（如果 !is_per_axis_quantized(operand)）。
- element_type(operand)，但 quantization_dimension(operand) 和 quantization_dimension(result) 可能会有所不同。
(C2) size(operand) = size(result)。
(C3) 如果 is_per_axis_quantized(operand)：
- reduce(dims(operand, [0, 1, ..., quantization_dimension(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [0, 1, ..., quantization_dimension(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y).
- dim(operand, quantization_dimension(operand)) = dim(result, quantization_dimension(result)).
- reduce(dims(operand, [quantization_dimension(operand) + 1, ..., rank(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [quantization_dimension(result) + 1, ..., rank(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y).

示例

// %operand: [[1, 2, 3], [4, 5, 6]]]
%result = "stablehlo.reshape"(%operand) : (tensor<2x3xi32>) -> tensor<3x2xi32>
// %result: [[1, 2], [3, 4], [5, 6]]

更多示例

reverse

语义

逆转 operand 中元素的顺序沿指定的 dimensions 方向，并生成一个 result 张量。更正式地说，是 result[result_index] = operand[operand_index]，其中：

operand_index[d] = dim(result, d) - result_index[d] - 1（如果 dimensions 为 d）。
否则为 operand_index[d] = result_index[d]。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1)、(C3)
（I2）	`dimensions`	`si64` 类型的一维张量常数	(C2)、(C3)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)、(C3)

限制条件

(C1) type(operand) = type(result)。
(C2) is_unique(dimensions)。
(C3) 0 <= dimensions < rank(result)。

示例

// %operand = [[1, 2], [3, 4], [5, 6]]
%result = "stablehlo.reverse"(%operand) {
  dimensions = dense<1> : tensor<1xi64>
} : (tensor<3x2xi32>) -> tensor<3x2xi32>
// %result: [[2, 1], [4, 3], [6, 5]]

更多示例

广播

语义

使用 rng_distribution 算法生成随机数字，并生成给定形状 shape 的 result 张量。

如果为 rng_distribution = UNIFORM，则按照间隔 [a, b) 内的均匀分布生成随机数字。如果为 a >= b，则行为未定义。

如果为 rng_distribution = NORMAL，则按照正态分布生成随机数字，其中平均值 = a，标准差 = b。如果为 b < 0，则行为未定义。

生成随机数的确切方式由实现定义。例如，它们不一定具有确定性，不一定使用隐藏状态。

在与许多利益相关方的对话中，我们发现此操作已被有效地废弃，因此未来我们计划探索如何将其移除 (#597)。

输入内容

标签	名称	类型	限制条件
（I1）	`a`	整数、布尔值或浮点类型的 0 维张量	(C1)、(C2)
（I2）	`b`	整数、布尔值或浮点类型的 0 维张量	(C1)、(C2)
(I3)	`shape`	`si64` 类型的一维张量常数	(C3)
（I4）	`rng_distribution`	`UNIFORM` 和 `NORMAL` 的枚举	(C2)

输出

名称	类型	限制条件
`result`	整数、布尔值或浮点类型的张量	(C1-C3)

限制条件

(C1) element_type(a) = element_type(b) = element_type(result)。
(C2) 如果 rng_distribution = NORMAL，则 is_float(a)。
(C3) shape(result) = shape。

示例

// %a = 0
// %b = 2
// %shape = [3, 3]
%result = "stablehlo.rng"(%a, %b, %shape) {
  rng_distribution = #stablehlo<rng_distribution UNIFORM>
} : (tensor<i32>, tensor<i32>, tensor<2xi64>) -> tensor<3x3xi32>
// %result: [
//           [1, 0, 1],
//           [1, 1, 1],
//           [0, 0, 0]
//          ]

rng_bit_generator

语义

在给定初始状态 initial_state 的情况下，使用伪随机数生成器算法 rng_algorithm 返回填充了统一随机位的 output 和更新后的输出状态 output_state。输出可以保证是 initial_state 的确定性函数，但在实现之间不一定是确定的。

rng_algorithm 是以下值之一：

DEFAULT：实现定义的算法。
THREE_FRY：实现定义的 Threefry 算法变体。*
PHILOX：实现定义的 Philox 算法变体。*

* 请参阅：Salmon 等人，SC 2011。并行随机数字：简单到 1、2、3。

输入内容

标签	名称	类型	限制条件
（I1）	`rng_algorithm`	`DEFAULT`、`THREE_FRY` 和 `PHILOX` 的枚举	(C2)
（I2）	`initial_state`	`ui64` 类型的一维张量	(C1)、(C2)

输出

名称	类型	限制条件
`output_state`	`ui64` 类型的一维张量	(C1)
`output`	整数或浮点类型的张量

限制条件

(C1) type(initial_state) = type(output_state)。
(C2) size(initial_state) 的定义如下：
- 如果 rng_algorithm = DEFAULT，则为实现定义。
- 如果 rng_algorithm = THREE_FRY，则为 2。
- 2如果 rng_algorithm = PHILOX，则选择 3。

示例

// %initial_state: [1, 2]
%output_state, %output = "stablehlo.rng_bit_generator"(%initial_state) {
  rng_algorithm = #stablehlo<rng_algorithm THREE_FRY>
} : (tensor<2xui64>) -> (tensor<2xui64>, tensor<2x2xui64>)
// %output_state: [1, 6]
// %output: [
//           [9236835810183407956, 16087790271692313299],
//           [18212823393184779219, 2658481902456610144]
//          ]

round_nearest_afz

语义

在 operand 张量上对最接近的整数执行元素级舍入，使关系远离 0，并生成 result 张量。实现符合 IEEE-754 规范的 roundToIntegralTiesToAway 操作。对于量化类型，请执行 dequantize_op_quantize(round_nearest_afz, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand = [-2.5, 0.4, 0.5, 0.6, 2.5]
%result = "stablehlo.round_nearest_afz"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// %result: [-3.0, 0.0, 1.0, 1.0, 3.0]

更多示例

round_nearest_even

语义

在 operand 张量上对最接近的整数执行元素级舍入，断开与偶数整数的关系，并生成 result 张量。实现符合 IEEE-754 规范的 roundToIntegralTiesToEven 操作。对于量化类型，请执行 dequantize_op_quantize(round_nearest_even, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand = [-2.5, 0.4, 0.5, 0.6, 2.5]
%result = "stablehlo.round_nearest_even"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// %result: [-2.0, 0.0, 0.0, 1.0, 2.0]

更多示例

rsqrt

语义

对 operand 张量执行元素级倒数平方根运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 rSqrt。
对于复数：复数倒数平方根。
对于量化类型：dequantize_op_quantize(rsqrt, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [[1.0, 4.0], [9.0, 25.0]]
%result = "stablehlo.rsqrt"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[1.0, 0.5], [0.33333343, 0.2]]

更多示例

scatter

语义

生成 results 张量，这些张量等于 inputs 张量，但 scatter_indices 指定的多个切片是使用 update_computation 的值 updates 进行更新。

通过一个具体示例，下图显示了 updates... 中的元素如何映射到 results... 中的元素。该图选择了几个示例 updates... 索引，并详细介绍了它们对应的 results... 索引。

更正式地说，对于 index_space(updates[0]) 中的所有 update_index：

update_scatter_dims = [d for d in axes(updates[0]) and d not in update_window_dims].
update_scatter_index = update_index[update_scatter_dims...].
start_index 的定义如下：
- scatter_indices[si0, ..., :, ..., siN]，其中 si 是 update_scatter_index 中的各个元素，如果 index_vector_dim < rank(scatter_indices)，则会在 index_vector_dim 索引处插入 :。
- 否则为 [scatter_indices[update_scatter_index]]。
对于 axes(inputs[0]) 中的 d_input，
- 如果 d_input = scatter_dims_to_operand_dims[d_start]，则为 full_start_index[d_input] = start_index[d_start]。
- 否则为 full_start_index[d_input] = 0。
update_window_index = update_index[update_window_dims...].
full_window_index = [wi0, ..., 0, ..., wiN]，其中 wi 是 update_window_index 中的各个元素，0 是在 inserted_window_dims 中的索引处插入。
result_index = full_start_index + full_window_index.

鉴于此，results = exec(schedule, inputs)，其中：

schedule 是实现定义的 index_space(updates[0]) 排列。
exec([update_index, ...], results) = exec([...], updated_results)，其中：
- 如果result_index在shape(results...)的边界内
- updates_converted = to_destination_type( updates...[update_index], type(func_inputs(update_computation) [len(func_inputs(update_computation))//2:])... )
- updated_values = update_computation(results...[result_index], updates_converted)
- updated_results 是 results 的副本，其中 results...[result_index] 设置为 updated_values...。
- 否则
- updated_results = results.
exec([], results) = results.

如果 indices_are_sorted 为 true，则实现可以假定 scatter_indices 相对于 scatter_dims_to_operand_dims 进行排序，否则行为将处于未定义状态。更正式地说，对于 indices(result) 中的所有 i1 < i2，full_start_index(i1) <= full_start_index(i2)。

如果 unique_indices 为 true，则实现可以假定分散的所有 result_index 索引都是唯一的。如果 unique_indices 为 true，但分散到的索引不是唯一的，则行为未定义。

输入内容

标签	名称	类型	限制条件
（I1）	`inputs`	可变张量数量或每个张量量化张量	(C1)、(C2)、(C4-C6)、(C10)、(C13)、(C15-C16)
（I2）	`scatter_indices`	整数类型的张量	(C4)、(C11)、(C14)
(I3)	`updates`	可变张量数量或每个张量量化张量	(C3-C6)、(C8)
（I4）	`update_window_dims`	`si64` 类型的一维张量常数	(C2)、(C4)、(C7)、(C8)
(I5)	`inserted_window_dims`	`si64` 类型的一维张量常数	(C2)、(C4)、(C9)、(C10)
(I6)	`scatter_dims_to_operand_dims`	`si64` 类型的一维张量常数	(C11-C13)
（I7）	`index_vector_dim`	`si64` 类型的常量	(C4)、(C11)、(C14)
(I8)	`indices_are_sorted`	`i1` 类型的常量
(I9)	`unique_indices`	`i1` 类型的常量
（I10）	`update_computation`	function	(C15)

输出

名称	类型	限制条件
`results`	可变张量数量或每个张量量化张量	(C15-C17)

限制条件

(C1) same(shape(inputs...))。
(C2) rank(inputs[0]) = size(update_window_dims) + size(inserted_window_dims)。
(C3) same(shape(updates...))。
(C4) shape(updates[0]) = combine(update_scatter_dim_sizes, update_window_dim_sizes)，其中：
- update_scatter_dim_sizes = shape(scatter_indices)，只不过不包含与 index_vector_dim 对应的 scatter_indices 的尺寸大小。
- update_window_dim_sizes <= shape(inputs[0])，只不过 inputs[0] 中与 inserted_window_dims 对应的尺寸尺寸不包括在内。
- combine 将 update_scatter_dim_sizes 放置在与 update_scatter_dims 对应的轴上，将 update_window_dim_sizes 放置在与 update_window_dims 对应的轴上。
(C5) 0 < size(inputs) = size(updates) = N。
(C6) element_type(updates...) = element_type(inputs...)。
(C7) is_unique(update_window_dims) and is_sorted(update_window_dims)。
(C8) 0 <= update_window_dims < rank(updates[0])。
(C9) is_unique(inserted_window_dims) and is_sorted(update_window_dims)。
(C10) 0 <= inserted_window_dims < rank(inputs[0])。
(C11) size(scatter_dims_to_operand_dims) = index_vector_dim < rank(scatter_indices) ? dim(scatter_indices, index_vector_dim) : 1。
(C12) is_unique(scatter_dims_to_operand_dims)。
(C13) 0 <= scatter_dims_to_operand_dims < rank(inputs[0])。
(C14) 0 <= index_vector_dim <= rank(scatter_indices)。
(C15) update_computation 的类型为 (tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ..., tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>)，其中 is_promotable(element_type(inputs[i]), Ei)。
(C16) shape(inputs...) = shape(results...)。
(C17) 对 [0,N) 中的所有 i 执行 element_type(results[i]) = Ei。

示例

// %input: [
//          [[1, 2], [3, 4], [5, 6], [7, 8]],
//          [[9, 10], [11, 12], [13, 14], [15, 16]],
//          [[17, 18], [19, 20], [21, 22], [23, 24]]
//         ]
// %scatter_indices: [[[0, 2], [1, 0], [2, 1]], [[0, 1], [1, 0], [0, 9]]]
// %update: [
//           [[[1, 1], [1, 1]], [[1, 1], [1, 1]], [[1, 1], [1, 1]]],
//           [[[1, 1], [1, 1]], [[1, 1], [1, 1]], [[1, 1], [1, 1]]]
//          ]
%result = "stablehlo.scatter"(%input, %scatter_indices, %update) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  scatter_dimension_numbers = #stablehlo.scatter<
    update_window_dims = [2, 3],
    inserted_window_dims = [0],
    scatter_dims_to_operand_dims = [1, 0],
    index_vector_dim = 2>,
  indices_are_sorted = false,
  unique_indices = false
} : (tensor<3x4x2xi64>, tensor<2x3x2xi64>, tensor<2x3x2x2xi64>) -> tensor<3x4x2xi64>
// %result: [
//           [[1, 2], [5, 6], [7, 8], [7, 8]],
//           [[10, 11], [12, 13], [14, 15], [16, 17]],
//           [[18, 19], [20, 21], [21, 22], [23, 24]]
//          ]

更多示例

选择

语义

生成一个 result 张量，其中根据 pred 中相应元素的值从 on_true 或 on_false 张量中选择每个元素。更正式地说，是 result[result_index] = pred_element ? on_true[result_index] : on_false[result_index]，其中 pred_element = rank(pred) = 0 ? pred[] : pred[result_index]。对于量化类型，请执行 dequantize_select_quantize(pred, on_true, on_false, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`pred`	`i1` 类型的张量	(C1)
（I2）	`on_true`	或每张量量化张量	(C1-C2)
(I3)	`on_false`	或每张量量化张量	(C2)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C2)

限制条件

(C1) rank(pred) = 0 or shape(pred) = shape(on_true)。
(C2) baseline_type(on_true) = baseline_type(on_false) = baseline_type(result)。

示例

// %pred: [[false, true], [true, false]]
// %on_true: [[1, 2], [3, 4]]
// %on_false: [[5, 6], [7, 8]]
%result = "stablehlo.select"(%pred, %on_true, %on_false) : (tensor<2x2xi1>, tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 2], [3, 8]]

更多示例

select_and_scatter

语义

根据使用 select 的 input 张量的 reduce_window 的结果，使用 scatter 散布来自 source 张量的值，并生成一个 result 张量。

下图显示了如何使用一个具体示例根据 operand 和 source 计算 result 中的元素。

更正式一点：

输入以下内容的 selected_values = reduce_window_without_init(...)：
- `inputs = [操作数]。
- window_dimensions、window_strides 和 padding 按原样使用。
- base_dilations = windows_dilations = 1.
- body 的定义如下：
```
def body(arg0: tensor<E>, arg1: tensor<E>) -> tensor<E>:
  return select(arg0, arg1) ? arg0 : arg1;
```
其中，E = element_type(operand) 和 reduce_window_without_init 的工作方式与 reduce_window 完全相同，只不过底层 reduce 的 schedule（请参阅 reduce）不包括 init 值。目前未指定如果相应窗口没有值时会发生什么情况 (#731)。
result[result_index] = reduce([source_values], [init_value], [0], scatter)，其中：
- source_values = [source[source_index] for source_index in source_indices].
- 如果 selected_values[source_index] 具有 operand_index 中的 operand 元素，则为 selected_index(source_index) = operand_index。
- source_indices = [source_index for source_index in indices(source) if selected_index(source_index) = result_index].

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1-C4)、(C6)、(C8-C11)
（I2）	`source`	或每张量量化张量	(C1)、(C2)
(I3)	`init_value`	0 维张量或每张量量化张量	(C3)
（I4）	`window_dimensions`	`si64` 类型的一维张量常数	(C2)、(C4)、(C5)
(I5)	`window_strides`	`si64` 类型的一维张量常数	(C2)、(C6)、(C7)
(I6)	`padding`	`si64` 类型的二维张量常数	(C2)、(C8)
（I7）	`select`	function	(C9)
(I8)	`scatter`	function	(C10)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C11-C12)

限制条件

(C1) element_type(operand) = element_type(source)。
(C2) shape(source) = num_windows，其中：
- padded_operand_shape = padding[:, 0] + shape(operand) + padding[:, 1].
- is_empty_window = padded_operand_shape = 0 || window_dimensions > padded_operand_shape.
- num_windows = is_empty_window ? 0 : floor((padded_operand_shape - window_dimensions) / window_strides) + 1.
(C3) element_type(init_value) = element_type(operand)。
(C4) size(window_dimensions) = rank(operand)。
(C5) 0 < window_dimensions。
(C6) size(window_strides) = rank(operand)。
(C7) 0 < window_strides。
(C8) shape(padding) = [rank(operand), 2]。
(C9) select 的类型为 (tensor<E>, tensor<E>) -> tensor<i1>，其中 E = element_type(operand)。
(C10) scatter 的类型为 (tensor<E>, tensor<E>) -> tensor<E>，其中 is_promotable(element_type(operand), E)。
(C11) shape(operand) = shape(result)。
(C12) element_type(result) = E。

示例

// %operand: [[1, 5], [2, 5], [3, 6], [4, 4]]
// %source: [[5, 6], [7, 8]]
// %init_value: 0
%result = "stablehlo.select_and_scatter"(%operand, %source, %init_value) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.compare"(%arg0, %arg1) {
      comparison_direction = #stablehlo<comparison_direction GE>
    } : (tensor<i64>, tensor<i64>) -> tensor<i1>
    "stablehlo.return"(%0) : (tensor<i1>) -> ()
}, {
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  window_dimensions = dense<[3, 1]> : tensor<2xi64>,
  window_strides = dense<[2, 1]> : tensor<2xi64>,
  padding = dense<[[0, 1], [0, 0]]> : tensor<2x2xi64>
} : (tensor<4x2xi64>, tensor<2x2xi64>, tensor<i64>) -> tensor<4x2xi64>
// %result: [[0, 0], [0, 0], [5, 14], [7, 0]]

更多示例

send

语义

将 inputs 发送到通道 channel_id 并生成一个 result 令牌。

如果 is_host_transfer 为 true，则操作会将数据转移到主机。否则，它会将数据传输到其他设备。这意味着，是由实现定义的。该标志与 channel_type 中提供的信息重复，因此将来我们计划只保留其中一个 (#666)。

输入内容

标签	名称	类型	限制条件
（I1）	`inputs`	可变数量的张量或量化张量
（I2）	`token`	`token`
(I3)	`channel_id`	`si64` 类型的常量
（I4）	`channel_type`	`DEVICE_TO_DEVICE` 和 `DEVICE_TO_HOST` 的枚举	(C1)
(I5)	`is_host_transfer`	`i1` 类型的常量	(C1)

输出

名称	类型
`result`	`token`

限制条件

(C1) channel_type 的定义如下：
- 如果 is_host_transfer = true，则为 DEVICE_TO_HOST；
- 否则为 DEVICE_TO_DEVICE。

示例

%result = "stablehlo.send"(%operand, %token) {
  channel_handle = #stablehlo.channel_handle<handle = 1, type = 2>,
  is_host_transfer = true
} : (tensor<2x2xi64>, !stablehlo.token) -> !stablehlo.token

更多示例

shift_left

语义

对 lhs 张量执行 rhs 位数的元素级左移运算，并生成一个 result 张量。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	整数类型的张量	(C1)
（I2）	`rhs`	整数类型的张量	(C1)

输出

名称	类型	限制条件
`result`	整数类型的张量	(C1)

限制条件

(C1) type(lhs) = type(rhs) = type(result)。

示例

// %lhs: [-1, 0, 1]
// %rhs: [1, 2, 3]
%result = "stablehlo.shift_left"(%lhs, %rhs): (tensor<3xi64>, tensor<3xi64>) -> tensor<3xi64>
// %result: [-2, 0, 8]

更多示例

shift_right_arithmetic

语义

对 lhs 张量执行 rhs 位数的元素级算术右移运算，并生成一个 result 张量。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	整数类型的张量	(C1)
（I2）	`rhs`	整数类型的张量	(C1)

输出

名称	类型	限制条件
`result`	整数类型的张量	(C1)

限制条件

(C1) type(lhs) = type(rhs) = type(result)。

示例

// %lhs: [-1, 0, 8]
// %rhs: [1, 2, 3]
%result = "stablehlo.shift_right_arithmetic"(%lhs, %rhs): (tensor<3xi64>, tensor<3xi64>) -> tensor<3xi64>
// %result: [-1, 0, 1]

更多示例

shift_right_logical

语义

对 lhs 张量执行 rhs 位数的元素级逻辑右移运算，并生成一个 result 张量。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	整数类型的张量	(C1)
（I2）	`rhs`	整数类型的张量	(C1)

输出

名称	类型	限制条件
`result`	整数类型的张量	(C1)

限制条件

(C1) type(lhs) = type(rhs) = type(result)。

示例

// %lhs: [-1, 0, 8]
// %rhs: [1, 2, 3]
%result = "stablehlo.shift_right_logical"(%lhs, %rhs): (tensor<3xi64>, tensor<3xi64>) -> tensor<3xi64>
// %result: [9223372036854775807, 0, 1]

更多示例

签名

语义

按元素返回 operand 的符号，并生成一个 result 张量。更正式地说，对于每个 x 元素，可以使用 Python 语法表示语义，如下所示：

def sign(x):
  if is_integer(x):
    if compare(x, 0, LT, SIGNED): return -1
    if compare(x, 0, EQ, SIGNED): return 0
    return 1
  elif is_float(x):
    if is_nan(x): return NaN
    if compare(x, -0.0, EQ, FLOAT): return -0.0
    if compare(x, +0.0, EQ, FLOAT): return +0.0
    if compare(x, 0.0, LT, FLOAT): return -1.0
    return 1.0
  elif is_complex(x):
    if is_nan(real(x)) or is_nan(imag(x)): return (NaN, NaN)
    if compare(x, (0.0, 0.0), EQ, FLOAT): return (0.0, 0.0)
    return divide(x, convert(abs(x), type(x)))

对于量化类型，请执行 dequantize_op_quantize(sign, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	带符号整数、浮点或复杂类型的张量，或每个张量量化张量	(C1)

输出

名称	类型	限制条件
`result`	带符号整数、浮点或复杂类型的张量，或每个张量量化张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// Logical values: +NaN, -1.0, -0.0, +0.0, 1.0
// operand: [0x7FFFFFFFFFFFFFFF, -1.0, -0.0, 0.0, 1.0]
%result = "stablehlo.sign"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// Logical values: +NaN, -1.0, -0.0, +0.0, 1.0
// %result: [0x7FFFFFFFFFFFFFFF, -1.0, -0.0, 0.0, 1.0]

更多示例

正弦

语义

对 operand 张量执行元素级正弦运算并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 sin。
对于复数：复数正弦。
对于量化类型：dequantize_op_quantize(sine, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [
//            [0.0, 1.57079632],       // [0, pi/2]
//            [3.14159265, 4.71238898] // [pi, 3pi/2]
//           ]
%result = "stablehlo.sine"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[0.0, 1.0], [0.0, -1.0]]

更多示例

slice

语义

使用静态计算的起始索引从 operand 中提取 Slice，并生成 result 张量。start_indices 包含每个维度切片的起始索引，limit_indices 包含每个维度切片的结束索引（不含该索引），strides 包含每个维度的步长。

更正式地说，是 result[result_index] = operand[operand_index]，其中 operand_index = start_indices + result_index * strides。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或每张量量化张量	(C1-C3)、(C5)
（I2）	`start_indices`	`si64` 类型的一维张量常数	(C2)、(C3)、(C5)
(I3)	`limit_indices`	`si64` 类型的一维张量常数	(C2)、(C3)、(C5)
（I4）	`strides`	`si64` 类型的一维张量常数	(C2)、(C4)

输出

名称	类型	限制条件
`result`	或每张量量化张量	(C1)、(C5)

限制条件

(C1) element_type(operand) = element_type(result)。
(C2) size(start_indices) = size(limit_indices) = size(strides) = rank(operand)。
(C3) 0 <= start_indices <= limit_indices <= shape(operand)。
(C4) 0 < strides。
(C5) shape(result) = ceil((limit_indices - start_indices) / strides)。

示例

// %operand: [
//            [0, 0, 0, 0],
//            [0, 0, 1, 1],
//            [0, 0, 1, 1]
//           ]
%result = "stablehlo.slice"(%operand) {
  start_indices = dense<[1, 2]> : tensor<2xi64>,
  limit_indices = dense<[3, 4]> : tensor<2xi64>,
  strides = dense<1> : tensor<2xi64>
} : (tensor<3x4xi64>) -> tensor<2x2xi64>
// % result: [
//            [1, 1],
//            [1, 1]
//           ]

更多示例

sort

语义

根据 comparator，将沿 dimension 维度的 inputs 的一维切片一起排序，并生成 results。

与其他运算中的类似输入不同，dimension 允许负值，具有下述语义。将来，出于一致性原因，我们可能会禁止使用此参数 (#1377)。

如果 is_stable 为 true，则排序是稳定的，即保留比较器认为相等的元素的相对顺序。对于只有一个输入的情况，当且仅当 comparator(e1, e2) = comparator(e2, e1) = false 时，比较器才会认为两个元素 e1 和 e2 相等。如需了解其如何泛化到多个输入，请参阅以下格式。

更正式地说，对于 index_space(results[0]) 中的所有 result_index：

adjusted_dimension = dimension >= 0 ? dimension : rank(inputs[0]) + dimension.
result_slice = [ri0, ..., :, ..., riR-1]，其中 riN 是 result_index 中的各个元素，: 在 adjusted_dimension 处插入。
inputs_together = (inputs[0]..., ..., inputs[N-1]...).
results_together[result_slice] = sort(inputs_together[result_slice], comparator_together).
其中 sort 按非降序对一维切片进行排序，预期如果左侧参数小于右侧第二个参数，comparator_together 会返回 true。

def comparator_together(lhs_together, rhs_together):
  args = []
  for (lhs_el, rhs_el) in zip(lhs_together, rhs_together):
    args.append(lhs_el)
    args.append(rhs_el)
  return comparator(*args)

(results[0]..., ..., results[N-1]...) = results_together.

输入内容

标签	名称	类型	限制条件
（I1）	`inputs`	可变张量数量或每个张量量化张量	(C1-C5)
（I2）	`dimension`	`si64` 类型的常量	(C4)
(I3)	`is_stable`	`i1` 类型的常量
（I4）	`comparator`	function	(C5)

输出

名称	类型	限制条件
`results`	可变张量数量或每个张量量化张量	(C2)、(C3)

限制条件

(C1) 0 < size(inputs)。
(C2) type(inputs...) = type(results...)。
(C3) same(shape(inputs...) + shape(results...))。
(C4) -R <= dimension < R，其中 R = rank(inputs[0])。
(C5) comparator 的类型为 (tensor<E1>, tensor<E1>, ..., tensor<EN-1>, tensor<EN-1>) -> tensor<i1>，其中 Ei = element_type(inputs[i])。

示例

// %input0 = [[1, 2, 3], [3, 2, 1]]
// %input1 = [[3, 2, 1], [1, 2, 3]]
%result0, %result1 = "stablehlo.sort"(%input0, %input1) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>, %arg2: tensor<i64>, %arg3: tensor<i64>):
    %predicate = "stablehlo.compare"(%arg0, %arg1) {
      comparison_direction = #stablehlo<comparison_direction GT>
    } : (tensor<i64>, tensor<i64>) -> tensor<i1>
    "stablehlo.return"(%predicate) : (tensor<i1>) -> ()
}) {
  dimension = 0 : i64,
  is_stable = true
} : (tensor<2x3xi64>, tensor<2x3xi64>) -> (tensor<2x3xi64>, tensor<2x3xi64>)
// %result0 = [[3, 2, 3], [1, 2, 1]]
// %result1 = [[1, 2, 1], [3, 2, 3]]

更多示例

sqrt

语义

对 operand 张量执行元素级平方根运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 squareRoot。
对于复数：复数平方根。
对于量化类型：dequantize_op_quantize(sqrt, operand, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [[0.0, 1.0], [4.0, 9.0]]
%result = "stablehlo.sqrt"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[0.0, 1.0], [2.0, 3.0]]

更多示例

subtract

语义

对 lhs 和 rhs 两个张量执行元素级减法，并生成一个 result 张量。根据元素类型，执行以下操作：

对于整数：整数减法。
对于浮点数：IEEE-754 中的 subtraction。
对于复数：复数减法。
对于量化类型：
- dequantize_op_quantize(subtract, lhs, rhs, type(result)).

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)
（I2）	`rhs`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

输出

名称	类型	限制条件
`result`	整数、浮点或复杂类型的张量，或每个张量的量化张量	(C1)

限制条件

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

示例

// %lhs: [[6, 8], [10, 12]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.subtract"(%lhs, %rhs) : (tensor<2x2xf32>, tensor<2x2xf32>) -> (tensor<2x2xf32>)
// %result: [[1, 2], [3, 4]]

更多示例

双色

语义

对 operand 张量执行元素级双曲正切运算，并生成一个 result 张量。根据元素类型，执行以下操作：

对于浮点数：IEEE-754 中的 tanh。
对于复数：复双曲正切。
对于量化类型：
- dequantize_op_quantize(tanh, operand, type(result)).

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或复杂类型或每个张量量化张量的张量	(C1)

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_type(operand) = baseline_type(result)。

示例

// %operand: [-1.0, 0.0, 1.0]
%result = "stablehlo.tanh"(%operand) : (tensor<3xf32>) -> tensor<3xf32>
// %result: [-0.76159416, 0.0, 0.76159416]

更多示例

转置

语义

使用 permutation 排列 operand 张量的维，并生成一个 result 张量。更正式地说，是 result[result_index] = operand[operand_index]，其中 result_index[d] = operand_index[permutation[d]]。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	或量化张量	(C1-C4)
（I2）	`permutation`	`si64` 类型的一维张量常数	(C2-C4)

输出

名称	类型	限制条件
`result`	或量化张量	(C1)、(C3-C4)

限制条件

(C1) element_type(result) 的计算公式如下：
- element_type(operand)（如果 !is_per_axis_quantized(operand)）。
- element_type(operand)，但 quantization_dimension(operand) 和 quantization_dimension(result) 可能会有所不同。
(C2) permutation 是 range(rank(operand)) 的排列。
(C3) shape(result) = dim(operand, permutation...)。
(C4) 如果为 is_per_axis_quantized(result)，则设为 quantization_dimension(operand) = permutation(quantization_dimension(result))。

示例

// %operand: [
//            [[1,2], [3,4], [5,6]],
//            [[7,8], [9,10], [11,12]]
//           ]
%result = "stablehlo.transpose"(%operand) {
  permutation = dense<[2, 1, 0]> : tensor<3xi64>
} : (tensor<2x3x2xi32>) -> tensor<2x3x2xi32>
// %result: [
//           [[1,7], [3,9], [5,11]],
//           [[2,8], [4,10], [6,12]]
//          ]

更多示例

triangular_solve

语义

求解具有下层或上三角系数矩阵的线性方程组的批次。

更正式地说，给定 a 和 b，当 left_side 为 true 或 x * op(a[i0, ..., iR-3, :, :]) = b[i0, ..., iR-3, :, :] 为 false 时，result[i0, ..., iR-3, :, :] 是 op(a[i0, ..., iR-3, :, :]) * x = b[i0, ..., iR-3, :, :] 的解，求出变量 x，其中 op(a) 由 transpose_a 决定，可以是以下几项之一：left_side

NO_TRANSPOSE：按原样使用 a 执行操作。
TRANSPOSE：对 a 的转置执行操作。
ADJOINT：对 a 的共置转置执行操作。

如果 lower 为 true 或 a 的上三角形，则仅从 a 的下三角形读取输入数据。输出数据在同一三角形中返回；另一个三角形中的值由实现定义。

如果 unit_diagonal 为 true，则实现可以假定 a 的对角线元素等于 1，否则行为将处于未定义状态。

对于量化类型，请执行 dequantize_op_quantize(lambda x, y: triangular_solve(x, y, left_side, lower, unit_diagonal, transpose_a), a, b, type(result))。

输入内容

标签	名称	类型	限制条件
（I1）	`a`	浮点或复杂类型或每个张量量化张量的张量	(C1-C3)
（I2）	`b`	浮点或复杂类型或每个张量量化张量的张量	(C1-C4)
(I3)	`left_side`	`i1` 类型的常量	(C3)
（I4）	`lower`	`i1` 类型的常量
(I5)	`unit_diagonal`	`i1` 类型的常量
(I6)	`transpose_a`	`NO_TRANSPOSE`、`TRANSPOSE` 和 `ADJOINT` 的枚举

输出

名称	类型	限制条件
`result`	浮点或复杂类型或每个张量量化张量的张量	(C1)

限制条件

(C1) baseline_element_type(a) = baseline_element_type(b)。
(C2) 2 <= rank(a) = rank(b) = R。
(C3) shape(a) 和 shape(b) 之间的关系定义如下：
- shape(a)[:-3] = shape(b)[:-3].
- dim(a, -2) = dim(a, -1) = dim(b, left_side ? -2 : -1).
(C4) baseline_type(b) = baseline_type(result)。

示例

// %a = [
//       [1.0, 0.0, 0.0],
//       [2.0, 4.0, 0.0],
//       [3.0, 5.0, 6.0]
//      ]
// %b = [
//       [2.0, 0.0, 0.0],
//       [4.0, 8.0, 0.0],
//       [6.0, 10.0, 12.0]
//      ]
%result = "stablehlo.triangular_solve"(%a, %b) {
  left_side = true,
  lower = true,
  unit_diagonal = false,
  transpose_a = #stablehlo<transpose NO_TRANSPOSE>
} : (tensor<3x3xf32>, tensor<3x3xf32>) -> tensor<3x3xf32>
// %result: [
//           [2.0, 0.0, 0.0],
//           [0.0, 2.0, 0.0],
//           [0.0, 0.0, 2.0]
//          ]

tuple

语义

根据值 val 生成 result 元组。

输入内容

标签	名称	类型	限制条件
（I1）	`val`	值数量不等	(C1)

输出

名称	类型	限制条件
`result`	tuple	(C1)

限制条件

(C1) result 的类型为 tuple<E0, ..., EN-1>，其中 Ei = type(val[i])。

示例

// %val0: [1.0, 2.0]
// %val1: (3)
%result = "stablehlo.tuple"(%val0, %val1) : (tensor<2xf32>, tuple<tensor<i32>>) -> tuple<tensor<2xf32>, tuple<tensor<i32>>>
// %result: ([1.0, 2.0], (3))

更多示例

uniform_dequantize

语义

根据 operand 类型定义的量化参数，对量化张量 operand 执行元素级转换，使其成为浮点张量 result。

更正式地说，result = dequantize(operand)。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	量化张量	(C1)、(C2)

输出

名称	类型	限制条件
`result`	浮点类型的张量	(C1)、(C2)

限制条件

(C1) shape(operand) = shape(result)。
(C2) element_type(result) = expressed_type(operand)。

示例

// %operand: [10, 10]
%result = "stablehlo.uniform_dequantize"(%operand) : (tensor<2x!quant.uniform<i8:f32:0, {0.1:-30,0.5:-20}>>) -> tensor<2xf32>
// %result: [4.0, 15.0]

uniform_quantize

语义

根据 result 类型定义的量化参数，将浮点张量或量化张量 operand 转换为量化张量 result。

更正式地说，

如果为 is_float(operand)：
- result = quantize(operand, type(result)).
如果为 is_quantized(operand)：
- float_result = dequantize(operand).
- result = quantize(float_result, type(result)).

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	浮点或量化类型的张量	(C1)、(C2)

输出

名称	类型	限制条件
`result`	量化张量	(C1)、(C2)

限制条件

(C1) shape(operand) = shape(result)。
(C2) expressed_type(result) = is_float(operand) ? element_type(operand) : expressed_type(operand)。

示例

// %operand: [4.0, 15.0]
%result = "stablehlo.uniform_quantize"(%operand) : (tensor<2xf32>) -> tensor<2x!quant.uniform<i8:f32:0, {0.1:-30,0.5:-20}>>
// %result: [10, 10]

// %operand: [10, 10]
%result = "stablehlo.uniform_quantize"(%operand) : (tensor<2x!quant.uniform<i8:f32:0, {0.1:-30,0.5:-20}>>) -> tensor<2x!quant.uniform<i8:f32:0, {0.1:-20,0.2:-30}>>
// %result: [20, 45]

而

语义

执行 body 函数 0 次或多次时产生输出，同时 cond 函数输出 true。更正式地说，可以使用 Python 语法表示语义，如下所示：

internal_state = operand
while cond(*internal_state):
  internal_state = body(*internal_state)
results = internal_state

无限循环的行为待定 (#383)。

输入内容

标签	名称	类型	限制条件
（I1）	`operand`	可变数量的张量、量化张量或词元	(C1-C3)
（I2）	`cond`	function	(C1)
(I3)	`body`	function	(C2)

输出

名称	类型	限制条件
`results`	可变数量的张量、量化张量或词元	(C3)

限制条件

(C1) cond 的类型为 (T0, ..., TN-1) -> tensor<i1>，其中 Ti = type(operand[i])。
(C2) body 的类型为 (T0, ..., TN-1) -> (T0, ..., TN-1)，其中 Ti = type(operand[i])。
(C3) type(results...) = type(operand...)。

示例

// %init_i: 1
// %init_sum: 0
// %one: 1
// %ten: 10
%results0, %results1 = "stablehlo.while"(%init_i, %init_sum) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %cond = "stablehlo.compare"(%arg0, %ten) {
      comparison_direction = #stablehlo<comparison_direction LT>
    } : (tensor<i64>, tensor<i64>) -> tensor<i1>
    stablehlo.return %cond : tensor<i1>
  }, {
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %new_sum = stablehlo.add %arg1, %one : tensor<i64>
    %new_i = stablehlo.add %arg0, %one : tensor<i64>
    stablehlo.return %new_i, %new_sum : tensor<i64>, tensor<i64>
}) : (tensor<i64>, tensor<i64>) -> (tensor<i64>, tensor<i64>)
// %results0: 10
// %results1: 10

更多示例

异或

语义

对两个张量 lhs 和 rhs 执行元素级 XOR，并生成一个 result 张量。根据元素类型，执行以下操作：

对于布尔值：逻辑 XOR。
对于整数：按位 XOR。

输入内容

标签	名称	类型	限制条件
（I1）	`lhs`	布尔值或整数类型的张量	(C1)
（I2）	`rhs`	布尔值或整数类型的张量	(C1)

输出

名称	类型	限制条件
`result`	布尔值或整数类型的张量	(C1)

限制条件

(C1) type(lhs) = type(rhs) = type(result)。

示例

// Bitwise operation with with integer tensors
// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.xor"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[4, 4], [4, 12]]

// Logical operation with with boolean tensors
// %lhs: [[false, false], [true, true]]
// %rhs: [[false, true], [false, true]]
%result = "stablehlo.xor"(%lhs, %rhs) : (tensor<2x2xi1>, tensor<2x2xi1>) -> tensor<2x2xi1>
// %result: [[false, true], [true, false]]

执行

顺序执行

StableHLO 程序通过向 main 函数提供输入值并计算输出值来执行。函数的输出值通过执行根于相应 return 操作的操作图来计算。

只要执行顺序与数据流保持一致（即操作是否在使用之前执行），则执行顺序就是由实现定义的。在 StableHLO 中，所有附带效应的操作都会消耗一个令牌，并生成一个令牌（多个令牌可通过 after_all 多路复用为一个令牌），因此附带效应的执行顺序也与数据流一致。上述示例程序的可能执行顺序为 %0 → %1 → %2 → %3 → %4 → return 或 %3 → %0 → %1 → %2 → %4 → return。

更正式地说，StableHLO 进程是以下项的组合：1) StableHLO 程序，2) 操作状态（尚未执行，已经执行），3) 进程正在处理的中间值。该过程从 main 函数的输入值开始，一直经历更新操作状态和中间值的操作图，最后以输出值结束。有关进一步正式的信息，请参阅待定 (#484)。

并行执行

StableHLO 程序可以并行执行，并由 num_partitions 组织到 num_replicas 的 2D 进程网格中，二者的类型均为 ui32。

在 StableHLO 进程网格中，系统会同时执行 StableHLO 进程的 num_replicas * num_partitions。每个进程都有一个唯一的 process_id = (replica_id, partition_id)，其中 replica_ids = range(num_replicas) 中的 replica_id 和 partition_ids = range(num_partitions) 中的 partition_id 都具有 ui32 类型。

对于每个程序，进程网格的大小都是静态已知的（未来，我们计划使其成为 StableHLO 程序的显式部分 #650），而进程网格中的位置对于每个进程都是静态已知的。每个进程都可以通过 replica_id 和 partition_id 操作访问它在进程网格中的位置。

在进程网格中，程序可以全部相同（采用“Single Program, multiple Data”样式）、不同（在“Multiple Program,Multiple Data”样式）中，也可以介于两者之间。未来，我们计划支持定义并行 StableHLO 程序的其他习语，包括 GSPMD (#619)。

在进程网格中，进程大多彼此独立 - 它们具有单独的操作状态，单独的输入/中间/输出值，并且大多数操作在进程之间单独执行，但下面介绍的少数集合操作除外。

鉴于大多数操作的执行仅使用来自同一进程的值，因此按名称引用这些值通常没有歧义。但是，在描述集合操作的语义时，这样做还不够，这就导致了 name@process_id 表示法在特定进程中引用值 name。（从这个角度来看，非限定的 name 可以看作是 name@(replica_id(), partition_id()) 的简写形式。）

所有进程的执行顺序都是由实现定义的，但由点对点通信和集体操作引入的同步除外（如下所述）。

点对点通信

StableHLO 进程可以通过 StableHLO 通道相互通信。渠道由 si64 类型的正 ID 表示。通过各种操作，您可以向通道发送值，然后从通道接收值。

进一步规范化，例如，这些频道 ID 的来源、进程如何发现它们以及它们会引入哪种同步，目前尚处于待定状态 (#484)。

流式通信

每个 StableHLO 进程都可以访问两个流接口：

可供读取的信息流广告。
可写入的外馈。

与用于在进程之间进行通信（因此两端都有进程）的渠道不同，infeed 和 outfeed 的另一端实现则由其定义。

要进一步规范化，例如流式通信如何影响执行顺序以及会引入何种同步，目前尚待确定 (#484)。

集体行动

StableHLO 有六个共同操作：all_gather、all_reduce、all_to_all、collective_broadcast、collective_permute 和 reduce_scatter。所有这些操作都会将 StableHLO 进程网格中的进程拆分为 StableHLO 进程组，并在每个进程组（独立于其他进程组）中执行联合计算。

在每个进程组内，共同操作可能会引入同步屏障。待进一步规范化，例如，详细阐述此同步的确切发生时间、进程究竟如何到达此屏障，以及未发生时会发生什么情况，则属于待定 (#484)。

如果进程组涉及跨分区通信，即进程组中一些进程的分区 ID 不同，则集合操作的执行需要一个通道，而集合操作的执行必须提供一个类型为 si64 的正 channel_id。跨副本通信不需要通道。

这些共同操作执行的计算特定于单个操作，并在上面的各个操作部分中进行了介绍。不过，将进程网格拆分为进程组的策略会在这些操作之间共享，本部分将对此进行介绍。从更正式的角度来说，StableHLO 支持以下四种策略。

cross_replica

每个进程组中仅发生跨副本通信。此策略采用 replica_groups（副本 ID 列表）作为参数，并按 partition_ids 计算 replica_groups 的笛卡尔积。replica_groups 必须具有唯一的元素，并覆盖所有 replica_ids。更正式地说，使用 Python 语法：

def cross_replica(replica_groups: List[List[ReplicaId]]) -> List[List[ProcessId]]:
  for replica_group in replica_groups:
    for partition_id in partition_ids:
      process_group = []
      for replica_id in replica_group:
        process_group.append((replica_id, partition_id))
      yield process_group

例如，对于 replica_groups = [[0, 1], [2, 3]] 和 num_partitions = 2，cross_replica 将生成 [[(0, 0), (1, 0)], [(0, 1), (1, 1)], [(2, 0), (3, 0)], [(2, 1), (3, 1)]]。

cross_partition

每个进程组中仅发生跨分区通信。此策略采用 partition_groups（分区 ID 列表）作为参数，并按 replica_ids 计算 partition_groups 的笛卡尔积。partition_groups 必须具有唯一的元素，并覆盖所有 partition_ids。更正式地说，使用 Python 语法：

def cross_partition(partition_groups: List[List[PartitionId]]) -> List[List[ProcessId]]:
  for partition_group in partition_groups:
    for replica_id in replica_ids:
      process_group = []
      for partition_id in partition_group:
        process_group.append((replica_id, partition_id))
      yield process_group

例如，对于 partition_groups = [[0, 1]] 和 num_replicas = 4，cross_partition 将生成 [[(0, 0), (0, 1)], [(1, 0), (1, 1)], [(2, 0), (2, 1)], [(3, 0), (3, 1)]]。

cross_replica_and_partition

跨副本通信和跨分区通信都可能在每个进程组内进行。此策略采用 replica_groups（副本 ID 列表）作为参数，并按 partition_ids 计算每个 replica_group 的笛卡尔积。replica_groups 必须具有唯一的元素，并覆盖所有 replica_ids。更正式地说，使用 Python 语法：

def cross_replica_and_partition(replica_groups: List[List[ReplicaId]]) -> List[List[ProcessId]]:
  for replica_group in replica_groups:
    process_group = []
    for partition_id in partition_ids:
      for replica_id in replica_group:
        process_group.append((replica_id, partition_id))
    yield process_group

例如，对于 replica_groups = [[0, 1], [2, 3]] 和 num_partitions = 2，cross_replica_and_partition 将生成 [[(0, 0), (1, 0), (0, 1), (1, 1)], [(2, 0), (3, 0), (2, 1), (3, 1)]]。

flattened_ids

此策略采用 flattened_id_groups（replica_id * num_partitions + partition_id 形式的“扁平化”进程 ID 列表）并将其转换为进程 ID。flattened_id_groups 必须具有唯一的元素，并覆盖所有 process_ids。更正式地说，使用 Python 语法：

def flattened_ids(flattened_id_groups: List[List[ui32]]) -> List[List[ProcessId]]:
  for flattened_id_group in flattened_id_groups:
    process_group = []
    for flattened_id in flattened_id_group:
      replica_id = flattened_id // num_partitions
      partition_id = flattened_id % num_partitions
      process_group.append((replica_id, partition_id))
    yield process_group

例如，对于 flattened_id_groups = [[0, 1, 2, 3], [4, 5, 6, 7]]、num_replicas = 4 和 num_partitions = 2，flattened_ids 将生成 [[(0, 0), (0, 1), (1, 0), (1, 1)], [(2, 0), (2, 1), (3, 0), (3, 1)]]。

准确率

目前，StableHLO 无法保证数值准确性，但将来可能会发生变化 (#1156)。

错误数

StableHLO 程序通过针对单个操作的一系列广泛限制条件进行验证，从而在运行时间之前排除许多类别的错误。不过，仍然可能会出现错误情况，例如发生整数溢出、出界访问等。除非明确调用，否则所有这些错误都会导致实现定义的行为，但将来可能会发生变化 (#1157)。

此规则的一个例外情况是，StableHLO 程序中的浮点异常具有明确定义的行为。导致 IEEE-754 标准定义的异常（无效操作、除零、上溢、下溢或不精确异常）的操作会生成默认结果（如标准中所定义），并在不引发相应状态标记的情况下继续执行；类似于标准中的 raiseNoFlag 异常处理。非标准运算（例如复杂算术和某些先验函数）的例外情况是由实现定义的。

Notation

为了描述语法，本文档使用了经过修改的 EBNF 语法的 ISO 变种（ISO/IEC 14977:1996、Wikipedia），并进行了两项修改：1) 使用 ::= 而不是 = 定义规则；

2) 串联使用并置（而不是 ,）来表示。

为了描述语义（即在“类型”“常量”和“操作”部分中），我们使用的公式基于扩展的 Python 语法，并支持简明表达数组运算（如下所述）。这非常适合小的代码段，但在极少数情况下，当需要较大的代码段时，我们会使用始终明确引入的 vanilla Python 语法。

公式

我们基于 dot_general 规范中的示例来探索公式的工作原理。此操作的一项限制如下所示：dim(lhs, lhs_batching_dimensions...) = dim(rhs, rhs_batching_dimensions...)。

此公式中使用的名称有两个来源：1) 全局函数，即 dim、2) 相应程序元素的成员定义，即 dot_general 的“输入源”部分中定义的 lhs、lhs_batching_dimensions、rhs 和 rhs_batching_dimensions 输入。

如上所述，此公式的语法基于 Python，并且包含一些面向简洁的扩展程序。为了理解这个公式让我们将其转换为原始的 Python 语法

A) 在这些公式中，我们使用 = 表示等式，因此获取 Python 语法的第一步是将 = 替换为 ==，如下所示：dim(lhs, lhs_batching_dimensions...) == dim(rhs, rhs_batching_dimensions...)。

B) 此外，这些公式支持省略号 (...)，后者可将标量表达式转换为张量表达式。简而言之，f(xs...) 大致表示“对于张量 xs 中的每个标量 x，计算一个标量 f(x)，然后将所有这些标量结果一起作为张量结果返回”。在原始 Python 语法中，我们的示例公式会变为：[dim(lhs, dim1) for dim1 in lhs_batching_dimensions] == [dim(rhs, dim2) for dim2 in rhs_batching_dimensions]。

得益于省略号，通常可以避免在单个标量级别工作。不过，在某些棘手的情况下，可以像 gather 规范中的 start_indices[bi0, ..., :, ..., biN] 公式一样使用较低级别的半非正式语法。为了简明扼要，对于将此类语法转换为普通 Python 语言，我们不提供确切的形式，希望仍按照具体情况直观地理解该语法。如果某些特定公式看起来不透明，请告知我们，我们将尽力改进。

此外，您会注意到，公式使用省略号扩展各种列表，包括张量、张量列表（例如，可能来自不同数量的张量）等。在这方面，我们无法提供确切的正式形式（例如，列表甚至都不是 StableHLO 类型系统的一部分），而是依赖于直观性系统来理解。

C) 我们使用的最后一种值得注意的记数方式是隐式广播。虽然 StableHLO 运算集不支持隐式广播，但公式还支持简洁性。简而言之，如果在需要张量的上下文中使用标量，该标量会被广播为预期的形状。

继续以 dot_general 为例，以下是另一个限制条件：0 <= lhs_batching_dimensions < rank(lhs)。如 dot_general 规范中所定义，lhs_batching_dimensions 是一个张量，但是 0 和 rank(lhs) 都是标量。应用隐式广播后，公式将变为 [0, ..., 0] <= lhs_batching_dimensions < [rank(lhs), ..., rank(lhs)]。

应用于特定 dot_general 运算时，此公式的计算结果为布尔值张量。将公式用作约束条件时，如果公式的计算结果为 true 或仅具有 true 元素的张量，则约束条件。

姓名

在公式中，词法范围包括：1) 全局函数，2) 成员定义，

3) 当地定义。下面提供了全局函数列表。元素定义列表取决于要应用该表示法的程序元素：

对于操作，成员定义包括“输入”和“输出”部分中引入的名称。
对于其他所有内容，成员定义包括程序元素的结构部分，以相应的 EBNF 非终端命名。大多数情况下，这些结构部分的名称是通过将非终端的名称转换为蛇形命名法（例如 IntegerLiteral => integer_literal）来获取的，但有时名称会在进程中采用缩写（例如 QuantizationStorageType => storage_type），在这种情况下，明确引入名称，与“输入”/“输出”部分中的“输入”类似。
此外，成员定义始终包含 self 来引用相应的程序元素。

值

计算公式时，公式会使用以下类型的值： 1) Value（实际值，例如 dense<[[1, 2], [3, 4]]> : tensor<2x2xi32>；它们始终知道自己的类型）； 2) Placeholder（未来的值，例如 lhs、rhs 或 result；其实际值目前未知，只有其类型已知） 3) Type（“类型”部分中定义的类型）； 4) Function（“全局函数”部分中定义的全局函数，如 lhs、rhs 或 result）

名称可能引用不同的值，具体取决于上下文。更具体地说，运算的“语义”部分（以及其他程序元素的等效部分）定义了运行时逻辑，因此所有输入都以 Value 的形式提供。相比之下，操作（以及等效项）的“约束”部分则定义了“编译时”逻辑，即通常在运行时之前执行的内容，因此只有常量输入可用作 Value，其他输入仅可用作 Placeholder。

姓名	在“Semantics”中	在“约束条件”中
全局函数	`Function`	`Function`
常量输入	`Value`	`Value`
非常量输入	`Value`	`Placeholder`
输出	`Value`	`Placeholder`
本地定义	取决于定义	取决于定义

我们来看一个示例 transpose 操作：

%result = "stablehlo.transpose"(%operand) {
  permutation = dense<[2, 1, 0]> : tensor<3xi64>
} : (tensor<2x3x2xi32>) -> tensor<2x3x2xi32>

对于此操作，permutation 是一个常量，因此在语义和约束条件中，它都以 Value 的形式提供。相比之下，operand 和 result 可在语义中以 Value 的形式使用，但在约束条件中仅以 Placeholder 的形式提供。

函数

类型的构造

没有可用于构造类型的函数。相反，我们直接使用类型语法，因为它通常更简洁。例如，使用 (tensor<E>, tensor<E>) -> (tensor<E>)，而不是 function_type( [tensor_type([], E), tensor_type([], E)], [tensor_type([], E)])。

类型函数

element_type 基于张量类型和量化张量类型定义，并分别返回相应 TensorType 或 QuantizedTensorType 的 TensorElementType 或 QuantizedTensorElementType 部分。

def element_type(x: Value | Placeholder | Type):
 if type(x) == TensorType:
    return tensor_element_type(x)
  if type(x) == QuantizedTensorType:
    return quantized_tensor_element_type(x)
  if type(x) is not Type:
    return element_type(type(x))

is_per_axis_quantized(x: Value | Placeholder | Type) -> Value 是 is_quantized(x) and quantization_dimension(x) is not None 的快捷方式。
is_per_tensor_quantized(x: Value | Placeholder | Type) -> Value 是 is_quantized(x) and quantization_dimension(x) is None 的快捷方式。
is_promotable(x: Type, y: Type) -> bool 检查类型 x 是否可以提升为 y 类型。当 x 和 y 为 QuantizedTensorElementType 时，促销活动仅应用于 storage_type。这一特定版本的促销目前用于减少计算的上下文（如需了解详情，请参阅 RFC）。

def is_promotable(x: Type, y: Type) -> Value:
  is_same_type = (is_bool(x) and is_bool(y)) or
    (is_integer(x) and is_integer(y)) or (is_float(x) and is_float(y)) or
    (is_complex(x) and is_complex(y)) or
    (is_quantized(x) and is_quantized(y) and expressed_type(x) = expressed_type(y))

  if is_same_type == False:
    return False

  if is_integer(x) or is_float(x):
    return bitwidth(x) <= bitwidth(y)

  if is_complex(x):
    return bitwidth(element_type(x)) <= bitwidth(element_type(y))

  if is_quantized(x):
    return bitwidth(storage_type(x)) <= bitwidth(storage_type(y))

  return false

is_quantized(x: Value | Placeholder | Type) -> Value 是 is_quantized_tensor_element_type(x) 的快捷方式。
is_type_name(x: Value | Placeholder | Type) -> Value。适用于所有类型。例如，如果 x 为 FloatType，则 is_float(x) 会返回 true。如果 x 是值或占位符，则此函数是 is_type_name(type(x)) 的快捷方式。
max_value(x: Type) -> Value 返回 TensorElementType 的最大值。如果 x 不是 TensorElementType，则返回 None。
min_value(x: Type) -> Value 会返回 TensorElementType 的可能最小值。如果 x 不是 TensorElementType，则返回 None。
member_name(x: Value | Placeholder | Type) -> Any。适用于所有类型的所有成员定义 member_name。例如，tensor_element_type(x) 会返回对应 TensorType 的 TensorElementType 部分。如果 x 是值或占位符，则此函数是 member_name(type(x)) 的快捷方式。如果 x 不是具有适当成员的类型，也不不是此类类型的值或占位符，则返回 None。

值的构造

operation_name(*xs: Value | Type) -> Value。适用于所有操作。例如，add(lhs, rhs) 接受两个张量值 lhs 和 rhs，并返回通过这些输入评估 add 运算的输出。对于某些操作（例如 broadcast_in_dim），其输出类型为“承载”，即评估操作所需的输出。在本例中，该函数将这些类型作为参数。

值函数

Python 的所有运算符和函数都可用。例如，Python 中的订阅和切片表示法都可用于将索引编入张量、量化张量和元组。
to_destination_type(x: Value, destination_type: Type) -> Value 是在张量上定义的，它根据 type(x) 和 destination_type 返回 x 的转换值，如下所示：

def to_destination_type(x: Value, destination_type: Type) -> Value:
  if type(x) == destination_type:
    return x

  if is_quantized(destination_type):
    if is_quantized(type(x)):
      return quantize(x, destination_type)
    assert is_float(type(x))
    return quantize(x, destination_type)

  if is_quantized(type(x)):
    assert destination_type = expressed_type(type(x))
    return dequantize(type(x))

  return convert(x, destination_type)

我们早期讨论了合并 convert、uniform_quantize 和 uniform_dequantize 操作 (#1576)。合并后，我们不需要上述函数，可以改用 convert 的操作名称。

is_nan(x: Value) -> Value 是在张量上定义的，如果 x 的所有元素均为 NaN，则返回 true，否则返回 false。如果 x 不是张量，则返回 None。
is_sorted(x: Value) -> Value 是在张量上定义的，如果 x 元素根据索引的升序字典顺序排序，则返回 true，否则返回 false。如果 x 不是张量，则返回 None。
is_unique(x: Value) -> Value 在张量上定义，如果 x 没有重复元素，则返回 true，否则返回 false。如果 x 不是张量，则返回 None。
为所有值的所有成员定义 member_name 定义了 member_name(x: Value) -> Any。例如，real_part(x) 会返回相应 ComplexConstant 的 RealPart 部分。如果 x 不是具有适当成员的值，则返回 None。
same(x: Value) -> Value 在张量上定义，如果 x 的元素彼此均等，则返回 true，否则返回 false。如果张量没有元素，则计为“全部相等”，即函数返回 true。如果 x 不是张量，则返回 None。
split(x: Value, num_results: Value, axis: Value) -> Value 在张量上定义，并返回沿 axis 轴 x 的 num_results 切片。如果 x 不是张量或 dim(x, axis) % num_results != 0，则返回 None。

形状计算

axes(x: Value | Placeholder | Type) -> Value 是 range(rank(x)) 的快捷方式。
dim(x: Value | Placeholder | Type, axis: Value) -> Value 是 shape(x)[axis] 的快捷方式。
dims(x: Value | Placeholder | Type, axes: List) -> List 是 list(map(lambda axis: dim(x, axis), axes)) 的快捷方式。
index_space(x: Value | Placeholder | Type) -> Value 是在张量上定义的，它返回相应 TensorType 的 size(x) 索引（按字典顺序升序排列，即 [0, ..., 0]、[0, ..., 1]、...、shape(x) - 1）。如果 x 不是张量类型、量化张量类型或上述类型之一的值或占位符，则返回 None。
rank(x: Value | Placeholder | Type) -> Value 是 size(shape(x)) 的快捷方式。
shape(x: Value | Placeholder | Type) -> Value 通过 member_name 在“对类型的函数”部分进行定义。
size(x: Value | Placeholder | Type) -> Value 是 reduce(lambda x, y: x * y, shape(x)) 的快捷方式。

量化计算

def baseline_element_type(x: Value | Placeholder | Type) -> Type 是 element_type(baseline_type(x)) 的快捷方式。
baseline_type 针对张量类型和量化张量类型进行定义，并将它们转换为“基准”，即形状相同但元素类型的量化参数重置为默认值的类型。这可作为一种方便的技巧，用于一致地比较张量和量化张量类型，并且经常需要该方法。对于量化类型，这支持比较忽略量化参数的类型，即 shape、storage_type、expressed_type、storage_min、storage_max 和 quantization_dimension（对于每轴量化类型）必须全部匹配，但 scales 和 zero points 可能不同。

def baseline_type(x: Value | Placeholder | Type) -> Type:
  if type(x) == TensorType:
    return x
  if type(x) == QuantizedTensorType:
    element_type = quantized_tensor_element_type(x)
    baseline_element_type = QuantizedTensorElementType(
      storage_type = storage_type(element_type),
      storage_min = storage_min(element_type),
      storage_max = storage_max(element_type),
      expressed_type = expressed_type(element_type),
      quantization_dimension = quantization_dimension(element_type),
      scales = [constant(1.0, expressed_type(element_type))] * dim(x, quantization_dimension(element_type)),
      zero_points = [constant(0, storage_type(element_type))] * dim(x, quantization_dimension(element_type)))
    return QuantizedTensorType(shape(x), baseline_element_type)
  if type(x) is not Type:
    return baseline_element_type(type(x))

dequantize 基于量化张量类型定义，并将它们转换为浮点张量类型。方法是使用与量化元素类型关联的零点和标度将表示存储类型的整数值的量化元素转换为表示类型的相应浮点值。

def compute_zero_points(quantized_type, result_type):
  if is_per_tensor_quantized(quantized_type):
    return broadcast_in_dim(constant(zero_point(quantized_type), storage_type(quantized_type)), [], result_type)
  if is_per_axis_quantized(quantized_type):
    for i in index_space(result_type):
      d = quantization_dimension(quantized_type)
      zero_points[i] = zero_points(quantized_type)[i[d]]
    return zero_points

def compute_scales(quantized_type, result_type):
  if is_per_tensor_quantized(quantized_type):
    return broadcast_in_dim(constant(scale(quantized_type), expressed_type(quantized_type)), [],
            type(result_type))
  if is_per_axis_quantized(quantized_type):
    for i in index_space(result_type):
      d = quantization_dimension(quantized_type)
      scales[i] = scales(quantized_type)[i[d]]
    return scales

def dequantize(x: Value) -> Value:
  assert is_quantized(x)
  x_storage = bitcast_convert(x, storage_type(x))
  x_storage_sub = x_storage - compute_zero_points(type(x), type(x_storage))
  x_expressed_sub = convert(x_storage_sub, expressed_type(x))
  return x_expressed_sub * compute_scales(type(x), type(x_expressed_sub))

quantize 基于浮点张量类型定义，并将它们转换为量化张量类型。此操作通过使用与量化元素类型关联的零点和比例将表示类型的浮点值转换为存储类型的相应整数值来实现。

def quantize(x: Value, type: Type) -> Value:
  assert is_float(x) and is_quantized(type)
  x_expressed_rounded = round_nearest_even(x / compute_scales(type, type(x)))
  x_storage_rounded = convert(x_expressed_rounded, storage_type(type))
  x_storage_add = x_storage_rounded + compute_zero_points(type, type(x_storage_rounded))
  x_storage = clamp(storage_min(type), x_storage_add, storage_max(type))
  return bitcast_convert(x_storage, type)

dequantize_op_quantize 用于指定对量化张量进行元素级计算。对量化元素进行反量化（即将量化元素转换为其表达类型），然后执行操作，再执行量化（即将结果转换回其存储类型）。目前，此函数仅适用于每个张量的量化。正在进行每轴量化 (#1574)。

def dequantize_op_quantize(op, *inputs_and_output_type):
  inputs = inputs_and_output_type[:-1]
  output_type = inputs_and_output_type[-1]

  float_inputs = map(dequantize, inputs)
  float_result = op(*float_inputs)
  return quantize(float_result, output_type)

def dequantize_batch_norm_grad_or_training_quantize(op, *inputs_and_output_types):
  inputs = inputs_and_output_type[:-3]
  float_inputs = map(dequantize, inputs)
  float_results = op(*float_inputs)
  return map(quantize, float_results, inputs_and_output_type[-3:])

def dequantize_compare(lhs, rhs, comparison_direction):
  float_lhs = dequantize(lhs)
  float_rhs = dequantize(rhs)
  return compare(float_lhs, float_rhs, comparison_direction, FLOAT)

def dequantize_select_quantize(pred, on_true, on_false, output_type):
  float_on_true = dequantize(on_true)
  float_on_false = dequantize(on_false)
  float_result = select(pred, float_on_true, float_on_false)
  return quantize(float_result, output_type)

网格计算

cross_partition(replica_groups: Value) -> Value。请参阅上面的“cross_copy”部分。
cross_replica(replica_groups: Value) -> Value。请参阅上面的“cross_copy”部分。
cross_replica_and_partition(replica_groups: Value) -> Value。请参阅上面的“cross_副本_and_partition”部分。
flattened_ids(replica_groups: Value) -> Value。请参阅上面的“flattened_ids”部分。