本頁面由 Cloud Translation API 翻譯而成。

StableHLO 規格

StableHLO 是機器內高階作業 (HLO) 的一組作業學習 (ML) 模型StableHLO 是不同網路之間的可攜性層機器學習架構和機器學習編譯器：產生 StableHLO 程式的機器學習架構與使用 StableHLO 程式的機器學習編譯器相容。

我們的目標是打造更多各種機器學習架構 (如 TensorFlow、JAX) 和 PyTorch) 和機器學習編譯器 (例如 XLA 和 IREE)。為達成這些目標文件提供 StableHLO 程式設計語言規格。

這個規格包含三個主要部分首先，程式一節說明 StableHLO 程式的結構由 StableHLO 函式組成，本身由 StableHLO 運算組成。在這個結構中，「Ops」區段會指定個別作業「執行」部分提供所有物件的語意這些運算會在程式內一起執行最後，標記法一節討論在規格。

如要查看先前 StableHLO 版本的規格，請在例如 StableHLO v0.19.0 規格。如要查看每個子版本增加 StableHLO 所帶來的變更，請參閱 VhloDialect.td 中的版本記錄，

程式

Program ::= {Func}

StableHLO 程式包含任意數量的 StableHLO 函式。以下程式範例包含函式 @main，其中包含 3 個輸入內容 (%image、%weights 和 %bias) 和 1 項輸出內容。函式的主體有 6 次操作

func.func @main(
  %image: tensor<28x28xf32>,
  %weights: tensor<784x10xf32>,
  %bias: tensor<1x10xf32>
) -> tensor<1x10xf32> {
  %0 = "stablehlo.reshape"(%image) : (tensor<28x28xf32>) -> tensor<1x784xf32>
  %1 = "stablehlo.dot"(%0, %weights) : (tensor<1x784xf32>, tensor<784x10xf32>) -> tensor<1x10xf32>
  %2 = "stablehlo.add"(%1, %bias) : (tensor<1x10xf32>, tensor<1x10xf32>) -> tensor<1x10xf32>
  %3 = "stablehlo.constant"() {value = dense<0.0> : tensor<1x10xf32>} : () -> tensor<1x10xf32>
  %4 = "stablehlo.maximum"(%2, %3) : (tensor<1x10xf32>, tensor<1x10xf32>) -> tensor<1x10xf32>
  "func.return"(%4): (tensor<1x10xf32>) -> ()
}

函式

Func        ::= 'func' '.' 'func' FuncId FuncInputs FuncOutputs '{' FuncBody '}'
FuncInputs  ::= '(' [FuncInput {',' FuncInput}] `)`
FuncInput   ::= ValueId ':' ValueType
FuncOutputs ::= ['->' FuncOutput, {',' FuncOutput}]
FuncOutput  ::= ValueType
FuncBody    ::= {Op}

StableHLO 函式 (也稱為「已命名函式」) 具有 ID、輸入內容/輸出內容和主體日後，我們將為函式引入額外中繼資料，提升相容性與 HLO 結合 (#425、 #626、 #740、 #744)。

ID

FuncId  ::= '@' letter {letter | digit}
ValueId ::= '%' digit {digit}
          | '%' letter {letter | digit}
letter  ::= 'a' | ... | 'z' | 'A' | ... | 'Z' | '_'
digit   ::= '0' | ... | '9'

StableHLO ID 在許多程式設計中都類似於 ID 其中兩項特性：1) 所有 ID 都有線索例如區分不同 ID 或 2) 值 ID ，藉此簡化 StableHLO 程式的產生過程。

類型

Type         ::= ValueType | NonValueType
ValueType    ::= TensorType | QuantizedTensorType | TokenType | TupleType
NonValueType ::= TensorElementType | QuantizedTensorElementType | FunctionType | StringType

StableHLO 類型可分為「值類型」 (又稱為「值類型」) 第一類別類型)，代表 StableHLO 值和非值類型)。描述其他程式元素StableHLO 類型與而且最顯而易見的是 StableHLO 導致某些不尋常的結果 (例如純量類型) 並非值類型)。

TensorType ::= 'tensor' '<' Shape TensorElementType '>'
Shape ::= {DimensionSize 'x'}
DimensionSize ::= digit {digit} | '?'

張量類型表示張量，即多維陣列。他們的 shape 和元素類型：其中形狀代表非負數或 維度大小 (以相應值的遞增順序排列) 維度 (又稱為「axes」)，從 0 到 R-1。維度數量「R」稱為「排名」。例如，tensor<2x3xf32> 是形狀為 2x3 且元素類型為 f32 的張量類型。包含兩種維度 (也就是兩個軸) - 第 0 個維度和第 1 個維度，其大小分別是 2 和 3。排名為 2。

形狀可以部分或完全不明 (動態)，例如：tensor<?x2xf64> 只有部分不明，tensor<?x?xf64> 完全不明。動態尺寸會以 ? 表示。不得將形狀取消排名。

在未來，我們預計將更多的張量類型維度大小和元素類型，例如： (#629) 及稀疏度 (#1078)。

QuantizedTensorType ::= 'tensor' '<' Shape QuantizedTensorElementType '>'
QuantizedTensorElementType ::= '!quant.uniform' '<'
                  QuantizationStorageType
                  ['<' QuantizationStorageMin ':' QuantizationStorageMax '>']
                  ':' QuantizationExpressedType
                  [':' QuantizationDimension]
                  ',' QuantizationParameters '>'
QuantizationStorageType ::= IntegerType
QuantizationStorageMin ::= IntegerConstant
QuantizationStorageMax ::= IntegerConstant
QuantizationExpressedType ::= FloatType
QuantizationDimension ::= IntegerConstant
QuantizationParameters ::= QuantizationParameter
                         | '{' QuantizationParameter {',' QuantizationParameter} '}'
QuantizationParameter ::= QuantizationScale ':' QuantizationZeroPoint
QuantizationScale ::= FloatConstant
QuantizationZeroPoint ::= IntegerConstant

名稱	類型	限制
`storage_type`	整數類型	(C1-C3)、(C8)
`storage_min`	整數常數	(C1)、(C3)、(C7)
`storage_max`	整數常數	(C2)、(C3)、(C7)
`expressed_type`	浮點類型	(C4)。
`quantization_dimension`	選用整數常數	(C10-C12)
`scales`	浮點常數數量	(C4-C6)、(C9)、(C10)、(C13)
`zero_points`	整數常數的變異數	(C7-C9)

量化元素類型代表儲存空間類型 從 storage_min 到 storage_max (包含)，對應到 表達類型的浮點值。若是指定整數值 i，對應的浮點值 f 可以按 f = (i - zero_point) * scale，其中呼叫 scale 和 zero_point 量化參數。storage_min 和 storage_max 為選用項目但該樣式的預設值是 min_value(storage_type) 和 max_value(storage_type)。量化元素類型具有下列限制：

(C1) type(storage_min) = storage_type。
(C2) type(storage_max) = storage_type。
(C3) min_value(storage_type) <= storage_min < storage_max <= max_value(storage_type)。
(C4) type(scales...) = expressed_type。
(C5) 0 < scales。
(C6) is_finite(scales...)。
(C7) storage_min <= zero_points <= storage_max。
(C8) type(zero_points...) = storage_type。
(C9) size(scales) = size(zero_points)。
(C10) 如果值為 is_empty(quantization_dimension)，則值為 size(scales) = 1。
(C11) 0 <= quantization_dimension。

目前 QuantizationScale 是浮點常數，但對整數比例尺度表示強烈興趣，以倍數表示轉變。我們預計於近期內探索這項功能 (#1404)。

關於 QuantizationZeroPoint 語意，我們目前正在進行討論，包括類型、值，以及是否只能量化張量類型中可能有多個零點根據這次討論的結果，零點的規格可能會 (#1405)。

另一個進行中的討論涉及 QuantizationStorageMin 的語意和 QuantizationStorageMax 判斷是否應套用任何限制條件對這些值及量化張量值所施加的影響 (#1406)。

最後，我們打算探索代表未知規模和零的數值類似於我們規劃如何探討未知尺寸大小 (#1407)。

量化張量類型代表含量化元素的張量。這些張量和一般張量相同，只不過其元素有量化元素類型，而非一般元素類型

在量化張量中，量化可「每個張量」，也就是具有整個張量一個 scale 和 zero_point，或者可以是每個軸。意味著擁有多個 scales 和 zero_points，每個分層使用一組特定維度 quantization_dimension。更正式，在張量 t 中按軸量化時，會有 dim(t, quantization_dimension) 個配量「quantization_dimension」的屬性值：t[:, ..., 0, ..., :], t[:, ..., 1, ..., :]，依此類推。i 配量中的所有元素都會使用 scales[i] 和 zero_points[i] 做為量化參數量化張量類型具有以下特性限制：

根據張量量化：
- 沒有其他限制。
按軸量化：
- (C12) quantization_dimension < rank(self)。
- (C13) dim(self, quantization_dimension) = size(scales)。

TokenType ::= 'token'

符記類型代表符記，即產生和使用的不透明值來監控一些作業符記可用來設定執行作業的執行順序，相關說明請參閱「執行」一節。

TupleType ::= 'tuple' '<' TupleElementTypes '>'
TupleElementTypes ::= [ValueType {',' ValueType}]

元組類型代表元組，也就是異質清單。元組是舊版的不只能與 HLO 相容。在 HLO 中，元組來代表變異基因段輸入和輸出內容在 StableHLO 中，可變輸入和 StableHLO 預設支援輸出，而 StableHLO 中唯一的用途是完全代表 HLO ABI，例如：T、tuple<T>和 tuple<tuple<T>> 可能會因特定因素而有重大差異。我們預計日後會變更 HLO ABI 我們可以從 StableHLO 移除元組類型 (#598)。

TensorElementType ::= BooleanType | IntegerType | FloatType | ComplexType
BooleanType ::= 'i1'
IntegerType ::= SignedIntegerType | UnsignedIntegerType
SignedIntegerType ::= 'si2' | 'si4' | 'si8' | 'si16' | 'si32' | 'si64'
UnsignedIntegerType ::= 'ui2' | 'ui4' | 'ui8' | 'ui16' | 'ui32' | 'ui64'
FloatType ::= 'f8E4M3FN' | 'f8E5M2' | 'f8E4M3FNUZ' | 'f8E5M2FNUZ'
            | 'f8E4M3B11FNUZ' | 'bf16' | 'f16' | 'f32' | 'f64'
TensorFloat32 ::= 'tf32'
ComplexType ::= 'complex' '<' ComplexElementType '>'
ComplexElementType ::= 'f32' | 'f64'

元素類型代表張量類型的元素。與許多程式設計不同這些類型不是 StableHLO 的第一類。也就是說 StableHLO 程式無法直接表示這些型別的值 (因此，使用 0 維張量表示 T 類型的純量值 tensor<T> 類型的值)。

布林值類型代表布林值 true 和 false。
整數類型可以是帶正負號 (si) 或無正負號 (ui)，並具有其中一個支援的位元寬度 (2、4、8、16、32 或 64)。帶符號的 siN 類型代表從 -2^(N-1) 到 2^(N-1)-1 的整數值包含不帶正負號的 uiN 類型代表 0 到 2^N-1 包含頭尾。
浮點類型可以是下列其中一項：
- f8E4M3FN 和 f8E5M2 類型分別對應 FP8 格式的 E4M3 和 E5M2 編碼，如適用於深度學習的 FP8 格式。
- 與 E4M3 和 E5M2 對應的 f8E4M3FNUZ 和 f8E5M2FNUZ 類型先前提過的 FP8 格式編碼深層類神經網路的 8 位元數值格式。
- 與 FP8 格式 E4M3 編碼對應的 f8E4M3B11FNUZ 類型描述深層類神經網路的混合 8 位元浮點 (HFP8) 訓練和推論。
- 與上述 bfloat16 格式對應的 bf16 類型 BFloat16：在 Cloud TPU 上高效能的秘訣。
- 分別對應的 f16、f32 和 f64 類型 binary16 (「半精度」)、binary32 (「單一精確度」) 和 binary64 (「雙精度」) 格式，如 IEEE 754 標準。
- tf32 類型對應的是 TensorFloat32 格式而且對 StableHLO 的支援有限。
複雜類型代表內含實際部分的複雜值和相同元素類型的虛構部分。支援的複雜功能類型為 complex<f32> (這兩個部分都是 f32 類型) 和 complex<f64> (這兩個部分都屬於 f64 類型)。

FunctionType ::= '(' InputTypes ')' '->' '(' OutputTypes ')'
InputTypes ::= [ValueType {',' ValueType}]
OutputTypes ::= [ValueType {',' ValueType}]

函式類型同時代表具名和匿名函式。他們擁有輸入類型 (-> 左側的類型清單) 和輸出類型 (-> 右側的類型清單)。在許多程式設計中函式類型為第一類別，但不在 StableHLO 中。

StringType ::= 'string'

字串類型代表位元組序列。與許多程式設計不同字串型別不是 StableHLO 中的第一類別，只會用於指定節目元素的靜態中繼資料。

作業

StableHLO 作業 (也稱為「作業」) 代表封閉組合機器學習模型的高階作業如前所述 StableHLO 語法受到 MLIR 影響，不一定是符合人體工學的替代方案，但最適合 StableHLO 達成實現機器學習架構和機器學習編譯器之間的互通性。

Op            ::= [OpOutputs] OpName OpInputs ':' OpSignature
OpName        ::= '"' 'stablehlo' '.' OpMnemonic '"'
OpMnemonic    ::= 'abs' | 'add' | ...

StableHLO 作業 (也稱為「作業」) 具有名稱、輸入/輸出及簽章名稱包含 stablehlo. 前置字串「記憶」，用於識別其中一個支援的作業。請見下方說明列出所有支援的作業

OpInputs        ::= OpInputValues OpInputFuncs OpInputAttrs
OpInputValues   ::= '(' [OpInputValue {',' OpInputValue}] ')'
OpInputValue    ::= ValueId
OpInputFuncs    ::= ['(' OpInputFunc {',' OpInputFunc} ')']
OpInputAttrs    ::= ['{' OpInputAttr {',' OpInputAttr} '}']
OpOutputs       ::= [OpOutput {',' OpOutput} '=']
OpOutput        ::= ValueId

作業會耗用「輸入內容」並產生「輸出內容」。輸入內容會分類為輸入值 (在執行期間計算)、輸入函數 (已提供) 靜態方式，因為在 StableHLO 函式中並不是一流的值) 以及輸入屬性 (也以靜態方式提供)。輸入與輸出的種類而相關運算的運用及產生方式就取決於其記憶。例如：add 運算會耗用 2 個輸入值，並產生 1 個輸出值。相較之下， select_and_scatter 運算會耗用 3 個輸入值、2 個輸入函式和 3 個輸入屬性。

OpInputFunc ::= '{' Unused FuncInputs ':' FuncBody '}'
Unused      ::= '^' digit {digit}
              | '^' letter {letter | digit}

輸入函式 (也稱為「匿名函式」) 非常與已命名函式類似，但以下情況除外：1) 這些函式沒有 ID (因此名為「anonymous」)，2) 都不會宣告輸出類型 (輸出類型是透過函式中的 return 運算推論而得)。

輸入函式的語法包含目前未使用的部分 (請參閱 Unused 實際工作環境)，想要與 MLIR 相容。在 MLIR 中還有較為籠統的「區域」概念可以有多個「區塊」透過跳躍運算彼此串連在一起這些區塊的 ID 與連結到 Unused 實際工作環境，以便彼此區別。 StableHLO 沒有跳躍運算，因此 MLIR 語法的對應部分為未使用 (但仍然存在)。

OpInputAttr      ::= OpInputAttrName '=' OpInputAttrValue
OpInputAttrName  ::= letter {letter | digit}
OpInputAttrValue ::= Constant

輸入屬性具有名稱和一項支援的屬性值常數。它們是指定節目靜態中繼資料的主要方式元素。舉例來說，concatenate 運算使用 dimension 屬性來也就是將輸入值串連起來的維度。同樣地 slice 運算使用多個屬性，例如 start_indices 和 limit_indices 可指定用來切割輸入值的邊界。

目前外部的 StableHLO 程式有時會包含屬性而本文中則不會討論到這類主題。日後，我們將無法將這些屬性吸收至 StableHLO 運算元，或禁止出現在 StableHLO 程式中在此同時屬性：

layout (#629)。
mhlo.frontend_attributes (#628)。
mhlo.sharding (#619)。
output_operand_aliases (#740)。
位置中繼資料 (#594)。

OpSignature ::= '(' [ValueType {',' ValueType}] ')' '->' '(' [ValueType {',' ValueType}] ')'

「運算簽名」包含所有輸入值的類型 ( -> 左側)，以及所有輸出值的類型 ( -> 右側的型別)。嚴格來說，輸入類型是多餘和輸出類型則幾乎總是備援大部分 StableHLO 運算 (輸出類型可從輸入推斷)。儘管如此是專門為 StableHLO 語法的一部分，以便與 MLIR 相容。

以下是其記憶法為 select_and_scatter 的運算範例。會耗用 3 次輸入值 (%operand、%source 和 %init_value)、2 個輸入函式和 3 個輸入屬性 (window_dimensions、window_strides 和 padding)。請注意，運算的簽名只包含輸入值的類型 (但並非以內嵌方式提供的輸入函式和屬性類型)。

%result = "stablehlo.select_and_scatter"(%operand, %source, %init_value) ({
  ^bb0(%arg0: tensor<i32>, %arg1: tensor<i32>):
    %0 = "stablehlo.compare"(%arg0, %arg1) {
      comparison_direction = #stablehlo<comparison_direction GE>
    } : (tensor<i32>, tensor<i32>) -> tensor<i1>
    "stablehlo.return"(%0) : (tensor<i1>) -> ()
}, {
  ^bb0(%arg0: tensor<i32>, %arg1: tensor<i32>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i32>, tensor<i32>) -> tensor<i32>
    "stablehlo.return"(%0) : (tensor<i32>) -> ()
}) {
  window_dimensions = dense<[3, 1]> : tensor<2xi64>,
  window_strides = dense<[2, 1]> : tensor<2xi64>,
  padding = dense<[[0, 1], [0, 0]]> : tensor<2x2xi64>
} : (tensor<4x2xi32>, tensor<2x2xi32>, tensor<i32>) -> tensor<4x2xi32>

常數

Constant ::= BooleanConstant
           | IntegerConstant
           | FloatConstant
           | ComplexConstant
           | TensorConstant
           | QuantizedTensorConstant
           | StringConstant
           | EnumConstant

StableHLO 常數具有常值和類型，兩者共同代表 StableHLO 值。一般來說，類型是常數語法的一部分，除了例如是不明確的時 (例如布林值常數一律為 i1 類型，而整數常數則可有多種可能的類型)。

BooleanConstant ::= BooleanLiteral
BooleanLiteral  ::= 'true' | 'false'

布林值常數代表布林值 true 和 false。布林值常數的類型為 i1。

IntegerConstant   ::= IntegerLiteral ':' IntegerType
IntegerLiteral    ::= ['-' | '+'] DecimalDigits
                    | ['-' | '+'] '0x' HexadecimalDigits
DecimalDigits     ::= decimalDigit {decimalDigit}
HexadecimalDigits ::= hexadecimalDigit {hexadecimalDigit}
decimalDigit      ::= '0' | ... | '9'
hexadecimalDigit  ::= decimalDigit | 'a' | ... | 'f' | 'A' | ... | 'F'

整數常數會透過使用小數或十六進制標記法。其他二進位檔案或八進位數字。整數常數具有下列限制：

(C1) is_wellformed(integer_literal, integer_type)。

FloatConstant  ::= FloatLiteral ':' FloatType
FloatLiteral   ::= SignPart IntegerPart FractionalPart ScientificPart
                 | '0x' [HexadecimalDigits]
SignPart       ::= ['-' | '+']
IntegerPart    ::= DecimalDigits
FractionalPart ::= ['.' [DecimalDigits]]
ScientificPart ::= [('e' | 'E') ['-' | '+'] DecimalDigits]

浮點常數會透過字串表示浮點值使用小數或科學記號此外，十六進位標記法的用來直接指定以相應類型浮點常數有下列限制：

(C1) 如果使用非十六進位標記法， is_wellformed(float_literal, float_type)。
(C2) 如果使用十六進位標記法， size(hexadecimal_digits) = num_bits(float_type) / 4。

ComplexConstant ::= ComplexLiteral ':' ComplexType
ComplexLiteral  ::= '(' RealPart ',' ImaginaryPart ')'
RealPart        ::= FloatLiteral
ImaginaryPart   ::= FloatLiteral

複雜常數會使用實際部分的清單表示複雜值和虛部分 (第二次)。例如： (1.0, 0.0) : complex<f32> 代表 1.0 + 0.0i， (0.0, 1.0) : complex<f32> 代表 0.0 + 1.0i。這些元素的順序部分資料儲存於記憶體中，則須由實作定義。複雜常數具有下列限制：

(C1) is_wellformed(real_part, complex_element_type(complex_type))。
(C2) is_wellformed(imaginary_part, complex_element_type(complex_type))。

TensorConstant ::= TensorLiteral ':' TensorType
TensorLiteral  ::= 'dense' '<' (DenseLiteral | ElementLiteral) '>'
DenseLiteral   ::= DenseDimension | DenseElements
DenseDimension ::= '[' [DenseLiteral {',' DenseLiteral}] ']'
DenseElements  ::= [ElementLiteral {',' ElementLiteral}]
ElementLiteral ::= BooleanLiteral | IntegerLiteral | FloatLiteral | ComplexLiteral

Tensor 常數使用透過 NumPy 標記法。例如：dense<[[1, 2, 3], [4, 5, 6]]> : tensor<2x3xi32> 代表張量值，將下列從索引對應至元素： {0, 0} => 1、{0, 1} => 2、{0, 2} => 3、{1, 0} => 4、{1, 1} => 5、 {1, 2} => 6。這些元素在記憶體中的儲存順序為您會瞭解自己的解決方案Tensor 常數有下列限制：

(C1) has_syntax(tensor_literal, element_type(tensor_type))，其中：
- has_syntax(element_literal: Syntax, element_type: Type) = is_wellformed(element_literal, type)。
- has_syntax(tensor_literal: List, element_type: Type) = has_syntax(tensor_literal..., element_type)。
(C2) has_shape(tensor_literal, shape(tensor_type))，其中：
- has_shape(element_literal: Syntax, []) = true。
- has_shape(tensor_literal: List, shape: List) = size(tensor_literal) = shape[0] and has_shape(tensor_literal..., shape[1:])。
- 否則為 false。

QuantizedTensorConstant ::= QuantizedTensorLiteral ':' QuantizedTensorType
QuantizedTensorLiteral  ::= 'dense' '<' (DenseLiteral | ElementLiteral) '>'

量化張量常數會使用相同的作為張量常數，且會將元素指定為儲存空間類型量化張量常數有下列限制：

(C1) has_syntax(quantized_tensor_literal, storage_type(quantized_tensor_type))。
(C2) has_shape(quantized_tensor_literal, shape(quantized_tensor_type))。

StringConstant  ::= StringLiteral
StringLiteral   ::= '"' {stringCharacter | escapeSequence} '"'
stringCharacter ::= all ASCII characters except '\00', '\01', ... '\1f' and '"'
escapeSequence  ::= '\' ('"' | '\' | 'n' | 't' | (hexadecimalDigit hexadecimalDigit))

字串常值包含使用 ASCII 字元指定的位元組，逸出序列。這些字元與編碼無關，因此則是由實作定義字串常值屬於 string 類型。

作業數

腹部

語義學

對 operand 張執行元素的抽象運算，並產生 result 張量根據元素類型執行以下操作：

如果是帶正負號整數：整數模數。
浮點值：IEEE-754 的 abs。
針對複數：複數的模數。
量化類型：dequantize_op_quantize(abs, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	帶正負號整數、浮點數或複雜類型或各張量量化張量	(C1-C2)

輸出

名稱	類型	限制
`result`	帶正負號整數或浮點類型，或每個張量的量化張量	(C1-C2)

限制

(C1) shape(result) = shape(operand)。
(C2) baseline_element_type(result) 定義為：
- 如果 is_complex(operand)，則為 complex_element_type(element_type(operand))。
- 否則為 baseline_element_type(operand)。

範例

// %operand: [-2, 0, 2]
%result = "stablehlo.abs"(%operand) : (tensor<3xi32>) -> tensor<3xi32>
// %result: [2, 0, 2]

更多範例

add

語義學

執行將兩個張量依元素加入 lhs 和 rhs 的結果，並產生 result 張量。根據元素類型執行以下操作：

布林值：邏輯 OR。
整數：加整數。
浮點值：IEEE-754 的 addition。
適用於複數：複雜加法。
量化類型：dequantize_op_quantize(add, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	張量或量化張量	(C1-C6)
(I2)。	`rhs`	張量或量化張量	(C1-C5)、(C7)

輸出

名稱	類型	限制
`result`	張量或量化張量	(C1-C7)

限制

如果作業使用非量化張量：
- (C1) type(lhs) = type(rhs) = type(result)。
如果作業使用量化張量：
- (C2) is_quantized(lhs) and is_quantized(rhs) and is_quantized(result)。
- (C3) storage_type(lhs) = storage_type(rhs) = storage_type(result)。
- (C4) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C5) (is_per_axis_quantized(lhs) or is_per_axis_quantized(rhs)) = is_per_axis_quantized(result)。
- (C6) 如果值為 is_per_axis_quantized(lhs)，則值為 quantization_dimension(lhs) = quantization_dimension(result)。
- (C7) 如果值為 is_per_axis_quantized(rhs)，則值為 quantization_dimension(rhs) = quantization_dimension(result)。

範例

// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.add"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[6, 8], [10, 12]]

更多範例

after_all

語義學

確保產生 inputs 的作業會在任何要求前執行處理仰賴 result 的作業。執行作業不會執行任何動作它只是為了建立 result 到 inputs 的資料依附元件。

輸入

標籤	名稱	類型
(I1)。	`inputs`	`token` 的變異數

輸出

名稱	類型
`result`	`token`

範例

// %input0: !stablehlo.token
// %input1: !stablehlo.token
%result = "stablehlo.after_all"(%input0, %input1) : (!stablehlo.token, !stablehlo.token) -> !stablehlo.token

更多範例

all_gather

語義學

在 StableHLO 程序網格中的每個程序群組中，將值串連起來分別來自 all_gather_dim 和各個程序的 operands 張量 results 張張量。

這項作業會將 StableHLO 程序格線分割為 process_groups 定義如下：

cross_replica(replica_groups) 如果 channel_id <= 0 and use_global_device_ids = false。
cross_replica_and_partition(replica_groups) 如果 channel_id > 0 and use_global_device_ids = false。
flattened_ids(replica_groups) 如果 channel_id > 0 and use_global_device_ids = true。

之後，在每個 process_group 內：

所有產品的operands...@receiver = [operand@sender for sender in process_group] 「process_group」的「receiver」。
所有產品的results...@process = concatenate(operands...@process, all_gather_dim) 「process_group」的「process」。

輸入

標籤	名稱	類型	限制
(I1)。	`operands`	變異量或每個張量量化張量	(C1)、(C6)
(I2)。	`all_gather_dim`	`si64` 類型的常數	(C1)、(C6)
(I3)。	`replica_groups`	`si64` 類型的 2D 張張量常數	(C2-C4)
(I4)。	`channel_id`	`si64` 類型的常數	(C5)。
(I5)。	`use_global_device_ids`	`i1` 類型的常數	(C5)。

輸出

名稱	類型	限制
`results`	變異量或每個張量量化張量	(C6)。

限制

(C1) 0 <= all_gather_dim < rank(operands...)。
(C2) is_unique(replica_groups)。
(C3) size(replica_groups) 定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_replica_and_partition，則為 num_replicas。
- 如果使用 flattened_ids，則為 num_processes。
(C4) 0 <= replica_groups < size(replica_groups)。
(C5) 如果值為 use_global_device_ids = true，則值為 channel_id > 0。
(C6) type(results...) = type(operands...)，但以下項目除外：
- dim(results..., all_gather_dim) = dim(operands..., all_gather_dim) * dim(process_groups, 1)。

範例

// num_replicas: 2
// num_partitions: 1
// %operand0@(0, 0): [[1, 2], [3, 4]]
// %operand0@(1, 0): [[5, 6], [7, 8]]
// %operand1@(0, 0): [[11, 12], [13, 14]]
// %operand1@(1, 0): [[15, 16], [17, 18]]
%result:2 = "stablehlo.all_gather"(%operand0, %operand1) {
  all_gather_dim = 1 : i64,
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>,
  // channel_id = 0
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
  // use_global_device_ids = false
} : (tensor<2x2xi64>, tensor<2x2xi64>) -> (tensor<2x4xi64>, tensor<2x4xi64>)
// %result0@(0, 0): [[1, 2, 5, 6], [3, 4, 7, 8]]
// %result0@(1, 0): [[1, 2, 5, 6], [3, 4, 7, 8]]
// %result1@(0, 0): [[11, 12, 15, 16], [13, 14, 17, 18]]
// %result1@(1, 0): [[11, 12, 15, 16], [13, 14, 17, 18]]

更多範例

all_reduce

語義學

在 StableHLO 程序網格中的每個處理程序群組中，套用縮減項目函式 computation 設為每個程序的 operands 張量值並產生 results 張量

這項作業會將 StableHLO 程序格線分割為 process_groups 定義如下：

cross_replica(replica_groups) 如果 channel_id <= 0 and use_global_device_ids = false。
cross_replica_and_partition(replica_groups) 如果 channel_id > 0 and use_global_device_ids = false。
flattened_ids(replica_groups) 如果 channel_id > 0 and use_global_device_ids = true。

之後，在每個 process_group 內：

results...@process[result_index] = exec(schedule) 代表一些二進位樹狀結構 schedule，其中：
- exec(node) = computation(exec(node.left), exec(node.right))。
- exec(leaf) = leaf.value。
schedule 是實作定義的二進位樹狀結構，其順序為週遊是 to_destination_type(operands...@process_group...[result_index], type(func_inputs(computation)[0]))。

輸入

標籤	名稱	類型	限制
(I1)。	`operands`	變異量或每個張量量化張量	(C5)、(C6)
(I2)。	`replica_groups`	`si64` 類型的 1D 張量常數的變異數	(C1-C3)
(I3)。	`channel_id`	`si64` 類型的常數	(C4)。
(I4)。	`use_global_device_ids`	`i1` 類型的常數	(C4)。
(I5)。	`computation`	函式	(C5)。

輸出

名稱	類型	限制
`results`	變異量或每個張量量化張量	(C6-C7)

限制

(C1) is_unique(replica_groups)。
(C2) size(replica_groups) 定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_replica_and_partition，則為 num_replicas。
- 如果使用 flattened_ids，則為 num_processes。
(C3) 0 <= replica_groups < size(replica_groups)。
(C4) 如果值為 use_global_device_ids = true，則值為 channel_id > 0。
(C5) computation 採用 (tensor<E>, tensor<E>) -> (tensor<E>) 類型，其中 is_promotable(element_type(operand), E)。
(C6) shape(results...) = shape(operands...)。
(C7) element_type(results...) = E。

範例

// num_replicas: 2
// num_partitions: 1
// %operand0@(0, 0): [1, 2, 3, 4]
// %operand0@(1, 0): [5, 6, 7, 8]
// %operand1@(0, 0): [9, 10, 11, 12]
// %operand1@(1, 0): [13, 14, 15, 16]
%result:2 = "stablehlo.all_reduce"(%operand0, %operand0) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>,
  // channel_id = 0
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
  // use_global_device_ids = false
} : (tensor<4xi64>, tensor<4xi64>) -> (tensor<4xi64>, tensor<4xi64>)
// %result0@(0, 0): [6, 8, 10, 12]
// %result0@(1, 0): [6, 8, 10, 12]
// %result1@(0, 0): [22, 24, 26, 28]
// %result1@(1, 0): [22, 24, 26, 28]

更多範例

all_to_all

語義學

all_to_all

在 StableHLO 程序網格中的每個程序群組內，沿著 split_dimension 的 operands 張量分為多個部分將分散的元件串連起來 concat_dimension 並產生 results 張量。這項作業會將 StableHLO 程序格線分割為 process_groups 定義如下：

如果 channel_id <= 0，則為 cross_replica(replica_groups)。
如果 channel_id > 0，則為 cross_partition(replica_groups)。

之後，在每個 process_group 內：

split_parts...@sender = split(operands...@sender, split_count, split_dimension) 適用於 process_group 的所有 sender。
scattered_parts...@receiver = [split_parts...@sender[receiver_index] for sender in process_group]，其中 receiver_index = process_group.index(receiver)。
results...@process = concatenate(scattered_parts...@process, concat_dimension)。

輸入

標籤	名稱	類型	限制
(I1)。	`operands`	變異量或每個張量量化張量	(C1-C3)、(C9)
(I2)。	`split_dimension`	`si64` 類型的常數	(C1)、(C2)、(C9)
(I3)。	`concat_dimension`	`si64` 類型的常數	(C3)、(C9)
(I4)。	`split_count`	`si64` 類型的常數	(C2)、(C4)、(C8)、(C9)
(I5)。	`replica_groups`	`si64` 類型的 2D 張張量常數	(C5-C8)
(I6)。	`channel_id`	`si64` 類型的常數

輸出

名稱	類型	限制
`results`	變異量或每個張量量化張量	(C9)。

限制

(C1) 0 <= split_dimension < rank(operands...)。
(C2) dim(operands..., split_dimension) % split_count = 0。
(C3) 0 <= concat_dimension < rank(operands...)。
(C4) 0 < split_count。
(C5) is_unique(replica_groups)。
(C6) size(replica_groups) 的定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_partition，則為 num_partitions。
(C7) 0 <= replica_groups < size(replica_groups)。
(C8) dim(replica_groups, 1) = split_count。
(C9) type(results...) = type(operands...)，但 split_dimension != concat_dimension 除外：
- dim(results..., split_dimension) = dim(operands..., split_dimension) / split_count。
- dim(results..., concat_dimension) = dim(operands..., concat_dimension) * split_count。

範例

// num_replicas: 2
// num_partitions: 1
// %operand1@(0, 0): [[1, 2, 3, 4],
//                    [5, 6, 7, 8]]
// %operand1@(1, 0): [[9, 10, 11, 12],
//                    [13, 14, 15, 16]]
// %operand2@(0, 0): [[17, 18, 19, 20],
//                    [21, 22, 23, 24]]
// %operand2@(1, 0): [[25, 26, 27, 28],
//                    [29, 30, 31, 32]]
%result:2 = "stablehlo.all_to_all"(%operand1, %operand2) {
  split_dimension = 1 : i64,
  concat_dimension = 0 : i64,
  split_count = 2 : i64,
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>
  // channel_id = 0
} : (tensor<2x4xi64>, tensor<2x4xi64>) -> (tensor<4x2xi64>, tensor<4x2xi64>)
// %result#0@(0, 0): [[1, 2], [5, 6], [9, 10], [13, 14]]
// %result#0@(1, 0): [[3, 4], [7, 8], [11, 12], [15, 16]]
// %result#1@(0, 0): [[17, 18], [21, 22], [25, 26], [29, 30]]
// %result#1@(1, 0): [[19, 20], [23, 24], [27, 28], [31, 32]]

更多範例

和

語義學

執行 lhs 和 rhs 兩個張量的元素 AND 運算，然後產生 result 張量根據元素類型執行以下操作：

布林值：邏輯 AND。
整數：位元 AND。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	布林值或整數類型的張量	(C1)。
(I2)。	`rhs`	布林值或整數類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	布林值或整數類型的張量	(C1)。

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.and"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[1, 2], [3, 0]]

更多範例

atan2

語義學

對 lhs 和 rhs 張執行元素的 atan2 運算，然後產生 result 張量。根據元素類型執行以下操作：

浮點值：IEEE-754 的 atan2。
複雜數字：複數的 atan2。
量化類型：dequantize_op_quantize(atan2, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	浮點或複雜型別或每個張量的量化張量	(C1)。
(I2)。	`rhs`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [0.0, 1.0, -1.0]
// %rhs: [0.0, 0.0, 0.0]
%result = "stablehlo.atan2"(%lhs, %rhs) : (tensor<3xf64>, tensor<3xf64>) -> tensor<3xf64>
// %result: [0.0, 1.57079637, -1.57079637] // [0.0, pi/2, -pi/2]

更多範例

batch_norm_grad

語義學

計算 batch_norm_training 反向傳播的多個輸入梯度並產生 grad_operand、grad_scale 和 grad_offsetgrad_output 張量更正式的說法可以用分解的方式表示使用 Python 語法的現有 StableHLO 作業：

def compute_sum(operand, feature_index):
  (sum,) = reduce(
      inputs=[operand],
      init_values=[constant(0, element_type(operand))],
      dimensions=[i for i in range(rank(operand)) if i != feature_index],
      body=lambda x, y: add(x, y))
  return sum

def compute_mean(operand, feature_index):
  sum = compute_sum(operand, feature_index)
  divisor = constant(size(operand) / dim(operand, feature_index),
                     element_type(operand))
  divisor_bcast = broadcast_in_dim(divisor, [], type(sum))
  return divide(sum, divisor_bcast)

def batch_norm_grad(operand, scale, mean, variance, grad_output, epsilon, feature_index):
  # Broadcast inputs to type(operand)
  scale_bcast = broadcast_in_dim(scale, [feature_index], type(operand))
  mean_bcast = broadcast_in_dim(mean, [feature_index], type(operand))
  variance_bcast = broadcast_in_dim(variance, [feature_index], type(operand))
  epsilon_bcast = broadcast_in_dim(constant(epsilon, element_type(operand)), [],
                                   type(operand))

  # Perform normalization using the provided `mean` and `variance`
  # Intermediate values will be useful for computing gradients
  centered_operand = subtract(operand, mean_bcast)
  stddev = sqrt(add(variance_bcast, epsilon_bcast))
  normalized_operand = divide(centered_operand, stddev)

  # Use the implementation from batchnorm_expander.cc in XLA
  # Temporary variables have exactly the same names as in the C++ code
  elements_per_feature = broadcast_in_dim(
      constant(divide(size(operand), dim(operand, feature_index)),
               element_type(grad_output)),
      [], type(operand))
  i1 = multiply(grad_output, elements_per_feature)
  i2 = broadcast_in_dim(
      compute_sum(grad_output, feature_index), [feature_index], type(operand))
  i3 = broadcast_in_dim(
      compute_sum(multiply(grad_output, centered_operand), feature_index),
      [feature_index], type(operand))
  i4 = multiply(i3, centered_operand)
  i5 = divide(i4, add(variance_bcast, epsilon_bcast))
  i6 = subtract(subtract(i1, i2), i5)

  grad_operand =
      multiply(divide(divide(scale_bcast, stddev), elements_per_feature), i6)
  grad_scale =
      compute_sum(multiply(grad_output, normalized_operand), feature_index)
  grad_offset = compute_sum(grad_output, feature_index)

  return grad_operand, grad_scale, grad_offset

如果是量化類型，請執行 dequantize_batch_norm_grad_or_training_quantize(lambda operand, scale, mean, variance, grad_output: batch_norm_grad(operand, scale, mean, variance, grad_output, epsilon, feature_index), operand, scale, mean, variance, grad_output, type(grad_operand), type(grad_scale), type(feature_index))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點類型或每個張量的量化張量	(C1-C3)、(C5)
(I2)。	`scale`	1D 浮點或個別張量量化類型的 1 維張量	(C2)、(C4)、(C5)
(I3)。	`mean`	1D 浮點或個別張量量化類型的 1 維張量	(C2)、(C4)
(I4)。	`variance`	1D 浮點或個別張量量化類型的 1 維張量	(C2)、(C4)
(I5)。	`grad_output`	浮點類型或每個張量的量化張量	(C2)、(C3)
(I6)。	`epsilon`	`f32` 類型的常數
(I7)。	`feature_index`	`si64` 類型的常數	(C1)、(C5)

輸出

名稱	類型	限制
`grad_operand`	浮點類型或每個張量的量化張量	(C2)、(C3)
`grad_scale`	1D 浮點或個別張量量化類型的 1 維張量	(C2)、(C4)
`grad_offset`	1D 浮點或個別張量量化類型的 1 維張量	(C2)、(C4)

限制

(C1) 0 <= feature_index < rank(operand)。
(C2) operand、scale、mean、variance、grad_output、grad_operand、 grad_scale 和 grad_offset 具有相同的 baseline_element_type。
(C3) operand、grad_output 和 grad_operand 的形狀相同。
(C4) scale、mean、variance、grad_scale 和 grad_offset 具有我們可以看到
(C5) size(scale) = dim(operand, feature_index)。

範例

// %operand: [
//            [[1.0, 2.0], [3.0, 4.0]],
//            [[3.0, 4.0], [1.0, 2.0]]
//           ]
// %scale: [1.0, 1.0]
// %mean: [2.0, 3.0]
// %variance: [1.0, 1.0]
// %grad_output: [
//                [[0.1, 0.1], [0.1, 0.1]],
//                [[0.1, 0.1], [0.1, 0.1]]
//               ]
%grad_operand, %grad_scale, %grad_offset =
"stablehlo.batch_norm_grad"(%operand, %scale, %mean, %variance, %grad_output) {
  epsilon = 0.0 : f32,
  feature_index = 2 : i64
} : (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>, tensor<2xf64>,
     tensor<2x2x2xf64>) -> (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>)
// %grad_operand: [
//                 [[0.0, 0.0], [0.0, 0.0]],
//                 [[0.0, 0.0], [0.0, 0.0]]
//                ]
// %grad_scale:  [0.0, 0.0]
// %grad_offset: [0.4, 0.4]

batch_norm_inference

語義學

將 operand張量 (除了 feature_index 維度並產生 result 張量。更正式作業可表示為現有 StableHLO 作業的分解使用 Python 語法如下：

def batch_norm_inference(operand, scale, offset, mean, variance, epsilon, feature_index):
  # Broadcast inputs to shape(operand)
  scale_bcast = broadcast_in_dim(scale, [feature_index], type(operand))
  offset_bcast = broadcast_in_dim(offset, [feature_index], type(operand))
  mean_bcast = broadcast_in_dim(mean, [feature_index], type(operand))
  variance_bcast = broadcast_in_dim(variance, [feature_index], type(operand))
  epsilon_bcast = broadcast_in_dim(constant(epsilon, element_type(operand)), [],
                                   type(operand))

  # Perform normalization using the provided `mean` and `variance` instead of
  # computing them like `batch_norm_training` does.
  centered_operand = subtract(operand, mean_bcast)
  stddev = sqrt(add(variance_bcast, epsilon_bcast))
  normalized_operand = divide(centered_operand, stddev)
  return add(multiply(scale_bcast, normalized_operand), offset_bcast)

如果是量化類型，請執行 dequantize_op_quantize(lambda operand, scale, offset, mean, variance: batch_norm_inference(operand, scale, offset, mean, variance, epsilon, feature_index), operand, scale, offset, mean, variance, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點類型或每個張量的量化張量	(C1-C7)
(I2)。	`scale`	1D 浮點或個別張量量化類型的 1 維張量	(C2)、(C3)
(I3)。	`offset`	1D 浮點或個別張量量化類型的 1 維張量	(C2)、(C4)
(I4)。	`mean`	1D 浮點或個別張量量化類型的 1 維張量	(C5)。
(I5)。	`variance`	1D 浮點或個別張量量化類型的 1 維張量	(C2)、(C6)
(I6)。	`epsilon`	`f32` 類型的常數
(I7)。	`feature_index`	`si64` 類型的常數	(C1)、(C3-C6)

輸出

名稱	類型	限制
`result`	浮點類型或每個張量的量化張量	(C2)、(C7)

限制

(C1) 0 <= feature_index < rank(operand)。
(C2) operand、scale、offset、mean、variance 和 result 有相同的 baseline_element_type。
(C3) size(scale) = dim(operand, feature_index)。
(C4) size(offset) = dim(operand, feature_index)。
(C5) size(mean) = dim(operand, feature_index)。
(C6) size(variance) = dim(operand, feature_index)。
(C7) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [
//            [[1.0, 2.0], [3.0, 4.0]],
//            [[3.0, 4.0], [1.0, 2.0]]
//           ]
// %scale: [1.0, 1.0]
// %offset: [1.0, 1.0]
// %mean: [2.0, 3.0]
// %variance: [1.0, 1.0]
%result = "stablehlo.batch_norm_inference"(%operand, %scale, %offset, %mean, %variance) {
  epsilon = 0.0 : f32,
  feature_index = 2 : i64
} : (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>, tensor<2xf64>, tensor<2xf64>) -> tensor<2x2x2xf64>
// %result: [
//           [[0.0, 0.0], [2.0, 2.0]],
//           [[2.0, 2.0], [0.0, 0.0]]
//          ]

batch_norm_training

語義學

計算所有維度 (feature_index 除外) 的平均值和變異數維度，並將產生 output、batch_mean 的 operand 張量正規化和 batch_var 張張量。更正式的說法是以使用 Python 語法來分解現有 StableHLO 作業如下：

def compute_mean(operand, feature_index):
  (sum,) = reduce(
      inputs=[operand],
      init_values=[constant(0, element_type(operand))],
      dimensions=[i for i in range(rank(operand)) if i != feature_index],
      body=lambda x, y: add(x, y))
  divisor = constant(size(operand) / dim(operand, feature_index),
                     element_type(operand))
  divisor_bcast = broadcast_in_dim(divisor, [], type(sum))
  return divide(sum, divisor_bcast)

def compute_variance(operand, feature_index):
  mean = compute_mean(operand, feature_index)
  mean_bcast = broadcast_in_dim(mean, [feature_index], type(operand))
  centered_operand = subtract(operand, mean_bcast)
  return compute_mean(mul(centered_operand, centered_operand), feature_index)

def batch_norm_training(operand, scale, offset, epsilon, feature_index):
  mean = compute_mean(operand, feature_index)
  variance = compute_variance(operand, feature_index)
  return batch_norm_inference(operand, scale, offset, mean, variance, epsilon,
                              feature_index),
         mean, variance

如果是量化類型，請執行 dequantize_batch_norm_grad_or_training_quantize(lambda operand, scale, offset: batch_norm_training(operand, scale, offset, epsilon, feature_index), operand, scale, offset, type(output), type(batch_mean), type(batch_var))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點類型或每個張量的量化張量	(C1)。
(I2)。	`scale`	1D 浮點或個別張量量化	(C2)、(C3)
(I3)。	`offset`	1D 浮點或個別張量量化	(C2)、(C4)
(I4)。	`epsilon`	`f32` 類型的常數	(C1)、(C3-C6)
(I5)。	`feature_index`	`si64` 類型的常數	(C1)、(C3-C6)

輸出

名稱	類型	限制
`output`	浮點類型或每個張量的量化張量	(C7)。
`batch_mean`	1D 浮點或個別張量量化	(C2)、(C5)
`batch_var`	1D 浮點或個別張量量化	(C2)、(C6)

限制

(C1) 0 <= feature_index < rank(operand)。
(C2) operand、scale、offset、batch_mean、batch_var 和 output 有相同的 baseline_element_type。
(C3) size(scale) = dim(operand, feature_index)。
(C4) size(offset) = dim(operand, feature_index)。
(C5) size(batch_mean) = dim(operand, feature_index)。
(C6) size(batch_var) = dim(operand, feature_index)。
(C7) baseline_type(output) = baseline_type(operand)。

範例

// %operand: [
//            [[1.0, 2.0], [3.0, 4.0]],
//            [[3.0, 4.0], [1.0, 2.0]]
//           ]
// %scale: [1.0, 1.0]
// %offset: [1.0, 1.0]
%output, %batch_mean, %batch_var = "stablehlo.batch_norm_training"(%operand, %scale, %offset) {
  epsilon = 0.0 : f32,
  feature_index = 2 : i64
} : (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>) ->
    (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>)
// %output: [
//           [[0.0, 0.0], [2.0, 2.0]],
//           [[2.0, 2.0], [0.0, 0.0]]
//          ]
// %batch_mean: [2.0, 3.0]
// %batch_var: [1.0, 1.0]

bitcast_convert

語義學

對 operand 張量執行點陣圖作業，並產生 result 張量其中部分 operand 的位元會以 result 張類型。

更正式，給予E = element_type(operand)、E' = element_type(result)，和 R = rank(operand)：

如果是 num_bits(E') < num_bits(E)， bits(result[i0, ..., iR-1, :]) = bits(operand[i0, ..., iR-1])。
如果是 num_bits(E') > num_bits(E)， bits(result[i0, ..., iR-2]) = bits(operand[i0, ..., iR-2, :])。
如果是 num_bits(E') = num_bits(E)， bits(result[i0, ..., iR-1]) = bits(operand[i0, ..., iR-1])。

bits 會傳回指定值及其行為的記憶體內表示法是實作的定義，因為張量的確切表示方式為而元素類型的具體呈現方式也會定義相同的實作方式

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或量化張量	(C1-C2)

輸出

名稱	類型	限制
`result`	張量或量化張量	(C1-C2)

限制

(C1) 指定 E = is_quantized(operand) ? storage_type(operand) : element_type(operand)、E' = is_quantized(result) ? storage_type(result) : element_type(result) 和 R = rank(operand)：
- 如果是 num_bits(E') = num_bits(E)，shape(result) = shape(operand)。
- 如果為 num_bits(E') < num_bits(E)：
- rank(result) = R + 1。
- 所有0 <= i < R的dim(result, i) = dim(operand, i)。
- dim(result, R) * num_bits(E') = num_bits(E)。
- 如果為 num_bits(E') > num_bits(E)：
- rank(result) = R - 1。
- 所有0 <= i < R的dim(result, i) = dim(operand, i)。
- dim(operand, R - 1) * num_bits(E) = num_bits(E')。
(C2) 如果值為 is_complex(operand) or is_complex(result)，則 is_complex(operand) and is_complex(result)。

範例

// %operand: 0x0123456789ABCDEF
%result = "stablehlo.bitcast_convert"(%operand) : (tensor<f64>) -> tensor<4xf16>
// %result: [0xCDEF, 0x89AB, 0x4567, 0x0123] // little-endian representation

更多範例

broadcast_in_dim

語義學

複製資料來擴充輸入張量的維度和/或排名 operand張量中，會產生 result張量。更正式 result[result_index] = operand[operand_index]，涵蓋所有d axes(operand):

如果 dim(operand, d) = 1，則為 operand_index[d] = 0。
否則為 operand_index[d] = result_index[broadcast_dimensions[d]]。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或量化張量	(C1-C2)、(C5-C6)
(I2)。	`broadcast_dimensions`	`si64` 類型的 1D 張量常數	(C2-C6)

輸出

名稱	類型	限制
`result`	張量或量化張量	(C1)、(C3)、(C5-C6)

限制

(C1) element_type(result) 的提供者：
- 如果 !is_per_axis_quantized(operand)，則為 element_type(operand)。
- element_type(operand)，但quantization_dimension(operand)， scales(operand)，zero_points(operand)可能與 quantization_dimension(result)、scales(result) 和 zero_points(result) 否則會怎麼樣
(C2) size(broadcast_dimensions) = rank(operand)。
(C3) 0 <= broadcast_dimensions < rank(result)。
(C4) is_unique(broadcast_dimensions)。
(C5) 對於 axes(operand) 中的所有 d：
- dim(operand, d) = 1或
- dim(operand, d) = dim(result, broadcast_dimensions[d])。
(C6) 如果 is_per_axis_quantized(result)：
- quantization_dimension(result) = broadcast_dimensions[quantization_dimension(operand)]。
- 如果值為 dim(operand, quantization_dimension(operand)) = 1，則 scales(result)[i] = scales(operand)[0] and zero_points(result)[i] = zero_points(operand)[0] for i in range(dim(result, quantization_dimension(result)))。

範例

// %operand: [
//            [1, 2, 3]
//           ]
%result = "stablehlo.broadcast_in_dim"(%operand) {
  broadcast_dimensions = array<i64: 2, 1>
} : (tensor<1x3xi32>) -> tensor<2x3x2xi32>
// %result: [
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ],
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ]
//          ]

更多範例

保護殼

語義學

這個輸出內容只會從 branches 執行一個函式，進而產生輸出內容視 index 的值而定更正式，result = selected_branch() 其中：

如果 0 <= index < size(branches)，則為 selected_branch = branches[index]。
否則為 selected_branch = branches[-1]。

輸入

標籤	名稱	類型	限制
(I1)。	`index`	`si32` 類型的 0D 張張量
(I2)。	`branches`	可變函式數	(C1-C4)

輸出

名稱	類型	限制
`results`	變異量、量化張量或代詞	(C4)。

限制

(C1) 0 < size(branches)。
(C2) input_types(branches...) = []。
(C3) same(output_types(branches...))。
(C4) type(results...) = output_types(branches[0])。

範例

// %index: -1
// %result_branch0: [0, 0]
// %result_branch1: [1, 1]
%result0, %result1 = "stablehlo.case"(%index) ({
  "stablehlo.return"(%result_branch0, %result_branch0) : (tensor<2xi64>, tensor<2xi64>) -> ()
}, {
  "stablehlo.return"(%result_branch1, %result_branch1) : (tensor<2xi64>, tensor<2xi64>) -> ()
}) : (tensor<i32>) -> (tensor<2xi64>, tensor<2xi64>)
// %result0: [1, 1]
// %result1: [1, 1]

更多範例

cbrt

語義學

對 operand 張執行元素依據元素的立方根運算，然後產生 result 張量。根據元素類型執行以下操作：

浮點值：IEEE-754 的 rootn(x, 3)。
針對複數：複雜的立方根。
量化類型：dequantize_op_quantize(cbrt, operand, type(result))

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [0.0, 1.0, 8.0, 27.0]
%result = "stablehlo.cbrt"(%operand) : (tensor<4xf64>) -> tensor<4xf64>
// %result: [0.0, 1.0, 2.0, 3.0]

更多範例

錫爾

語義學

執行 operand 張元素依據元素，產生 result 張量。實作 IEEE-754 中的 roundToIntegralTowardPositive 作業規格。如果是量化類型，請執行 dequantize_op_quantize(ceil, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點類型或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點類型或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [-0.8166, -0.2530, 0.2530, 0.8166, 2.0]
%result = "stablehlo.ceil"(%operand) : (tensor<5xf32>) -> tensor<5xf32>
// %result: [-0.0, -0.0, 1.0, 1.0, 2.0]

更多範例

Cholesky

語義學

計算一批矩陣的 Cholesky 分解。

更正式，針對「index_space(result)」的所有i， result[i0, ..., iR-3, :, :] 是 Cholesky 分解法 a[i0, ..., iR-3, :, :]，可以是下三角函數 (如果 lower 是 true) 或上方三角形 (lower 是 false) 矩陣。對角三角形的輸出值，即嚴格上三角形或相對來說，嚴格的下限三角形則為實作定義。

如果 i 存在，且輸入矩陣不是 Hermitian 正確定值，則此行為為未定義。

如果是量化類型，請執行 dequantize_op_quantize(lambda operand: cholesky(operand, lower), a, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`a`	浮點或複雜型別或每個張量的量化張量	(C1-C3)
(I2)。	`lower`	`i1` 類型的 0 維張量常數

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(a) = baseline_type(result)。
(C2) 2 <= rank(a)。
(C3) dim(a, -2) = dim(a, -1)。

範例

// %a: [
//      [1.0, 2.0, 3.0],
//      [2.0, 20.0, 26.0],
//      [3.0, 26.0, 70.0]
//     ]
%result = "stablehlo.cholesky"(%a) {
  lower = true
} : (tensor<3x3xf32>) -> tensor<3x3xf64>
// %result: [
//           [1.0, 0.0, 0.0],
//           [2.0, 4.0, 0.0],
//           [3.0, 5.0, 6.0]
//          ]

限制取值範圍

語義學

將 operand 張量的每個元素限制在最小值和最大值之間值並產生 result 張量。更正式，result[result_index] = minimum(maximum(operand[result_index], min_element), max_element)，其中 min_element = rank(min) = 0 ? min[] : min[result_index]、 max_element = rank(max) = 0 ? max[] : max[result_index]。以量化類型來說執行dequantize_op_quantize(clamp, min, operand, max, type(result))。

對複雜數字表示排序涉及出乎意料的語意因此，我們預計日後將停止支援複數數字用於這項作業 (#560)。

輸入

標籤	名稱	類型	限制
(I1)。	`min`	張量或每個張量量化張量	(C1)、(C3)
(I2)。	`operand`	張量或每個張量量化張量	(C1-C4)
(I3)。	`max`	張量或每個張量量化張量	(C2)、(C3)

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C4)。

限制

(C1) rank(min) = 0 or shape(min) = shape(operand)。
(C2) rank(max) = 0 or shape(max) = shape(operand)。
(C3) baseline_element_type(min) = baseline_element_type(operand) = baseline_element_type(max)。
(C4) baseline_type(operand) = baseline_type(result)。

範例

// %min: [5, 10, 15]
// %operand: [3, 13, 23]
// %max: [10, 15, 20]
%result = "stablehlo.clamp"(%min, %operand, %max) : (tensor<3xi32>, tensor<3xi32>, tensor<3xi32>) -> tensor<3xi32>
// %result: [5, 13, 20]

更多範例

collective_broadcast

語義學

在 StableHLO 程序網格中的每個程序群組內，將從來源程序到目標程序的 operand 張量，並產生 result 張量。

這項作業會將 StableHLO 程序格線分割為 process_groups 定義如下：

如果 channel_id <= 0，則為 cross_replica(replica_groups)。
如果 channel_id > 0，則為 cross_partition(replica_groups)。

之後，result@process 是由以下公式表示：

operand@process_groups[i, 0] (如果有 i 來進行程序位置：process_groups[i]。
broadcast_in_dim(constant(is_quantized(result) ? quantize(0, element_type(result)) : 0, element_type(result)), [], type(result)) 反之。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C3)。
(I2)。	`replica_groups`	`si64` 類型的 1D 張量常數的變異數	(C1)、(C2)
(I3)。	`channel_id`	`si64` 類型的常數

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C3)。

限制

(C1) is_unique(replica_groups)。
(C2) 0 <= replica_groups < N，其中 N 定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_partition，則為 num_partitions。
(C3) type(result) = type(operand)。

範例

// num_replicas: 4
// num_partitions: 1
// %operand@(0, 0): [[1, 2]]
// %operand@(1, 0): [[3, 4]]
// %operand@(2, 0): [[5, 6]]
// %operand@(3, 0): [[7, 8]]
%result = "stablehlo.collective_broadcast"(%operand) {
  replica_groups = dense<[[2, 1]]> : tensor<1x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor1x2xi64>) -> tensor<1x2xi64>
// %result@(0, 0): [[0, 0]]
// %result@(1, 0): [[5, 6]]
// %result@(2, 0): [[5, 6]]
// %result@(3, 0): [[0, 0]]

collective_permute

語義學

在 StableHLO 程序網格中的每個程序群組內，將從來源程序到目標程序的 operand 張量，並產生 result 張量。

這項作業會將 StableHLO 程序格線分割為 process_groups 定義如下：

如果 channel_id <= 0，則為 cross_replica(source_target_pairs)。
如果 channel_id > 0，則為 cross_partition(source_target_pairs)。

之後，result@process 是由以下公式表示：

operand@process_groups[i, 0]，如有 i process_groups[i, 1] = process。
broadcast_in_dim(constant(is_quantized(result) ? quantize(0, element_type(result)) : 0, element_type(result)), [], type(result)) 反之。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C5)。
(I2)。	`source_target_pairs`	`si64` 類型的 2D 張張量常數	(C1-C4)
(I3)。	`channel_id`	`si64` 類型的常數

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C1)。

限制

(C1) dim(source_target_pairs, 1) = 2。
(C2) is_unique(source_target_pairs[:, 0])。
(C3) is_unique(source_target_pairs[:, 1])。
(C4) 0 <= source_target_pairs < N，其中 N 定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_partition，則為 num_partitions。
(C5) type(result) = type(operand)。

範例

// num_replicas: 3
// num_partitions: 1
// %operand@(0, 0): [[1, 2], [3, 4]]
// %operand@(1, 0): [[5, 6], [7, 8]]
// %operand@(2, 0): [[9, 10], [11, 12]]
%result = "stablehlo.collective_permute"(%operand) {
  source_target_pairs = dense<[[0, 1], [1, 2]]> : tensor<2x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor<2x2xi64>) -> tensor<2x2xi64>
//
// %result@(0, 0): [[0, 0], [0, 0]]
// %result@(1, 0): [[1, 2], [3, 4]]
// %result@(2, 0): [[5, 6], [7, 8]]

更多範例

比較

語義學

執行依據lhsrhs comparison_direction 和 compare_type，並會產生 result 張量。

comparison_direction 和 compare_type 的值如下語意：

布林值和整數元素類型：

EQ：lhs = rhs。
NE：lhs != rhs。
GE：lhs >= rhs。
GT：lhs > rhs。
LE：lhs <= rhs。
LT：lhs < rhs。

針對具有 compare_type = FLOAT 的浮點元素類型，運算會實作下列 IEEE-754 作業：

EQ：compareQuietEqual。
NE：compareQuietNotEqual。
GE：compareQuietGreaterEqual。
GT：compareQuietGreater。
LE：compareQuietLessEqual。
LT：compareQuietLess。

針對包含 compare_type = TOTALORDER 的浮點元素類型，運算會結合以下項目的 totalOrder 和 compareQuietEqual 作業： IEEE-754。

如果是複雜的元素類型，(real, imag) 組合的字典式比較如下：並根據所提供的 comparison_direction 和 compare_type 執行。對複雜數字表示排序涉及出乎意料的語意因此，我們預計日後將停止支援複數數字當 comparison_direction 為 GE、GT、LE 或 LT 時 (#560)。

適用於量化類型。執行dequantize_compare(lhs, rhs, comparison_direction)。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	張量或每個張量量化張量	(C1-C3)
(I2)。	`rhs`	張量或每個張量量化張量	(C1-C2)
(I3)。	`comparison_direction`	`EQ`、`NE`、`GE`、`GT`、`LE` 和 `LT` 的列舉
(I4)。	`compare_type`	`FLOAT`、`TOTALORDER`、`SIGNED` 和 `UNSIGNED` 的列舉	(C3)。

輸出

名稱	類型	限制
`result`	布林值類型的張量	(C2)。

限制

(C1) baseline_element_type(lhs) = baseline_element_type(rhs)。
(C2) shape(lhs) = shape(rhs) = shape(result)。
(C3) compare_type 定義為：
- 如果 is_signed_integer(element_type(lhs))，則為 SIGNED。
- 如果 is_unsigned_integer(element_type(lhs)) or is_boolean(element_type(lhs))，則為 UNSIGNED。
- FLOAT 或 TOTALORDER (如果 is_float(element_type(lhs)))。
- 如果 is_complex(element_type(lhs))，則為 FLOAT。

範例

// %lhs: [1.0, 3.0]
// %rhs: [1.1, 2.9]
%result = "stablehlo.compare"(%lhs, %rhs) {
  comparison_direction = #stablehlo<comparison_direction LT>,
  compare_type = #stablehlo<comparison_type FLOAT>
} : (tensor<2xf32>, tensor<2xf32>) -> tensor<2xi1>
// %result: [true, false]

更多範例

複雜

語義學

這個外掛程式能從一對實數，執行元素轉換到複雜值，虛值 (lhs 和 rhs)，並產生 result 張量。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	`f32` 或 `f64` 類型的張量	(C1-C3)
(I2)。	`rhs`	`f32` 或 `f64` 類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	複雜類型的張量	(C2)、(C3)

限制

(C1) type(lhs) = type(rhs)。
(C2) shape(result) = shape(lhs)。
(C3) element_type(result) 採用 complex<E> 類型，其中 E = element_type(lhs)。

範例

// %lhs: [1.0, 3.0]
// %rhs: [2.0, 4.0]
%result = "stablehlo.complex"(%lhs, %rhs) : (tensor<2xf64>, tensor<2xf64>) -> tensor<2xcomplex<f64>>
// %result: [(1.0, 2.0), (3.0, 4.0)]

更多範例

複合

語義學

封裝其他 StableHLO 作業 (由其他 StableHLO 作業組成) 的作業。搭乘 inputs 和 composite_attributes，產生 results。運算的語意是透過 decomposition 屬性實作。 composite 運算可以在不變更程式的情況下替換為分解語意內嵌分解則未提供運算語意，建議使用 custom_call。

version 欄位 (預設為 0) 用來表示複合的語意異動

輸入

標籤	名稱	類型
(I1)。	`inputs`	值的變異數
(I2)。	`name`	`string` 類型的常數
(I3)。	`composite_attributes`	屬性字典
(I4)。	`decomposition`	`string` 類型的常數
(I5)。	`version`	`si32` 類型的常數

輸出

名稱	類型
`results`	值的變異數

限制

(C1) is_namespaced_op_name(name)
(C2) is_defined_in_parent_scope(decomposition)
(C3) types(inputs...) == input_types(decomposition)
(C4) types(results...) == output_types(decomposition)

範例

%results = "stablehlo.composite"(%input0, %input1) {
  name = "my_namespace.my_op",
  composite_attributes = {
    my_attribute = "my_value"
  },
  decomposition = @my_op,
  version = 1 : i32
} : (tensor<f32>, tensor<f32>) -> tensor<f32>

更多範例

串連

語義學

按照指定的順序串接與 dimension 維度的 inputs 並產生 result 張量更正式 result[i0, ..., id, ..., iR-1] = inputs[k][i0, ..., kd, ..., iR-1]，其中：

id = d0 + ... + dk-1 + kd。
d等於 dimension，且 d0 是第 d 個維度大小 (共 inputs 個)。

輸入

標籤	名稱	類型	限制
(I1)。	`inputs`	變異量或每個張量量化張量	(C1-C6)
(I2)。	`dimension`	`si64` 類型的常數	(C2)、(C4)、(C6)

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C5-C6)

限制

(C1) same(element_type(inputs...))。
(C2) same(shape(inputs...))，但 dim(inputs..., dimension) 除外。
(C3) 0 < size(inputs)。
(C4) 0 <= dimension < rank(inputs[0])。
(C5) element_type(result) = element_type(inputs[0])。
(C6) shape(result) = shape(inputs[0])，但以下情況除外：
- dim(result, dimension) = dim(inputs[0], dimension) + ...。

範例

// %input0: [[1, 2], [3, 4], [5, 6]]
// %input1: [[7, 8]]
%result = "stablehlo.concatenate"(%input0, %input1) {
  dimension = 0 : i64
} : (tensor<3x2xi64>, tensor<1x2xi64>) -> tensor<4x2xi64>
// %result: [[1, 2], [3, 4], [5, 6], [7, 8]]

更多範例

常數

語義學

從常數 value 產生 output 張量。

輸入

標籤	名稱	類型	限制
(I1)。	`value`	常數	(C1)。

輸出

名稱	類型	限制
`output`	張量或量化張量	(C1)。

限制

(C1) type(value) = type(output)。

範例

%output = "stablehlo.constant"() {
  value = dense<[[0.0, 1.0], [2.0, 3.0]]> : tensor<2x2xf32>
} : () -> tensor<2x2xf32>
// %output: [[0.0, 1.0], [2.0, 3.0]]

更多範例

完成轉換

語義學

這個外掛程式能執行從某個元素類型到另一個元素類型的轉換 operand 張量並產生 result 張量。

針對 boolean-to-any-supported-type 的轉換，false 的值是轉換成零，而 true 值會轉換成 1。適用對象 any-supported-type-to-boolean 的轉換，沒有值會轉換成 false，而非零值會轉換為 true。請參閱下文，瞭解操作方式並處理較複雜的類型

轉換包含整數至整數、整數至浮點值 或 floating-point-to-floating-point (如果來源值無法確切) 目的地類型所代表的結果值這種表示法否則，行為會待定。 (#180)。

針對涉及floating-point-to-integer的轉換，分數部分為遭到截斷。如果無法以截斷值表示目的地類型，行為是待定 (#180)。

如果轉換涉及複雜至複雜，請遵循 floating-point-to-floating-point的轉換次數，可用於轉換真實和虛構部分

對於complex-to-any-other-type的轉換，以及complex-to-any-other-type轉換，系統會忽略來源虛設值，或目的地假值為。實際發生的轉換浮點轉換

原則上，此運算可以表示去量化 (從量化張量到一般張量，量化 (從一般張量轉換張量和量化張量) 和重新量化 (在量化之間轉換張量)，但目前我們有專門的作業 - 第一個用途為 uniform_dequantize，第一個用途則為 uniform_quantize 第二種用途日後，這兩項作業可能會合併放入 convert (#1576)。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量	(C1)。

輸出

名稱	類型	限制
`result`	張量	(C1)。

限制

(C1) shape(operand) = shape(result)。

範例

// %operand: [-1, 0, 1]
%result = "stablehlo.convert"(%operand) : (tensor<3xi64>) -> tensor<3xcomplex<f64>>
// %result: [(-1.0, 0.0), (0.0, 0.0), (1.0, 0.0)]

更多範例

卷積

語義學

計算 lhs 窗口與 rhs 分塊之間並產出的點積 result。下圖顯示如何計算 result 中的元素 lhs 和 rhs 使用具體範例。

更正式，請考慮以下有關 lhs 的輸入內容的重新解析方式為了能夠表示 lhs 的視窗：

lhs_window_dimensions = lhs_shape(dim(lhs, input_batch_dimension), dim(rhs, kernel_spatial_dimensions), dim(lhs, input_feature_dimension))。
lhs_window_strides = lhs_shape(1, window_strides, 1)。
lhs_padding = lhs_shape([0, 0], padding, [0, 0])。
lhs_base_dilations = lhs_shape(1, lhs_dilation, 1)。
lhs_window_dilations = lhs_shape(1, rhs_dilation, 1)。

這項修正會使用下列輔助函式：

lhs_shape(n, hw, c) = permute([n] + hw + [c], [input_batch_dimension] + input_spatial_dimensions + [input_feature_dimension])。
result_shape(n1, hw, c1) = permute([n1] + hw + [c1], [output_batch_dimension] + output_spatial_dimensions + [output_feature_dimension])。
permute([j0, j1, ..., jR-1], permutation) = [i0, i1, ..., iR-1]，其中 j[d] = i[permutation[d]]。

如果 feature_group_count = 1 和 batch_group_count = 1，則針對所有 index_space(dim(result, output_spatial_dimensions...))的output_spatial_index， result[result_shape(:, output_spatial_index, :)] = dot_product，其中：

padding_value = constant(0, element_type(lhs))。
padded_lhs = pad(lhs, padding_value, lhs_padding[:, 0], lhs_padding[:, 1], lhs_base_dilations - 1)。
lhs_window_start = lhs_shape(0, output_spatial_index, 0) * lhs_window_strides。
lhs_window = slice(padded_lhs, lhs_window_start, lhs_window_start + lhs_window_dimensions, lhs_window_dilations)。
reversed_lhs_window = reverse(lhs_window, [input_spatial_dimensions[dim] for dim in range(size(window_reversal)) if window_reversal[dim] = true])。這項功能似乎未使用，因此我們計劃日後移除 (#1181)。
dot_product = dot_general(reversed_lhs_window, rhs, lhs_batching_dimensions=[], lhs_contracting_dimensions=input_spatial_dimensions + [input_feature_dimension], rhs_batching_dimensions=[], rhs_contracting_dimensions=kernel_spatial_dimensions + [kernel_input_feature_dimension])。

如果為 feature_group_count > 1：

lhses = split(lhs, feature_group_count, input_feature_dimension)。
rhses = split(rhs, feature_group_count, kernel_output_feature_dimension)。
results... = convolution(lhses..., rhses..., ..., feature_group_count=1, ...)。
result = concatenate(results, output_feature_dimension)。

如果為 batch_group_count > 1：

lhses = split(lhs, batch_group_count, input_batch_dimension)。
rhses = split(rhs, batch_group_count, kernel_output_feature_dimension)。
results... = convolution(lhses..., rhses..., ..., batch_group_count=1, ...)。
result = concatenate(results, output_feature_dimension)。

如果是量化類型，請執行 dequantize_op_quantize( lambda lhs, rhs: convolution(lhs, rhs, window_strides, padding, lhs_dilation, rhs_dilation, window_reversal, input_batch_dimension, input_feature_dimension, input_spatial_dimensions, kernel_input_feature_dimension, kernel_output_feature_dimension, kernel_spatial_dimensions, output_batch_dimension, output_feature_dimension, output_spatial_dimensions, feature_group_count, batch_group_count, precision_config), lhs, rhs, type(result))。

如為混合量化類型，請執行 hybrid_dequantize_then_op( lambda lhs, rhs: convolution(lhs, rhs, window_strides, padding, lhs_dilation, rhs_dilation, window_reversal, input_batch_dimension, input_feature_dimension, input_spatial_dimensions, kernel_input_feature_dimension, kernel_output_feature_dimension, kernel_spatial_dimensions, output_batch_dimension, output_feature_dimension, output_spatial_dimensions, feature_group_count, batch_group_count, precision_config), lhs, rhs)。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	張量或每個張量量化張量	(C1)、(C10-C11)、(C14) (C25)、(C27-C28)、(C31-C32)、(C34)
(I2)。	`rhs`	張量或量化張量	(C1)、(C14-C16)、(C25)、(C27-C29)、(C31-C34)
(I3)。	`window_strides`	`si64` 類型的 1D 張量常數	(C2-C3)、(C25)
(I4)。	`padding`	`si64` 類型的 2D 張張量常數	(C4)、(C25)
(I5)。	`lhs_dilation`	`si64` 類型的 1D 張量常數	(C5-C6)、(C25)
(I6)。	`rhs_dilation`	`si64` 類型的 1D 張量常數	(C7-C8)、(C25)
(I7)。	`window_reversal`	`i1` 類型的 1D 張量常數	(C9)。
(I8)。	`input_batch_dimension`	`si64` 類型的常數	(C10)、(C13)、(C25)
(I9)。	`input_feature_dimension`	`si64` 類型的常數	(C11)、(C13-C14)
(I10)。	`input_spatial_dimensions`	`si64` 類型的 1D 張量常數	(C12)、(C13)、(C25)
(I11)。	`kernel_input_feature_dimension`	`si64` 類型的常數	(C14)、(C18)
(I12)。	`kernel_output_feature_dimension`	`si64` 類型的常數	(C15-C16)、(C18)、(C25)、(C29)
(I13)。	`kernel_spatial_dimensions`	`si64` 類型的 1D 張量常數	(C17-C18)、(C25)
(I14)。	`output_batch_dimension`	`si64` 類型的常數	(C20)、(C25)
(I15)。	`output_feature_dimension`	`si64` 類型的常數	(C20)、(C25)、(C30)
(I16)。	`output_spatial_dimensions`	`si64` 類型的 1D 張量常數	(C19-C20)、(C25)
(I17)。	`feature_group_count`	`si64` 類型的常數	(C11)、(C14)、(C16)、(C21)、(C23)
(I18)。	`batch_group_count`	`si64` 類型的常數	(C10)、(C15)、(C22)、(C23)、(C25)
(I19)。	`precision_config`	`DEFAULT`、`HIGH` 和 `HIGHEST` 列舉的變數數量	(C24)。

輸出

名稱	類型	限制
`result`	張量或量化張量	(C25-C28)、(C30)、(C32-34)

限制

(C1) N = rank(lhs) = rank(rhs)。
(C2) size(window_strides) = N - 2。
(C3) 0 < window_strides。
(C4) shape(padding) = [N - 2, 2]。
(C5) size(lhs_dilation) = N - 2。
(C6) 0 < lhs_dilation。
(C7) size(rhs_dilation) = N - 2。
(C8) 0 < rhs_dilation。
(C9) size(window_reversal) = N - 2。
(C10) dim(lhs, input_batch_dimension) % batch_group_count = 0。
(C11) dim(lhs, input_feature_dimension) % feature_group_count = 0。
(C12) size(input_spatial_dimensions) = N - 2。
(C13) 指定 input_dimensions = [input_batch_dimension] + input_spatial_dimensions + [input_feature_dimension]：
- is_unique(input_dimensions)。
- 0 <= input_dimensions < N。
(C14) dim(rhs, kernel_input_feature_dimension) = dim(lhs, input_feature_dimension) / feature_group_count。
(C15) dim(rhs, kernel_output_feature_dimension) % batch_group_count = 0。
(C16) dim(rhs, kernel_output_feature_dimension) % feature_group_count = 0。
(C17) size(kernel_spatial_dimensions) = N - 2。
(C18) 指定 kernel_dimensions = kernel_spatial_dimensions + [kernel_input_feature_dimension] + [kernel_output_feature_dimension]：
- is_unique(kernel_dimensions)。
- 0 <= kernel_dimensions < N。
(C19) size(output_spatial_dimensions) = N - 2。
(C20) 指定 output_dimensions = [output_batch_dimension] + output_spatial_dimensions + [output_feature_dimension] 時：
- is_unique(output_dimensions)。
- 0 <= output_dimensions < N。
(C21) 0 < feature_group_count。
(C22) 0 < batch_group_count。
(C23) feature_group_count = 1 or batch_group_count = 1。
(C24) size(precision_config) = 2。
(C25) dim(result, result_dim) 定義為：
- 如果 result_dim = output_batch_dimension，則為 dim(lhs, input_batch_dimension) / batch_group_count。
- 如果 result_dim = output_feature_dimension，則為 dim(rhs, kernel_output_feature_dimension)。
- 其他結果則是 num_windows，其中：
- output_spatial_dimensions[spatial_dim] = result_dim。
- lhs_dim = input_spatial_dimensions[spatial_dim]。
- rhs_dim = kernel_spatial_dimensions[spatial_dim]。
- dilated_input_shape[lhs_dim] = dim(lhs, lhs_dim) = 0 ? 0 : (dim(lhs, lhs_dim) - 1) * lhs_dilation[spatial_dim] + 1。
- padded_input_shape[lhs_dim] = padding[spatial_dim, 0] + dilated_input_shape[lhs_dim] + padding[spatial_dim, 1]。
- dilated_window_shape[lhs_dim] = dim(rhs, rhs_dim) = 0 ? 0 : (dim(rhs, rhs_dim) - 1) * rhs_dilation[spatial_dim] + 1。
- is_empty_window[lhs_dim] = padded_input_shape[lhs_dim] = 0 || dilated_window_shape[lhs_dim] > padded_input_shape[lhs_dim]。
- num_windows = is_empty_window[lhs_dim] ? 0 : floor((padded_input_shape[lhs_dim] - dilated_window_shape[lhs_dim]) / window_strides[spatial_dim]) + 1。
(C26) rank(result) = N。
如果作業使用非量化張量：
- (C27) element_type(lhs) = element_type(rhs) = element_type(result)。
如果作業使用量化張量：
- (C28) is_quantized(lhs) = is_quantized(result) and is_quantized(rhs)。
- (C29) 如果 is_per_axis_quantized(rhs)，然後quantization_dimension(rhs) = kernel_output_feature_dimension。
- (C30) 如果值為 is_per_axis_quantized(result)，則 quantization_dimension(result) = output_feature_dimension。
- 如果為 is_quantized(lhs)：
- (C31) storage_type(lhs) = storage_type(rhs)。
- (C32) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C33) 如果值為 is_per_tensor_quantized(rhs)，則 is_per_tensor_quantized(result)。
- 如果為 !is_quantized(lhs)：
- (C34) element_type(lhs) = expressed_type(rhs) = element_type(result)。

範例

// %lhs: [[
//        [
//          [1], [2], [5], [6]
//        ],
//        [
//          [3], [4], [7], [8]
//        ],
//        [
//          [10], [11], [14], [15]
//        ],
//        [
//          [12], [13], [16], [17]
//        ]
//      ]]
//
// %rhs: [
//        [[[1]], [[1]], [[1]]],
//        [[[1]], [[1]], [[1]]],
//        [[[1]], [[1]], [[1]]]
//       ]
%result = "stablehlo.convolution"(%lhs, %rhs) {
  window_strides = array<i64: 4, 4>,
  padding = dense<0> : tensor<2x2xi64>,
  lhs_dilation = array<i64: 2, 2>,
  rhs_dilation = array<i64: 1, 1>,
  window_reversal = array<i1: false, false>,
  // In the StableHLO dialect, dimension numbers are encoded via:
  // `[<input dimensions>]x[<kernel dimensions>]->[output dimensions]`.
  // "b" is batch dimension, "f" is feature dimension,
  // "i" is input feature dimension, "o" is output feature dimension,
  // "0/1/etc" are spatial dimensions.
  dimension_numbers = #stablehlo.conv<[b, 0, 1, f]x[0, 1, i, o]->[b, 0, 1, f]>,
  batch_group_count = 1 : i64,
  feature_group_count = 1 : i64,
  precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>]
} : (tensor<1x4x4x1xi64>, tensor<3x3x1x1xi64>) -> tensor<1x2x2x1xi64>
// %result: [[
//            [[10], [26]],
//            [[46], [62]]
//          ]]

更多範例

餘弦

語義學

對 operand 張執行元素的餘弦運算，並產生 result 張量。根據元素類型執行以下操作：

浮點值：IEEE-754 的 cos。
適用於複數：複數。
量化類型：dequantize_op_quantize(cosine, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [
//            [0.0, 1.57079632],       // [0, pi/2]
//            [3.14159265, 4.71238898] // [pi, 3pi/2]
//           ]
%result = "stablehlo.cosine"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[1.0, 0.0], [-1.0, 0.0]]

更多範例

count_leading_zeros

語義學

根據元素計算 operand 中的開頭零位元數並產生 result 張量

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	整數類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	整數類型的張量	(C1)。

限制

(C1) type(operand) = type(result)。

範例

// %operand: [[0, 1], [128, -1]]
%result = "stablehlo.count_leading_zeros"(%operand) : (tensor<2x2xi64>) -> tensor<2x2xi64>
// %result: [[64, 63], [56, 0]]

更多範例

custom_call

語義學

封裝實作定義的作業 call_target_name，其中包含 inputs 和 called_computations 並產生 results。has_side_effect, backend_config 和 api_version 可能會用於提供額外福利實作定義的中繼資料

目前，這項作業包含一組非常組織的中繼資料，反映其對應作業在 XLA 編譯器我們預計在日後整合這項中繼資料 (#741)。

輸入

標籤	名稱	類型
(I1)。	`inputs`	值的變異數
(I2)。	`call_target_name`	`string` 類型的常數
(I3)。	`has_side_effect`	`i1` 類型的常數
(I4)。	`backend_config`	`string` 類型的常數或屬性字典
(I5)。	`api_version`	`si32` 類型的常數
(I6)。	`called_computations`	`string` 類型的變數數

輸出

名稱	類型
`results`	值的變異數

範例

%results = "stablehlo.custom_call"(%input0) {
  call_target_name = "foo",
  has_side_effect = false,
  backend_config = {bar = 42 : i32},
  api_version = 4 : i32,
  called_computations = [@foo]
} : (tensor<f64>) -> tensor<f64>

除號

語義學

執行被除數 lhs 和除數 rhs 張元素的元素除數，以及會產生 result 張量。根據元素類型執行以下操作：

整數：產生代數商與任意並捨棄小數部分
浮點值：IEEE-754 的 division。
適用於複數：複雜的除法。
量化類型：
- dequantize_op_quantize(divide, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	整數、浮點數或複雜型別或每個張量的量化張量	(C1)。
(I2)。	`rhs`	整數、浮點數或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [17.1, -17.1, 17.1, -17.1]
// %rhs: [3.0, 3.0, -3.0, -3.0]
%result = "stablehlo.divide"(%lhs, %rhs) : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>
// %result: [5.66666651, -5.66666651, -5.66666651, 5.66666651]

更多範例

dot_general

語義學

在 lhs 和 rhs 切片之間計算內點積， result 張量。

更正式的 result[result_index] = dot_product，其中：

lhs_result_dimensions = [d for d in axes(lhs) and d not in lhs_batching_dimensions and d not in lhs_contracting_dimensions]。
rhs_result_dimensions = [d for d in axes(rhs) and d not in rhs_batching_dimensions and d not in rhs_contracting_dimensions]。
result_batching_index + result_lhs_index + result_rhs_index = result_index 其中 size(result_batching_index) = size(lhs_batching_dimensions)、 size(result_lhs_index) = size(lhs_result_dimensions)和 size(result_rhs_index) = size(rhs_result_dimensions)。
transposed_lhs = transpose(lhs, lhs_batching_dimensions + lhs_result_dimensions + lhs_contracting_dimensions)。
transposed_lhs_slice = slice(transposed_lhs, result_batching_index + result_lhs_index + [:, ..., :])。
reshaped_lhs_slice = reshape(transposed_lhs_slice, dims(lhs, lhs_contracting_dimensions))。
transposed_rhs = transpose(rhs, rhs_batching_dimensions + rhs_result_dimensions + rhs_contracting_dimensions)。
transposed_rhs_slice = slice(transposed_rhs, result_batching_index + result_rhs_index + [:, ..., :])。
reshaped_rhs_slice = reshape(transposed_rhs_slice, dims(rhs, rhs_contracting_dimensions))。
dot_product = reduce( inputs=[multiply(reshaped_lhs_slice, reshaped_rhs_slice)], init_values=[constant(0, element_type(result))], dimensions=range(size(lhs_contracting_dimensions)), body=lambda x, y: add(x, y))。

如果是量化類型，請執行 dequantize_op_quantize( lambda lhs, rhs: dot_general(lhs, rhs, lhs_batching_dimensions, rhs_batching_dimensions, lhs_contracting_dimensions, rhs_contracting_dimensions, precision_config), lhs, rhs, type(result))。

如為混合量化類型，請執行 hybrid_dequantize_then_op( lambda lhs, rhs: dot_general(lhs, rhs, lhs_batching_dimensions, rhs_batching_dimensions, lhs_contracting_dimensions, rhs_contracting_dimensions, precision_config), lhs, rhs)。

precision_config 會控制以下項目的速度和準確率的取捨計算加速器後端運算作業可以是下列任一值 ( 我們無法明確指出這些列舉值的語意 #755)：

DEFAULT：計算速度最快，但預估結果的準確度最低。
HIGH：計算速度較慢，但接近。
HIGHEST：計算速度最低，但最準確的近似值。

DotAlgorithm 定義用於實作的演算法主要屬性也會定義精確度。如果演算法屬性欄位，則 precision_config 必須是 DEFAULT。DotAlgorithms 沒有預設值，因為預設的參數為實作因此，所有點號演算法欄位可以設為 None，以指定空白點演算法，因此會改用 precision_config 值。

DotAlgorithm 欄位包括：

lhs_precision_type 和 rhs_precision_type，LHS 和運算的 RHS 會四捨五入為。精確度類型與儲存空間類型和輸出內容
accumulation_type 用於累計的精確度。
「lhs_component_count」、「rhs_component_count」和「num_primitive_operations」我們會將 LHS 和/或 RHS 分解為演算法同時執行多個「原始」在這些 Pod 中值 - 通常為了模擬較高的精確度 (例如運用 bfloat16 人工智慧資料類型來進行高精確度運算： bf16_6x tf32_3x 等)。對於沒有分解的演算法，這些值應設為 1。
allow_imprecise_accumulation，用於指定是否以較低精確度進行累計允許執行某些步驟 (例如：CUBLASLT_MATMUL_DESC_FAST_ACCUM)。

DotAlgorithm 屬性範例：

// Inputs are casted to tf32, and then accumulated in f32:
{lhs_precision_type = tf32,
 rhs_precision_type = tf32,
 accumulation_type = f32,
 lhs_component_count = 1,
 rhs_component_count = 1,
 num_primitive_operations = 1,
 allow_imprecise_accumulation = false}


// bf16_6x: each input is decomposed to 3 bf16 components, then 6 dot operations are done on those components, and the result is accumulated in f32.
{lhs_precision_type = bf16,
 rhs_precision_type = bf16,
 accumulation_type = f32,
 lhs_component_count = 3,
 rhs_component_count = 3,
 num_primitive_operations = 6,
 allow_imprecise_accumulation = false}


// Inputs are (casted to) f8e5m2, and we accumulate in f32, but for some steps we may accumulate in lower precision.
{lhs_precision_type = f8e5m2,
 rhs_precision_type = f8e5m2,
 accumulation_type = f32,
 lhs_component_count = 1,
 rhs_component_count = 1,
 num_primitive_operations = 1,
 allow_imprecise_accumulation = true}

系統支援的組合完全取決於導入方式。於一般而言，無法保證每種演算法都能支援 StableHLO 消費者的加速器類型。如果指定的演算法錯誤，而不是復原為。StableHLO 驗證會提供最適當的驗證避免任何硬體並不支援的演算法。

查看「xla_data.proto > Algorithm」部分支援的演算法值支援單 2483 需要製定計畫是以後端為基礎的支援演算法集中查閱文件。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	張量或每個張量量化張量	(C5-C6)、(C9-C10)、(C12-C14)、(C17-C18)、(C20)
(I2)。	`rhs`	張量或量化張量	(C7-C10)、(C12-C20)
(I3)。	`lhs_batching_dimensions`	`si64` 類型的 1D 張量常數	(C1)、(C3)、(C5)、(C9)、(C12)
(I4)。	`rhs_batching_dimensions`	`si64` 類型的 1D 張量常數	(C1)、(C4)、(C7)、(C9)
(I5)。	`lhs_contracting_dimensions`	`si64` 類型的 1D 張量常數	(C2)、(C3)、(C6)、(C10)
(I6)。	`rhs_contracting_dimensions`	`si64` 類型的 1D 張量常數	(C2)、(C4)、(C8)、(C10)、(C16)
(I7)。	`precision_config`	`DEFAULT`、`HIGH` 和 `HIGHEST` 列舉的變數數量	(C11)、(C21)
(I8)。	`lhs_precision_type`	FloatType 或 TensorFloat32	(C21)。
(I9)。	`rhs_precision_type`	FloatType 或 TensorFloat32	(C21)。
(I10)。	`accumulation_type`	FloatType 或 TensorFloat32	(C21)。
(I11)。	`lhs_component_count`	`si32` 類型的常數	(C21)、(C22)
(I12)。	`rhs_component_count`	`si32` 類型的常數	(C21)、(C23)
(I13)。	`num_primitive_operations`	`si32` 類型的常數	(C21)、(C24)
(I14)。	`allow_imprecise_accumulation`	`bool` 類型的常數	(C21)。

輸出

名稱	類型	限制
`result`	張量或量化張量	(C12)、(C14)、(C18-C20)

限制

(C1) size(lhs_batching_dimensions) = size(rhs_batching_dimensions)。
(C2) size(lhs_contracting_dimensions) = size(rhs_contracting_dimensions)。
(C3) is_unique(lhs_batching_dimensions + lhs_contracting_dimensions)。
(C4) is_unique(rhs_batching_dimensions + rhs_contracting_dimensions)。
(C5) 0 <= lhs_batching_dimensions < rank(lhs)。
(C6) 0 <= lhs_contracting_dimensions < rank(lhs)。
(C7) 0 <= rhs_batching_dimensions < rank(rhs)。
(C8) 0 <= rhs_contracting_dimensions < rank(rhs)。
(C9) dim(lhs, lhs_batching_dimensions...) = dim(rhs, rhs_batching_dimensions...)。
(C10) dim(lhs, lhs_contracting_dimensions...) = dim(rhs, rhs_contracting_dimensions...)。
(C11) size(precision_config) = 2。
(C12) shape(result) = dim(lhs, lhs_batching_dimensions) + dim(lhs, lhs_result_dimensions) + dim(rhs, rhs_result_dimensions)。
如果作業使用非量化張量：
- (C13) element_type(lhs) = element_type(rhs)。
如果作業使用量化張量：
- (C14) is_quantized(lhs) = is_quantized(result) and is_quantized(rhs)。
- (C15) zero_points(rhs) = 0。
- (C16) 如果值為 is_per_axis_quantized(rhs)，則 quantization_dimension(rhs)不在rhs_contracting_dimensions中。
- 如果為 is_quantized(lhs)：
- (C17) storage_type(lhs) = storage_type(rhs)。
- (C18) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C19) 如果值為 is_per_tensor_quantized(rhs)，則 is_per_tensor_quantized(result)。
- 如果為 !is_quantized(lhs)：
- (C20) element_type(lhs) = expressed_type(rhs) = element_type(result)。
如果為 !is_empty_algorithm(lhs_precision_type, rhs_precision_type, accumulation_type, lhs_component_count, rhs_component_count, num_primitive_operations allow_imprecise_accumulation)：
- (C21) precision_config... = DEFAULT。
- (C22) 0 < lhs_component_count。
- (C23) 0 < rhs_component_count。
- (C24) 0 < num_primitive_operations。

範例

// %lhs: [
//        [[1, 2],
//         [3, 4]],
//        [[5, 6],
//         [7, 8]]
//       ]
// %rhs: [
//        [[1, 0],
//         [0, 1]],
//        [[1, 0],
//         [0, 1]]
//       ]
%result = "stablehlo.dot_general"(%lhs, %rhs) {
  dot_dimension_numbers = #stablehlo.dot<
    lhs_batching_dimensions = [0],
    rhs_batching_dimensions = [0],
    lhs_contracting_dimensions = [2],
    rhs_contracting_dimensions = [1]
  >,
  precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>],
  algorithm = #stablehlo.dot_algorithm<
    lhs_precision_type = tf32,
    rhs_precision_type = tf32,
    accumulation_type = f32,
    lhs_component_count = 1,
    rhs_component_count = 1,
    num_primitive_operations = 1,
    allow_imprecise_accumulation = false
  >
} : (tensor<2x2x2xi64>, tensor<2x2x2xi64>) -> tensor<2x2x2xi64>
// %result: [
//           [[1, 2],
//            [3, 4]],
//           [[5, 6],
//            [7, 8]]
//          ]

更多範例

dynamic_broadcast_in_dim

語義學

這項作業的功能與 broadcast_in_dim 運算，但結果形狀是透過 output_dimensions 動態指定。

這項作業也接受選用屬性 known_expanding_dimensions、known_non_expanding_dimensions 表達有關維度展開行為的靜態知識。如果沒有指定，系統會假設所有維度都可能會擴大。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或量化張量	(C1-C2)、(C5-C6)、(C9)
(I2)。	`output_dimensions`	整數類型的 1 維張量	(C7)。
(I3)。	`broadcast_dimensions`	整數類型的 1D 常數張量	(C2-C6)
(I4)。	`known_expanding_dimensions`	整數類型的 1D 常數張量	(C8-C9)
(I5)。	`known_non_expanding_dimensions`	整數類型的 1D 常數張量	(C8-C9)

輸出

名稱	類型	限制
`result`	張量或量化張量	(C1)、(C3)、(C5-C7)

限制

(C1) element_type(result) 的提供者：
- 如果 !is_per_axis_quantized(operand)，則為 element_type(operand)。
- element_type(operand)，但quantization_dimension(operand)， scales(operand)，zero_points(operand)可能與 quantization_dimension(result)、scales(result) 和 zero_points(result) 否則會怎麼樣
(C2) size(broadcast_dimensions) = rank(operand)。
(C3) 0 <= broadcast_dimensions < rank(result)。
(C4) is_unique(broadcast_dimensions)。
(C5) 對於 axes(operand) 中的所有 d：
- dim(operand, d) = 1或
- dim(operand, d) = dim(result, broadcast_dimensions[d])。
(C6) 如果 is_per_axis_quantized(result)：
- quantization_dimension(result) = broadcast_dimensions[quantization_dimension(operand)]。
- 如果值為 dim(operand, quantization_dimension(operand)) = 1，則 scales(result)[i] = scales(operand)[0] and zero_points(result)[i] = zero_points(operand)[0] for i in range(dim(result, quantization_dimension(result)))。
(C7) size(output_dimensions) = rank(result)。
(C8) is_unique(known_expanding_dimensions + known_non_expanding_dimensions)。
(C9) 0 <= known_expanding_dimensions < rank(operand)。
(C10) 0 <= known_non_expanding_dimensions < rank(operand)。

範例

// %operand: [
//            [1, 2, 3]
//           ]
%operand = stablehlo.constant dense<[[1, 2, 3]]> : tensor<1x3xi64>
%output_dimensions = stablehlo.constant dense<[2, 3, 2]> : tensor<3xi64>
%result = "stablehlo.dynamic_broadcast_in_dim"(%operand, %output_dimensions) {
  broadcast_dimensions = array<i64: 2, 1>,
  known_expanding_dimensions = array<i64: 0>,
  known_non_expanding_dimensions = array<i64: 1>
} : (tensor<1x3xi64>, tensor<3xi64>) -> tensor<2x3x2xi64>
// %result: [
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ],
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ]
//          ]

更多範例

dynamic_conv

語義學

這項作業的功能與卷積運算，但邊框間距是透過 padding 動態指定。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	張量或每個張量量化張量	(C1)、(C10-C11)、(C14) (C25)、(C26-C27)、(C30-C31)、(C33)
(I2)。	`rhs`	張量或量化張量	(C1)、(C14-C16)、(C26-C28)、(C30-C33)
(I3)。	`padding`	整數類型的 2D 張量	(C4)。
(I4)。	`window_strides`	`si64` 類型的 1D 張量常數	(C2-C3)
(I5)。	`lhs_dilation`	`si64` 類型的 1D 張量常數	(C5-C6)
(I6)。	`rhs_dilation`	`si64` 類型的 1D 張量常數	(C7-C8)
(I7)。	`window_reversal`	`i1` 類型的 1D 張量常數	(C9)。
(I8)。	`input_batch_dimension`	`si64` 類型的常數	(C10)、(C13)
(I9)。	`input_feature_dimension`	`si64` 類型的常數	(C11)、(C13-C14)
(I10)。	`input_spatial_dimensions`	`si64` 類型的 1D 張量常數	(C12)、(C13)
(I11)。	`kernel_input_feature_dimension`	`si64` 類型的常數	(C14)、(C18)
(I12)。	`kernel_output_feature_dimension`	`si64` 類型的常數	(C15-C16)、(C18)、(C28)
(I13)。	`kernel_spatial_dimensions`	`si64` 類型的 1D 張量常數	(C17-C18)
(I14)。	`output_batch_dimension`	`si64` 類型的常數	(C20)。
(I15)。	`output_feature_dimension`	`si64` 類型的常數	(C20)、(C29)
(I16)。	`output_spatial_dimensions`	`si64` 類型的 1D 張量常數	(C19-C20)
(I17)。	`feature_group_count`	`si64` 類型的常數	(C11)、(C14)、(C16)、(C21)、(C23)
(I18)。	`batch_group_count`	`si64` 類型的常數	(C10)、(C15)、(C22)、(C23)
(I19)。	`precision_config`	`DEFAULT`、`HIGH` 和 `HIGHEST` 列舉的變數數量	(C24)。

輸出

名稱	類型	限制
`result`	張量或量化張量	(C25-C27)、(C29)、(C31-C33)

限制

(C1) N = rank(lhs) = rank(rhs)。
(C2) size(window_strides) = N - 2。
(C3) 0 < window_strides。
(C4) shape(padding) = [N - 2, 2]。
(C5) size(lhs_dilation) = N - 2。
(C6) 0 < lhs_dilation。
(C7) size(rhs_dilation) = N - 2。
(C8) 0 < rhs_dilation。
(C9) size(window_reversal) = N - 2。
(C10) dim(lhs, input_batch_dimension) % batch_group_count = 0。
(C11) dim(lhs, input_feature_dimension) % feature_group_count = 0。
(C12) size(input_spatial_dimensions) = N - 2。
(C13) 指定 input_dimensions = [input_batch_dimension] + input_spatial_dimensions + [input_feature_dimension]：
- is_unique(input_dimensions)。
- 0 <= input_dimensions < N。
(C14) dim(rhs, kernel_input_feature_dimension) = dim(lhs, input_feature_dimension) / feature_group_count。
(C15) dim(rhs, kernel_output_feature_dimension) % batch_group_count = 0。
(C16) dim(rhs, kernel_output_feature_dimension) % feature_group_count = 0。
(C17) size(kernel_spatial_dimensions) = N - 2。
(C18) 指定 kernel_dimensions = kernel_spatial_dimensions + [kernel_input_feature_dimension] + [kernel_output_feature_dimension]：
- is_unique(kernel_dimensions)。
- 0 <= kernel_dimensions < N。
(C19) size(output_spatial_dimensions) = N - 2。
(C20) 指定 output_dimensions = [output_batch_dimension] + output_spatial_dimensions + [output_feature_dimension] 時：
- is_unique(output_dimensions)。
- 0 <= output_dimensions < N。
(C21) 0 < feature_group_count。
(C22) 0 < batch_group_count。
(C23) feature_group_count = 1 or batch_group_count = 1。
(C24) size(precision_config) = 2。
(C25) dim(result, result_dim) 定義為：
- 如果 result_dim = output_batch_dimension，則為 dim(lhs, input_batch_dimension) / batch_group_count。
- 如果 result_dim = output_feature_dimension，則為 dim(rhs, kernel_output_feature_dimension)。
- 其他結果則是 num_windows，其中：
- output_spatial_dimensions[spatial_dim] = result_dim。
- lhs_dim = input_spatial_dimensions[spatial_dim]。
- rhs_dim = kernel_spatial_dimensions[spatial_dim]。
- dilated_input_shape[lhs_dim] = dim(lhs, lhs_dim) = 0 ? 0 : (dim(lhs, lhs_dim) - 1) * lhs_dilation[spatial_dim] + 1。
- padded_input_shape[lhs_dim] = padding[spatial_dim, 0] + dilated_input_shape[lhs_dim] + padding[spatial_dim, 1]。
- dilated_window_shape[lhs_dim] = dim(rhs, rhs_dim) = 0 ? 0 : (dim(rhs, rhs_dim) - 1) * rhs_dilation[spatial_dim] + 1。
- is_empty_window[lhs_dim] = padded_input_shape[lhs_dim] = 0 || dilated_window_shape[lhs_dim] > padded_input_shape[lhs_dim]。
- num_windows = is_empty_window[lhs_dim] ? 0 : floor((padded_input_shape[lhs_dim] - dilated_window_shape[lhs_dim]) / window_strides[spatial_dim]) + 1。
(C26) rank(result) = N。
如果作業使用非量化張量：
- (C27) element_type(lhs) = element_type(rhs) = element_type(result)。
如果作業使用量化張量：
- (C28) is_quantized(lhs) = is_quantized(result) and is_quantized(rhs)。
- (C29) 如果 is_per_axis_quantized(rhs)，然後quantization_dimension(rhs) = kernel_output_feature_dimension。
- (C30) 如果值為 is_per_axis_quantized(result)，則 quantization_dimension(result) = output_feature_dimension。
- 如果為 is_quantized(lhs)：
- (C31) storage_type(lhs) = storage_type(rhs)。
- (C32) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C33) 如果值為 is_per_tensor_quantized(rhs)，則 is_per_tensor_quantized(result)。
- 如果為 !is_quantized(lhs)：
- (C34) element_type(lhs) = expressed_type(rhs) = element_type(result)。

範例

// %lhs: [[
//        [[1], [2], [5], [6]],
//        [[3], [4], [7], [8]],
//        [[10], [11], [14], [15]],
//        [[12], [13], [16], [17]]
//      ]]
//
// %rhs: [
//         [[[1]], [[1]], [[1]]],
//         [[[1]], [[1]], [[1]]],
//         [[[1]], [[1]], [[1]]]
//        ]
// %padding: [[1, 1],
//            [1, 1]]
%result = "stablehlo.dynamic_conv"(%lhs, %rhs, %padding) {
  window_strides = array<i64: 4, 4>,
  lhs_dilation = array<i64: 2, 2>,
  rhs_dilation = array<i64: 1, 1>,
  window_reversal = array<i1: false, false>,
  dimension_numbers = #stablehlo.conv<raw
    input_batch_dimension = 0,
    input_feature_dimension = 3,
    input_spatial_dimensions = [0, 1],
    kernel_input_feature_dimension = 2,
    kernel_output_feature_dimension = 3,
    kernel_spatial_dimensions = [0, 1],
    output_batch_dimension = 0,
    output_feature_dimension = 3,
    output_spatial_dimensions = [1, 2]
  >,
  feature_group_count = 1 : i64,
  batch_group_count = 1 : i64,
  precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>]
} : (tensor<1x4x4x1xi64>, tensor<3x3x1x1xi64>, tensor<2x2xi64>) -> tensor<1x2x2x1xi64>
// %result: [[
//            [[1], [5]],
//            [[10], [14]]
//          ]]

更多範例

dynamic_gather

語義學

這項作業的功能與收集運算，使用動態指定 slice_sizes 為值。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1)、(C7)、(C10-C12)、(C14)
(I2)。	`start_indices`	整數類型的張量	(C2)、(C3)、(C13)
(I3)。	`slice_sizes`	整數類型的 1 維張量	(C8)、(C11-C13)
(I4)。	`offset_dims`	`si64` 類型的 1D 張量常數	(C1)、(C4-C5)、(C13)
(I5)。	`collapsed_slice_dims`	`si64` 類型的 1D 張量常數	(C1)、(C6-C8)、(C13)
(I6)。	`start_index_map`	`si64` 類型的 1D 張量常數	(C3)、(C9)、(C10)
(I7)。	`index_vector_dim`	`si64` 類型的常數	(C2)、(C3)、(C13)
(I8)。	`indices_are_sorted`	`i1` 類型的常數

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C5)、(C13-C14)

限制

(C1) rank(operand) = size(offset_dims) + size(collapsed_slice_dims)。
(C2) 0 <= index_vector_dim <= rank(start_indices)。
(C3) size(start_index_map) = index_vector_dim < rank(start_indices) ? dim(start_indices, index_vector_dim) : 1。
(C4) is_unique(offset_dims) and is_sorted(offset_dims)。
(C5) 0 <= offset_dims < rank(result)。
(C6) is_unique(collapsed_slice_dims) and is_sorted(collapsed_slice_dims)。
(C7) 0 <= collapsed_slice_dims < rank(operand)。
(C8) slice_sizes[collapsed_slice_dims...] <= 1。
(C9) is_unique(start_index_map)。
(C10) 0 <= start_index_map < rank(operand)。
(C11) size(slice_sizes) = rank(operand)。
(C12) 0 <= slice_sizes <= shape(operand)。
(C13) shape(result) = combine(batch_dim_sizes, offset_dim_sizes)，其中：
- batch_dim_sizes = shape(start_indices)，但尺寸大小除外與 index_vector_dim 對應的 start_indices 未納入。
- offset_dim_sizes = shape(slice_sizes)，但尺寸大小除外不包含與 collapsed_slice_dims 對應的 slice_sizes。
- combine 會將 batch_dim_sizes 放在對應 batch_dims 和對應於 offset_dims 的軸 offset_dim_sizes。
(C14) element_type(operand) = element_type(result)。

範例

// %operand: [
//            [[1, 2], [3, 4], [5, 6], [7, 8]],
//            [[9, 10],[11, 12], [13, 14], [15, 16]],
//            [[17, 18], [19, 20], [21, 22], [23, 24]]
//           ]
// %start_indices: [
//                  [[0, 0], [1, 0], [2, 1]],
//                  [[0, 1], [1, 1], [0, 2]]
//                 ]
// %slize_sizes: [1, 2, 2]
%result = "stablehlo.dynamic_gather"(%operand, %start_indices, %slize_sizes) {
  dimension_numbers = #stablehlo.gather<
    offset_dims = [2, 3],
    collapsed_slice_dims = [0],
    start_index_map = [1, 0],
    index_vector_dim = 2>,
  indices_are_sorted = false
} : (tensor<3x4x2xi64>, tensor<2x3x2xi64>, tensor<3xi64>) -> tensor<2x3x2x2xi64>
// %result: [
//            [
//              [[1, 2], [3, 4]],
//              [[3, 4], [5, 6]],
//              [[13, 14], [15, 16]]
//            ],
//            [
//              [[9, 10], [11, 12]],
//              [[11, 12], [13, 14]],
//              [[17, 18], [19, 20]]
//            ]
//          ]

更多範例

dynamic_iota

語義學

這項作業的功能與 iota 運算，但結果形狀是透過 output_shape 動態指定。

輸入

標籤	名稱	類型	限制
(I1)。	`output_shape`	整數類型的 1 維張量	(C1)、(C2)
(I2)。	`iota_dimension`	`si64`	(C1)。

輸出

名稱	類型	限制
`result`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C2)。

限制

(C1) 0 <= iota_dimension < size(output_shape)。
(C2) rank(result) = size(output_shape)。

範例

%output_shape = stablehlo.constant dense<[4, 5]> : tensor<2xi64>
%result = "stablehlo.dynamic_iota"(%output_shape) {
  iota_dimension = 0 : i64
} : (tensor<2xi64>) -> tensor<4x5xi64>
// %result: [
//           [0, 0, 0, 0, 0],
//           [1, 1, 1, 1, 1],
//           [2, 2, 2, 2, 2],
//           [3, 3, 3, 3, 3]
//          ]

更多範例

dynamic_pad

語義學

這項作業的功能與鍵盤運算，但使用 edge_padding_low、edge_padding_high 和 interior_padding 可以動態指定為值

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1)、(C2)、(C4)
(I2)。	`padding_value`	0 維張量或每個張量量化張量	(C1)。
(I3)。	`edge_padding_low`	整數類型的 1 維張量	(C1)、(C4)
(I4)。	`edge_padding_high`	整數類型的 1 維張量	(C1)、(C4)
(I5)。	`interior_padding`	整數類型的 1 維張量	(C2-C4)

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C3-C6)

限制

(C1) element_type(operand) = element_type(padding_value) = element_type(result)。
(C2) size(edge_padding_low) = size(edge_padding_high) = size(interior_padding) = rank(operand)。
(C3) 0 <= interior_padding。
(C4) shape(result) = shape(operand) + edge_padding_low + max(shape(operand) - 1, 0) * interior_padding + edge_padding_high。

範例

// %operand: [
//            [1, 2, 3],
//            [4, 5, 6]
//           ]
// %padding_value: 0
// %edge_padding_low: [0, 1]
// %edge_padding_high: [2, 1]
// %interior_padding: [1, 2]
%result = "stablehlo.dynamic_pad"(%operand, %padding_value,
  %edge_padding_low, %edge_padding_high, %interior_padding
) : (tensor<2x3xi64>, tensor<i64>, tensor<2xi64>, tensor<2xi64>, tensor<2xi64>) -> tensor<5x9xi64>
// %result: [
//           [0, 1, 0, 0, 2, 0, 0, 3, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 4, 0, 0, 5, 0, 0, 6, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0]
//          ]

更多範例

dynamic_reshape

語義學

這項作業的功能與重塑運算，但結果形狀是透過 output_shape 動態指定。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或量化張量	(C1-C3)
(I2)。	`output_shape`	整數類型的 1 維張量	(C4)。

輸出

名稱	類型	限制
`result`	張量或量化張量	(C1-C4)

限制

(C1) element_type(result) 的提供者：
- 如果 !is_per_axis_quantized(operand)，則為 element_type(operand)。
- element_type(operand)，但quantization_dimension(operand)和 quantization_dimension(result) 可能不同，否則可能不同。
(C2) size(operand) = size(result)。
(C3) 如果 is_per_axis_quantized(operand)：
- reduce(dims(operand, [0, 1, ..., quantization_dimension(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [0, 1, ..., quantization_dimension(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y)。
- dim(operand, quantization_dimension(operand)) = dim(result, quantization_dimension(result))。
- reduce(dims(operand, [quantization_dimension(operand) + 1, ..., rank(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [quantization_dimension(result) + 1, ..., rank(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y)。
(C4) size(output_shape) = rank(result)。

範例

// %operand: [[1, 2, 3], [4, 5, 6]]
// %output_shape: [3, 2]
%result = "stablehlo.dynamic_reshape"(%operand, %output_shape) : (tensor<2x3xi64>, tensor<2xi64>) -> tensor<3x2xi64>
// %result: [[1, 2], [3, 4], [5, 6]]

更多範例

dynamic_slice

語義學

使用動態運算的起始索引從 operand 擷取切片並產生 result 張量start_indices 包含每個維度的配量可能會有變動，slice_sizes 包含每個維度的區塊大小更正式 result[result_index] = operand[operand_index]，其中：

adjusted_start_indices = clamp(0, start_indices, shape(operand) - slice_sizes)。
operand_index = adjusted_start_indices + result_index。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1)、(C2)、(C4)
(I2)。	`start_indices`	整數型別 0 維張量的變異數	(C2)、(C3)
(I3)。	`slice_sizes`	`si64` 類型的 1D 張量常數	(C2)、(C4)、(C5)

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C1)、(C5)

限制

(C1) element_type(operand) = element_type(result)。
(C2) size(start_indices) = size(slice_sizes) = rank(operand)。
(C3) same(type(start_indices...))。
(C4) 0 <= slice_sizes <= shape(operand)。
(C5) shape(result) = slice_sizes。

範例

// %operand: [
//            [0, 0, 1, 1],
//            [0, 0, 1, 1],
//            [0, 0, 0, 0],
//            [0, 0, 0, 0]
//           ]
// %start_indices0: -1
// %start_indices1: 3
%result = "stablehlo.dynamic_slice"(%operand, %start_indices0, %start_indices1) {
  slice_sizes = array<i64: 2, 2>
} : (tensor<4x4xi32>, tensor<i64>, tensor<i64>) -> tensor<2x2xi32>
// %result: [
//           [1, 1],
//           [1, 1]
//          ]

更多範例

dynamic_update_slice

語義學

產生與 operand 張量相同的 result 張量，唯獨 start_indices 開始的配量已更新為 update 中的值。更正式的說法，result[result_index] 的定義為：

如果 0 <= update_index < shape(update)，其中：update[update_index]
- adjusted_start_indices = clamp(0, start_indices, shape(operand) - shape(update))。
- update_index = result_index - adjusted_start_indices。
否則為 operand[result_index]。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1-C4)、(C6)
(I2)。	`update`	張量或每個張量量化張量	(C2)、(C3)、(C6)
(I3)。	`start_indices`	整數型別 0 維張量的變異數	(C4)、(C5)

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C1)。

限制

(C1) type(operand) = type(result)。
(C2) element_type(update) = element_type(operand)。
(C3) rank(update) = rank(operand)。
(C4) size(start_indices) = rank(operand)。
(C5) same(type(start_indices...))。
(C6) 0 <= shape(update) <= shape(operand)。

範例

// %operand: [
//            [1, 1, 0, 0],
//            [1, 1, 0, 0],
//            [1, 1, 1, 1],
//            [1, 1, 1, 1]
//           ]
// %update: [
//           [1, 1],
//           [1, 1]
//          ]
// %start_indices0: -1
// %start_indices1: 3
%result = "stablehlo.dynamic_update_slice"(%operand, %update, %start_indices0, %start_indices1)
  : (tensor<4x4xi32>, tensor<2x2xi32>, tensor<i64>, tensor<i64>) -> tensor<4x4xi32>
// %result: [
//           [1, 1, 1, 1],
//           [1, 1, 1, 1],
//           [1, 1, 1, 1],
//           [1, 1, 1, 1]
//          ]

更多範例

指數

語義學

對 operand 張執行元素的指數運算，然後產生 result 張量。根據元素類型執行以下操作：

浮點值：IEEE-754 的 exp。
複數：複數。
量化類型： dequantize_op_quantize(exponential, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[0.0, 1.0], [2.0, 3.0]]
%result = "stablehlo.exponential"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [[1.0, 2.7182818284590451], [7.3890560989306504, 20.085536923187668]]

更多範例

exponential_minus_one

語義學

對 operand 張與會產生 result 張量根據元素類型執行以下操作：

浮點值：IEEE-754 的 expm1。
複數：複數的指數減去 1。
量化類型： dequantize_op_quantize(exponential_minus_one, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [0.0, 1.0]
%result = "stablehlo.exponential_minus_one"(%operand) : (tensor<2xf64>) -> tensor<2xf64>
// %result: [0.0, 1.71828187]

更多範例

公噸

語義學

執行正向與反向傅立葉變換，呈現真實和複雜的物件輸入/輸出

fft_type 是下列其中一項：

FFT：轉送複雜至複雜的 FFT。
IFFT：反向複雜至複雜的 FFT。
RFFT：轉送真實到複雜的 FFT。
IRFFT：反向到複雜的 FFT (亦即複雜，傳回實數)。

更正式來說，假設 fft 函式會使用 1 維張量會產生與輸入相同類型的 1D 張量輸出及計算獨立的 Fourier 轉換：

以 fft_type = FFT 來說，result 定義為 L 系列的最終結果計算其中 L = size(fft_length)。以 L = 3 為例：

result1[i0, ..., :] = fft(operand[i0, ..., :])。
result2[i0, ..., :, iR-1] = fft(result1[i0, ..., :, iR-1])。
result[i0, ..., :, iR-2, iR-1] = fft(result2[i0, ..., :, iR-2, iR-1])。

此外，假設 ifft 函式具有相同類型簽章，會計算 fft 的反函式：

如果是 fft_type = IFFT，則 result 的定義為反向計算 (fft_type = FFT)以 L = 3 為例：

result1[i0, ..., :, iR-2, iR-1] = ifft(operand[i0, ..., :, iR-2, iR-1])。
result2[i0, ..., :, iR-1] = ifft(result1[i0, ..., :, iR-1])。
result[i0, ..., :] = ifft(result2[i0, ..., :])。

此外，假設 rfft 函式會使用 1 維張量浮點型別，會產生 1D 複雜型別的相同的浮點語意，且運作方式如下：

rfft(real_operand) = truncated_result，其中
complex_operand... = (real_operand..., 0.0)。
complex_result = fft(complex_operand)。
truncated_result = complex_result[:(rank(complex_result) / 2 + 1)]。

(計算實際運算元時，如果是獨立的 Fourier 轉換，結果的 N/2 + 1 元素會明確定義結果的其餘部分。因此系統會截斷 rfft 的結果，以免計算多餘的元素。

以 fft_type = RFFT 來說，result 定義為 L 系列的最終結果計算其中 L = size(fft_length)。以 L = 3 為例：

result1[i0, ..., :] = rfft(operand[i0, ..., :])。
result2[i0, ..., :, iR-1] = fft(result1[i0, ..., :, iR-1])。
result[i0, ..., :, iR-2, iR-1] = fft(result2[i0, ..., :, iR-2, iR-1])。

最後，假設 irfft 函式具有相同類型簽章，會計算 rfft 的反函式：

如果是 fft_type = IRFFT，則 result 的定義為反向計算 (fft_type = RFFT)以 L = 3 為例：

result1[i0, ..., :, iR-2, iR-1] = ifft(operand[i0, ..., :, iR-2, iR-1])。
result2[i0, ..., :, iR-1] = ifft(result1[i0, ..., :, iR-1])。
result[i0, ..., :] = irfft(result2[i0, ..., :])。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜類型的張量	(C1)、(C2)、(C4)、(C5)
(I2)。	`fft_type`	`FFT`、`IFFT`、`RFFT` 和 `IRFFT` 的列舉	(C2)、(C5)
(I3)。	`fft_length`	`si64` 類型的 1D 張量常數	(C1)、(C3)、(C4)

輸出

名稱	類型	限制
`result`	浮點或複雜類型的張量	(C2)、(C4)、(C5)

限制

(C1) size(fft_length) <= rank(operand)。
(C2) operand 和 result 元素類型之間的關係不相同：
- 如果 fft_type = FFT、element_type(operand) 和 element_type(result) 都具有相同的複雜類型
- 如果 fft_type = IFFT、element_type(operand) 和 element_type(result) 都具有相同的複雜類型
- 如果 fft_type = RFFT，element_type(operand) 是浮點類型，而 element_type(result) 是相同浮點的複雜類型語意
- 如果 fft_type = IRFFT，element_type(operand) 是複雜類型，且 element_type(result) 是相同浮點類型的浮點類型語意
(C3) 1 <= size(fft_length) <= 3。
(C4) 如果 operand 和 result 之間，有 real 浮點類型，然後是 shape(real)[-size(fft_length):] = fft_length。
(C5) shape(result) = shape(operand)，但以下情況除外：
- 如果是 fft_type = RFFT， dim(result, -1) = dim(operand, -1) = 0 ? 0 : dim(operand, -1) / 2 + 1。
- 如果是 fft_type = IRFFT， dim(operand, -1) = dim(result, -1) = 0 ? 0 : dim(result, -1) / 2 + 1。

範例

// %operand: [(1.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0)]
%result = "stablehlo.fft"(%operand) {
  fft_type = #stablehlo<fft_type FFT>,
  fft_length = array<i64: 4>
} : (tensor<4xcomplex<f32>>) -> tensor<4xcomplex<f32>>
// %result: [(1.0, 0.0), (1.0, 0.0), (1.0, 0.0), (1.0, 0.0)]

floor

語義學

執行 operand 張元素級別的底層，並產生 result 張量。實作 IEEE-754 中的 roundToIntegralTowardNegative 作業規格。如果是量化類型，請執行 dequantize_op_quantize(floor, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點類型或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點類型或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [-0.8166, -0.2530, 0.2530, 0.8166, 2.0]
%result = "stablehlo.floor"(%operand) : (tensor<5xf32>) -> tensor<5xf32>
// %result: [-1.0, -1.0, 0.0, 0.0, 2.0]

更多範例

收集

語義學

從 start_indices 中指定的偏移值從 operand 張量收集切片並產生 result 張量

下圖顯示 result 中的元素如何對應 operand使用具體範例。這張圖表中挑選了幾個範例 result 並詳細說明它們對應的 operand 索引。

更正式的result[result_index] = operand[operand_index]，其中：

batch_dims = [d for d in axes(result) and d not in offset_dims]。
batch_index = result_index[batch_dims...]。
start_index 定義為：
- start_indices[bi0, ..., :, ..., biN]，其中 bi 是以下位置中的個別元素： batch_index 和 : 會插入 index_vector_dim 索引 (如果 index_vector_dim <rank(start_indices)。
- 否則為 [start_indices[batch_index]]。
對於 axes(operand)的 d_operand，
- full_start_index[d_operand] = clamp(start_index[d_start], 0, dim(operand, d_operand) - slice_sizes[d_operand]) 如果 d_operand = start_index_map[d_start]。
- 否則為 full_start_index[d_operand] = 0。
對於 axes(operand)的 d_operand，
- full_batching_index[d_operand] = batch_index[d_start - (d_start < index_vector_dim ? 0 : 1)] 如果 d_operand = operand_batching_dims[i_batching] 和 d_start = start_indices_batching_dims[i_batching]。
- 否則為 full_batching_index[d_operand] = 0。
offset_index = result_index[offset_dims...]。
full_offset_index = [oi0, ..., 0, ..., oiN]，其中oi為個人 offset_index 中的元素，而 0 則是從 collapsed_slice_dims 和 operand_batching_dims。
operand_index = full_start_index + full_batching_index + full_offset_index。

如果 indices_are_sorted 為 true，則實作會假設 start_indices 按照 start_index_map 排序，否則將未定義的行為更正式，針對「indices(result)」的所有i1 < i2， full_start_index(i1) <= full_start_index(i2)。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1)、(C8)、(C11)、(C17)、(C19-C21)、(C23)
(I2)。	`start_indices`	整數類型的張量	(C2-C3)、(C14)、(C17)、(C22)
(I3)。	`offset_dims`	`si64` 類型的 1D 張量常數	(C1)、(C4-C5)、(C22)
(I4)。	`collapsed_slice_dims`	`si64` 類型的 1D 張量常數	(C1)、(C6-C9)、(C22)
(I5)。	`operand_batching_dims`	`si64` 類型的 1D 張量常數	(C1)、(C6)、(C10-C12)、(C16-C18)、(C22)
(I6)。	`start_indices_batching_dims`	`si64` 類型的 1D 張量常數	(C13-C17)。
(I7)。	`start_index_map`	`si64` 類型的 1D 張量常數	(C3)、(C18-C19)
(I8)。	`index_vector_dim`	`si64` 類型的常數	(C2-C3)、(C15)、(C22)
(I9)。	`slice_sizes`	`si64` 類型的 1D 張量常數	(C9)、(C12)、(C20-C22)
(I10)。	`indices_are_sorted`	`i1` 類型的常數

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C5)、(C22-C23)

限制

(C1) rank(operand) = size(offset_dims) + size(collapsed_slice_dims) + size(operand_batching_dims)。
(C2) 0 <= index_vector_dim <= rank(start_indices)。
(C3) size(start_index_map) = index_vector_dim < rank(start_indices) ? dim(start_indices, index_vector_dim) : 1。
(C4) is_unique(offset_dims) and is_sorted(offset_dims)。
(C5) 0 <= offset_dims < rank(result)。
(C6) is_unique(concatenate(collapsed_slice_dims, operand_batching_dims))
(C7) is_sorted(collapsed_slice_dims)。
(C8) 0 <= collapsed_slice_dims < rank(operand)。
(C9) slice_sizes[collapsed_slice_dims...] <= 1。
(C10) is_sorted(operand_batching_dims)。
(C11) 0 <= operand_batching_dims < rank(operand)。
(C12) slice_sizes[operand_batching_dims...] <= 1。
(C13) is_unique(start_indices_batching_dims)。
(C14) 0 <= start_indices_batching_dims < rank(start_indices)。
(C15) index_vector_dim not in start_indices_batching_dims。
(C16) size(operand_batching_dims) == size(start_indices_batching_dims)。
(C17) dim(operand, operand_batching_dims...) = dim(start_indices, start_indices_batching_dims...)。
(C18) is_unique(concatenate(start_index_map, operand_batching_dims))。
(C19) 0 <= start_index_map < rank(operand)。
(C20) size(slice_sizes) = rank(operand)。
(C21) 0 <= slice_sizes <= shape(operand)。
(C22) shape(result) = combine(batch_dim_sizes, offset_dim_sizes)，其中：
- batch_dim_sizes = shape(start_indices)，但尺寸大小除外與 index_vector_dim 對應的 start_indices 未納入。
- offset_dim_sizes = slice_sizes，但維度大小對應 slice_sizes 與 collapsed_slice_dims 和不包含 operand_batching_dims。
- combine 會將 batch_dim_sizes 放在對應 batch_dims 和對應於 offset_dims 的軸 offset_dim_sizes。
(C23) element_type(operand) = element_type(result)。

範例

// %operand: [
//            [
//             [[1, 2], [3, 4], [5, 6], [7, 8]],
//             [[9, 10],[11, 12], [13, 14], [15, 16]],
//             [[17, 18], [19, 20], [21, 22], [23, 24]]
//            ],
//            [
//             [[25, 26], [27, 28], [29, 30], [31, 32]],
//             [[33, 34], [35, 36], [37, 38], [39, 40]],
//             [[41, 42], [43, 44], [45, 46], [47, 48]]
//            ]
//           ]
// %start_indices: [
//                  [
//                   [[0, 0], [1, 0], [2, 1]],
//                   [[0, 1], [1, 1], [0, 9]]
//                  ],
//                  [
//                   [[0, 0], [2, 1], [2, 2]],
//                   [[1, 2], [0, 1], [1, 0]]
//                  ]
//                 ]
%result = "stablehlo.gather"(%operand, %start_indices) {
  dimension_numbers = #stablehlo.gather<
    offset_dims = [3, 4],
    collapsed_slice_dims = [1],
    operand_batching_dims = [0],
    start_indices_batching_dims = [1],
    start_index_map = [2, 1],
    index_vector_dim = 3>,
  slice_sizes = array<i64: 1, 1, 2, 2>,
  indices_are_sorted = false
} : (tensor<2x3x4x2xi32>, tensor<2x2x3x2xi64>) -> tensor<2x2x3x2x2xi32>
// %result: [
//           [
//            [
//             [[1, 2], [3, 4]],
//             [[3, 4], [5, 6]],
//             [[13, 14], [15, 16]]
//            ],
//            [
//             [[33, 34], [35, 36]],
//             [[35, 36], [37, 38]],
//             [[41, 42], [43, 44]]
//            ]
//           ],
//           [
//            [
//             [[1, 2], [3, 4]],
//             [[13, 14], [15, 16]],
//             [[21, 22], [23, 24]]
//            ],
//            [
//             [[43, 44], [45, 46]],
//             [[33, 34], [35, 36]],
//             [[27, 28], [29, 30]]
//            ]
//           ]
//          ]

更多範例

get_dimension_size

語義學

產生 operand 的指定 dimension 大小。更正式 result = dim(operand, dimension)。語意化工作的處理只有形狀類型的元件元素類型可以是任何內容。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或量化張量	(C1)。
(I2)。	`dimension`	`si64` 類型的常數	(C1)。

輸出

名稱	類型
`result`	`si32` 類型的 0D 張張量

限制

(C1) 0 <= dimension < rank(operand)。

範例

// %operand: [[1, 2, 3], [4, 5, 6]]
%result = "stablehlo.get_dimension_size"(%operand) {
  dimension = 1 : i64
} : (tensor<2x3xi64>) -> tensor<i32>
// %result: 3

更多範例

get_tuple_element

注意： 根據 StableHLO v1.0 清理 #2283 規定，由於這項運算似乎都沒有使用，因此我們正在研究要淘汰的運算架構和編譯器因此，相容性保證有限 (6 個月)。

語義學

從 operand 元組的 index 位置擷取元素，然後產生 result。更正式的 result = operand[index]。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	元組	(C1)、(C2)
(I2)。	`index`	`si32` 類型的常數	(C1)、(C2)

輸出

名稱	類型	限制
`result`	任何支援的類型	(C2)。

限制

(C1) 0 <= index < size(operand)。
(C2) type(result) = tuple_element_types(operand)[index]。

範例

// %operand: ([1.0, 2.0], (3))
  index = 0 : i32
} : (tuple<tensor<2xf32>, tuple<tensor<i32>>>) -> tensor<2xf32>
// %result: [1.0, 2.0]

更多範例

如果

語義學

這個外掛程式能從 true_branch 或 false_branch，視 pred 的值而定。更正式的 result = pred ? true_branch() : false_branch()。

輸入

標籤	名稱	類型	限制
(I1)。	`pred`	`i1` 類型的 0D 張張量
(I2)。	`true_branch`	函式	(C1-C3)
(I3)。	`false_branch`	函式	(C1)、(C2)

輸出

名稱	類型	限制
`results`	變異量、量化張量或代詞	(C3)。

限制

(C1) input_types(true_branch) = input_types(false_branch) = []。
(C2) output_types(true_branch) = output_types(false_branch)。
(C3) type(results...) = output_types(true_branch)。

範例

// %result_true_branch: 10
// %result_false_branch: 11
// %pred: true
%result = "stablehlo.if"(%pred) ({
  "stablehlo.return"(%result_true_branch) : (tensor<i32>) -> ()
}, {
  "stablehlo.return"(%result_false_branch) : (tensor<i32>) -> ()
}) : (tensor<i1>) -> tensor<i32>
// %result: 10

更多範例

影像

語義學

從 operand 中擷取虛構部分元素，然後產生 result 張量。更正式地說明每個元素 x： imag(x) = is_complex(x) ? imaginary_part(x) : constant(0, element_type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜類型的張量	(C1)、(C2)

輸出

名稱	類型	限制
`result`	浮點類型的張量	(C1)、(C2)

限制

(C1) shape(result) = shape(operand)。
(C2) element_type(result) 定義為：
- 如果 is_complex(operand)，則為 complex_element_type(element_type(operand))。
- 否則為 element_type(operand)。

範例

// %operand: [(1.0, 2.0), (3.0, 4.0)]
%result = "stablehlo.imag"(%operand) : (tensor<2xcomplex<f32>>) -> tensor<2xf32>
// %result: [2.0, 4.0]

更多範例

動態內廣告

語義學

從動態內讀取資料並產生 results。

infeed_config 的語意是由實作定義。

results 包含最前面的酬載值和。日後，我們打算將酬載和權杖拆分為兩個分開處理多個輸出內容，以求清楚明確 (#670)。

輸入

標籤	名稱	類型
(I1)。	`token`	`token`
(I2)。	`infeed_config`	`string` 類型的常數

輸出

名稱	類型	限制
`results`	變異量、量化張量或代詞	(C1-C3)

限制

(C1) 0 < size(results)。
(C2) is_empty(result[:-1]) 或 is_tensor(type(results[:-1]))。
(C3) is_token(type(results[-1]))。

範例

// %token: !stablehlo.token
// infeed_queue[0]: [[1, 2], [3, 4]]
// infeed_queue[1]: [[5, 6], [7, 8]]
%results0:2 = "stablehlo.infeed"(%token) {
  infeed_config = ""
} : (!stablehlo.token) -> (tensor<2x2xi64>, !stablehlo.token)
// results0#0: [[1, 2], [3, 4]]
%results1:2 = "stablehlo.infeed"(%token) {
  infeed_config = ""
} : (!stablehlo.token) -> (tensor<2x2xi64>, !stablehlo.token)
// results1#0: [[5, 6], [7, 8]]

更多範例

Iota

語義學

填入 output 張量值，並依照從 0 開始遞增的順序填入值以及 iota_dimension 維度。更正式

output[output_index] = constant(is_quantized(output) ? quantize(output_index[iota_dimension], element_type(output)) : output_index[iota_dimension], element_type(output))。

輸入

標籤	名稱	類型	限制
(I1)。	`iota_dimension`	`si64`	(C1)。

輸出

名稱	類型	限制
`output`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。

限制

(C1) 0 <= iota_dimension < rank(output)。

範例

%output = "stablehlo.iota"() {
  iota_dimension = 0 : i64
} : () -> tensor<4x5xi32>
// %output: [
//           [0, 0, 0, 0, 0],
//           [1, 1, 1, 1, 1],
//           [2, 2, 2, 2, 2],
//           [3, 3, 3, 3, 3]
//          ]

%output = "stablehlo.iota"() {
  iota_dimension = 1 : i64
} : () -> tensor<4x5xi32>
// %output: [
//           [0, 1, 2, 3, 4],
//           [0, 1, 2, 3, 4],
//           [0, 1, 2, 3, 4],
//           [0, 1, 2, 3, 4]
//          ]

更多範例

is_finite

語義學

依據元素執行檢查 x 中的值是否有限 (亦即不是 +Inf、-Inf 和 NaN) 並產生 y 張量。實作 isFinite 執行 IEEE-754 規格的作業量化類型的結果一律為 true。

輸入

標籤	名稱	類型	限制
(I1)。	`x`	浮點類型或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`y`	布林值類型的張量	(C1)。

限制

(C1) shape(x) = shape(y)。

範例

// Logical values: -Inf, +Inf, NaN, ...
// %x: [0xFFF0000000000000, 0x7FF0000000000000, 0x7FF8000000000000, -10.0, -0.0, 0.0, 10.0]
%y = "stablehlo.is_finite"(%x) : (tensor<7xf64) -> tensor<7xi1>
// %y: [false, false, false, true, true, true, true]

更多範例

log

語義學

對 operand 張執行元素的對數運算，然後產生 result 張量。根據元素類型執行以下操作：

浮點值：IEEE-754 的 log。
複數：複雜的對數。
量化類型：dequantize_op_quantize(log, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[1.0, 2.0], [3.0, 4.0]]
%result = "stablehlo.log"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [[0.0, 0.69314718055994529], [1.0986122886681098, 1.3862943611198906]]

更多範例

log_plus_one

語義學

對 operand 張，執行元素的對數加 1 運算，並會產生 result 張量根據元素類型執行以下操作：

浮點值：IEEE-754 的 logp1。
複數：複雜的對數加一。
量化類型： dequantize_op_quantize(log_plus_one, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [0.0, -0.999, 7.0, 6.38905621, 15.0]
%result = "stablehlo.log_plus_one"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// %result: [0.0, -6.90776825, 2.07944155, 2.0, 2.77258873]

更多範例

物流

語義學

對 operand 張執行元素的邏輯運算，並產生 result 張量。根據元素類型執行以下操作：

浮點值：IEEE-754 的 division(1, addition(1, exp(-x)))。
適用於複數：複雜物流。
量化類型： dequantize_op_quantize(logistic, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[0.0, 1.0], [2.0, 3.0]]
%result = "stablehlo.logistic"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [[0.5, 0.73105858], [0.88079708, 0.95257413]]

更多範例

地圖

注意： 根據 StableHLO v1.0 清理 #2283 規定，由於這項運算似乎都沒有使用，因此我們正在研究要淘汰的運算架構和編譯器因此，相容性保證有限 (6 個月)。

語義學

將對應函式 computation 同時套用至 inputs、dimensions 和會產生 result 張量

更正式的 result[result_index] = computation(inputs...[result_index])。

輸入

標籤	名稱	類型	限制
(I1)。	`inputs`	變異量或每個張量量化張量	(C1-C4)
(I2)。	`dimensions`	`si64` 類型的 1D 張量常數	(C3)。
(I3)。	`computation`	函式	(C4)。

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C1)、(C4)

限制

(C1) shape(inputs...) = shape(result)。
(C2) 0 < size(inputs) = N。
(C3) dimensions = range(rank(inputs[0]))。
(C4) computation 屬於「(tensor<E0>, ..., tensor<EN-1>) -> tensor<E'>」類型其中 Ei = element_type(inputs[i]) 和 E' = element_type(result)。

範例

// %input0: [[0, 1], [2, 3]]
// %input1: [[4, 5], [6, 7]]
%result = "stablehlo.map"(%input0, %input1) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = stablehlo.multiply %arg0, %arg1 : tensor<i64>
    stablehlo.return %0 : tensor<i64>
}) {
  dimensions = array<i64: 0, 1>
} : (tensor<2x2xi64>, tensor<2x2xi64>) -> tensor<2x2xi64>
// %result: [[0, 5], [12, 21]]

更多範例

最高

語義學

對張量 lhs 和 rhs 執行元素上限作業，並產生 result 張量。根據元素類型執行以下操作：

布林值：邏輯 OR。
整數：最大整數。
浮點值：IEEE-754 的 maximum。
複數：(real, imaginary) 組合的字母數上限。對複雜數字表示排序涉及出乎意料的語意因此，我們預計日後將停止支援複數數字用於這項作業 (#560)。
量化類型：
- dequantize_op_quantize(maximum, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	張量或每個張量量化張量	(C1)。
(I2)。	`rhs`	張量或每個張量量化張量	(C1)。

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C1)。

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [[1, 2], [7, 8]]
// %rhs: [[5, 6], [3, 4]]
%result = "stablehlo.maximum"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 6], [7, 8]]

更多範例

最低

語義學

對張量 lhs 和 rhs 執行元素下限運算，並產生 result 張量。根據元素類型執行以下操作：

布林值：邏輯 AND。
整數：最小值。
浮點值：IEEE-754 的 minimum。
針對複數：(real, imaginary) 組合的字母數字下限。對複雜數字表示排序涉及出乎意料的語意因此，我們預計日後將停止支援複數數字用於這項作業 (#560)。
量化類型：
- dequantize_op_quantize(minimum, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	張量或每個張量量化張量	(C1)。
(I2)。	`rhs`	張量或每個張量量化張量	(C1)。

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C1)。

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [[1, 2], [7, 8]]
// %rhs: [[5, 6], [3, 4]]
%result = "stablehlo.minimum"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[1, 2], [3, 4]]

更多範例

相乘

語義學

執行兩個張量 lhs 和 rhs 的元素影響乘積，並產生 result 張量。根據元素類型執行以下操作：

布林值：邏輯 AND。
整數：整數乘法。
浮點值：IEEE-754 的 multiplication。
適用於複數：複雜乘法。
量化類型：
- dequantize_op_quantize(multiply, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	張量或每個張量量化張量	(C1)。
(I2)。	`rhs`	張量或每個張量量化張量	(C1)。

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.multiply"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 12], [21, 32]]

更多範例

negate

語義學

執行 operand 張元素依據元素的否定，並產生 result 張量根據元素類型執行以下操作：

帶正負號整數：整數否定。
不帶正負號整數：以位元轉換為帶正負號的整數、整數否定、位元 cast 傳回無正負號整數。
浮點值：IEEE-754 的 negate。
針對複數：複雜的否定運算子。
量化類型： dequantize_op_quantize(negate, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// Negation operation with integer Tensors
// %operand: [0, -2]
%result = "stablehlo.negate"(%operand) : (tensor<2xi32>) -> tensor<2xi32>
// %result: [0, 2]

// Negation operation with with complex tensors
// %operand: (2.5, 0.0)
%result = "stablehlo.negate"(%operand) : (tensor<1xcomplex<f32>>) -> tensor<1xcomplex<f32>>
// %result: [-2.5, -0.0]

更多範例

not

語義學

會執行非張量 operand，而非元素，並產生 result 張量。根據元素類型執行以下操作：

布林值：邏輯 NOT。
整數：位元 NOT。

引數

名稱	類型	限制
`operand`	布林值或整數類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	布林值或整數類型的張量	(C1)。

限制

(C1) type(operand) = type(result)。

範例

// Bitwise operation with with integer tensors
// %operand: [[1, 2], [3, 4]]
%result = "stablehlo.not"(%operand) : (tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[-2, -3], [-4, -5]]

// Bitwise operation with with boolean tensors
// %operand: [true, false]
%result = "stablehlo.not"(%operand) : (tensor<2xi1>) -> tensor<2xi1>
// %result: [false, true]

更多範例

optimization_barrier

語義學

確保產生 operand 的作業會在任何運作取決於 result 並防止編譯器轉換作業無法跨越藩籬，跨越藩籬進行。除此之外識別項，例如 result = operand。

引數

名稱	類型	限制
`operand`	變異量、每個張量量化張量或代詞	(C1)。

輸出

名稱	類型	限制
`result`	變異量、每個張量量化張量或代詞	(C1)。

限制

(C1) type(operand...) = type(result...)。

範例

// %operand0: 0.0
// %operand1: 1.0
%result0, %result1 = "stablehlo.optimization_barrier"(%operand0, %operand1) : (tensor<f32>, tensor<f32>) -> (tensor<f32>, tensor<f32>)
// %result0: 0.0
// %result1: 1.0

更多範例

或

語義學

執行兩個張量 lhs 和 rhs 的元素 OR 運算，並產生 result 張量根據元素類型執行以下操作：

布林值：邏輯 OR。
整數：位元 OR。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	整數或布林值類型的張量	(C1)。
(I2)。	`rhs`	整數或布林值類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	整數或布林值類型的張量	(C1)。

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// Bitwise operation with with integer tensors
// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.or"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 6], [7, 12]]

// Logical operation with with boolean tensors
// %lhs: [[false, false], [true, true]]
// %rhs: [[false, true], [false, true]]
%result = "stablehlo.or"(%lhs, %rhs) : (tensor<2x2xi1>, tensor<2x2xi1>) -> tensor<2x2xi1>
// %result: [[false, true], [true, true]]

更多範例

動態饋給外

語義學

將 inputs 寫入外動態饋給，並產生 result 權杖。

outfeed_config 的語意是由實作定義。

輸入

標籤	名稱	類型
(I1)。	`inputs`	張量化張量或量化張量
(I2)。	`token`	`token`
(I3)。	`outfeed_config`	`string` 類型的常數

輸出

名稱	類型
`result`	`token`

範例

%result = "stablehlo.outfeed"(%input0, %token) {
  outfeed_config = ""
} : (tensor<2x2x2xi64>, !stablehlo.token) -> !stablehlo.token

更多範例

巴士

語義學

在張量周圍和元素之間加上邊框間距，藉此展開 operand 具有指定 padding_value 的張量

edge_padding_low 和 edge_padding_high 指定加入的邊框間距量在索引 0 的低端 (位於索引 0 旁邊) 和高階 (在最高索引旁邊) 邊框間距大小可以是負數，其中邊框間距的絕對值指出要移除的元素數量。

interior_padding 指定任兩個之間相加的邊框間距量每個維度中的元素，但不得為負值。發生內部邊框間距這樣該區域的負邊邊框間距就會移除內部填充運算元

更正式的說法，result[result_index] 的定義為：

如果符合條件，則為operand[operand_index] result_index = edge_padding_low + operand_index * (interior_padding + 1)。
否則為 padding_value。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1)、(C2)、(C4)
(I2)。	`padding_value`	0 維張量或每個張量量化張量	(C1)。
(I3)。	`edge_padding_low`	`si64` 類型的 1D 張量常數	(C1)、(C4)
(I4)。	`edge_padding_high`	`si64` 類型的 1D 張量常數	(C1)、(C4)
(I5)。	`interior_padding`	`si64` 類型的 1D 張量常數	(C2-C4)

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C3-C6)

限制

(C1) element_type(operand) = element_type(padding_value) = element_type(result)。
(C2) size(edge_padding_low) = size(edge_padding_high) = size(interior_padding) = rank(operand)。
(C3) 0 <= interior_padding。
(C4) shape(result) = shape(operand) + edge_padding_low + max(shape(operand) - 1, 0) * interior_padding + edge_padding_high。

範例

// %operand: [
//            [1, 2, 3],
//            [4, 5, 6]
//           ]
// %padding_value: 0
%result = "stablehlo.pad"(%operand, %padding_value) {
  edge_padding_low = array<i64: 0, 1>,
  edge_padding_high = array<i64: 2, 1>,
  interior_padding = array<i64: 1, 2>
} : (tensor<2x3xi32>, tensor<i32>) -> tensor<5x9xi32>
// %result: [
//           [0, 1, 0, 0, 2, 0, 0, 3, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 4, 0, 0, 5, 0, 0, 6, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0]
//          ]

更多範例

partition_id

語義學

產生目前程序的 partition_id。

輸出

名稱	類型
`result`	`ui32` 類型的 0D 張張量

範例

%result = "stablehlo.partition_id"() : () -> tensor<ui32>

更多範例

罌粟花

語義學

根據元素計算 operand 張量中設定的位元數並產生 result 張量

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	整數類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	整數類型的張量	(C1)。

限制

(C1) type(operand) = type(result)。

範例

// %operand: [0, 1, 2, 127]
%result = "stablehlo.popcnt"(%operand) : (tensor<4xi64>) -> tensor<4xi64>
// %result: [0, 1, 1, 7]

更多範例

指數

語義學

執行 lhs 張 (依 rhs 張量) 的元素層級指數，以及會產生 result 張量。根據元素類型執行以下操作：

整數：整數指數。
浮點值：IEEE-754 的 pow。
複數：複數。
量化類型：dequantize_op_quantize(power, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。
(I2)。	`rhs`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %lhs: [-2.0, -0.0, -36.0, 5.0, 3.0, 10000.0]
// %rhs: [2.0, 2.0, 1.1, 2.0, -1.0, 10.0]
%result = "stablehlo.power"(%lhs, %rhs) : (tensor<6xf64>, tensor<6xf64>) -> tensor<6xf64>
// %result: [4.0, 0.0, -nan, 25.0, 0.333333343, inf]

更多範例

real

語義學

從 operand 依據元素擷取實際部分，並產生 result 張量更正式地說明每個元素 x： real(x) = is_complex(x) ? real_part(x) : x。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜類型的張量	(C1)、(C2)

輸出

名稱	類型	限制
`result`	浮點類型的張量	(C1)、(C2)

限制

(C1) shape(result) = shape(operand)。
(C2) element_type(result) 定義為：
- 如果 is_complex(operand)，則為 complex_element_type(element_type(operand))。
- 否則為 element_type(operand)。

範例

// %operand: [(1.0, 2.0), (3.0, 4.0)]
%result = "stablehlo.real"(%operand) : (tensor<2xcomplex<f32>>) -> tensor<2xf32>
// %result: [1.0, 3.0]

更多範例

參考資料

語義學

這個外掛程式能接收採用 channel_id 的頻道資料，並產生 results。

如果 is_host_transfer 為 true，則作業會從主機。否則，它會從其他裝置轉移資料。代表的意義您會瞭解自己的解決方案這個標記與 channel_type，因此我們日後只打算保留其中一個帳戶 (#666)。

results 包含最前面的酬載值和。日後，我們打算將酬載和權杖拆分為兩個分開處理多個輸出內容，以求清楚明確 (#670)。

輸入

標籤	名稱	類型	限制
(I1)。	`token`	`token`	(C4)。
(I2)。	`channel_id`	`si64` 類型的常數
(I3)。	`channel_type`	`DEVICE_TO_DEVICE` 和 `HOST_TO_DEVICE` 的列舉	(C1)。
(I4)。	`is_host_transfer`	`i1` 類型的常數	(C1)。

輸出

名稱	類型	限制
`results`	變異量、量化張量或代詞	(C2-C4)

限制

(C1) channel_type 的定義為：
- 若 is_host_transfer = true、HOST_TO_DEVICE，
- 否則為 DEVICE_TO_DEVICE。
(C2) 0 < size(results)。
(C3) is_empty(result[:-1]) 或 is_tensor(type(results[:-1]))。
(C4) is_token(type(results[-1]))。

範例

%results0, %results1 = "stablehlo.recv"(%token) {
  channel_handle = #stablehlo.channel_handle<handle = 1, type = 3>,
  is_host_transfer = true
} : (!stablehlo.token) -> (tensor<2x2xi64>, !stablehlo.token)

更多範例

減少

語義學

將縮減函式 body 套用至 inputs 和 init_values dimensions 並產生 results 張量。

縮減順序是實作定義，表示 body 和 init_values 必須形成單聲道，以確保該作業會產生那麼為所有實作項目輸入相同結果。不過這個條件就無法受到許多熱門的衰減值的影響例如：新增浮點值 init_values 的 body 和 0 實際上不會形成單聲道浮點新增無法結合。

更正式的results...[j0, ..., jR-1] = reduce(input_slices_converted)，其中：

input_slices = inputs...[j0, ..., :, ..., jR-1]，其中插入 : 結束時間：dimensions。
input_slices_converted = to_destination_type(input_slices..., type(func_inputs(body)[:len(func_inputs(body))//2])...)。
init_values_converted = to_destination_type(init_values..., type(func_inputs(body)[len(func_inputs(body))//2:])...)。
reduce(input_slices_converted) = exec(schedule) 代表一些二進位樹狀結構 schedule，其中：
- exec(node) = body(exec(node.left), exec(node.right))。
- exec(leaf) = leaf.value。
schedule 是實作定義的完整二進位樹狀結構，其順序為週遊包含：
- 所有 index 的 input_slices_converted...[index] 值 index_space(input_slices_converted) (依遞增順序排列) (共 index 個)。
- 交錯使用實作定義的 init_values_converted。

輸入

標籤	名稱	類型	限制
(I1)。	`inputs`	變異量或每個張量量化張量	(C1-C4)、(C6)、(C7)
(I2)。	`init_values`	0 維張量或每個張量的量化張量的變異數	(C2)、(C3)
(I3)。	`dimensions`	`si64` 類型的 1D 張量常數	(C4)、(C5)、(C7)
(I4)。	`body`	函式	(C6)。

輸出

名稱	類型	限制
`results`	變異量或每個張量量化張量	(C3)、(C7)、(C8)

限制

(C1) same(shape(inputs...))。
(C2) element_type(inputs...) = element_type(init_values...)。
(C3) 0 < size(inputs) = size(init_values) = size(results) = N。
(C4) 0 <= dimensions < rank(inputs[0])。
(C5) is_unique(dimensions)。
(C6) body 屬於「(tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ...,」類型 tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>)，其中 is_promotable(element_type(inputs[i]), Ei)。
(C7) shape(results...) = shape(inputs...)，但該維度除外不含與 dimensions 對應的 inputs... 大小。
(C8) [0,N) 中所有 i 的 element_type(results[i]) = Ei。

範例

// %input = [[0, 1, 2, 3, 4, 5]]
// %init_value = 0
%result = "stablehlo.reduce"(%input, %init_value) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  dimensions = array<i64: 1>
} : (tensor<1x6xi64>, tensor<i64>) -> tensor<1xi64>
// %result = [15]

更多範例

reduce_precision

語義學

執行 operand 的元素轉換到其他浮點類型使用 exponent_bits 和 mantissa_bits，然後再復原為原始設定浮點類型，並產生 output 張量。

更正式：

原始值的 mantissa 位元會更新為四捨五入為原始值設為可透過 mantissa_bits 代表的最接近值，方法是使用 roundToIntegralTiesToEven 語意。
接著，如果 mantissa_bits 小於 mantisa 位元的位元數原始值，mantissa 位元會截斷為 mantissa_bits。
接著，如果中繼結果的指數位元與範圍由 exponent_bits 提供，中繼結果溢位到產生無限可能原本的符號。
如果是量化類型，請執行 dequantize_op_quantize( lambda operand: reduce_precision(operand, exponent_bits, mantissa_bits), operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點類型或每個張量的量化張量	(C1)。
(I2)。	`exponent_bits`	`si32` 類型的常數	(C2)。
(I3)。	`mantissa_bits`	`si32` 類型的常數	(C3)。

輸出

名稱	類型	限制
`output`	浮點類型或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(output)。
(C2) 1 <= exponent_bits。
(C3) 0 <= mantissa_bits。

範例

// Logical values: +Inf, NaN, +Denormal, 0.0, 65519.0, 65520.0
// %operand: [0x7FF0000000000000, 0x7FFFFFFFFFFFFFFF, 0x0000000000000001, 0.0, 65519.0, 65520.0]
%output = "stablehlo.reduce_precision"(%operand) {
  exponent_bits = 5 : i32,
  mantissa_bits = 10 : i32
} : (tensor<6xf64>) -> tensor<6xf64>
// Logical values: +Inf, NaN, 0.0, 0.0, 65504.0, +Inf
// %output: [0x7FF0000000000000, 0x7FFFFFFFFFFFFFFF, 0.0, 0.0, 65504.0, 0x7FF0000000000000]

更多範例

reduce_scatter

語義學

reduce_scatter

在 StableHLO 程序網格中的每個處理程序群組中，執行縮減、使用 computations 處理每個程序的 operand 張量值。將縮減結果沿著 scatter_dimension 分成多個部分，然後散佈將部分分割到程序之間，以便產生 result

這項作業會將 StableHLO 程序格線分割為 process_groups 定義如下：

cross_replica(replica_groups) 如果 channel_id <= 0 and use_global_device_ids = false。
cross_replica_and_partition(replica_groups) 如果 channel_id > 0 and use_global_device_ids = false。
flattened_ids(replica_groups) 如果 channel_id > 0 and use_global_device_ids = true。

之後，在每個 process_group 內：

reduced_value = all_reduce(operand, replica_groups, channel_id, use_global_device_ids, computation)。
parts@sender = split(reduced_value@sender, dim(process_groups, 1), scatter_dimension)。
result@receiver = parts@sender[receiver_index]地區所有sender的 process_group，其中 receiver_index = process_group.index(receiver)。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1)、(C2)、(C7)、(C8)
(I2)。	`scatter_dimension`	`si64` 類型的常數	(C1)、(C2)、(C8)
(I3)。	`replica_groups`	`si64` 類型的 2D 張張量常數	(C3-C5)
(I4)。	`channel_id`	`si64` 類型的常數	(C6)。
(I5)。	`use_global_device_ids`	`i1` 類型的常數	(C6)。
(I6)。	`computation`	函式	(C7)。

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C8-C9)

限制

(C1) dim(operand, scatter_dimension) % dim(process_groups, 1) = 0。
(C2) 0 <= scatter_dimension < rank(operand)。
(C3) is_unique(replica_groups)。
(C4) size(replica_groups) 定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_replica_and_partition，則為 num_replicas。
- 如果使用 flattened_ids，則為 num_processes。
(C5) 0 <= replica_groups < size(replica_groups)。
(C6) 如果值為 use_global_device_ids = true，則值為 channel_id > 0。
(C7) computation 屬於 (tensor<E>, tensor<E>) -> (tensor<E>) 類型，其中 is_promotable(element_type(operand), E)。
(C8) shape(result) = shape(operand)，但下列項目除外：
- dim(result, scatter_dimension) = dim(operand, scatter_dimension) / dim(process_groups, 1)。
(C9) element_type(result) = E。

範例

// num_replicas: 2
// num_partitions: 1
// %operand@(0, 0): [[1, 2, 3, 4],
//                   [5, 6, 7, 8]]
// %operand@(1, 0): [[9, 10, 11, 12],
//                   [13, 14, 15, 16]]
%result = "stablehlo.reduce_scatter"(%operand) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
  %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
  "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  scatter_dimension = 1 : i64,
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor<2x4xi64>) -> tensor<2x2xi64>
//
// %result@(0, 0): [[10, 12],
//                  [18, 20]]
// %result@(1, 0): [[14, 16],
//                  [22, 24]]

更多範例

reduce_window

語義學

將縮減函式 body 套用至 inputs 和 init_values 的視窗並產生 results

下圖顯示如何計算 results... 中的元素 inputs...使用具體範例。

reduce_window

更正式 results...[result_index] = reduce(windows, init_values, axes(inputs...), body) (請參閱「縮減」)：其中：

padded_inputs = pad(inputs..., init_values..., padding[:, 0], padding[:, 1], base_dilations - 1)。
window_start = result_index * window_strides。
window_end = window_start + (window_dimensions - 1) * window_dilations + 1。
windows = slice(padded_inputs..., window_start, window_end, window_dilations)。

輸入

標籤	名稱	類型	限制
(I1)。	`inputs`	變異量或每個張量量化張量	(C1-C4)、(C6)、(C8)、(C10)、(C12)、(C13)、(C15)
(I2)。	`init_values`	0 維張量或每個張量的量化張量的變異數	(C1)、(C13)
(I3)。	`window_dimensions`	`si64` 類型的 1D 張量常數	(C4)、(C5)、(C15)
(I4)。	`window_strides`	`si64` 類型的 1D 張量常數	(C6)、(C7)、(C15)
(I5)。	`base_dilations`	`si64` 類型的 1D 張量常數	(C8)、(C9)、(C15)
(I6)。	`window_dilations`	`si64` 類型的 1D 張量常數	(C10)、(C11)、(C15)
(I7)。	`padding`	`si64` 類型的 2D 張張量常數	(C12)、(C15)
(I8)。	`body`	函式	(C13)。

輸出

名稱	類型	限制
`results`	變異量或每個張量量化張量	(C1)、(C14-C16)

限制

(C1) 0 < size(inputs) = size(init_values) = size(results) = N。
(C2) same(shape(inputs...))。
(C3) element_type(inputs...) = element_type(init_values...)。
(C4) size(window_dimensions) = rank(inputs[0])。
(C5) 0 < window_dimensions。
(C6) size(window_strides) = rank(inputs[0])。
(C7) 0 < window_strides。
(C8) size(base_dilations) = rank(inputs[0])。
(C9) 0 < base_dilations。
(C10) size(window_dilations) = rank(inputs[0])。
(C11) 0 < window_dilations。
(C12) shape(padding) = [rank(inputs[0]), 2]。
(C13) body 屬於(tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ...,類型 tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>)，其中 is_promotable(element_type(inputs[i]), Ei)。
(C14) same(shape(results...))。
(C15) shape(results[0]) = num_windows，其中：
- dilated_input_shape = shape(inputs[0]) = 0 ? 0 : (shape(inputs[0]) - 1) * base_dilations + 1。
- padded_input_shape = padding[:, 0] + dilated_input_shape + padding[:, 1]。
- dilated_window_shape = (window_dimensions - 1) * window_dilations + 1。
- is_empty_window = padded_input_shape = 0 || dilated_window_shape > padded_input_shape。
- num_windows = is_empty_window ? 0 : floor((padded_input_shape - dilated_window_shape) / window_strides) + 1。
(C16) [0,N) 中所有 i 的 element_type(results[i]) = Ei。

範例

// %input = [[1, 2], [3, 4], [5, 6]]
// %init_value = 0
%result = "stablehlo.reduce_window"(%input, %init_value) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  window_dimensions = array<i64: 2, 1>,
  window_strides = array<i64: 4, 1>,
  base_dilations = array<i64: 2, 1>,
  window_dilations = array<i64: 3, 1>,
  padding = dense<[[2, 1], [0, 0]]> : tensor<2x2xi64>
} : (tensor<3x2xi64>, tensor<i64>) -> tensor<2x2xi64>
// %result = [[0, 0], [3, 4]]

更多範例

餘下

語義學

執行被除數 lhs 和除數 rhs 張元素的元素餘數，以及會產生 result 張量。

廣泛來說，結果的符號是各個被除數，結果的絕對值一律小於該除數的絕對值。餘數的計算方式為 lhs - d * rhs，其中 d 是由以下公式指定：

整數：stablehlo.divide(lhs, rhs)。
浮點值：來自 IEEE-754 的 division(lhs, rhs) 且帶有捨入屬性 roundTowardZero。
複數：未定 (#997)。
量化類型：
- dequantize_op_quantize(remainder, lhs, rhs, type(result))。

對於浮點元素類型，此項運算與 IEEE-754 規格的 remainder 作業，其中 d 為積分值最接近 lhs/rhs 與偶數相同的值。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	整數、浮點數或複雜型別或每個張量的量化張量	(C1)。
(I2)。	`rhs`	整數、浮點數或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	整數、浮點數或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %lhs: [17, -17, 17, -17]
// %rhs: [3, 3, -3, -3]
%result = "stablehlo.remainder"(%lhs, %rhs) : (tensor<4xi64>, tensor<4xi64>) -> tensor<4xi64>
// %result: [2, -2, 2, -2]

更多範例

replica_id

語義學

產生目前程序的 replica_id。

輸出

名稱	類型
`result`	`ui32` 類型的 0D 張張量

範例

%result = "stablehlo.replica_id"() : () -> tensor<ui32>

更多範例

重塑

語義學

執行 operand 張型重塑至 result 張量。概念上其次是保留標準表示法，但可能會形狀，例如範圍從 tensor<2x3xf32> 到 tensor<3x2xf32> 或 tensor<6xf32>

更正式，result[result_index] = operand[operand_index]： result_index和operand_index在字典中的位置相同 index_space(result) 和 index_space(operand) 的順序。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或量化張量	(C1-C3)

輸出

名稱	類型	限制
`result`	張量或量化張量	(C1-C3)

限制

(C1) element_type(result) 的提供者：
- 如果 !is_per_axis_quantized(operand)，則為 element_type(operand)。
- element_type(operand)，但quantization_dimension(operand)和 quantization_dimension(result) 可能不同，否則可能不同。
(C2) size(operand) = size(result)。
(C3) 如果 is_per_axis_quantized(operand)：
- reduce(dims(operand, [0, 1, ..., quantization_dimension(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [0, 1, ..., quantization_dimension(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y)。
- dim(operand, quantization_dimension(operand)) = dim(result, quantization_dimension(result))。
- reduce(dims(operand, [quantization_dimension(operand) + 1, ..., rank(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [quantization_dimension(result) + 1, ..., rank(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y)。

範例

// %operand: [[1, 2, 3], [4, 5, 6]]
%result = "stablehlo.reshape"(%operand) : (tensor<2x3xi32>) -> tensor<3x2xi32>
// %result: [[1, 2], [3, 4], [5, 6]]

更多範例

反向排序

語義學

沿著指定 dimensions 反轉 operand 中的元素順序並產生 result 張量更正式 result[result_index] = operand[operand_index]，其中：

operand_index[d] = dim(result, d) - result_index[d] - 1 如果 dimensions 中的 d。
否則為 operand_index[d] = result_index[d]。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1)、(C3)
(I2)。	`dimensions`	`si64` 類型的 1D 張量常數	(C2)、(C3)

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C1)、(C3)

限制

(C1) type(operand) = type(result)。
(C2) is_unique(dimensions)。
(C3) 0 <= dimensions < rank(result)。

範例

// %operand = [[1, 2], [3, 4], [5, 6]]
%result = "stablehlo.reverse"(%operand) {
  dimensions = array<i64: 1>
} : (tensor<3x2xi32>) -> tensor<3x2xi32>
// %result: [[2, 1], [4, 3], [6, 5]]

更多範例

恩格

注意： 根據 StableHLO v1.0 清理 #2283 規定，由於這項運算似乎都沒有使用，因此我們正在研究要淘汰的運算架構和編譯器因此，相容性保證有限 (6 個月)。

語義學

使用 rng_distribution 演算法產生隨機數字，並產生指定形狀 shape 的 result 張量。

如果值為 rng_distribution = UNIFORM，則會產生隨機號碼按照一致分佈情形於 [a, b) 執行。如果是 a >= b，未定義行為

如果值為 rng_distribution = NORMAL，則會產生隨機號碼如下表所示，平均值 = a，標準差 = b。如果是 b < 0，表示未定義行為。

您必須定義隨機數字，才能正確產生隨機數字。適用對象舉例來說，這些詞彙不一定具有確定性，而且不一定會使用隱藏狀態。

在與許多利害關係人對話中，這項合作計劃想到能找到非常有效的方法因此我們預計日後將嘗試移除 (#597)。

輸入

標籤	名稱	類型	限制
(I1)。	`a`	整數、布林值或浮點類型的 0 維張量	(C1)、(C2)
(I2)。	`b`	整數、布林值或浮點類型的 0 維張量	(C1)、(C2)
(I3)。	`shape`	`si64` 類型的 1D 張量常數	(C3)。
(I4)。	`rng_distribution`	`UNIFORM` 和 `NORMAL` 的列舉	(C2)。

輸出

名稱	類型	限制
`result`	整數、布林值或浮點類型的張量	(C1-C3)

限制

(C1) element_type(a) = element_type(b) = element_type(result)。
(C2) 如果值為 rng_distribution = NORMAL，則值為 is_float(a)。
(C3) shape(result) = shape。

範例

// %a = 0
// %b = 2
// %shape = [3, 3]
%result = "stablehlo.rng"(%a, %b, %shape) {
  rng_distribution = #stablehlo<rng_distribution UNIFORM>
} : (tensor<i32>, tensor<i32>, tensor<2xi64>) -> tensor<3x3xi32>
// %result: [
//           [1, 0, 1],
//           [1, 1, 1],
//           [0, 0, 0]
//          ]

rng_bit_generator

語義學

傳回已填入統一隨機位元與已更新輸出狀態的 output output_state (使用虛擬隨機號碼產生器演算法 rng_algorithm) 根據指定的初始狀態 initial_state輸出內容保證會確定性函式 initial_state，但不保證一定確定結果具有確定性

rng_algorithm 是下列其中一項：

DEFAULT：實作定義的演算法。
THREE_FRY：Threefry 演算法的實作定義的變化版本。*
PHILOX：Philox 演算法的實作定義的變化版本。*

* 請參閱：Salmon 等人，SC 2011。平行隨機號碼：就像 1、2、3 一樣簡單。，瞭解如何調查及移除這項存取權。

輸入

標籤	名稱	類型	限制
(I1)。	`rng_algorithm`	`DEFAULT`、`THREE_FRY` 和 `PHILOX` 的列舉	(C2)。
(I2)。	`initial_state`	`ui64` 類型的 1D 張張量	(C1)、(C2)

輸出

名稱	類型	限制
`output_state`	`ui64` 類型的 1D 張張量	(C1)。
`output`	整數或浮點類型的張量

限制

(C1) type(initial_state) = type(output_state)。
(C2) size(initial_state) 定義為：
- 如果 rng_algorithm = DEFAULT，則會定義。
- 如果 rng_algorithm = THREE_FRY，則為 2。
- 2 或 3 (如果 rng_algorithm = PHILOX)。

範例

// %initial_state: [1, 2]
%output_state, %output = "stablehlo.rng_bit_generator"(%initial_state) {
  rng_algorithm = #stablehlo<rng_algorithm THREE_FRY>
} : (tensor<2xui64>) -> (tensor<2xui64>, tensor<2x2xui64>)
// %output_state: [1, 6]
// %output: [
//           [9236835810183407956, 16087790271692313299],
//           [18212823393184779219, 2658481902456610144]
//          ]

round_nearest_afz

語義學

執行元素相關無條件進位至最接近的整數，以破壞緊密關係從零開始，operand 張量就會產生 result 張量。實作方式 IEEE-754 規格中的 roundToIntegralTiesToAway 運算。適用對象量化類型 dequantize_op_quantize(round_nearest_afz, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點類型或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點類型或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand = [-2.5, 0.4, 0.5, 0.6, 2.5]
%result = "stablehlo.round_nearest_afz"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// %result: [-3.0, 0.0, 1.0, 1.0, 3.0]

更多範例

round_nearest_even

語義學

執行元素相關無條件進位至最接近的整數，以破損在 operand 張量處產生 result，然後得出偶數整數張量實作 IEEE-754 中的 roundToIntegralTiesToEven 作業規格。如果是量化類型，請執行 dequantize_op_quantize(round_nearest_even, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點類型或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點類型或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand = [-2.5, 0.4, 0.5, 0.6, 2.5]
%result = "stablehlo.round_nearest_even"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// %result: [-2.0, 0.0, 0.0, 1.0, 2.0]

更多範例

rsqrt

語義學

對 operand 張量執行元素相關的倒數平方根運算，並會產生 result 張量根據元素類型執行以下操作：

浮點值：IEEE-754 的 rSqrt。
複數：複數的倒數平方根。
量化類型：dequantize_op_quantize(rsqrt, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[1.0, 4.0], [9.0, 25.0]]
%result = "stablehlo.rsqrt"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[1.0, 0.5], [0.33333343, 0.2]]

更多範例

散熱器

語義學

產生與 inputs 張量相同的 results 張量，除非有 scatter_indices 指定的多個配量會更新為相應的值使用 update_computation 的 updates。

下圖顯示 updates... 中的元素如何對應 results...使用具體範例。這個圖表中挑選了 updates... 索引，並詳細說明其 results... 所代表的索引物件

散熱器

更正式的說法，針對 index_space(updates[0]) 中的所有 update_index：

update_scatter_dims = [d for d in axes(updates[0]) and d not in update_window_dims]。
update_scatter_index = update_index[update_scatter_dims...]。
start_index 定義為：
- scatter_indices[si0, ..., :, ..., siN]，其中si為個人 update_scatter_index 和 : 中的元素會插入至 index_vector_dim 索引，如果 index_vector_dim < rank(scatter_indices)。
- 否則為 [scatter_indices[update_scatter_index]]。
對於 axes(inputs[0])的 d_input，
- 如果符合條件，則為full_start_index[d_input] = start_index[d_start] d_input = scatter_dims_to_operand_dims[d_start]。
- 否則為 full_start_index[d_input] = 0。
對於 axes(inputs[0])的 d_input，
- full_batching_index[d_input] = update_scatter_index[d_start - (d_start < index_vector_dim ? 0 : 1)] 如果 d_input = input_batching_dims[i_batching] 和 d_start = scatter_indices_batching_dims[i_batching]。
- 否則為 full_batching_index[d_input] = 0。
update_window_index = update_index[update_window_dims...]。
full_window_index = [wi0, ..., 0, ..., wiN]，其中wi為個人 update_window_index 中的元素，而 0 則是從 inserted_window_dims 和 input_batching_dims。
result_index = full_start_index + full_batching_index + full_window_index。

假設，results = exec(schedule, inputs)，其中：

schedule 是實作定義的排列組合 index_space(updates[0])。
exec([update_index, ...], results) = exec([...], updated_results)，其中：
- 如果 result_index 位於 shape(results...) 的邊界內
- updates_converted = to_destination_type( updates...[update_index], type(func_inputs(update_computation) [len(func_inputs(update_computation))//2:])... )
- updated_values = update_computation(results...[result_index], updates_converted)
- updated_results是「results」與「results...[result_index]」的副本已設為 updated_values...。
- 你也可以
- updated_results = results。
exec([], results) = results。

如果 indices_are_sorted 為 true，則實作會假設 scatter_indices 按照 scatter_dims_to_operand_dims 排序，否則行為將處於未定義狀態更正式，適用於下列位置的所有i1 < i2： indices(result)，full_start_index(i1) <= full_start_index(i2)。

如果 unique_indices 為 true，則實作會假設所有分散的 result_index 索引是唯一的。如果 unique_indices 是 true，但被分散的索引不是獨一無二，則行為是未定義。

輸入

標籤	名稱	類型	限制
(I1)。	`inputs`	變異量或每個張量量化張量	(C1)、(C2)、(C4-C6)、(C11)、(C13)、(C18)、(C21)、(C23-C24)
(I2)。	`scatter_indices`	整數類型的張量	(C4)、(C15)、(C19)、(C22)
(I3)。	`updates`	變異量或每個張量量化張量	(C3-C6)、(C8)
(I4)。	`update_window_dims`	`si64` 類型的 1D 張量常數	(C2)、(C4)、(C7-C8)
(I5)。	`inserted_window_dims`	`si64` 類型的 1D 張量常數	(C2)、(C4)、(C9-C11)
(I6)。	`input_batching_dims`	`si64` 類型的 1D 張量常數	(C2)、(C4)、(C9)、(C12-13)、(C17-18)、(C20)
(I7)。	`scatter_indices_batching_dims`	`si64` 類型的 1D 張量常數	(C14-C18)
(I8)。	`scatter_dims_to_operand_dims`	`si64` 類型的 1D 張量常數	(C19-C21)
(I9)。	`index_vector_dim`	`si64` 類型的常數	(C4)、(C16)、(C19)、(C22)
(I10)。	`indices_are_sorted`	`i1` 類型的常數
(I11)。	`unique_indices`	`i1` 類型的常數
(I12)。	`update_computation`	函式	(C23)。

輸出

名稱	類型	限制
`results`	變異量或每個張量量化張量	(C24-C25)

限制

(C1) same(shape(inputs...))。
(C2) `rank(inputs[0]) = size(update_window_dims) + size(inserted_window_dims)
- size(input_batching_dims)`.
(C3) same(shape(updates...))。
(C4) shape(updates[0]) = combine(update_scatter_dim_sizes, update_window_dim_sizes)，其中：
- update_scatter_dim_sizes = shape(scatter_indices)，但有 scatter_indices 的維度大小，會對應到未包含 index_vector_dim。
- update_window_dim_sizes <= shape(inputs[0])，但有與 inserted_window_dims 對應的 inputs[0] 尺寸大小以及 input_batching_dims。
- combine 會將 update_scatter_dim_sizes 放置在與 update_scatter_dims 和 update_window_dim_sizes 軸對應至至 update_window_dims。
(C5) 0 < size(inputs) = size(updates) = N。
(C6) element_type(updates...) = element_type(inputs...)。
(C7) is_unique(update_window_dims) and is_sorted(update_window_dims)。
(C8) 0 <= update_window_dims < rank(updates[0])。
(C9) is_unique(concatenate(inserted_window_dims, input_batching_dims))
(C10) is_sorted(inserted_window_dims)。
(C11) 0 <= inserted_window_dims < rank(inputs[0])。
(C12) is_sorted(input_batching_dims)。
(C13) 0 <= input_batching_dims < rank(inputs[0]))。
(C14) is_unique(scatter_indices_batching_dims)。
(C15) 0 <= scatter_indices_batching_dims < rank(scatter_indices)。
(C16) index_vector_dim not in scatter_indices_batching_dims。
(C17) size(input_batching_dims) == size(scatter_indices_batching_dims)。
(C18) dim(inputs[0], input_batching_dims...) = dim(scatter_indices, scatter_indices_batching_dims...)。
(C19) size(scatter_dims_to_operand_dims) = index_vector_dim < rank(scatter_indices) ? dim(scatter_indices, index_vector_dim) : 1。
(C20) is_unique(concatenate(scatter_dims_to_operand_dims, input_batching_dims))。
(C21) 0 <= scatter_dims_to_operand_dims < rank(inputs[0])。
(C22) 0 <= index_vector_dim <= rank(scatter_indices)。
(C23) update_computation 屬於 (tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ..., tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>) 類型，其中 is_promotable(element_type(inputs[i]), Ei)。
(C24) shape(inputs...) = shape(results...)。
(C25) [0,N) 中所有 i 的 element_type(results[i]) = Ei。

範例

// %input: [
//          [
//           [[1, 2], [3, 4], [5, 6], [7, 8]],
//           [[9, 10],[11, 12], [13, 14], [15, 16]],
//           [[17, 18], [19, 20], [21, 22], [23, 24]]
//          ],
//          [
//           [[25, 26], [27, 28], [29, 30], [31, 32]],
//           [[33, 34], [35, 36], [37, 38], [39, 40]],
//           [[41, 42], [43, 44], [45, 46], [47, 48]]
//          ]
//         ]
// %scatter_indices: [
//                    [
//                     [[0, 0], [1, 0], [2, 1]],
//                     [[0, 1], [1, 1], [0, 9]]
//                    ],
//                    [
//                     [[0, 0], [2, 1], [2, 2]],
//                     [[1, 2], [0, 1], [1, 0]]
//                    ]
//                   ]
// %update: [
//           [
//            [[1, 1], [1, 1], [1, 1]],
//            [[1, 1], [1, 1], [1, 1]]
//           ],
//           [
//            [[1, 1], [1, 1], [1, 1]],
//            [[1, 1], [1, 1], [1, 1]]
//           ]
//          ]
%result = "stablehlo.scatter"(%input, %scatter_indices, %update) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  scatter_dimension_numbers = #stablehlo.scatter<
    update_window_dims = [3, 4],
    inserted_window_dims = [1],
    input_batching_dims = [0],
    scatter_indices_batching_dims = [1],
    scatter_dims_to_operand_dims = [2, 1],
    index_vector_dim = 3>,
  indices_are_sorted = false,
  unique_indices = false
} : (tensor<2x3x4x2xi64>, tensor<2x2x3x2xi64>, tensor<2x2x3x2x2xi64>) -> tensor<2x3x4x2xi64>
// %result: [
//           [
//            [[3, 4], [6, 7], [6, 7], [7, 8]],
//            [[9, 10],[11, 12], [15, 16], [17, 18]],
//            [[17, 18], [19, 20], [22, 23], [24, 25]]
//           ],
//           [
//            [[25, 26], [28, 29], [30, 31], [31, 32]],
//            [[35, 36], [38, 39], [38, 39], [39, 40]],
//            [[41, 42], [44, 45], [46, 47], [47, 48]]
//           ]
//          ]

更多範例

選取

語義學

產生 result 張量，其中每個元素都從 on_true 或以 pred 對應元素的值為依據的 on_false 張量。更正式的 result[result_index] = pred_element ? on_true[result_index] : on_false[result_index]，其中 pred_element = rank(pred) = 0 ? pred[] : pred[result_index]。如果是量化類型，請執行 dequantize_select_quantize(pred, on_true, on_false, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`pred`	`i1` 類型的張量	(C1)。
(I2)。	`on_true`	張量或每個張量量化張量	(C1-C2)
(I3)。	`on_false`	張量或每個張量量化張量	(C2)。

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C2)。

限制

(C1) rank(pred) = 0 or shape(pred) = shape(on_true)。
(C2) baseline_type(on_true) = baseline_type(on_false) = baseline_type(result)。

範例

// %pred: [[false, true], [true, false]]
// %on_true: [[1, 2], [3, 4]]
// %on_false: [[5, 6], [7, 8]]
%result = "stablehlo.select"(%pred, %on_true, %on_false) : (tensor<2x2xi1>, tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 2], [3, 8]]

更多範例

select_and_scatter

語義學

使用 scatter 根據source 使用 select 並產生 input 張量的 reduce_window 結果 result 張量。

下圖顯示如何計算 result 中的元素 operand 和 source 使用具體範例。

select_and_scatter

更正式：

selected_values = reduce_window_without_init(...) 替換為下列輸入內容：
- inputs = [operand].
- 依原樣使用 window_dimensions、window_strides 和 padding。
- base_dilations = windows_dilations = 1。
- body 定義為：
```
def body(arg0: tensor<E>, arg1: tensor<E>) -> tensor<E>:
  return select(arg0, arg1) ? arg0 : arg1;
```
「E = element_type(operand)」和「reduce_window_without_init」在哪裡運作與 reduce_window 完全相同，但基礎的 schedule reduce (請參閱縮減) 不含 init 值。目前時間：未指定當對應的視窗沒有值會有什麼影響 (#731)。
result[result_index] = reduce([source_values], [init_value], [0], scatter) 其中：
- source_values = [source[source_index] for source_index in source_indices]。
- 如果符合條件，則為selected_index(source_index) = operand_index selected_values[source_index] 包含 operand 元素最低時間：operand_index。
- source_indices = [source_index for source_index in indices(source) if selected_index(source_index) = result_index]。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1-C4)、(C6)、(C8-C11)
(I2)。	`source`	張量或每個張量量化張量	(C1)、(C2)
(I3)。	`init_value`	0 維張量或每個張量量化張量	(C3)。
(I4)。	`window_dimensions`	`si64` 類型的 1D 張量常數	(C2)、(C4)、(C5)
(I5)。	`window_strides`	`si64` 類型的 1D 張量常數	(C2)、(C6)、(C7)
(I6)。	`padding`	`si64` 類型的 2D 張張量常數	(C2)、(C8)
(I7)。	`select`	函式	(C9)。
(I8)。	`scatter`	函式	(C10)。

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C11-C12)

限制

(C1) element_type(operand) = element_type(source)。
(C2) shape(source) = num_windows，其中：
- padded_operand_shape = padding[:, 0] + shape(operand) + padding[:, 1]。
- is_empty_window = padded_operand_shape = 0 || window_dimensions > padded_operand_shape。
- num_windows = is_empty_window ? 0 : floor((padded_operand_shape - window_dimensions) / window_strides) + 1。
(C3) element_type(init_value) = element_type(operand)。
(C4) size(window_dimensions) = rank(operand)。
(C5) 0 < window_dimensions。
(C6) size(window_strides) = rank(operand)。
(C7) 0 < window_strides。
(C8) shape(padding) = [rank(operand), 2]。
(C9) select 採用 (tensor<E>, tensor<E>) -> tensor<i1> 類型，其中 E = element_type(operand)。
(C10) scatter 屬於 (tensor<E>, tensor<E>) -> tensor<E> 類型，其中 is_promotable(element_type(operand), E)。
(C11) shape(operand) = shape(result)。
(C12) element_type(result) = E。

範例

// %operand: [[1, 5], [2, 5], [3, 6], [4, 4]]
// %source: [[5, 6], [7, 8]]
// %init_value: 0
%result = "stablehlo.select_and_scatter"(%operand, %source, %init_value) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.compare"(%arg0, %arg1) {
      comparison_direction = #stablehlo<comparison_direction GE>
    } : (tensor<i64>, tensor<i64>) -> tensor<i1>
    "stablehlo.return"(%0) : (tensor<i1>) -> ()
}, {
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  window_dimensions = array<i64: 3, 1>,
  window_strides = array<i64: 2, 1>,
  padding = dense<[[0, 1], [0, 0]]> : tensor<2x2xi64>
} : (tensor<4x2xi64>, tensor<2x2xi64>, tensor<i64>) -> tensor<4x2xi64>
// %result: [[0, 0], [0, 0], [5, 14], [7, 0]]

更多範例

傳送

語義學

將 inputs 傳送至管道 channel_id 並產生 result 符記。

如果 is_host_transfer 為 true，則作業會將資料轉移至主機。否則，系統會將資料轉移到其他裝置。代表的意義您會瞭解自己的解決方案這個標記與 channel_type，因此我們日後只打算保留其中一個帳戶 (#666)。

輸入

標籤	名稱	類型	限制
(I1)。	`inputs`	張量化張量或量化張量
(I2)。	`token`	`token`
(I3)。	`channel_id`	`si64` 類型的常數
(I4)。	`channel_type`	`DEVICE_TO_DEVICE` 和 `DEVICE_TO_HOST` 的列舉	(C1)。
(I5)。	`is_host_transfer`	`i1` 類型的常數	(C1)。

輸出

名稱	類型
`result`	`token`

限制

(C1) channel_type 的定義為：
- 若 is_host_transfer = true、DEVICE_TO_HOST，
- 否則為 DEVICE_TO_DEVICE。

範例

%result = "stablehlo.send"(%operand, %token) {
  channel_handle = #stablehlo.channel_handle<handle = 1, type = 2>,
  is_host_transfer = true
} : (tensor<2x2xi64>, !stablehlo.token) -> !stablehlo.token

更多範例

shift_left

語義學

對 lhs 張按元素執行左側偏移作業，依照 rhs 數字產生 result 張量

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	整數類型的張量	(C1)。
(I2)。	`rhs`	整數類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	整數類型的張量	(C1)。

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// %lhs: [-1, 0, 1]
// %rhs: [1, 2, 3]
%result = "stablehlo.shift_left"(%lhs, %rhs): (tensor<3xi64>, tensor<3xi64>) -> tensor<3xi64>
// %result: [-2, 0, 8]

更多範例

shift_right_arithmetic

語義學

對 lhs 張量執行元素方向的右移運算 rhs 位元並產生 result 張量。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	整數類型的張量	(C1)。
(I2)。	`rhs`	整數類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	整數類型的張量	(C1)。

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// %lhs: [-1, 0, 8]
// %rhs: [1, 2, 3]
%result = "stablehlo.shift_right_arithmetic"(%lhs, %rhs): (tensor<3xi64>, tensor<3xi64>) -> tensor<3xi64>
// %result: [-1, 0, 1]

更多範例

shift_right_logical

語義學

根據 rhs 在 lhs 張量執行元素邏輯右移運算並產生 result 張量

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	整數類型的張量	(C1)。
(I2)。	`rhs`	整數類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	整數類型的張量	(C1)。

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// %lhs: [-1, 0, 8]
// %rhs: [1, 2, 3]
%result = "stablehlo.shift_right_logical"(%lhs, %rhs): (tensor<3xi64>, tensor<3xi64>) -> tensor<3xi64>
// %result: [9223372036854775807, 0, 1]

更多範例

簽署

語義學

傳回 operand 元素的正負號，並產生 result 張量。更正式來說，每個元素 x 都可以使用語意表示語意 Python 語法如下：

def sign(x):
  if is_integer(x):
    if compare(x, 0, LT, SIGNED): return -1
    if compare(x, 0, EQ, SIGNED): return 0
    return 1
  elif is_float(x):
    if is_nan(x): return NaN
    if compare(x, -0.0, EQ, FLOAT): return -0.0
    if compare(x, +0.0, EQ, FLOAT): return +0.0
    if compare(x, 0.0, LT, FLOAT): return -1.0
    return 1.0
  elif is_complex(x):
    if is_nan(real(x)) or is_nan(imag(x)): return (NaN, NaN)
    if compare(x, (0.0, 0.0), EQ, FLOAT): return (0.0, 0.0)
    return divide(x, convert(abs(x), type(x)))

如果是量化類型，請執行 dequantize_op_quantize(sign, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	帶正負號整數、浮點數或複雜類型或各張量量化張量	(C1)。

輸出

名稱	類型	限制
`result`	帶正負號整數、浮點數或複雜類型或各張量量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// Logical values: +NaN, -1.0, -0.0, +0.0, 1.0
// operand: [0x7FFFFFFFFFFFFFFF, -1.0, -0.0, 0.0, 1.0]
%result = "stablehlo.sign"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// Logical values: +NaN, -1.0, -0.0, +0.0, 1.0
// %result: [0x7FFFFFFFFFFFFFFF, -1.0, -0.0, 0.0, 1.0]

更多範例

正弦

語義學

對 operand 張執行元素的正弦運算，並產生 result 張量根據元素類型執行以下操作：

浮點值：IEEE-754 的 sin。
適用於複數：複數。
量化類型：dequantize_op_quantize(sine, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [
//            [0.0, 1.57079632],       // [0, pi/2]
//            [3.14159265, 4.71238898] // [pi, 3pi/2]
//           ]
%result = "stablehlo.sine"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[0.0, 1.0], [0.0, -1.0]]

更多範例

配量

語義學

使用靜態計算的起始索引從 operand 擷取切片並產生 result 張量start_indices 包含每個維度的切片，limit_indices 包含結尾索引 (不含) 每個維度切片，strides 則包含等數。每個維度的值

更正式，result[result_index] = operand[operand_index]： operand_index = start_indices + result_index * strides。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或每個張量量化張量	(C1-C3)、(C5)
(I2)。	`start_indices`	`si64` 類型的 1D 張量常數	(C2)、(C3)、(C5)
(I3)。	`limit_indices`	`si64` 類型的 1D 張量常數	(C2)、(C3)、(C5)
(I4)。	`strides`	`si64` 類型的 1D 張量常數	(C2)、(C4)

輸出

名稱	類型	限制
`result`	張量或每個張量量化張量	(C1)、(C5)

限制

(C1) element_type(operand) = element_type(result)。
(C2) size(start_indices) = size(limit_indices) = size(strides) = rank(operand)。
(C3) 0 <= start_indices <= limit_indices <= shape(operand)。
(C4) 0 < strides。
(C5) shape(result) = ceil((limit_indices - start_indices) / strides)。

範例

// %operand: [
//            [0, 0, 0, 0],
//            [0, 0, 1, 1],
//            [0, 0, 1, 1]
//           ]
%result = "stablehlo.slice"(%operand) {
  start_indices = array<i64: 1, 2>,
  limit_indices = array<i64: 3, 4>,
  strides = array<i64: 1, 1>
} : (tensor<3x4xi64>) -> tensor<2x2xi64>
// % result: [
//            [1, 1],
//            [1, 1]
//           ]

更多範例

排序

語義學

將維度 (dimension) 上 inputs 的 1D 切片一起排序。依據 comparator 並產生 results。

有別於其他作業中的類似輸入內容，dimension 允許使用負值，語意解釋如下系統日後可能會禁止基於一致性考量 (#1377)。

如果 is_stable 為 true，則排序會保持穩定，也就是系統會保留比較子視為相等的元素。個案如果有單一輸入內容，系統會將 e1 和 e2 兩個元素視為只有在遇到 comparator(e1, e2) = comparator(e2, e1) = false。請參閱下方的正式化說明如何將內容歸納為多個輸入內容

更正式的說法，針對 index_space(results[0]) 中的所有 result_index：

adjusted_dimension = dimension >= 0 ? dimension : rank(inputs[0]) + dimension。
result_slice = [ri0, ..., :, ..., riR-1]，其中riN為個人 result_index 中的元素，而 : 是插入在 adjusted_dimension 位置。
inputs_together = (inputs[0]..., ..., inputs[N-1]...)。
results_together[result_slice] = sort(inputs_together[result_slice], comparator_together)。
其中 sort 會根據預期的非遞減順序排序 1D 切片左側引數如果comparator_togethertrue 小於右手的第二個引數

def comparator_together(lhs_together, rhs_together):
  args = []
  for (lhs_el, rhs_el) in zip(lhs_together, rhs_together):
    args.append(lhs_el)
    args.append(rhs_el)
  return comparator(*args)

(results[0]..., ..., results[N-1]...) = results_together。

輸入

標籤	名稱	類型	限制
(I1)。	`inputs`	變異量或每個張量量化張量	(C1-C5)
(I2)。	`dimension`	`si64` 類型的常數	(C4)。
(I3)。	`is_stable`	`i1` 類型的常數
(I4)。	`comparator`	函式	(C5)。

輸出

名稱	類型	限制
`results`	變異量或每個張量量化張量	(C2)、(C3)

限制

(C1) 0 < size(inputs)。
(C2) type(inputs...) = type(results...)。
(C3) same(shape(inputs...) + shape(results...))。
(C4) -R <= dimension < R，其中 R = rank(inputs[0])。
(C5) comparator 有類型 (tensor<E1>, tensor<E1>, ..., tensor<EN-1>, tensor<EN-1>) -> tensor<i1>, 其中 Ei = element_type(inputs[i])。

範例

// %input0 = [[1, 2, 3], [3, 2, 1]]
// %input1 = [[3, 2, 1], [1, 2, 3]]
%result0, %result1 = "stablehlo.sort"(%input0, %input1) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>, %arg2: tensor<i64>, %arg3: tensor<i64>):
    %predicate = "stablehlo.compare"(%arg0, %arg1) {
      comparison_direction = #stablehlo<comparison_direction GT>
    } : (tensor<i64>, tensor<i64>) -> tensor<i1>
    "stablehlo.return"(%predicate) : (tensor<i1>) -> ()
}) {
  dimension = 0 : i64,
  is_stable = true
} : (tensor<2x3xi64>, tensor<2x3xi64>) -> (tensor<2x3xi64>, tensor<2x3xi64>)
// %result0 = [[3, 2, 3], [1, 2, 1]]
// %result1 = [[1, 2, 1], [3, 2, 3]]

更多範例

平方

語義學

對 operand 張執行元素平方根運算，並產生 result 張量。根據元素類型執行以下操作：

浮點值：IEEE-754 的 squareRoot。
複數：複數的平方根。
量化類型：dequantize_op_quantize(sqrt, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[0.0, 1.0], [4.0, 9.0]]
%result = "stablehlo.sqrt"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[0.0, 1.0], [2.0, 3.0]]

更多範例

subtract

語義學

執行 lhs 和 rhs 元素的兩個張量減去，然後產生 result 張量。根據元素類型執行以下操作：

整數：減去整數。
浮點值：IEEE-754 的 subtraction。
複數：複數減法。
量化類型：
- dequantize_op_quantize(subtract, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。
(I2)。	`rhs`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	整數張量、浮點數或複雜類型，或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [[6, 8], [10, 12]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.subtract"(%lhs, %rhs) : (tensor<2x2xf32>, tensor<2x2xf32>) -> (tensor<2x2xf32>)
// %result: [[1, 2], [3, 4]]

更多範例

tan

語義學

對 operand 張執行元素的切線運算，然後產生 result 張量。根據元素類型執行以下操作：

浮點值：IEEE-754 的 tan。
複數：複數的正切值。
量化類型：dequantize_op_quantize(tan, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [
//            [0.0, 1.57079632],       // [0, pi/2]
//            [3.14159265, 4.71238898] // [pi, 3pi/2]
//           ]
%result = "stablehlo.tan"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [
//           [0.0, 1.63312e+16],
//           [0.0, 5.44375e+15]
//          ]

更多範例

丹納

語義學

對 operand 張量執行元素的雙曲正切運算，並會產生 result 張量根據元素類型執行以下操作：

浮點值：IEEE-754 的 tanh。
複數：複數的雙曲正切值。
量化類型：
- dequantize_op_quantize(tanh, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或複雜型別或每個張量的量化張量	(C1)。

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [-1.0, 0.0, 1.0]
%result = "stablehlo.tanh"(%operand) : (tensor<3xf32>) -> tensor<3xf32>
// %result: [-0.76159416, 0.0, 0.76159416]

更多範例

轉置

語義學

使用 permutation 將 operand 張量縮小，然後產生 result 張量。更正式，result[result_index] = operand[operand_index] 其中 result_index[d] = operand_index[permutation[d]]。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	張量或量化張量	(C1-C4)
(I2)。	`permutation`	`si64` 類型的 1D 張量常數	(C2-C4)

輸出

名稱	類型	限制
`result`	張量或量化張量	(C1)、(C3-C4)

限制

(C1) element_type(result) 的提供者：
- 如果 !is_per_axis_quantized(operand)，則為 element_type(operand)。
- element_type(operand)，但quantization_dimension(operand)和 quantization_dimension(result) 可能不同，否則可能不同。
(C2) permutation 是 range(rank(operand)) 的排列組合。
(C3) shape(result) = dim(operand, permutation...)。
(C4) 如果值為 is_per_axis_quantized(result)，則 quantization_dimension(operand) = permutation(quantization_dimension(result))。

範例

// %operand: [
//            [[1,2], [3,4], [5,6]],
//            [[7,8], [9,10], [11,12]]
//           ]
%result = "stablehlo.transpose"(%operand) {
  permutation = array<i64: 2, 1, 0>
} : (tensor<2x3x2xi32>) -> tensor<2x3x2xi32>
// %result: [
//           [[1,7], [3,9], [5,11]],
//           [[2,8], [4,10], [6,12]]
//          ]

更多範例

triangular_solve

語義學

解開最小或上三角形的線性方程式系統係數矩陣

更正式的說法，就 a 和 b 而言，result[i0, ..., iR-3, :, :] 是最佳解決方案到「op(a[i0, ..., iR-3, :, :]) * x = b[i0, ..., iR-3, :, :]」時：left_side true 或 x * op(a[i0, ..., iR-3, :, :]) = b[i0, ..., iR-3, :, :] 時機 left_side 為 false，正在解析 op(a) 判定的變數 x 期限為 transpose_a，可以是下列其中一項：

NO_TRANSPOSE：依原樣使用 a 執行操作。
TRANSPOSE：對 a 轉置作業執行。
ADJOINT：對 a 轉動的共置作業執行這項作業。

如果 lower 為 true 或a a 的上方三角形，在其他情況下則傳回。輸出資料會以同一個三角形傳回；其他三角形的值則是實作定義。

如果 unit_diagonal 為 true，則實作會假設對角線 a 的元素等於 1，否則行為未定義。

如果是量化類型，請執行 dequantize_op_quantize(lambda x, y: triangular_solve(x, y, left_side, lower, unit_diagonal, transpose_a), a, b, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`a`	浮點或複雜型別或每個張量的量化張量	(C1-C3)
(I2)。	`b`	浮點或複雜型別或每個張量的量化張量	(C1-C4)
(I3)。	`left_side`	`i1` 類型的常數	(C3)。
(I4)。	`lower`	`i1` 類型的常數
(I5)。	`unit_diagonal`	`i1` 類型的常數
(I6)。	`transpose_a`	`NO_TRANSPOSE`、`TRANSPOSE` 和 `ADJOINT` 的列舉

輸出

名稱	類型	限制
`result`	浮點或複雜型別或每個張量的量化張量	(C1)。

限制

(C1) baseline_element_type(a) = baseline_element_type(b)。
(C2) 2 <= rank(a) = rank(b) = R。
(C3) shape(a) 和 shape(b) 的關係定義如下：
- shape(a)[:-3] = shape(b)[:-3]。
- dim(a, -2) = dim(a, -1) = dim(b, left_side ? -2 : -1)。
(C4) baseline_type(b) = baseline_type(result)。

範例

// %a = [
//       [1.0, 0.0, 0.0],
//       [2.0, 4.0, 0.0],
//       [3.0, 5.0, 6.0]
//      ]
// %b = [
//       [2.0, 0.0, 0.0],
//       [4.0, 8.0, 0.0],
//       [6.0, 10.0, 12.0]
//      ]
%result = "stablehlo.triangular_solve"(%a, %b) {
  left_side = true,
  lower = true,
  unit_diagonal = false,
  transpose_a = #stablehlo<transpose NO_TRANSPOSE>
} : (tensor<3x3xf32>, tensor<3x3xf32>) -> tensor<3x3xf32>
// %result: [
//           [2.0, 0.0, 0.0],
//           [0.0, 2.0, 0.0],
//           [0.0, 0.0, 2.0]
//          ]

元組

注意： 根據 StableHLO v1.0 清理 #2283 規定，由於這項運算似乎都沒有使用，因此我們正在研究要淘汰的運算架構和編譯器因此，相容性保證有限 (6 個月)。

語義學

使用值 val 產生 result 元組。

輸入

標籤	名稱	類型	限制
(I1)。	`val`	值的變異數	(C1)。

輸出

名稱	類型	限制
`result`	元組	(C1)。

限制

(C1) result 屬於 tuple<E0, ..., EN-1> 類型，其中 Ei = type(val[i])。

範例

// %val0: [1.0, 2.0]
// %val1: (3)
%result = "stablehlo.tuple"(%val0, %val1) : (tensor<2xf32>, tuple<tensor<i32>>) -> tuple<tensor<2xf32>, tuple<tensor<i32>>>
// %result: ([1.0, 2.0], (3))

更多範例

uniform_dequantize

語義學

這個外掛程式能執行量化張量 operand，將量化張量轉換為浮點張量 result (根據定義的量化參數) 依據 operand 類型劃分。

更正式的 result = dequantize(operand)。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	量化張量	(C1)、(C2)

輸出

名稱	類型	限制
`result`	浮點類型的張量	(C1)、(C2)

限制

(C1) shape(operand) = shape(result)。
(C2) element_type(result) = expressed_type(operand)。

範例

// %operand: [10, 10]
%result = "stablehlo.uniform_dequantize"(%operand) : (tensor<2x!quant.uniform<i8:f32:0, {0.1:-30,0.5:-20}>>) -> tensor<2xf32>
// %result: [4.0, 15.0]

uniform_quantize

語義學

執行浮點張量或量化張量的元素轉換轉換根據量化，將 operand 變更為量化張量 result 參數是由 result 類型定義的參數。

更正式

如果為 is_float(operand)：
- result = quantize(operand, type(result))。
如果為 is_quantized(operand)：
- float_result = dequantize(operand)。
- result = quantize(float_result, type(result))。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	浮點或量化類型的張量	(C1)、(C2)

輸出

名稱	類型	限制
`result`	量化張量	(C1)、(C2)

限制

(C1) shape(operand) = shape(result)。
(C2) expressed_type(result) = is_float(operand) ? element_type(operand) : expressed_type(operand)。

範例

// %operand: [4.0, 15.0]
%result = "stablehlo.uniform_quantize"(%operand) : (tensor<2xf32>) -> tensor<2x!quant.uniform<i8:f32:0, {0.1:-30,0.5:-20}>>
// %result: [10, 10]

// %operand: [10, 10]
%result = "stablehlo.uniform_quantize"(%operand) : (tensor<2x!quant.uniform<i8:f32:0, {0.1:-30,0.5:-20}>>) -> tensor<2x!quant.uniform<i8:f32:0, {0.1:-20,0.2:-30}>>
// %result: [20, 45]

而

語義學

在執行 body 函式時產生輸出內容 0 次以上， cond 函式會輸出 true。較正式的語句使用 Python 語法如下：

internal_state = operand
while cond(*internal_state):
  internal_state = body(*internal_state)
results = internal_state

無限迴圈的行為為待定 (#383)。

輸入

標籤	名稱	類型	限制
(I1)。	`operand`	變異量、量化張量或代詞	(C1-C3)
(I2)。	`cond`	函式	(C1)。
(I3)。	`body`	函式	(C2)。

輸出

名稱	類型	限制
`results`	變異量、量化張量或代詞	(C3)。

限制

(C1) cond 採用 (T0, ..., TN-1) -> tensor<i1> 類型，其中 Ti = type(operand[i])。
(C2) body 採用 (T0, ..., TN-1) -> (T0, ..., TN-1) 類型，其中 Ti = type(operand[i])。
(C3) type(results...) = type(operand...)。

範例

// %init_i: 1
// %init_sum: 0
// %one: 1
// %ten: 10
%results0, %results1 = "stablehlo.while"(%init_i, %init_sum) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %cond = "stablehlo.compare"(%arg0, %ten) {
      comparison_direction = #stablehlo<comparison_direction LT>
    } : (tensor<i64>, tensor<i64>) -> tensor<i1>
    stablehlo.return %cond : tensor<i1>
  }, {
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %new_sum = stablehlo.add %arg1, %one : tensor<i64>
    %new_i = stablehlo.add %arg0, %one : tensor<i64>
    stablehlo.return %new_i, %new_sum : tensor<i64>, tensor<i64>
}) : (tensor<i64>, tensor<i64>) -> (tensor<i64>, tensor<i64>)
// %results0: 10
// %results1: 10

更多範例

Xor

語義學

執行兩個張量 lhs 和 rhs 元素的 XOR，並產生 result 張量根據元素類型執行以下操作：

布林值：邏輯 XOR。
整數：位元 XOR。

輸入

標籤	名稱	類型	限制
(I1)。	`lhs`	布林值或整數類型的張量	(C1)。
(I2)。	`rhs`	布林值或整數類型的張量	(C1)。

輸出

名稱	類型	限制
`result`	布林值或整數類型的張量	(C1)。

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// Bitwise operation with with integer tensors
// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.xor"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[4, 4], [4, 12]]

// Logical operation with with boolean tensors
// %lhs: [[false, false], [true, true]]
// %rhs: [[false, true], [false, true]]
%result = "stablehlo.xor"(%lhs, %rhs) : (tensor<2x2xi1>, tensor<2x2xi1>) -> tensor<2x2xi1>
// %result: [[false, true], [true, false]]

更多範例

方言互通性

目前，外部的 StableHLO 程式有時會包含某些作業，並未由 StableHLO 定義。

模組、函式、呼叫和回傳

StableHLO 針對 ModuleOp、FuncOp、CallOp 和退貨運算這麼做是為了提升與現有 MLIR 機器的互通性，實用票證是以 FuncOp 和 ModuleOp 為目標，管道預期有這些作業完全相容性保證已套用至這些作業如果您在某個叢集中執行不相容 (即移除) 將新增 StableHLO 同等項目以保留相容性。

CHLO

CHLO 運算組合包含分解至 StableHLO 的高階作業。我們目前並未針對 CHLO 提供相容性保證。相容性保證，包括chlo-legalize-to-stablehlo Pass 必須先用於序列化。

形狀運算

社群通常會使用某些核心行動動態 StableHLO 程式中的 MLIR 方言以執行形狀運算。大多數情況下，包括 shape 方言運算，例如 shape_of 或 num_elements，tensor 方言例如 dim 或 from_elements，以及內建的 index 類型。

Dynamism RFC >O2 表示這些項目超出範圍，但部分 index 類型支援但不包含在互通性用途中我們無法保證這些機制的相容性作業或類型shape-legalize-to-stablehlo 並能將這些作業轉換為完整支援的 StableHLO 運算。

已淘汰的作業

有些 StableHLO 作業沿用自 MHLO 且即將淘汰 StableHLO。請參閱您可以在 StableHLO v1.0 清理 #2283 中找到移除程序的相關資訊。這些淘汰作業的追蹤問題為 #2340。

這些作業分成幾個類別：

「不在 HLO 中」StableHLO 作業的分類 - 最初是 StableHLO 對子，但後來被認為不適合。 broadcast、create_token、cross-replica-sum、dot、einsum、 torch_index_select、unary_einsum (#3)。
Unused ops (未使用的作業) - 這些作業有時可能非常實用，但「Ops」(作業) 或是運用這些作業的管道以不再需要它們這包括 map、tuple (#598)、 get_tuple_element、rng、complex 比較結果 #560，和卷積 window_reversal (#1181)

其中部分運算可以輕鬆移除，因為它們可使用現有作業 (broadcast、create_token、cross-replica-sum、dot、 unary_einsum)，並會在現有的相容性期過後移除通行證 (6 個月)。系統仍在探索並移除其他項目 (einsum、 get_tuple_element，map，rng，torch_index_select，tuple，complex 比較、window_reversal)。尚待社群意見回饋，移除這些運算，或加入完整支援的規格中。結束時間這些運算 Future 都能夠保證 6 個月的相容性。

執行

依序執行

藉由將輸入值提供給 main 函式，即可執行 StableHLO 程式並計算輸出值函式的輸出值計算方式如下：執行在對應 return 運算根根層級運算的運算圖。

只要與執行順序一致，即可定義執行順序資料流，即運算作業在使用前執行。在 StableHLO 中副作用運算會耗用一個符記，並產生一個符記 (多個符記可以會透過 after_all 多工處理成一個權杖，因此效果也與 Dataflow 一致例如，在下列程式中執行訂單可能有兩個：%0 → %1 → %2 → return 和 %1 → %0 → %2 → return。

func.func @main() -> tensor<f64> {
  %0 = stablehlo.constant dense<1.0> : tensor<f64>
  %1 = stablehlo.constant dense<2.0> : tensor<f64>
  %2 = stablehlo.add %0, %1 : tensor<f64>
  return %2 : tensor<f64>
}

更正式的說， StableHLO 程序是： 1) StableHLO 程式、2) 作業狀態 (尚未執行，和 3) 程序正在執行的中繼值。此程序會從 main 函式的輸入值開始，並依序執行顯示更新作業狀態和中繼值的作業狀態和並以輸出值完成。進一步正式化是待定 (#484)。

平行執行

StableHLO 程式可以平行執行，並組織成 2D 程序格線 num_partitions 的 num_replicas，且兩者都含有 ui32 類型。

在 StableHLO 程序格線中，num_replicas * num_partitions StableHLO 就會同時執行多個程序每個程序都有 process_id = (replica_id, partition_id)，其中「replica_ids = range(num_replicas)」和「replica_id」 partition_ids = range(num_partitions)中的 partition_id，當中皆有類型 ui32。

程序網格大小是靜態的，每個程式的大小 (在我們計劃在未來將其納入 StableHLO 程式 #650)，以及位置程序網格是每個程序的靜態資料。每個程序透過 replica_id 和 partition_id 次操作。

在流程網格中，程式可以全部相同 (位於「Single」 Program, Multiple Data」可以是不同的樣式 (在「多項程式」中多重資料」樣式) 或介於兩者之間的其他文字我們在未來安排了引進支援其他定義平行 StableHLO 程式的慣用語言包括 GSPMD (#619)。

在流程網格中，這些程序大多各自獨立。具有獨立的作業狀態，獨立的輸入/中繼/輸出值大多數作業會在程序之間分開執行但下文說明的少數集體運作情形除外。

由於大多數運算的執行作業只會使用處理程序時，通常很難以名稱來參照這些值。然而，在描述集體運算的語意時，這個情況並不足夠，這會使標記法 name@process_id 指向標記法 name 的值特定流程中(從這個角度來看，不符合資格的 name 可以是這是 name@(replica_id(), partition_id()) 的簡寫)。

各程序的執行順序為實作定義，除了點對點通訊和集體操作所帶來的同步機制說明。

點對點通訊

StableHLO 程序可透過多種方式相互通訊 StableHLO 版本。管道是由類型為正的 ID 表示 si64。透過各種作業，您可以將價值傳送到管道和才能從頻道接收內容

進一步正式，例如：這些頻道 ID 的來源處理程式時，可察覺到這類程式，以及是尚未定案的 (#484)。

串流通訊

每個 StableHLO 程序都可存取兩個串流介面：

可讀取的 Infeed。
可寫入的外部動態饋給。

管道與管道不同之處在於前者是在程序之間通訊兩者都有處理程序導入作業的定義

進一步正式，例如：串流通訊對執行作業的影響以及其帶來的同步類型，待定 (#484)。

集體作業

StableHLO 中有六個集體運算：all_gather、all_reduce、 all_to_all、collective_broadcast、collective_permute和 reduce_scatter。這些運算會分割 StableHLO 程序的程序加入 StableHLO 程序群組，並在每個程序群組，獨立於其他程序群組。

在每個程序群組中，集體作業可能會引入一次同步處理作業障礙進一步正式，例如：也就是何時包括處理程序如何進入此障礙以及沒有的話會發生的情況 (#484)。

如果處理序群組涉及跨分區通訊，即分區 ID 不同的程序群組中的程序，然後再執行就必須為集體敵人提供一個頻道 si64 類型的正 channel_id。不需要跨備用資源通訊頻道。

集體運算執行的運算作業專屬於個別作業以及上述個別運算部分的說明。不過，策略程序網格會在這些作業之間共用並且會在本節中加以說明更正式的 StableHLO 支援並遵循四種策略

cross_replica

只有跨備用資源通訊會在每個程序群組內發生。這個策略會採用 replica_groups，也就是備用資源 ID 清單，以及運算作業 partition_ids由「replica_groups」的笛卡兒乘積。replica_groups 必須包含不重複的元素，且涵蓋所有 replica_ids。更正式，使用 Python 語法：

def cross_replica(replica_groups: List[List[ReplicaId]]) -> List[List[ProcessId]]:
  for replica_group in replica_groups:
    for partition_id in partition_ids:
      process_group = []
      for replica_id in replica_group:
        process_group.append((replica_id, partition_id))
      yield process_group

以 replica_groups = [[0, 1], [2, 3]] 和 num_partitions = 2 為例， cross_replica會產生 [[(0, 0), (1, 0)], [(0, 1), (1, 1)], [(2, 0), (3, 0)], [(2, 1), (3, 1)]]。

cross_partition

每個程序群組內只會發生跨分區通訊。這個策略會採用 partition_groups，也就是分區 ID 的清單，以及會計算 replica_ids 的 partition_groups 笛卡兒乘積。 partition_groups 必須包含不重複的元素，且涵蓋所有 partition_ids。更正式的 Python 語法：

def cross_partition(partition_groups: List[List[PartitionId]]) -> List[List[ProcessId]]:
  for partition_group in partition_groups:
    for replica_id in replica_ids:
      process_group = []
      for partition_id in partition_group:
        process_group.append((replica_id, partition_id))
      yield process_group

以 partition_groups = [[0, 1]] 和 num_replicas = 4 為例， cross_partition會產生 [[(0, 0), (0, 1)], [(1, 0), (1, 1)], [(2, 0), (2, 1)], [(3, 0), (3, 1)]]。

cross_replica_and_partition

跨備用資源和跨分區通訊可能都會在程序群組這項策略需要 replica_groups - 一份清單備用資源 ID - 並按下列查詢計算每個 replica_group 的笛卡兒乘積： partition_ids。replica_groups 必須包含專屬元素，且必須涵蓋所有元素 replica_ids。更正式的 Python 語法：

def cross_replica_and_partition(replica_groups: List[List[ReplicaId]]) -> List[List[ProcessId]]:
  for replica_group in replica_groups:
    process_group = []
    for partition_id in partition_ids:
      for replica_id in replica_group:
        process_group.append((replica_id, partition_id))
    yield process_group

以 replica_groups = [[0, 1], [2, 3]] 和 num_partitions = 2 為例， cross_replica_and_partition會產生 [[(0, 0), (1, 0), (0, 1), (1, 1)], [(2, 0), (3, 0), (2, 1), (3, 1)]]。

flattened_ids

這項策略採用 flattened_id_groups - 「扁平化」清單程序 ID 格式為 replica_id * num_partitions + partition_id，以及並轉換為程序 ID「flattened_id_groups」必須有專屬元素並涵蓋所有 process_ids。更正式的 Python 語法：

def flattened_ids(flattened_id_groups: List[List[ui32]]) -> List[List[ProcessId]]:
  for flattened_id_group in flattened_id_groups:
    process_group = []
    for flattened_id in flattened_id_group:
      replica_id = flattened_id // num_partitions
      partition_id = flattened_id % num_partitions
      process_group.append((replica_id, partition_id))
    yield process_group

以 flattened_id_groups = [[0, 1, 2, 3], [4, 5, 6, 7]] 為例， num_replicas = 4 和 num_partitions = 2 和 flattened_ids 會產生 [[(0, 0), (0, 1), (1, 0), (1, 1)], [(2, 0), (2, 1), (3, 0), (3, 1)]]。

準確率

StableHLO 目前無法對數值準確性提供保證，但未來可能會改變 (#1156)。

量化運算的執行語意

量化 StableHLO 運算的解釋可能因硬體需求和功能舉例來說，某些硬體可能會選擇使用「去量化、執行浮點值」來解釋量化運算最後是量化」策略。有些則可能將以整數計算運算因此，系統會解讀量化 StableHLO 作業完全取決於。混合量化的解釋 (#1575) 應根據如規格中所述 (透過 1792)。

錯誤

StableHLO 程式經過多種限制個別作業，這類作業在執行時間之前會將許多錯誤類別排除。不過，錯誤條件仍然可能出現，例如輸出至整數溢位範圍外的存取等等。除非有明確說明，否則所有這些錯誤會導致在實作定義的行為上出現，不過這可能會改變 (#1157)。

浮點例外狀況

除了這項規則之外，StableHLO 程式中的浮點例外狀況採取定義明確的行為導致例外狀況均由 IEEE-754 標準 (無效作業、除以零、溢位、反向溢位或不完全例外狀況) 會產生預設結果 (如標準中的定義) 和在不引發對應狀態旗標的情況下繼續執行；類似來自標準的 raiseNoFlag 例外狀況處理。非標準例外狀況運算 (例如複雜算術和特定半型函數) 您會瞭解自己的解決方案

形狀不符

StableHLO 支援動態形狀的張量。然而，形狀必須符合以下規範：否則就會成為未定義的行為。StableHLO 未明確提供運算，以斷言張量在執行階段具有指定形狀。產生正確的程式碼是製作者的責任。

以下提供特定範例的程式有效。不過在執行階段 %arg0 和 %arg1 的確切形狀必須相同，否則，程式的行為未定義：

func.func @foo(%arg0: tensor<?xi32>, %arg1: tensor<?xi32>) -> tensor<?xi32> {
    %0 = stablehlo.add %arg0, %arg1 : tensor<?xi32>
    return %0 : tensor<?xi32>
}

Notation

為說明語法，本文件使用修改後的 ISO 變種版本 (EBNF) 語法 (ISO/IEC 14977:1996)、 Wikipedia)、但有兩個修改項目：1) 規則是使用 ::= 定義，而非 =。

2) 串連是以 juxtaposition 表示，而非 ,。

用於說明語意 (也就是在「Types」、「常數」和「Ops」區段中)，我們使用的公式是以 Python 語法為基礎，並可使用以簡要表示陣列作業，如下所述。這樣很好找出小型程式碼片段，但在極少數的情況下我們需要使用一律明確引入的基本 Python 語法。

公式

讓我們根據 dot_general 的範例，探索公式的運作方式規格。這項作業的其中一個限制如下所示： dim(lhs, lhs_batching_dimensions...) = dim(rhs, rhs_batching_dimensions...)。

這個公式使用的名稱有兩個來源：1) 全域函式。例如：dim、2) 對應程式元素的成員定義，例如 lhs、lhs_batching_dimensions、rhs 和 rhs_batching_dimensions 輸入 dot_general 的區段。

如前所述，這個公式的語法採用 Python 語言，簡潔明瞭的擴充功能。為了理解公式轉換為基本的 Python 語法

答：在這些公式中，我們使用 = 表示相等，因此第一個步驟取得 Python 語法後，系統會將 = 替換成 ==，如下所示： dim(lhs, lhs_batching_dimensions...) == dim(rhs, rhs_batching_dimensions...)。

B) 此外，這些公式支援代表純量運算式的刪節號 (...) 轉換為 TensorFlow 運算式簡單來說，f(xs...) 大約代表「每個張量 xs 中的純量 x，計算純量 f(x)，然後傳回全部值並以張量結果的形式整合這些純量結果」。在基本的 Python 語法中範例公式會變成： [dim(lhs, dim1) for dim1 in lhs_batching_dimensions] == [dim(rhs, dim2) for dim2 in rhs_batching_dimensions]。

多虧了刪節號一般來說，在個別純量但在某些棘手的狀況下，半形式的半形式語法在 start_indices[bi0, ..., :, ..., biN] 公式中可能會類似選取 gather 規格簡而言之，我們不會提供將這類語法轉譯為香草 Python 的確切正式語氣希望這些內容仍然能輕易理解 (視個案情況而定)。如果您認為某些公式不透明，請與我們聯絡，我們會盡可能嘗試

此外，您也會發現公式使用了刪節號來展開所有類型的清單，包括張量、張量清單 (例如張量等資料。這是我們不提供確切的正式語氣 (例如清單不屬於 StableHLO 型別系統)；以及而非只仰賴直覺易懂

C) 我們最後採用的一項值得注意的標示是廣播。雖然 StableHLO 運算集不支援隱式廣播，我們也提供精簡的服務簡單來說，如果是純量用於預期張量、傳送純量符合預期的形狀

如要繼續執行 dot_general 範例，請參考另一項限制： 0 <= lhs_batching_dimensions < rank(lhs)。如 dot_general 中的定義指定，lhs_batching_dimensions 是張量，但 0 和 rank(lhs) 是純量。套用隱式廣播功能後，公式就會改為 [0, ..., 0] <= lhs_batching_dimensions < [rank(lhs), ..., rank(lhs)]。

如果套用至特定 dot_general 運算，這個公式求出符記的張量。使用公式做為限制條件時，如果公式計算結果為 true 或結果張量 ( 只有 true 元素。

名稱

在公式中，詞法範圍包括：1) 全域函式、2) 成員定義

3) 當地定義。以下提供全域函式清單。清單元素定義的部分取決於標記法的程式元素套用於：

至於作業，成員定義會包含「輸入內容」中引入的名稱和「輸出」專區。
其他部分，成員定義包括以對應的 EBNF 非終端命名。大部分的系統會透過轉換用於蛇形的非終端名稱 (例如 IntegerLiteral => integer_literal)，但在過程中有時會使用縮寫 (例如 QuantizationStorageType =>storage_type)，在這種情況下，名稱是採用類似「輸入內容」的方式/「輸出」作業中的區段規格。
此外，成員定義一律包含 self 來表示對應的程式元素

值

計算公式時，公式會處理下列類型的值： 1) Value (實際值，例如：dense<[[1, 2], [3, 4]]> : tensor<2x2xi32>；總是知道他們的類型) 2) Placeholder (未來值，例如 lhs、rhs 或 result；其實際值未知的值，只有其類型)。 3) Type (「類型」部分所定義的類型)、 4) Function (在「函式」一節中定義的全域函式)。

依情境而定，名稱可能會參照不同的值。更多內容具體而言，「語義」運算 (以及其他程式的同等程式碼) 元素) 定義執行階段邏輯，因此所有輸入內容均以 Value 形式提供。「限制」ops (及對等的) 一節會定義「compile-time」也就是通常會在執行階段前執行的內容因此只有常數輸入可以做為 Value 和其他輸入內容僅以 Placeholder 格式提供。

名稱	「語義」	在「限制」中
全域函式	`Function`	`Function`
常數輸入	`Value`	`Value`
非常數輸入	`Value`	`Placeholder`
輸出	`Value`	`Placeholder`
當地定義	取決於定義	取決於定義

假設 transpose 作業範例：

%result = "stablehlo.transpose"(%operand) {
  permutation = dense<[2, 1, 0]> : tensor<3xi64>
} : (tensor<2x3x2xi32>) -> tensor<2x3x2xi32>

就這項作業而言，permutation 是常數，因此能以 Value 的形式提供語意和限制中的統一編號相對地，operand 和 result 是可做為語意的 Value，但僅限於限制中的 Placeholder。

函式

類型建構

沒有任何函式可用來建構類型。而是直接通常較精簡例如： (tensor<E>, tensor<E>) -> (tensor<E>)而非 function_type( [tensor_type([], E), tensor_type([], E)], [tensor_type([], E)])。

類型的函式

element_type 是在張量類型和量化張量類型上定義，分別會傳回 TensorElementType 或 QuantizedTensorElementType 對應的 TensorType 或 QuantizedTensorType 部分。

def element_type(x: Value | Placeholder | Type):
 if type(x) == TensorType:
    return tensor_element_type(x)
  if type(x) == QuantizedTensorType:
    return quantized_tensor_element_type(x)
  if type(x) is not Type:
    return element_type(type(x))

is_per_axis_quantized(x: Value | Placeholder | Type) -> Value是捷徑 (is_quantized(x) and quantization_dimension(x) is not None)。
is_per_tensor_quantized(x: Value | Placeholder | Type) -> Value是 is_quantized(x) and quantization_dimension(x) is None 的快速鍵。
is_promotable(x: Type, y: Type) -> bool 會檢查 x 類型是否可升級輸入 y。當 x 和 y 為 QuantizedTensorElementType 時，代表促銷活動只會套用到 storage_type。這項促銷活動的特定版本為目前用於縮減運算作業 (請參閱 RFC)。

def is_promotable(x: Type, y: Type) -> Value:
  is_same_type = (is_bool(x) and is_bool(y)) or
    (is_integer(x) and is_integer(y)) or (is_float(x) and is_float(y)) or
    (is_complex(x) and is_complex(y)) or
    (is_quantized(x) and is_quantized(y) and expressed_type(x) = expressed_type(y))

  if is_same_type == False:
    return False

  if is_integer(x) or is_float(x):
    return bitwidth(x) <= bitwidth(y)

  if is_complex(x):
    return bitwidth(element_type(x)) <= bitwidth(element_type(y))

  if is_quantized(x):
    return bitwidth(storage_type(x)) <= bitwidth(storage_type(y))

  return false

is_quantized(x: Value | Placeholder | Type) -> Value是以下項目的捷徑： is_quantized_tensor_element_type(x)。
is_type_name(x: Value | Placeholder | Type) -> Value。適用於所有使用者。舉例來說，如果 x 是 FloatType，is_float(x) 會傳回 true。如果 x 是值或預留位置，此函式就會是以下項目的捷徑： is_type_name(type(x))。
max_value(x: Type) -> Value 會傳回 TensorElementType。如果 x 不是 TensorElementType，就會傳回 None。
min_value(x: Type) -> Value 會傳回最小可能值 TensorElementType。如果 x 不是 TensorElementType，就會傳回 None。
member_name(x: Value | Placeholder | Type) -> Any。所有成員均可使用所有類型的定義 member_name。例如：tensor_element_type(x) 會傳回對應 TensorType 的 TensorElementType 部分。如果 x 是值或預留位置，此函式就會是以下項目的捷徑： member_name(type(x))。如果 x 不是具有適當成員的類型，或這類型別的值或預留位置會傳回 None。
is_empty_algorithm(*args: Type) 會檢查是否已設定所有點號演算法欄位至 None。因為點點演算法已定義實作方式因此指定預設值會不正確。

價值結構

operation_name(*xs: Value | Type) -> Value。適用於所有作業。例如，add(lhs, rhs) 採用 lhs 和 rhs 這兩個張量值，會傳回使用這些輸入內容評估 add 運算的輸出內容。對於某些作業，例如broadcast_in_dim，輸出內容的類型為「load-bearing」，亦即評估作業時需要。在本例中會將這些類型做為引數

值函式

所有 Python 的運算子和函式都可供使用。例如：兩者皆是訂閱和切割 Python 的註解可用於為張量 (量化張量) 建立索引和元組
to_destination_type(x: Value, destination_type: Type) -> Value 定義為張量，x則根據 type(x) 和 destination_type，如下所示：

def to_destination_type(x: Value, destination_type: Type) -> Value:
  if type(x) == destination_type:
    return x

  if is_quantized(destination_type):
    if is_quantized(type(x)):
      return quantize(x, destination_type)
    assert is_float(type(x))
    return quantize(x, destination_type)

  if is_quantized(type(x)):
    assert destination_type = expressed_type(type(x))
    return dequantize(type(x))

  return convert(x, destination_type)

開始討論合併 convert、uniform_quantize 和 uniform_dequantize 作業 (#1576)。合併後，我們就不需要上述函式，也能使用作業名稱 convert。

is_nan(x: Value) -> Value 已在張量上定義，如果true x 的所有元素都是 NaN 或 false。如果 x 不是張量，會傳回 None。
is_sorted(x: Value) -> Value 已在張量上定義，如果true x 的元素會依據遞增排序，以遞增順序排列其索引的字母順序排列，否則為 false。如果 x 不是 Tensoror，會傳回 None。
is_unique(x: Value) -> Value 已在張量上定義，如果 x 則會傳回 true 沒有重複的元素，否則就不會有 false。如果 x 不是張量，會傳回 None。
已為所有成員定義定義「member_name(x: Value) -> Any」所有值的 member_name。例如，real_part(x) 會傳回 RealPart 則會成為對應 ComplexConstant 的一部分。如果 x 不是適當成員，會傳回 None。
same(x: Value) -> Value 已在張量上定義，如果true x 的元素會彼此相等，否則會等於 false。如果張量沒有元素，計為「所有元素均相等」，亦即函式會傳回 true。如果 x 不是張量，會傳回 None。
split(x: Value, num_results: Value, axis: Value) -> Value 定義為張量，並沿著 axis 軸傳回 x 的 num_results 配量。如果 x 不是張量或 dim(x, axis) % num_results != 0，就會傳回 None。
在字串中定義了 is_defined_in_parent_scope(x: Value) -> Value 如果 x 是相同範圍中定義的函式名稱，則會傳回 true 做為相關運算的父項函式
is_namespaced_op_name(x: Value) -> Value 是在字串中定義，然後傳回如果 x 是有效的運算名稱，則 true 會遵循下列一般運算式：[a-zA-Z][a-zA-Z0-9_]*([.][a-zA-Z0-9_$]+)+

形狀計算

axes(x: Value | Placeholder | Type) -> Value是以下項目的捷徑： range(rank(x))。
dim(x: Value | Placeholder | Type, axis: Value) -> Value是以下項目的捷徑： shape(x)[axis]。
dims(x: Value | Placeholder | Type, axes: List) -> List是以下項目的捷徑： list(map(lambda axis: dim(x, axis), axes))。
index_space(x: Value | Placeholder | Type) -> Value 已定義於張量然後針對已排序的對應 TensorType 傳回 size(x) 索引遞增順序，例如：[0, ..., 0]、[0, ..., 1]、...、 shape(x) - 1。如果 x 不是張量類型、量化張量類型或值或其中一種類型的預留位置，會傳回 None。
rank(x: Value | Placeholder | Type) -> Value是以下項目的捷徑： size(shape(x))。
shape(x: Value | Placeholder | Type) -> Value 定義於「函式」類型」透過 member_name 建立版面
size(x: Value | Placeholder | Type) -> Value是以下項目的捷徑： reduce(lambda x, y: x * y, shape(x))。

量化運算

def baseline_element_type(x: Value | Placeholder | Type) -> Type是 element_type(baseline_type(x)) 的快速鍵。
baseline_type 是在張量類型和量化張量類型上定義，會轉換為「基準」，也就是形狀相同但元素類型的量化參數重設為預設值。這是可做為比較張量和量化張量這些都是常需要的如果是量化類型，這會啟用例如不比較量化參數，也就是 shape storage_type、expressed_type、storage_min、storage_max和 quantization_dimension (適用於每個軸的量化類型) 必須全部相符，但 scales和 zero points 可能不同。

def baseline_type(x: Value | Placeholder | Type) -> Type:
  if type(x) == TensorType:
    return x
  if type(x) == QuantizedTensorType:
    element_type = quantized_tensor_element_type(x)
    baseline_element_type = QuantizedTensorElementType(
      storage_type = storage_type(element_type),
      storage_min = storage_min(element_type),
      storage_max = storage_max(element_type),
      expressed_type = expressed_type(element_type),
      quantization_dimension = quantization_dimension(element_type),
      scales = [constant(1.0, expressed_type(element_type))] * dim(x, quantization_dimension(element_type)),
      zero_points = [constant(0, storage_type(element_type))] * dim(x, quantization_dimension(element_type)))
    return QuantizedTensorType(shape(x), baseline_element_type)
  if type(x) is not Type:
    return baseline_element_type(type(x))

dequantize 是在量化張量類型上定義，然後轉換成浮點張類型方法是將量化元素，用來表示儲存體類型的整數值以零點和比例表示的表示型別的浮點值與量化元素類型建立關聯

def compute_zero_points(quantized_type, result_type):
  if is_per_tensor_quantized(quantized_type):
    return broadcast_in_dim(constant(zero_point(quantized_type), storage_type(quantized_type)), [], result_type)
  if is_per_axis_quantized(quantized_type):
    for i in index_space(result_type):
      d = quantization_dimension(quantized_type)
      zero_points[i] = zero_points(quantized_type)[i[d]]
    return zero_points

def compute_scales(quantized_type, result_type):
  if is_per_tensor_quantized(quantized_type):
    return broadcast_in_dim(constant(scale(quantized_type), expressed_type(quantized_type)), [],
            type(result_type))
  if is_per_axis_quantized(quantized_type):
    for i in index_space(result_type):
      d = quantization_dimension(quantized_type)
      scales[i] = scales(quantized_type)[i[d]]
    return scales

def dequantize(x: Value) -> Value:
  assert is_quantized(x)
  x_storage = bitcast_convert(x, storage_type(x))
  x_storage_sub = x_storage - compute_zero_points(type(x), type(x_storage))
  x_expressed_sub = convert(x_storage_sub, expressed_type(x))
  return x_expressed_sub * compute_scales(type(x), type(x_expressed_sub))

quantize 是在浮點張量類型上定義，然後轉換成量化張量類型方法是透過轉換浮點值轉換為儲存體類型的對應整數值建立與量化元素類型相關聯的零點和比例

def quantize(x: Value, result_type: Type) -> Value:
  assert is_float(x) and is_quantized(result_type)
  zero_points = compute_zero_points(result_type, TensorType(shape(x), storage_type(result_type)))
  converted_zero_points = convert(zero_points, expressed_type(result_type))
  converted_min = convert(storage_min(result_type), expressed_type(result_type))
  converted_max = convert(storage_max(result_type), expressed_type(result_type))

  x_scaled = x / compute_scales(result_type, type(x))
  x_scaled_add_zp = x_scaled + converted_zero_points
  x_clamped = clamp(converted_min, x_scaled_add_zp, converted_max)
  x_rounded = round_nearest_even(x_clamped)
  return convert(x_rounded, result_type)

dequantize_op_quantize 的用途是指定對元素進行元素的運算量化張量會去量化，也就是將量化元素轉換成表示型別，然後執行運算，然後量化，即依照儲存空間類型傳回結果目前這個函式適用於個別張量量化系統正在處理個別軸量化程序 (#1574)。

def dequantize_op_quantize(op, *inputs_and_output_type):
  inputs = inputs_and_output_type[:-1]
  output_type = inputs_and_output_type[-1]

  float_inputs = map(dequantize, inputs)
  float_result = op(*float_inputs)
  return quantize(float_result, output_type)

def dequantize_batch_norm_grad_or_training_quantize(op, *inputs_and_output_types):
  inputs = inputs_and_output_type[:-3]
  float_inputs = map(dequantize, inputs)
  float_results = op(*float_inputs)
  return map(quantize, float_results, inputs_and_output_type[-3:])

def dequantize_compare(lhs, rhs, comparison_direction):
  float_lhs = dequantize(lhs)
  float_rhs = dequantize(rhs)
  return compare(float_lhs, float_rhs, comparison_direction, FLOAT)

def dequantize_select_quantize(pred, on_true, on_false, output_type):
  float_on_true = dequantize(on_true)
  float_on_false = dequantize(on_false)
  float_result = select(pred, float_on_true, float_on_false)
  return quantize(float_result, output_type)

hybrid_dequantize_then_op 可用來指定以下項目的純權重量化混合型運算，接受浮點值和量化型別的 rh。這項服務將量化輸入內容去量化為表達形式的輸入內容，並執行運算作業以浮點值為單位浮點值及表示的量化 rh 類型張量必須相同

def hybrid_dequantize_then_op(op, lhs, rhs):
  assert(is_float(lhs) and is_quantized(rhs) and element_type(lhs) == expressed_type(rhs))
  return op(lhs, dequantize(rhs))

格線運算

cross_partition(replica_groups: Value) -> Value。查看「cross_replica」。
cross_replica(replica_groups: Value) -> Value。查看「cross_replica」。
cross_replica_and_partition(replica_groups: Value) -> Value。詳情請參閱 "cross_replica_and_partition"。
flattened_ids(replica_groups: Value) -> Value。查看「flattened_ids」。

動態

StableHLO 值可包含動態維度大小，例如：tensor<?xi64>。不過，StableHLO 值不可具有維度的動態數量 (未排名) 行使性，例如tensor<*xi64>)。運算元和結果可以使用動態大小。限制會是靜態驗證，否則就會延遲到執行階段不相符會導致未定義的行為。請查看以下範例。

單項元素運算的形狀不符

請考慮使用下列玩具計畫：

func.func @foo(%arg0: tensor<?xf64>) {
  %0 = stablehlo.abs %arg0 : (tensor<?xf64>) -> tensor<2xf64>
  return
}

這類程式並不常見，因為我們通常不知道但輸入內容的形狀則不同不過，這仍是有效的 StableHLO 計畫。您無法在這裡靜態驗證 abs 作業。因為運算元的確切形狀不明。不過，形狀確定相容，且可以靜態檢查：? 可能會設為 2，且沒有問題。不過，? 可以該值也會變成其他整數，此時行為是未定義的。

請注意，如果結果中的維度大小是動態的，則未定義的行為的確，並沒有所謂的「預期」因此不可包含不相符。

二進位元素運算的形狀不符

請考慮使用下列玩具計畫：

func.func @foo(%arg0: tensor<?xf64>, %arg1: tensor<?xf64>) {
  %0 = stablehlo.add %arg0, %arg0 : (tensor<?xf64>, tensor<?xf64>) -> tensor<?xf64>
  return
}

當談到二元元素運算時，輸入的形狀和在執行階段，所有結果都必須同意在編譯期間，靜態尺寸必須相等如果沒有，這兩個元件就只需要相容如果輸入內容中的「任何」維度是動態的，就表示有可能未定義因為動態大小可能會與對應的其他運算元的大小 (例如靜態或動態大小)。如果所有輸入內容靜態值，那麼無論結果是動態的，還是不具動態性，系統會靜態檢查已知維度，動態維度不會作出任何限制

運算輸出形狀做為運算元的運算形狀不符

請考慮使用下列玩具計畫：

func.func @foo(%arg0: tensor<2xi32>) {
  %0 = stablehlo.dynamic_iota %arg0, dim = 0 : (tensor<2xi32>) -> tensor<3x4xi64>
  return
}

執行階段的形狀運算元值必須與結果形狀相符。否則行為將處於未定義狀態也就是說，在執行階段 %arg0 中，必須具有值 dense<[3, 4]> : tensor<2xi32>。如果形狀運算元是常數，這個結果可以透過靜態方式驗證如果結果完全是動態的，不能是空的。