Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

StableHLO 規格

StableHLO 是機器學習 (ML) 模型中高階作業 (HLO) 的作業集。StableHLO 可做為不同機器學習架構和機器學習編譯器之間的移植層：產生 StableHLO 程式的機器學習架構，與使用 StableHLO 程式的機器學習編譯器相容。

我們的目標是簡化及加速機器學習開發作業，在各種機器學習架構 (例如 TensorFlow、JAX 和 PyTorch) 和機器學習編譯器 (例如 XLA 和 IREE) 之間建立更多互通性。為此，本文件提供 StableHLO 程式設計語言的規格。

這項規格包含三個主要部分。首先，「程式」部分說明 StableHLO 程式的結構，這類程式由 StableHLO 函式組成，而函式本身則由 StableHLO 作業組成。在該結構中，「Ops」部分會指定個別作業的語意。「執行」部分提供在程式中一起執行的所有作業語意。最後，「符號」一節會討論整個規格中使用的符號。

如要查看先前 StableHLO 版本的規格，請開啟感興趣的已標記版本的存放區。例如 StableHLO v0.19.0 規格。如要查看 StableHLO 各個次要版本升級時發生的變更，請參閱 VhloDialect.td 中的版本記錄。

程式

Program ::= {Func}

StableHLO 程式包含任意數量的 StableHLO 函式。以下是範例程式，其中包含 1 個輸出和 3 個輸入 (%image、%weights 和 %bias) 的 @main 函式。函式主體有 6 個運算。

func.func @main(
  %image: tensor<28x28xf32>,
  %weights: tensor<784x10xf32>,
  %bias: tensor<1x10xf32>
) -> tensor<1x10xf32> {
  %0 = "stablehlo.reshape"(%image) : (tensor<28x28xf32>) -> tensor<1x784xf32>
  %1 = "stablehlo.dot"(%0, %weights) : (tensor<1x784xf32>, tensor<784x10xf32>) -> tensor<1x10xf32>
  %2 = "stablehlo.add"(%1, %bias) : (tensor<1x10xf32>, tensor<1x10xf32>) -> tensor<1x10xf32>
  %3 = "stablehlo.constant"() {value = dense<0.0> : tensor<1x10xf32>} : () -> tensor<1x10xf32>
  %4 = "stablehlo.maximum"(%2, %3) : (tensor<1x10xf32>, tensor<1x10xf32>) -> tensor<1x10xf32>
  "func.return"(%4): (tensor<1x10xf32>) -> ()
}

函式

Func        ::= 'func' '.' 'func' FuncId FuncInputs FuncOutputs '{' FuncBody '}'
FuncInputs  ::= '(' [FuncInput {',' FuncInput}] `)`
FuncInput   ::= ValueId ':' ValueType
FuncOutputs ::= ['->' FuncOutput, {',' FuncOutput}]
FuncOutput  ::= ValueType
FuncBody    ::= {Op}

StableHLO 函式 (也稱為「已命名函式」) 具有 ID 識別碼、輸入/輸出內容和主體。我們計畫在日後為函式導入其他中繼資料，以提升與 HLO 的相容性 (#425、#626、#740、#744)。

ID

FuncId  ::= '@' letter {letter | digit}
ValueId ::= '%' digit {digit}
          | '%' letter {letter | digit}
letter  ::= 'a' | ... | 'z' | 'A' | ... | 'Z' | '_'
digit   ::= '0' | ... | '9'

StableHLO ID與許多程式設計語言中的 ID 類似，但有兩項特點：1) 所有 ID 都有符記，可區分不同類型的 ID；2) 值 ID 可以完全是數字，簡化 StableHLO 程式的產生作業。

類型

Type         ::= ValueType | NonValueType
ValueType    ::= TensorType | QuantizedTensorType | TokenType | TupleType | BufferType
NonValueType ::= TensorElementType | QuantizedTensorElementType | FunctionType | StringType

StableHLO 型別可分為值型別 (也稱為第一類型別)，代表 StableHLO 值，以及非值型別，用於描述其他程式元素。StableHLO 型別與許多程式設計語言中的型別相似，主要特點是 StableHLO 的領域專屬性質，因此會產生一些異常結果 (例如純量型別不是值型別)。

TensorType ::= 'tensor' '<' Shape TensorElementType '>'
Shape ::= {DimensionSize 'x'}
DimensionSize ::= digit {digit} | '?'

張量型別代表張量，也就是多維陣列。這類張量具有形狀和元素類型，其中形狀代表對應維度 (也稱為軸) 的非負或不明維度大小，並以遞增順序編號，從 0 到 R-1。維度的數量 R 稱為等級。舉例來說，tensor<2x3xf32> 是形狀為 2x3 且元素類型為 f32 的張量類型。這個陣列有兩個維度 (或兩個軸)，分別是第 0 個維度和第 1 個維度，大小為 2 和 3。排名第 2。

形狀可能部分或完全不明 (動態)，例如 tensor<?x2xf64> 部分不明，而 tensor<?x?xf64> 完全不明。動態維度大小會以 ? 表示。形狀無法取消排名。

未來我們計畫將張量型別擴展到維度大小和元素型別以外的範圍，例如納入版面配置 (#629) 和稀疏性 (#1078)。

QuantizedTensorType ::= 'tensor' '<' Shape QuantizedTensorElementType '>'
QuantizedTensorElementType ::= '!quant.uniform' '<'
                  QuantizationStorageType
                  ['<' QuantizationStorageMin ':' QuantizationStorageMax '>']
                  ':' QuantizationExpressedType
                  [':' QuantizationDimension]
                  ',' QuantizationParameters '>'
QuantizationStorageType ::= IntegerType
QuantizationStorageMin ::= IntegerLiteral
QuantizationStorageMax ::= IntegerLiteral
QuantizationExpressedType ::= FloatType
QuantizationDimension ::= IntegerLiteral
QuantizationParameters ::= QuantizationParameter
                         | '{' QuantizationParameter {',' QuantizationParameter} '}'
QuantizationParameter ::= QuantizationScale [':' QuantizationZeroPoint]
QuantizationScale ::= FloatLiteral
QuantizationZeroPoint ::= IntegerLiteral

名稱	類型	限制
`storage_type`	整數類型	(C1-C3)、(C8)
`storage_min`	整數常數	(C1)、(C3)、(C7)
`storage_max`	整數常數	(C2)、(C3)、(C7)
`expressed_type`	浮點類型	(C4)
`quantization_dimension`	選用整數常數	(C10-C12)
`scales`	可變數量的浮點常數	(C4-C6)、(C9)、(C10)、(C13)
`zero_points`	可變數量的整數常數	(C7-C9)

量化元素類型代表儲存類型的整數值，範圍從 storage_min 到 storage_max (含)，對應至表示類型的浮點值。對於指定整數值 i，對應的浮點值 f 可計算為 f = (i - zero_point) * scale，其中 scale 和 zero_point 稱為量化參數。storage_min 和 storage_max 在文法中為選用項目，但預設值分別為 min_value(storage_type) 和 max_value(storage_type)。量化元素類型有下列限制：

(C1) type(storage_min) = storage_type。
(C2) type(storage_max) = storage_type。
(C3) min_value(storage_type) <= storage_min < storage_max <= max_value(storage_type)。
(C4) type(scales...) = expressed_type。
(C5) 0 < scales。
(C6) is_finite(scales...)。
(C7) storage_min <= zero_points <= storage_max。
(C8) type(zero_points...) = storage_type。
(C9) size(scales) = size(zero_points)。
(C10) 如果 is_empty(quantization_dimension)，則 size(scales) = 1。
(C11) 0 <= quantization_dimension。

目前 QuantizationScale 是浮點常數，但大家對以整數為基礎的比例很感興趣，這類比例會以乘數和位移表示。我們計畫在不久的將來探討這項功能 (#1404)。

目前正在討論 QuantizationZeroPoint 的語意，包括型別、值，以及量化張量型別中是否只能有一個或可能有多個零點。根據這項討論的結果，未來零分相關規格可能會有所變更 (#1405)。

另一個持續進行的討論涉及 QuantizationStorageMin 和 QuantizationStorageMax 的語意，以判斷是否應對這些值和量化張量的值 (#1406) 施加任何限制。

最後，我們計畫探索如何表示不明比例和零點，這與我們計畫探索如何表示不明尺寸 (#1407) 類似。

量化張量類型代表具有量化元素的張量。這些張量與一般張量完全相同，但元素具有量化元素型別，而非一般元素型別。

在量化張量中，量化可以是每個張量，也就是整個張量有一個 scale 和 zero_point，也可以是每個軸，也就是有多個 scales 和 zero_points，每個特定維度 quantization_dimension 的切片各有一對。更正式地說，在具有每軸量化的張量 t 中，有 dim(t, quantization_dimension) 個 quantization_dimension 的切片：t[:, ..., 0, ..., :], t[:, ..., 1, ..., :] 等。第 i 個切片中的所有元素都會使用 scales[i] 和 zero_points[i] 做為量化參數。量化張量類型有下列限制：

如要進行張量量化：
- 沒有其他限制。
如要進行每個軸的量化：
- (C12) quantization_dimension < rank(self)。
- (C13) dim(self, quantization_dimension) = size(scales)。

TokenType ::= 'token'

符記類型代表符記，也就是某些作業產生及使用的不透明值。如「執行」一節所述，權杖可用於對作業強制執行順序。

TupleType ::= 'tuple' '<' TupleElementTypes '>'
TupleElementTypes ::= [ValueType {',' ValueType}]

緩衝區型別代表緩衝區。舉例來說，在 XLA 中，緩衝區是具有一致儲存空間的多維陣列。與張量型別類似，緩衝區型別也有形狀和元素型別，其中形狀代表對應維度 (也稱為軸) 的非負數或不明維度大小，並以遞增順序從 0 編號至 R-1。維度的數量 R 稱為階數。舉例來說，memref<2x3xf32> 是緩衝區類型，形狀為 2x3，元素類型為 f32。這個陣列有兩個維度 (或兩個軸)，分別是第 0 個維度和第 1 個維度，大小為 2 和 3。排名第 2。

緩衝區可使用 custom_call 至 CreateBuffer 或 Pin 分配，並透過 custom_call 至 Unpin 解除分配。只有 custom_call op 才能讀取及寫入緩衝區內的內容。詳情請參閱 custom_call。

元組型別代表元組，也就是異質清單。元組是舊版功能，僅用於與 HLO 相容。在 HLO 中，元組用於表示可變長度的輸入和輸出。在 StableHLO 中，系統原生支援可變輸入和輸出，且 StableHLO 中元組的唯一用途是全面表示 HLO ABI，例如 T、tuple<T> 和 tuple<tuple<T>> 可能會因特定實作項目而有實質差異。我們計畫在日後變更 HLO ABI，屆時或許就能從 StableHLO 中移除元組型別 (#598)。

TensorElementType ::= BooleanType | IntegerType | FloatType | ComplexType
BooleanType ::= 'i1'
IntegerType ::= SignedIntegerType | UnsignedIntegerType
SignedIntegerType ::= 'si2' | 'si4' | 'si8' | 'si16' | 'si32' | 'si64'
UnsignedIntegerType ::= 'ui2' | 'ui4' | 'ui8' | 'ui16' | 'ui32' | 'ui64'
FloatType ::= 'f4E2M1FN' | 'f6E2M3FN' | 'f6E3M2FN' | 'f8E3M4' | 'f8E4M3'
            | 'f8E4M3FN' | 'f8E4M3FNUZ' | 'f8E4M3B11FNUZ' | 'f8E5M2'
            | 'f8E5M2FNUZ' | 'f8E8M0FNU' | 'bf16' | 'f16' | 'f32' | 'f64'
TensorFloat32 ::= 'tf32'
ComplexType ::= 'complex' '<' ComplexElementType '>'
ComplexElementType ::= 'f32' | 'f64'

元素型別代表張量型別的元素。與許多程式設計語言不同，這些型別在 StableHLO 中並非第一類。也就是說，StableHLO 程式無法直接表示這些類型的值 (因此，以 0 維張量值表示 T 類型純量值是慣用做法)。tensor<T>

布林型別代表布林值 true 和 false。
整數型別可為帶正負號 (si) 或不帶正負號 (ui)，且具有其中一個支援的位元寬度 (2、4、8、16、32 或 64)。帶正負號的 siN 型別代表介於 -2^(N-1) 到 2^(N-1)-1 (含) 之間的整數值，不帶正負號的 uiN 型別則代表介於 0 到 2^N-1 (含) 之間的整數值。
浮點類型可以是下列其中一種：
- f8E3M4、f8E4M3 和 f8E5M2 8 位元浮點數，遵循 IEEE-754 慣例。
- f8E4M3FN 和 f8E5M2 類型分別對應至「深度學習的 FP8 格式」一文所述的 FP8 格式 E4M3 和 E5M2 編碼。
- f8E4M3FNUZ 和 f8E5M2FNUZ 型別，分別對應於「8-bit Numerical Formats for Deep Neural Networks」一文所述 FP8 格式的 E4M3 和 E5M2 編碼。
- f8E4M3B11FNUZ 類型，對應於「Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks」一文所述 FP8 格式的 E4M3 編碼。
- bf16 類型，對應於「bfloat16」格式，詳情請參閱「BFloat16：Cloud TPU 高效能的秘密」。
- f16、f32 和 f64 型別分別對應至 binary16 (「半精度」)、binary32 (「單精度」) 和 binary64 (「雙精度」) 格式，詳情請參閱 IEEE 754 標準。
- tf32 型別對應至 TensorFloat32 格式，且 StableHLO 僅支援部分功能。
- f4E2M1FN、f6E2M3FN、f6E3M2FN 和 f8E8M0FNU MX (微縮放) 類型，詳情請參閱 OCP 微縮放格式規格。
複數型別代表具有相同元素型別的實部和虛部的複數值。支援的複雜類型包括 complex<f32> (兩部分都是 f32 類型) 和 complex<f64> (兩部分都是 f64 類型)。

FunctionType ::= '(' InputTypes ')' '->' '(' OutputTypes ')'
InputTypes ::= [ValueType {',' ValueType}]
OutputTypes ::= [ValueType {',' ValueType}]

函式型別代表具名和匿名函式。這些函式具有輸入型別 (-> 左側的型別清單) 和輸出型別 (-> 右側的型別清單)。在許多程式設計語言中，函式型別都是第一類，但 StableHLO 並非如此。

StringType ::= 'string'

字串型別代表位元組序列。與許多程式設計語言不同，字串型別在 StableHLO 中不是第一類，只用於指定程式元素的靜態中繼資料。

作業

StableHLO 作業 (也稱為「作業」) 代表機器學習模型中的一組封閉式高階作業。如上所述，StableHLO 語法深受 MLIR 影響，這不一定是人體工學最佳替代方案，但可說是為了達成 StableHLO 目標 (在機器學習架構和機器學習編譯器之間建立更多互通性) 而採取的最佳做法。

Op            ::= [OpOutputs] OpName OpInputs ':' OpSignature
OpName        ::= '"' 'stablehlo' '.' OpMnemonic '"'
OpMnemonic    ::= 'abs' | 'add' | ...

StableHLO 作業 (也稱為「作業」) 具有名稱、輸入/輸出內容和簽章。名稱由 stablehlo. 前置字元和助記符組成，可唯一識別其中一個支援的作業。如需所有支援的作業完整清單，請參閱下文。

OpInputs        ::= OpInputValues OpInputFuncs OpInputAttrs
OpInputValues   ::= '(' [OpInputValue {',' OpInputValue}] ')'
OpInputValue    ::= ValueId
OpInputFuncs    ::= ['(' OpInputFunc {',' OpInputFunc} ')']
OpInputAttrs    ::= ['{' OpInputAttr {',' OpInputAttr} '}']
OpOutputs       ::= [OpOutput {',' OpOutput} '=']
OpOutput        ::= ValueId

運算會使用輸入並產生輸出。輸入內容可分為輸入值 (在執行期間計算)、輸入函式 (靜態提供，因為 StableHLO 函式不是一級值) 和輸入屬性 (也是靜態提供)。Op 消耗及產生的輸入和輸出內容種類取決於其助記符。舉例來說，add op 會耗用 2 個輸入值，並產生 1 個輸出值。相較之下，select_and_scatter op 會耗用 3 個輸入值、2 個輸入函式和 3 個輸入屬性。

OpInputFunc ::= '{' Unused FuncInputs ':' FuncBody '}'
Unused      ::= '^' digit {digit}
              | '^' letter {letter | digit}

輸入函式 (也稱為匿名函式) 與具名函式非常相似，但有以下例外：1) 輸入函式沒有 ID (因此稱為「匿名」)，2) 輸入函式不會宣告輸出型別 (輸出型別是從函式中的 return 作業推斷而來)。

輸入函式的語法包含目前未使用的部分 (請參閱上方的 Unused 產生)，這是為了與 MLIR 相容。在 MLIR 中，有更一般的「區域」概念，其中可包含多個透過跳躍作業連結在一起的「區塊」。這些區塊的 ID 對應至Unused正式環境，因此可以彼此區別。StableHLO 沒有跳躍作業，因此 MLIR 語法中對應的部分不會使用 (但仍存在)。

OpInputAttr      ::= OpInputAttrName '=' OpInputAttrValue
OpInputAttrName  ::= letter {letter | digit}
OpInputAttrValue ::= Constant

輸入屬性具有名稱和值，值是其中一個支援的常數。這是為節目元素指定靜態中繼資料的主要方式。舉例來說，concatenate op 會使用 dimension 屬性，指定要沿著哪個維度串連輸入值。同樣地，slice op 會使用 start_indices 和 limit_indices 等多個屬性，指定用於切片輸入值的界限。

目前，實際使用的 StableHLO 程式有時會包含本文未說明的屬性。我們計畫在日後將這些屬性併入 StableHLO opset，或禁止這些屬性出現在 StableHLO 程式中。以下列出這些屬性：

layout (#629)。
mhlo.frontend_attributes (#628)。
mhlo.sharding (#619)。
output_operand_aliases (#740)。
地點中繼資料 (#594)。

OpSignature ::= '(' [ValueType {',' ValueType}] ')' '->' '(' [ValueType {',' ValueType}] ')'

Op 簽章包含所有輸入值的型別 (-> 左側的型別清單)，以及所有輸出值的型別 (-> 右側的型別清單)。嚴格來說，輸入型別是多餘的，輸出型別也幾乎總是多餘的 (因為對於大多數 StableHLO 作業，輸出型別可以從輸入推斷)。不過，為了與 MLIR 相容，op 簽章刻意成為 StableHLO 語法的一部分。

以下是助記符為 select_and_scatter 的 op 範例。這個函式會耗用 3 個輸入值 (%operand、%source 和 %init_value)、2 個輸入函式和 3 個輸入屬性 (window_dimensions、window_strides 和 padding)。請注意，運算元的簽章只會包含輸入值的型別 (但不會包含內建的輸入函式和屬性型別)。

%result = "stablehlo.select_and_scatter"(%operand, %source, %init_value) ({
  ^bb0(%arg0: tensor<i32>, %arg1: tensor<i32>):
    %0 = "stablehlo.compare"(%arg0, %arg1) {
      comparison_direction = #stablehlo<comparison_direction GE>
    } : (tensor<i32>, tensor<i32>) -> tensor<i1>
    "stablehlo.return"(%0) : (tensor<i1>) -> ()
}, {
  ^bb0(%arg0: tensor<i32>, %arg1: tensor<i32>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i32>, tensor<i32>) -> tensor<i32>
    "stablehlo.return"(%0) : (tensor<i32>) -> ()
}) {
  window_dimensions = dense<[3, 1]> : tensor<2xi64>,
  window_strides = dense<[2, 1]> : tensor<2xi64>,
  padding = dense<[[0, 1], [0, 0]]> : tensor<2x2xi64>
} : (tensor<4x2xi32>, tensor<2x2xi32>, tensor<i32>) -> tensor<4x2xi32>

常數

Constant ::= BooleanConstant
           | IntegerConstant
           | FloatConstant
           | ComplexConstant
           | TensorConstant
           | QuantizedTensorConstant
           | StringConstant
           | EnumConstant

StableHLO 常數具有常值和型別，兩者共同代表 StableHLO 值。一般來說，型別是常數語法的一部分，但如果型別明確 (例如布林常數的型別明確為 i1，而整數常數可能有多種型別)，則不在此限。

BooleanConstant ::= BooleanLiteral
BooleanLiteral  ::= 'true' | 'false'

布林常數代表布林值 true 和 false。布林常數的型別為 i1。

IntegerConstant   ::= IntegerLiteral ':' IntegerType
IntegerLiteral    ::= ['-' | '+'] DecimalDigits
                    | ['-' | '+'] '0x' HexadecimalDigits
DecimalDigits     ::= decimalDigit {decimalDigit}
HexadecimalDigits ::= hexadecimalDigit {hexadecimalDigit}
decimalDigit      ::= '0' | ... | '9'
hexadecimalDigit  ::= decimalDigit | 'a' | ... | 'f' | 'A' | ... | 'F'

整數常數會透過使用十進位或十六進位表示法的字串，代表整數值。系統不支援其他進位制，例如二進位或八進位。整數常數有下列限制：

(C1) is_wellformed(integer_literal, integer_type)。

FloatConstant  ::= FloatLiteral ':' FloatType
FloatLiteral   ::= SignPart IntegerPart FractionalPart ScientificPart
                 | '0x' [HexadecimalDigits]
SignPart       ::= ['-' | '+']
IntegerPart    ::= DecimalDigits
FractionalPart ::= ['.' [DecimalDigits]]
ScientificPart ::= [('e' | 'E') ['-' | '+'] DecimalDigits]

浮點常數會透過使用十進位或科學記號的字串，表示浮點值。此外，十六進位標記法可用於直接指定相應型別浮點格式中的基礎位元。浮點常數有下列限制：

(C1) 如果使用非十六進位標記法，請使用 is_wellformed(float_literal, float_type)。
(C2) 如果使用十六進位標記法，請輸入 size(hexadecimal_digits) = num_bits(float_type) / 4。

ComplexConstant ::= ComplexLiteral ':' ComplexType
ComplexLiteral  ::= '(' RealPart ',' ImaginaryPart ')'
RealPart        ::= FloatLiteral
ImaginaryPart   ::= FloatLiteral

複數常數會使用實數部分 (在前) 和虛數部分 (在後) 的清單表示複數值。例如，(1.0, 0.0) : complex<f32> 代表 1.0 + 0.0i，(0.0, 1.0) : complex<f32> 代表 0.0 + 1.0i。這些部分在記憶體中的儲存順序取決於實作方式。複雜常數有下列限制：

(C1) is_wellformed(real_part, complex_element_type(complex_type))。
(C2) is_wellformed(imaginary_part, complex_element_type(complex_type))。

TensorConstant ::= TensorLiteral ':' TensorType
TensorLiteral  ::= 'dense' '<' (DenseLiteral | ElementLiteral) '>'
DenseLiteral   ::= DenseDimension | DenseElements
DenseDimension ::= '[' [DenseLiteral {',' DenseLiteral}] ']'
DenseElements  ::= [ElementLiteral {',' ElementLiteral}]
ElementLiteral ::= BooleanLiteral | IntegerLiteral | FloatLiteral | ComplexLiteral

張量常數會使用透過 NumPy 標記指定的巢狀清單，代表張量值。舉例來說，dense<[[1, 2, 3], [4, 5, 6]]> : tensor<2x3xi32> 代表張量值，索引至元素的對應如下：{0, 0} => 1、{0, 1} => 2、{0, 2} => 3、{1, 0} => 4、{1, 1} => 5、{1, 2} => 6。這些元素在記憶體中的儲存順序由實作定義。張量常數有下列限制：

(C1) has_syntax(tensor_literal, element_type(tensor_type))，其中：
- has_syntax(element_literal: Syntax, element_type: Type) = is_wellformed(element_literal, type)。
- has_syntax(tensor_literal: List, element_type: Type) = has_syntax(tensor_literal..., element_type)。
(C2) has_shape(tensor_literal, shape(tensor_type))，其中：
- has_shape(element_literal: Syntax, []) = true。
- has_shape(tensor_literal: List, shape: List) = size(tensor_literal) = shape[0] and has_shape(tensor_literal..., shape[1:])。
- 否則為 false。

QuantizedTensorConstant ::= QuantizedTensorLiteral ':' QuantizedTensorType
QuantizedTensorLiteral  ::= 'dense' '<' (DenseLiteral | ElementLiteral) '>'

量化張量常數會使用與張量常數相同的標記法表示量化張量值，且元素會指定為其儲存類型的常數。量化張量常數有下列限制：

(C1) has_syntax(quantized_tensor_literal, storage_type(quantized_tensor_type))。
(C2) has_shape(quantized_tensor_literal, shape(quantized_tensor_type))。

StringConstant  ::= StringLiteral
StringLiteral   ::= '"' {stringCharacter | escapeSequence} '"'
stringCharacter ::= all ASCII characters except '\00', '\01', ... '\1f' and '"'
escapeSequence  ::= '\' ('"' | '\' | 'n' | 't' | (hexadecimalDigit hexadecimalDigit))

字串常值是由使用 ASCII 字元和逸出序列指定的位元組組成。這些位元組與編碼無關，因此這些位元組的解讀方式由實作項目定義。字串常值會採用 string 型別。

作業數

ABS

語意

對 operand 張量執行元素級別的 abs 運算，並產生 result 張量。視元素類型而定，執行下列操作：

帶正負號整數：整數模數。
浮點數：IEEE-754 的 abs。
如果是複數，則為複數模數。
量化型別：dequantize_op_quantize(abs, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	有符號整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1-C2)

輸出內容

名稱	類型	限制
`result`	有符號整數或浮點類型張量，或每個張量的量化張量	(C1-C2)

限制

(C1) shape(result) = shape(operand)。
(C2) baseline_element_type(result) 的定義為：
- complex_element_type(element_type(operand)) (如果 is_complex(operand))。
- 其他情況則為 baseline_element_type(operand)。

範例

// %operand: [-2, 0, 2]
%result = "stablehlo.abs"(%operand) : (tensor<3xi32>) -> tensor<3xi32>
// %result: [2, 0, 2]

更多範例

add

語意

執行兩個張量 lhs 和 rhs 的元素加法，並產生 result 張量。視元素類型而定，執行下列操作：

布林值：邏輯 OR。
整數：整數加法。
浮點數：IEEE-754 的 addition。
複數：複數加法。
量化型別：dequantize_op_quantize(add, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	張量或量化張量	(C1-C6)
(I2)	`rhs`	張量或量化張量	(C1-C5)、(C7)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C1-C7)

限制

如果作業使用非量化張量：
- (C1) type(lhs) = type(rhs) = type(result)。
如果作業使用量化張量：
- (C2) is_quantized(lhs) and is_quantized(rhs) and is_quantized(result)。
- (C3) storage_type(lhs) = storage_type(rhs) = storage_type(result)。
- (C4) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C5) (is_per_axis_quantized(lhs) or is_per_axis_quantized(rhs)) = is_per_axis_quantized(result)。
- (C6) If is_per_axis_quantized(lhs), then quantization_dimension(lhs) = quantization_dimension(result).
- (C7) If is_per_axis_quantized(rhs), then quantization_dimension(rhs) = quantization_dimension(result).

範例

// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.add"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[6, 8], [10, 12]]

更多範例

after_all

語意

確保產生 inputs 的作業會在任何依附於 result 的作業之前執行。執行這項作業不會有任何作用，只是為了建立從 result 到 inputs 的資料依附元件。

輸入

標籤	名稱	類型
(I1)	`inputs`	可變數量的 `token`

輸出內容

名稱	類型
`result`	`token`

範例

// %input0: !stablehlo.token
// %input1: !stablehlo.token
%result = "stablehlo.after_all"(%input0, %input1) : (!stablehlo.token, !stablehlo.token) -> !stablehlo.token

更多範例

all_gather

語意

在 StableHLO 程序格的每個程序群組中，沿著 all_gather_dim 串連每個程序中的 operands 張量值，並產生 results 張量。

這項作業會將 StableHLO 程序格線分割成 process_groups，定義如下：

cross_replica(replica_groups) if channel_id <= 0 and use_global_device_ids = false。
cross_replica_and_partition(replica_groups) if channel_id > 0 and use_global_device_ids = false。
flattened_ids(replica_groups) if channel_id > 0 and use_global_device_ids = true。

接著，在每個 process_group 中執行下列操作：

operands...@receiver = [operand@sender for sender in process_group]，適用於所有receiver的process_group。
results...@process = concatenate(operands...@process, all_gather_dim)，適用於所有process的process_group。

輸入

標籤	名稱	類型	限制
(I1)	`operands`	可變數量的張量或每個張量的量化張量	(C1)、(C6)
(I2)	`all_gather_dim`	`si64` 類型的常數	(C1)、(C6)
(I3)	`replica_groups`	`si64` 類型的 2 維張量常數	(C2-C4)
(I4)	`channel_id`	`si64` 類型的常數	(C5)
(I5)	`use_global_device_ids`	`i1` 類型的常數	(C5)

輸出內容

名稱	類型	限制
`results`	可變數量的張量或每個張量的量化張量	(C6)

限制

(C1) 0 <= all_gather_dim < rank(operands...)。
(C2) is_unique(replica_groups)。
(C3) size(replica_groups) 的定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_replica_and_partition，則為 num_replicas。
- 如果使用 flattened_ids，則為 num_processes。
(C4) 0 <= replica_groups < size(replica_groups)。
(C5) If use_global_device_ids = true, then channel_id > 0.
(C6) type(results...) = type(operands...) 除外：
- dim(results..., all_gather_dim) = dim(operands..., all_gather_dim) * dim(process_groups, 1)。

範例

// num_replicas: 2
// num_partitions: 1
// %operand0@(0, 0): [[1, 2], [3, 4]]
// %operand0@(1, 0): [[5, 6], [7, 8]]
// %operand1@(0, 0): [[11, 12], [13, 14]]
// %operand1@(1, 0): [[15, 16], [17, 18]]
%result:2 = "stablehlo.all_gather"(%operand0, %operand1) {
  all_gather_dim = 1 : i64,
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>,
  // channel_id = 0
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
  // use_global_device_ids = false
} : (tensor<2x2xi64>, tensor<2x2xi64>) -> (tensor<2x4xi64>, tensor<2x4xi64>)
// %result0@(0, 0): [[1, 2, 5, 6], [3, 4, 7, 8]]
// %result0@(1, 0): [[1, 2, 5, 6], [3, 4, 7, 8]]
// %result1@(0, 0): [[11, 12, 15, 16], [13, 14, 17, 18]]
// %result1@(1, 0): [[11, 12, 15, 16], [13, 14, 17, 18]]

更多範例

all_reduce

語意

在 StableHLO 處理網格的每個處理群組中，將縮減函式 computation 套用至每個處理程序的 operands 張量值，並產生 results 張量。

這項作業會將 StableHLO 程序格線分割成 process_groups，定義如下：

cross_replica(replica_groups) if channel_id <= 0 and use_global_device_ids = false。
cross_replica_and_partition(replica_groups) if channel_id > 0 and use_global_device_ids = false。
flattened_ids(replica_groups) if channel_id > 0 and use_global_device_ids = true。

接著，在每個 process_group 中執行下列操作：

results...@process[result_index] = exec(schedule)，其中： schedule
- exec(node) = computation(exec(node.left), exec(node.right))。
- exec(leaf) = leaf.value。
schedule 是實作定義的二元樹，其中序遍歷為 to_destination_type(operands...@process_group...[result_index], type(func_inputs(computation)[0]))。

輸入

標籤	名稱	類型	限制
(I1)	`operands`	可變數量的張量或每個張量的量化張量	(C5)、(C6)
(I2)	`replica_groups`	`si64` 型別的一維張量常數可變引數數量	(C1-C3)
(I3)	`channel_id`	`si64` 類型的常數	(C4)
(I4)	`use_global_device_ids`	`i1` 類型的常數	(C4)
(I5)	`computation`	函式	(C5)

輸出內容

名稱	類型	限制
`results`	可變數量的張量或每個張量的量化張量	(C6-C7)

限制

(C1) is_unique(replica_groups)。
(C2) size(replica_groups) 的定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_replica_and_partition，則為 num_replicas。
- 如果使用 flattened_ids，則為 num_processes。
(C3) 0 <= replica_groups < size(replica_groups)。
(C4) 如果 use_global_device_ids = true，則 channel_id > 0。
(C5) computation 具有 (tensor<E>, tensor<E>) -> (tensor<E>) 類型，其中 is_promotable(element_type(operand), E)。
(C6) shape(results...) = shape(operands...)。
(C7) element_type(results...) = E。

範例

// num_replicas: 2
// num_partitions: 1
// %operand0@(0, 0): [1, 2, 3, 4]
// %operand0@(1, 0): [5, 6, 7, 8]
// %operand1@(0, 0): [9, 10, 11, 12]
// %operand1@(1, 0): [13, 14, 15, 16]
%result:2 = "stablehlo.all_reduce"(%operand0, %operand0) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>,
  // channel_id = 0
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
  // use_global_device_ids = false
} : (tensor<4xi64>, tensor<4xi64>) -> (tensor<4xi64>, tensor<4xi64>)
// %result0@(0, 0): [6, 8, 10, 12]
// %result0@(1, 0): [6, 8, 10, 12]
// %result1@(0, 0): [22, 24, 26, 28]
// %result1@(1, 0): [22, 24, 26, 28]

更多範例

all_to_all

語意

all_to_all

在 StableHLO 程序格的每個程序群組中，沿著 split_dimension 將 operands 張量的值分割成多個部分，在程序之間分散分割部分，沿著 concat_dimension 串連分散部分，並產生 results 張量。這項作業會將 StableHLO 程序格線分割成 process_groups，定義如下：

cross_replica(replica_groups) (如果 channel_id <= 0)。
cross_partition(replica_groups) (如果 channel_id > 0)。

接著，在每個 process_group 中執行下列操作：

split_parts...@sender = split(operands...@sender, split_count, split_dimension) 適用於「process_group」中的所有 sender。
scattered_parts...@receiver = [split_parts...@sender[receiver_index] for sender in process_group] (其中 receiver_index = process_group.index(receiver))。
results...@process = concatenate(scattered_parts...@process, concat_dimension)。

輸入

標籤	名稱	類型	限制
(I1)	`operands`	可變數量的張量或每個張量的量化張量	(C1-C3)、(C9)
(I2)	`split_dimension`	`si64` 類型的常數	(C1)、(C2)、(C9)
(I3)	`concat_dimension`	`si64` 類型的常數	(C3)、(C9)
(I4)	`split_count`	`si64` 類型的常數	(C2)、(C4)、(C8)、(C9)
(I5)	`replica_groups`	`si64` 類型的 2 維張量常數	(C5-C8)
(I6)	`channel_id`	`si64` 類型的常數

輸出內容

名稱	類型	限制
`results`	可變數量的張量或每個張量的量化張量	(C9)

限制

(C1) 0 <= split_dimension < rank(operands...)。
(C2) dim(operands..., split_dimension) % split_count = 0。
(C3) 0 <= concat_dimension < rank(operands...)。
(C4) 0 < split_count。
(C5) is_unique(replica_groups)。
(C6) size(replica_groups) 的定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_partition，則為 num_partitions。
(C7) 0 <= replica_groups < size(replica_groups)。
(C8) dim(replica_groups, 1) = split_count。
(C9) type(results...) = type(operands...)，但如果 split_dimension != concat_dimension：
- dim(results..., split_dimension) = dim(operands..., split_dimension) / split_count。
- dim(results..., concat_dimension) = dim(operands..., concat_dimension) * split_count。

範例

// num_replicas: 2
// num_partitions: 1
// %operand1@(0, 0): [[1, 2, 3, 4],
//                    [5, 6, 7, 8]]
// %operand1@(1, 0): [[9, 10, 11, 12],
//                    [13, 14, 15, 16]]
// %operand2@(0, 0): [[17, 18, 19, 20],
//                    [21, 22, 23, 24]]
// %operand2@(1, 0): [[25, 26, 27, 28],
//                    [29, 30, 31, 32]]
%result:2 = "stablehlo.all_to_all"(%operand1, %operand2) {
  split_dimension = 1 : i64,
  concat_dimension = 0 : i64,
  split_count = 2 : i64,
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>
  // channel_id = 0
} : (tensor<2x4xi64>, tensor<2x4xi64>) -> (tensor<4x2xi64>, tensor<4x2xi64>)
// %result#0@(0, 0): [[1, 2], [5, 6], [9, 10], [13, 14]]
// %result#0@(1, 0): [[3, 4], [7, 8], [11, 12], [15, 16]]
// %result#1@(0, 0): [[17, 18], [21, 22], [25, 26], [29, 30]]
// %result#1@(1, 0): [[19, 20], [23, 24], [27, 28], [31, 32]]

更多範例

和

語意

對兩個張量 lhs 和 rhs 執行元素層級的 AND 運算，並產生 result 張量。視元素類型而定，執行下列操作：

布林值：邏輯 AND。
如果是整數：位元 AND。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	布林值或整數類型的張量	(C1)
(I2)	`rhs`	布林值或整數類型的張量	(C1)

輸出內容

名稱	類型	限制
`result`	布林值或整數類型的張量	(C1)

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.and"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[1, 2], [3, 0]]

更多範例

atan2

語意

對 lhs 和 rhs 張量執行元素級別的 atan2 運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 atan2。
如果是複數，則為複數 atan2。
量化型別：dequantize_op_quantize(atan2, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)
(I2)	`rhs`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [0.0, 1.0, -1.0]
// %rhs: [0.0, 0.0, 0.0]
%result = "stablehlo.atan2"(%lhs, %rhs) : (tensor<3xf64>, tensor<3xf64>) -> tensor<3xf64>
// %result: [0.0, 1.57079637, -1.57079637] // [0.0, pi/2, -pi/2]

更多範例

batch_norm_grad

語意

計算 batch_norm_training 的多個輸入內容的梯度，從 grad_output 反向傳播，並產生 grad_operand、grad_scale 和 grad_offset 張量。更正式地來說，這項作業可以表示為使用 Python 語法分解為現有的 StableHLO 作業，如下所示：

def compute_sum(operand, feature_index):
  (sum,) = reduce(
      inputs=[operand],
      init_values=[constant(0, element_type(operand))],
      dimensions=[i for i in range(rank(operand)) if i != feature_index],
      body=lambda x, y: add(x, y))
  return sum

def compute_mean(operand, feature_index):
  sum = compute_sum(operand, feature_index)
  divisor = constant(size(operand) / dim(operand, feature_index),
                     element_type(operand))
  divisor_bcast = broadcast_in_dim(divisor, [], type(sum))
  return divide(sum, divisor_bcast)

def batch_norm_grad(operand, scale, mean, variance, grad_output, epsilon, feature_index):
  # Broadcast inputs to type(operand)
  scale_bcast = broadcast_in_dim(scale, [feature_index], type(operand))
  mean_bcast = broadcast_in_dim(mean, [feature_index], type(operand))
  variance_bcast = broadcast_in_dim(variance, [feature_index], type(operand))
  epsilon_bcast = broadcast_in_dim(constant(epsilon, element_type(operand)), [],
                                   type(operand))

  # Perform normalization using the provided `mean` and `variance`
  # Intermediate values will be useful for computing gradients
  centered_operand = subtract(operand, mean_bcast)
  stddev = sqrt(add(variance_bcast, epsilon_bcast))
  normalized_operand = divide(centered_operand, stddev)

  # Use the implementation from batchnorm_expander.cc in XLA
  # Temporary variables have exactly the same names as in the C++ code
  elements_per_feature = broadcast_in_dim(
      constant(divide(size(operand), dim(operand, feature_index)),
               element_type(grad_output)),
      [], type(operand))
  i1 = multiply(grad_output, elements_per_feature)
  i2 = broadcast_in_dim(
      compute_sum(grad_output, feature_index), [feature_index], type(operand))
  i3 = broadcast_in_dim(
      compute_sum(multiply(grad_output, centered_operand), feature_index),
      [feature_index], type(operand))
  i4 = multiply(i3, centered_operand)
  i5 = divide(i4, add(variance_bcast, epsilon_bcast))
  i6 = subtract(subtract(i1, i2), i5)

  grad_operand =
      multiply(divide(divide(scale_bcast, stddev), elements_per_feature), i6)
  grad_scale =
      compute_sum(multiply(grad_output, normalized_operand), feature_index)
  grad_offset = compute_sum(grad_output, feature_index)

  return grad_operand, grad_scale, grad_offset

如果是量化型別，則會執行 dequantize_batch_norm_grad_or_training_quantize(lambda operand, scale, mean, variance, grad_output: batch_norm_grad(operand, scale, mean, variance, grad_output, epsilon, feature_index), operand, scale, mean, variance, grad_output, type(grad_operand), type(grad_scale), type(feature_index))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點類型張量或每個張量的量化張量	(C1-C3)、(C5)
(I2)	`scale`	浮點或每個張量量化型別的一維張量	(C2)、(C4)、(C5)
(I3)	`mean`	浮點或每個張量量化型別的一維張量	(C2)、(C4)
(I4)	`variance`	浮點或每個張量量化型別的一維張量	(C2)、(C4)
(I5)	`grad_output`	浮點類型張量或每個張量的量化張量	(C2)、(C3)
(I6)	`epsilon`	`f32` 類型的常數
(I7)	`feature_index`	`si64` 類型的常數	(C1)、(C5)

輸出內容

名稱	類型	限制
`grad_operand`	浮點類型張量或每個張量的量化張量	(C2)、(C3)
`grad_scale`	浮點或每個張量量化型別的一維張量	(C2)、(C4)
`grad_offset`	浮點或每個張量量化型別的一維張量	(C2)、(C4)

限制

(C1) 0 <= feature_index < rank(operand)。
(C2) operand、scale、mean、variance、grad_output、grad_operand、grad_scale 和 grad_offset 具有相同的 baseline_element_type。
(C3) operand、grad_output 和 grad_operand 的形狀相同。
(C4) scale、mean、variance、grad_scale 和 grad_offset 的形狀相同。
(C5) size(scale) = dim(operand, feature_index)。

範例

// %operand: [
//            [[1.0, 2.0], [3.0, 4.0]],
//            [[3.0, 4.0], [1.0, 2.0]]
//           ]
// %scale: [1.0, 1.0]
// %mean: [2.0, 3.0]
// %variance: [1.0, 1.0]
// %grad_output: [
//                [[0.1, 0.1], [0.1, 0.1]],
//                [[0.1, 0.1], [0.1, 0.1]]
//               ]
%grad_operand, %grad_scale, %grad_offset =
"stablehlo.batch_norm_grad"(%operand, %scale, %mean, %variance, %grad_output) {
  epsilon = 0.0 : f32,
  feature_index = 2 : i64
} : (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>, tensor<2xf64>,
     tensor<2x2x2xf64>) -> (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>)
// %grad_operand: [
//                 [[0.0, 0.0], [0.0, 0.0]],
//                 [[0.0, 0.0], [0.0, 0.0]]
//                ]
// %grad_scale:  [0.0, 0.0]
// %grad_offset: [0.4, 0.4]

batch_norm_inference

語意

將 operand 張量在所有維度中正規化，但 feature_index 維度除外，並產生 result 張量。更正式地來說，這項作業可使用 Python 語法，表示為分解成現有 StableHLO 作業，如下所示：

def batch_norm_inference(operand, scale, offset, mean, variance, epsilon, feature_index):
  # Broadcast inputs to shape(operand)
  scale_bcast = broadcast_in_dim(scale, [feature_index], type(operand))
  offset_bcast = broadcast_in_dim(offset, [feature_index], type(operand))
  mean_bcast = broadcast_in_dim(mean, [feature_index], type(operand))
  variance_bcast = broadcast_in_dim(variance, [feature_index], type(operand))
  epsilon_bcast = broadcast_in_dim(constant(epsilon, element_type(operand)), [],
                                   type(operand))

  # Perform normalization using the provided `mean` and `variance` instead of
  # computing them like `batch_norm_training` does.
  centered_operand = subtract(operand, mean_bcast)
  stddev = sqrt(add(variance_bcast, epsilon_bcast))
  normalized_operand = divide(centered_operand, stddev)
  return add(multiply(scale_bcast, normalized_operand), offset_bcast)

如果是量化型別，則會執行 dequantize_op_quantize(lambda operand, scale, offset, mean, variance: batch_norm_inference(operand, scale, offset, mean, variance, epsilon, feature_index), operand, scale, offset, mean, variance, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點類型張量或每個張量的量化張量	(C1-C7)
(I2)	`scale`	浮點或每個張量量化型別的一維張量	(C2)、(C3)
(I3)	`offset`	浮點或每個張量量化型別的一維張量	(C2)、(C4)
(I4)	`mean`	浮點或每個張量量化型別的一維張量	(C5)
(I5)	`variance`	浮點或每個張量量化型別的一維張量	(C2)、(C6)
(I6)	`epsilon`	`f32` 類型的常數
(I7)	`feature_index`	`si64` 類型的常數	(C1)、(C3-C6)

輸出內容

名稱	類型	限制
`result`	浮點類型張量或每個張量的量化張量	(C2)、(C7)

限制

(C1) 0 <= feature_index < rank(operand)。
(C2) operand、scale、offset、mean、variance 和 result 具有相同的 baseline_element_type。
(C3) size(scale) = dim(operand, feature_index)。
(C4) size(offset) = dim(operand, feature_index)。
(C5) size(mean) = dim(operand, feature_index)。
(C6) size(variance) = dim(operand, feature_index)。
(C7) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [
//            [[1.0, 2.0], [3.0, 4.0]],
//            [[3.0, 4.0], [1.0, 2.0]]
//           ]
// %scale: [1.0, 1.0]
// %offset: [1.0, 1.0]
// %mean: [2.0, 3.0]
// %variance: [1.0, 1.0]
%result = "stablehlo.batch_norm_inference"(%operand, %scale, %offset, %mean, %variance) {
  epsilon = 0.0 : f32,
  feature_index = 2 : i64
} : (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>, tensor<2xf64>, tensor<2xf64>) -> tensor<2x2x2xf64>
// %result: [
//           [[0.0, 0.0], [2.0, 2.0]],
//           [[2.0, 2.0], [0.0, 0.0]]
//          ]

batch_norm_training

語意

計算所有維度的平均值和變異數 (feature_index維度除外)，並產生 output、batch_mean 和 batch_var 張量，藉此正規化 operand 張量。更正式地來說，這項作業可以表示為使用 Python 語法分解為現有的 StableHLO 作業，如下所示：

def compute_mean(operand, feature_index):
  (sum,) = reduce(
      inputs=[operand],
      init_values=[constant(0, element_type(operand))],
      dimensions=[i for i in range(rank(operand)) if i != feature_index],
      body=lambda x, y: add(x, y))
  divisor = constant(size(operand) / dim(operand, feature_index),
                     element_type(operand))
  divisor_bcast = broadcast_in_dim(divisor, [], type(sum))
  return divide(sum, divisor_bcast)

def compute_variance(operand, feature_index):
  mean = compute_mean(operand, feature_index)
  mean_bcast = broadcast_in_dim(mean, [feature_index], type(operand))
  centered_operand = subtract(operand, mean_bcast)
  return compute_mean(mul(centered_operand, centered_operand), feature_index)

def batch_norm_training(operand, scale, offset, epsilon, feature_index):
  mean = compute_mean(operand, feature_index)
  variance = compute_variance(operand, feature_index)
  return batch_norm_inference(operand, scale, offset, mean, variance, epsilon,
                              feature_index),
         mean, variance

如果是量化型別，則會執行 dequantize_batch_norm_grad_or_training_quantize(lambda operand, scale, offset: batch_norm_training(operand, scale, offset, epsilon, feature_index), operand, scale, offset, type(output), type(batch_mean), type(batch_var))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點類型張量或每個張量的量化張量	(C1)
(I2)	`scale`	浮點或每個張量量化的 1 維張量	(C2)、(C3)
(I3)	`offset`	浮點或每個張量量化的 1 維張量	(C2)、(C4)
(I4)	`epsilon`	`f32` 類型的常數	(C1)、(C3-C6)
(I5)	`feature_index`	`si64` 類型的常數	(C1)、(C3-C6)

輸出內容

名稱	類型	限制
`output`	浮點類型張量或每個張量的量化張量	(C7)
`batch_mean`	浮點或每個張量量化的 1 維張量	(C2)、(C5)
`batch_var`	浮點或每個張量量化的 1 維張量	(C2)、(C6)

限制

(C1) 0 <= feature_index < rank(operand)。
(C2) operand、scale、offset、batch_mean、batch_var 和 output 具有相同的 baseline_element_type。
(C3) size(scale) = dim(operand, feature_index)。
(C4) size(offset) = dim(operand, feature_index)。
(C5) size(batch_mean) = dim(operand, feature_index)。
(C6) size(batch_var) = dim(operand, feature_index)。
(C7) baseline_type(output) = baseline_type(operand)。

範例

// %operand: [
//            [[1.0, 2.0], [3.0, 4.0]],
//            [[3.0, 4.0], [1.0, 2.0]]
//           ]
// %scale: [1.0, 1.0]
// %offset: [1.0, 1.0]
%output, %batch_mean, %batch_var = "stablehlo.batch_norm_training"(%operand, %scale, %offset) {
  epsilon = 0.0 : f32,
  feature_index = 2 : i64
} : (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>) ->
    (tensor<2x2x2xf64>, tensor<2xf64>, tensor<2xf64>)
// %output: [
//           [[0.0, 0.0], [2.0, 2.0]],
//           [[2.0, 2.0], [0.0, 0.0]]
//          ]
// %batch_mean: [2.0, 3.0]
// %batch_var: [1.0, 1.0]

bitcast_convert

語意

對 operand 張量執行位元轉換運算，並產生 result 張量，其中整個 operand 張量的位元會使用 result 張量的型別重新解讀。

更正式地說，假設有 E = element_type(operand)、E' = element_type(result) 和 R = rank(operand)：

如果 num_bits(E') < num_bits(E)， bits(result[i0, ..., iR-1, :]) = bits(operand[i0, ..., iR-1])。
如果 num_bits(E') > num_bits(E)， bits(result[i0, ..., iR-2]) = bits(operand[i0, ..., iR-2, :])。
如果 num_bits(E') = num_bits(E)， bits(result[i0, ..., iR-1]) = bits(operand[i0, ..., iR-1])。

bits 會傳回指定值的記憶體內表示法，且其行為是由實作定義，因為張量的確切表示法是由實作定義，元素類型的確切表示法也是由實作定義。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或量化張量	(C1-C2)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C1-C2)

限制

(C1) 已知 E = is_quantized(operand) ? storage_type(operand) : element_type(operand)、E' = is_quantized(result) ? storage_type(result) : element_type(result) 和 R = rank(operand)：
- 如果 num_bits(E') = num_bits(E)，請參閱shape(result) = shape(operand)。
- 如果 num_bits(E') < num_bits(E)：
- rank(result) = R + 1。
- dim(result, i) = dim(operand, i)，費用只要 0 <= i < R。
- dim(result, R) * num_bits(E') = num_bits(E)。
- 如果 num_bits(E') > num_bits(E)：
- rank(result) = R - 1。
- dim(result, i) = dim(operand, i)，費用只要 0 <= i < R。
- dim(operand, R - 1) * num_bits(E) = num_bits(E')。
(C2) If is_complex(operand) or is_complex(result), then is_complex(operand) and is_complex(result).

範例

// %operand: 0x0123456789ABCDEF
%result = "stablehlo.bitcast_convert"(%operand) : (tensor<f64>) -> tensor<4xf16>
// %result: [0xCDEF, 0x89AB, 0x4567, 0x0123] // little-endian representation

更多範例

broadcast_in_dim

語意

broadcast_in_dim

藉由複製 operand 張量中的資料，擴展輸入張量的維度和/或等級，並產生 result 張量。更正式地說，result[result_index] = operand[operand_index]，其中對於 axes(operand) 中的所有 d：

operand_index[d] = 0 (如果 dim(operand, d) = 1)。
其他情況則為 operand_index[d] = result_index[broadcast_dimensions[d]]。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或量化張量	(C1-C2)、(C5-C6)
(I2)	`broadcast_dimensions`	`si64` 類型的 1 維張量常數	(C2-C6)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C1)、(C3)、(C5-C6)

限制

(C1) element_type(result) 由以下項目提供：
- element_type(operand) (如果 !is_per_axis_quantized(operand))。
- element_type(operand)，但 quantization_dimension(operand)、scales(operand) 和 zero_points(operand) 可能與 quantization_dimension(result)、scales(result) 和 zero_points(result) 不同，除非另有規定。
(C2) size(broadcast_dimensions) = rank(operand)。
(C3) 0 <= broadcast_dimensions < rank(result)。
(C4) is_unique(broadcast_dimensions)。
(C5) 針對 d 中的所有 axes(operand)：
- dim(operand, d) = 1或
- dim(operand, d) = dim(result, broadcast_dimensions[d])。
(C6) If is_per_axis_quantized(result):
- quantization_dimension(result) = broadcast_dimensions[quantization_dimension(operand)]。
- 如果值為 dim(operand, quantization_dimension(operand)) = 1，則 scales(result)[i] = scales(operand)[0] and zero_points(result)[i] = zero_points(operand)[0] for i in range(dim(result, quantization_dimension(result)))。

範例

// %operand: [
//            [1, 2, 3]
//           ]
%result = "stablehlo.broadcast_in_dim"(%operand) {
  broadcast_dimensions = array<i64: 2, 1>
} : (tensor<1x3xi32>) -> tensor<2x3x2xi32>
// %result: [
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ],
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ]
//          ]

更多範例

保護殼

語意

根據 index 的值，從 branches 執行一個函式，並產生輸出內容。更正式地說，result = selected_branch()，其中：

selected_branch = branches[index] (如果 0 <= index < size(branches))。
其他情況則為 selected_branch = branches[-1]。

輸入

標籤	名稱	類型	限制
(I1)	`index`	`si32` 類型的 0 維張量
(I2)	`branches`	函式數量可變	(C1-C4)

輸出內容

名稱	類型	限制
`results`	可變數量的張量、量化張量或權杖	(C4)

限制

(C1) 0 < size(branches)。
(C2) input_types(branches...) = []。
(C3) same(output_types(branches...))。
(C4) type(results...) = output_types(branches[0])。

範例

// %index: -1
// %result_branch0: [0, 0]
// %result_branch1: [1, 1]
%result0, %result1 = "stablehlo.case"(%index) ({
  "stablehlo.return"(%result_branch0, %result_branch0) : (tensor<2xi64>, tensor<2xi64>) -> ()
}, {
  "stablehlo.return"(%result_branch1, %result_branch1) : (tensor<2xi64>, tensor<2xi64>) -> ()
}) : (tensor<i32>) -> (tensor<2xi64>, tensor<2xi64>)
// %result0: [1, 1]
// %result1: [1, 1]

更多範例

cbrt

語意

對 operand 張量執行元素級別的立方根運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 rootn(x, 3)。
複數：複數立方根。
量化型別：dequantize_op_quantize(cbrt, operand, type(result))

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [0.0, 1.0, 8.0, 27.0]
%result = "stablehlo.cbrt"(%operand) : (tensor<4xf64>) -> tensor<4xf64>
// %result: [0.0, 1.0, 2.0, 3.0]

更多範例

ceil

語意

對 operand 張量執行元素層級的 ceil 運算，並產生 result 張量。實作 IEEE-754 規格的 roundToIntegralTowardPositive 運算。如果是量化型別，則會執行 dequantize_op_quantize(ceil, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點類型張量或每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點類型張量或每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [-0.8166, -0.2530, 0.2530, 0.8166, 2.0]
%result = "stablehlo.ceil"(%operand) : (tensor<5xf32>) -> tensor<5xf32>
// %result: [-0.0, -0.0, 1.0, 1.0, 2.0]

更多範例

cholesky

語意

計算一批矩陣的 Cholesky 分解。

更正式地說，對於 index_space(result) 中的所有 i，result[i0, ..., iR-3, :, :] 是 a[i0, ..., iR-3, :, :] 的 Cholesky 分解，形式為下三角 (如果 lower 是 true) 或上三角 (如果 lower 是 false) 矩陣。對向三角形中的輸出值 (即嚴格上三角形或嚴格下三角形) 則由實作定義。

如果存在輸入矩陣不是 Hermitian 正定矩陣的 i，則行為未定義。

如果是量化型別，則會執行 dequantize_op_quantize(lambda operand: cholesky(operand, lower), a, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`a`	浮點或複數型別的張量，或是每個張量的量化張量	(C1-C3)
(I2)	`lower`	`i1` 類型的常數

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(a) = baseline_type(result)。
(C2) 2 <= rank(a)。
(C3) dim(a, -2) = dim(a, -1)。

範例

// %a: [
//      [1.0, 2.0, 3.0],
//      [2.0, 20.0, 26.0],
//      [3.0, 26.0, 70.0]
//     ]
%result = "stablehlo.cholesky"(%a) {
  lower = true
} : (tensor<3x3xf32>) -> tensor<3x3xf64>
// %result: [
//           [1.0, 0.0, 0.0],
//           [2.0, 4.0, 0.0],
//           [3.0, 5.0, 6.0]
//          ]

限制取值範圍

語意

將 operand 張量中的每個元素限制在最小值和最大值之間，並產生 result 張量。更正式地說，result[result_index] = minimum(maximum(operand[result_index], min_element), max_element)，其中 min_element = rank(min) = 0 ? min[] : min[result_index]，max_element = rank(max) = 0 ? max[] : max[result_index]。如果是量化型別，則會執行 dequantize_op_quantize(clamp, min, operand, max, type(result))。

對複數強制排序會涉及令人意外的語意，因此我們計畫在未來移除對這項作業的複數支援 (#560)。

輸入

標籤	名稱	類型	限制
(I1)	`min`	張量或每個張量的量化張量	(C1)、(C3)
(I2)	`operand`	張量或每個張量的量化張量	(C1-C4)
(I3)	`max`	張量或每個張量的量化張量	(C2)、(C3)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C4)

限制

(C1) rank(min) = 0 or shape(min) = shape(operand)。
(C2) rank(max) = 0 or shape(max) = shape(operand)。
(C3) baseline_element_type(min) = baseline_element_type(operand) = baseline_element_type(max)。
(C4) baseline_type(operand) = baseline_type(result)。

範例

// %min: [5, 10, 15]
// %operand: [3, 13, 23]
// %max: [10, 15, 20]
%result = "stablehlo.clamp"(%min, %operand, %max) : (tensor<3xi32>, tensor<3xi32>, tensor<3xi32>) -> tensor<3xi32>
// %result: [5, 13, 20]

更多範例

collective_broadcast

語意

在 StableHLO 程序格的每個程序群組中，將來源程序的 operand 張量值傳送至目標程序，並產生 result 張量。

這項作業會將 StableHLO 程序格線分割成 process_groups，定義如下：

cross_replica(replica_groups) (如果 channel_id <= 0)。
cross_partition(replica_groups) (如果 channel_id > 0)。

之後，result@process 會由下列人員提供：

如果存在 i，且程序位於 process_groups[i] 中。operand@process_groups[i, 0]
broadcast_in_dim(constant(is_quantized(result) ? quantize(0, element_type(result)) : 0, element_type(result)), [], type(result))，否則為其他情況。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C3)
(I2)	`replica_groups`	`si64` 型別的一維張量常數可變引數數量	(C1)、(C2)
(I3)	`channel_id`	`si64` 類型的常數

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C3)

限制

(C1) is_unique(replica_groups)。
(C2) 0 <= replica_groups < N，其中 N 定義如下：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_partition，則為 num_partitions。
(C3) type(result) = type(operand)。

範例

// num_replicas: 4
// num_partitions: 1
// %operand@(0, 0): [[1, 2]]
// %operand@(1, 0): [[3, 4]]
// %operand@(2, 0): [[5, 6]]
// %operand@(3, 0): [[7, 8]]
%result = "stablehlo.collective_broadcast"(%operand) {
  replica_groups = dense<[[2, 1]]> : tensor<1x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor1x2xi64>) -> tensor<1x2xi64>
// %result@(0, 0): [[0, 0]]
// %result@(1, 0): [[5, 6]]
// %result@(2, 0): [[5, 6]]
// %result@(3, 0): [[0, 0]]

collective_permute

語意

在 StableHLO 程序格的每個程序群組中，將來源程序的 operand 張量值傳送至目標程序，並產生 result 張量。

這項作業會將 StableHLO 程序格線分割成 process_groups，定義如下：

cross_replica(source_target_pairs) (如果 channel_id <= 0)。
cross_partition(source_target_pairs) (如果 channel_id > 0)。

之後，result@process 會由下列人員提供：

operand@process_groups[i, 0]，如果存在 i，使得 process_groups[i, 1] = process。
broadcast_in_dim(constant(is_quantized(result) ? quantize(0, element_type(result)) : 0, element_type(result)), [], type(result))，否則為其他情況。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C5)
(I2)	`source_target_pairs`	`si64` 類型的 2 維張量常數	(C1-C4)
(I3)	`channel_id`	`si64` 類型的常數

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C1)

限制

(C1) dim(source_target_pairs, 1) = 2。
(C2) is_unique(source_target_pairs[:, 0])。
(C3) is_unique(source_target_pairs[:, 1])。
(C4) 0 <= source_target_pairs < N，其中 N 定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_partition，則為 num_partitions。
(C5) type(result) = type(operand)。

範例

// num_replicas: 3
// num_partitions: 1
// %operand@(0, 0): [[1, 2], [3, 4]]
// %operand@(1, 0): [[5, 6], [7, 8]]
// %operand@(2, 0): [[9, 10], [11, 12]]
%result = "stablehlo.collective_permute"(%operand) {
  source_target_pairs = dense<[[0, 1], [1, 2]]> : tensor<2x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor<2x2xi64>) -> tensor<2x2xi64>
//
// %result@(0, 0): [[0, 0], [0, 0]]
// %result@(1, 0): [[1, 2], [3, 4]]
// %result@(2, 0): [[5, 6], [7, 8]]

更多範例

compare

語意

根據 comparison_direction 和 compare_type，對 lhs 和 rhs 張量執行元素層級的比較，並產生 result 張量。

comparison_direction 和 compare_type 的值具有下列語意：

如果是布林值和整數元素類型：

EQ：lhs = rhs。
NE：lhs != rhs。
GE：lhs >= rhs。
GT：lhs > rhs。
LE：lhs <= rhs。
LT：lhs < rhs。

如為具有 compare_type = FLOAT 的浮點元素類型，運算子會實作下列 IEEE-754 作業：

EQ：compareQuietEqual。
NE：compareQuietNotEqual。
GE：compareQuietGreaterEqual。
GT：compareQuietGreater。
LE：compareQuietLessEqual。
LT：compareQuietLess。

如果是具有 compare_type = TOTALORDER 的浮點元素類型，運算子會使用 IEEE-754 的 totalOrder 和 compareQuietEqual 運算組合。

如果是複雜的元素類型，系統會使用提供的 comparison_direction 和 compare_type 執行 (real, imag) 配對的字典順序比較。對複數強制排序會牽涉到令人意外的語意，因此我們計畫在 comparison_direction 為 GE、GT、LE 或 LT 時，移除對複數的支援 (#560)。

適用於量化型別，會執行 dequantize_compare(lhs, rhs, comparison_direction)。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	張量或每個張量的量化張量	(C1-C3)
(I2)	`rhs`	張量或每個張量的量化張量	(C1-C2)
(I3)	`comparison_direction`	`EQ`、`NE`、`GE`、`GT`、`LE` 和 `LT` 的列舉
(I4)	`compare_type`	`FLOAT`、`TOTALORDER`、`SIGNED` 和 `UNSIGNED` 的列舉	(C3)

輸出內容

名稱	類型	限制
`result`	布林值類型的張量	(C2)

限制

(C1) baseline_element_type(lhs) = baseline_element_type(rhs)。
(C2) shape(lhs) = shape(rhs) = shape(result)。
(C3) compare_type 的定義為：
- SIGNED (如果 is_signed_integer(element_type(lhs)))。
- UNSIGNED (如果 is_unsigned_integer(element_type(lhs)) or is_boolean(element_type(lhs)))。
- FLOAT 或 TOTALORDER (如果 is_float(element_type(lhs)))。
- FLOAT (如果 is_complex(element_type(lhs)))。

範例

// %lhs: [1.0, 3.0]
// %rhs: [1.1, 2.9]
%result = "stablehlo.compare"(%lhs, %rhs) {
  comparison_direction = #stablehlo<comparison_direction LT>,
  compare_type = #stablehlo<comparison_type FLOAT>
} : (tensor<2xf32>, tensor<2xf32>) -> tensor<2xi1>
// %result: [true, false]

更多範例

複雜

語意

從實數和虛數值 (lhs 和 rhs) 逐一轉換為複數值，並產生 result 張量。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	`f32` 或 `f64` 類型的張量	(C1-C3)
(I2)	`rhs`	`f32` 或 `f64` 類型的張量	(C1)

輸出內容

名稱	類型	限制
`result`	複數型別的張量	(C2)、(C3)

限制

(C1) type(lhs) = type(rhs)。
(C2) shape(result) = shape(lhs)。
(C3) element_type(result) 的型別為 complex<E>，其中 E = element_type(lhs)。

範例

// %lhs: [1.0, 3.0]
// %rhs: [2.0, 4.0]
%result = "stablehlo.complex"(%lhs, %rhs) : (tensor<2xf64>, tensor<2xf64>) -> tensor<2xcomplex<f64>>
// %result: [(1.0, 2.0), (3.0, 4.0)]

更多範例

複合

語意

封裝由其他 StableHLO 作業組成的作業，並採用 inputs 和 composite_attributes，產生 results。op 的語意是由 decomposition 屬性實作。composite op 可以替換為其分解，而不會變更程式語意。如果將分解作業內嵌無法提供相同的作業語意，建議使用 custom_call。

version 欄位 (預設為 0) 用於表示複合項目的語意何時會變更。

輸入

標籤	名稱	類型
(I1)	`inputs`	可變數量的數值
(I2)	`name`	`string` 類型的常數
(I3)	`composite_attributes`	屬性字典
(I4)	`decomposition`	`string` 類型的常數
(I5)	`version`	`si32` 類型的常數

輸出內容

名稱	類型
`results`	可變數量的數值

限制

(C1) is_namespaced_op_name(name)
(C2) is_defined_in_parent_scope(decomposition)
(C3) types(inputs...) == input_types(decomposition)
(C4) types(results...) == output_types(decomposition)

範例

%results = "stablehlo.composite"(%input0, %input1) {
  name = "my_namespace.my_op",
  composite_attributes = {
    my_attribute = "my_value"
  },
  decomposition = @my_op,
  version = 1 : i32
} : (tensor<f32>, tensor<f32>) -> tensor<f32>

更多範例

串連

語意

沿著 dimension 維度串連 inputs，順序與指定引數相同，並產生 result 張量。更正式地說， result[i0, ..., id, ..., iR-1] = inputs[k][i0, ..., kd, ..., iR-1]，其中：

id = d0 + ... + dk-1 + kd。
d 等於 dimension，而 d0 等則是 inputs 的第 d 個維度大小。

輸入

標籤	名稱	類型	限制
(I1)	`inputs`	可變數量的張量或每個張量的量化張量	(C1-C6)
(I2)	`dimension`	`si64` 類型的常數	(C2)、(C4)、(C6)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C5-C6)

限制

(C1) same(element_type(inputs...))。
(C2) same(shape(inputs...)) (dim(inputs..., dimension) 除外)。
(C3) 0 < size(inputs)。
(C4) 0 <= dimension < rank(inputs[0])。
(C5) element_type(result) = element_type(inputs[0])。
(C6) shape(result) = shape(inputs[0]) (下列項目除外)：
- dim(result, dimension) = dim(inputs[0], dimension) + ...。

範例

// %input0: [[1, 2], [3, 4], [5, 6]]
// %input1: [[7, 8]]
%result = "stablehlo.concatenate"(%input0, %input1) {
  dimension = 0 : i64
} : (tensor<3x2xi64>, tensor<1x2xi64>) -> tensor<4x2xi64>
// %result: [[1, 2], [3, 4], [5, 6], [7, 8]]

更多範例

常數

語意

從常數 value 產生 output 張量。

輸入

標籤	名稱	類型	限制
(I1)	`value`	常數	(C1)

輸出內容

名稱	類型	限制
`output`	張量或量化張量	(C1)

限制

(C1) type(value) = type(output)。

範例

%output = "stablehlo.constant"() {
  value = dense<[[0.0, 1.0], [2.0, 3.0]]> : tensor<2x2xf32>
} : () -> tensor<2x2xf32>
// %output: [[0.0, 1.0], [2.0, 3.0]]

更多範例

完成轉換

語意

在 operand 張量上執行元素式轉換，從一種元素型別轉換為另一種，並產生 result 張量。

如果是boolean-to-any-supported-type的轉換，值 false 會轉換為零，值 true 則會轉換為一。對於any-supported-type-to-boolean的轉換，零值會轉換為 false，非零值則會轉換為 true。如要瞭解這項功能如何處理複雜類型，請參閱下文。

如果是整數對整數、整數對浮點數或浮點數對浮點數的轉換，如果來源值可以精確地以目標型別表示，結果值就是該精確表示法。否則，行為待定 (#180)。

如果是floating-point-to-integer的轉換，系統會截斷小數部分。如果截斷的值無法以目的地型別表示，則行為待定 (#180)。

複數到複數的轉換會遵循浮點到浮點轉換的相同行為，以轉換實部和虛部。

如果是複數到任何其他型別和任何其他型別到複數的轉換，系統會分別忽略來源虛數值或將目的地虛數值歸零。實部轉換遵循浮點轉換。

原則上，這項作業可以表示去量化 (從量化張量轉換為一般張量)、量化 (從一般張量轉換為量化張量) 和重新量化 (在量化張量之間轉換)，但目前我們有專用的作業：第一個用途是 uniform_dequantize，第二個和第三個用途是 uniform_quantize。日後，這兩項作業可能會合併為 convert (#1576)。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量	(C1)

輸出內容

名稱	類型	限制
`result`	張量	(C1)

限制

(C1) shape(operand) = shape(result)。

範例

// %operand: [-1, 0, 1]
%result = "stablehlo.convert"(%operand) : (tensor<3xi64>) -> tensor<3xcomplex<f64>>
// %result: [(-1.0, 0.0), (0.0, 0.0), (1.0, 0.0)]

更多範例

卷積

語意

計算 lhs 視窗與 rhs 切片之間的點積，並產生 result。下圖以具體範例說明如何從 lhs 和 rhs 計算 result 中的元素。

更正式地來說，請考慮以 lhs 表示輸入內容，以便表示 lhs 的視窗：

lhs_window_dimensions = lhs_shape(dim(lhs, input_batch_dimension), dim(rhs, kernel_spatial_dimensions), dim(lhs, input_feature_dimension))。
lhs_window_strides = lhs_shape(1, window_strides, 1)。
lhs_padding = lhs_shape([0, 0], padding, [0, 0])。
lhs_base_dilations = lhs_shape(1, lhs_dilation, 1)。
lhs_window_dilations = lhs_shape(1, rhs_dilation, 1)。

這項重構作業會使用下列輔助函式：

lhs_shape(n, hw, c) = permute([n] + hw + [c], [input_batch_dimension] + input_spatial_dimensions + [input_feature_dimension])。
result_shape(n1, hw, c1) = permute([n1] + hw + [c1], [output_batch_dimension] + output_spatial_dimensions + [output_feature_dimension])。
permute([j0, j1, ..., jR-1], permutation) = [i0, i1, ..., iR-1] (其中 j[d] = i[permutation[d]])。

如果 feature_group_count = 1 和 batch_group_count = 1，則對於 index_space(dim(result, output_spatial_dimensions...)) 中的所有 output_spatial_index，result[result_shape(:, output_spatial_index, :)] = dot_product 其中：

padding_value = constant(0, element_type(lhs))。
padded_lhs = pad(lhs, padding_value, lhs_padding[:, 0], lhs_padding[:, 1], lhs_base_dilations - 1)。
lhs_window_start = lhs_shape(0, output_spatial_index, 0) * lhs_window_strides。
lhs_window = slice(padded_lhs, lhs_window_start, lhs_window_start + lhs_window_dimensions, lhs_window_dilations)。
reversed_lhs_window = reverse(lhs_window, [input_spatial_dimensions[dim] for dim in range(size(window_reversal)) if window_reversal[dim] = true])。這項功能似乎未被使用，因此我們計畫在日後移除這項功能 (#1181)。
dot_product = dot_general(reversed_lhs_window, rhs, lhs_batching_dimensions=[], lhs_contracting_dimensions=input_spatial_dimensions + [input_feature_dimension], rhs_batching_dimensions=[], rhs_contracting_dimensions=kernel_spatial_dimensions + [kernel_input_feature_dimension])。

如果 feature_group_count > 1：

lhses = split(lhs, feature_group_count, input_feature_dimension)。
rhses = split(rhs, feature_group_count, kernel_output_feature_dimension)。
results... = convolution(lhses..., rhses..., ..., feature_group_count=1, ...)。
result = concatenate(results, output_feature_dimension)。

如果 batch_group_count > 1：

lhses = split(lhs, batch_group_count, input_batch_dimension)。
rhses = split(rhs, batch_group_count, kernel_output_feature_dimension)。
results... = convolution(lhses..., rhses..., ..., batch_group_count=1, ...)。
result = concatenate(results, output_feature_dimension)。

如果是量化型別，則會執行 dequantize_op_quantize( lambda lhs, rhs: convolution(lhs, rhs, window_strides, padding, lhs_dilation, rhs_dilation, window_reversal, input_batch_dimension, input_feature_dimension, input_spatial_dimensions, kernel_input_feature_dimension, kernel_output_feature_dimension, kernel_spatial_dimensions, output_batch_dimension, output_feature_dimension, output_spatial_dimensions, feature_group_count, batch_group_count, precision_config), lhs, rhs, type(result))。

如果是混合量化型別，則會執行 hybrid_dequantize_then_op( lambda lhs, rhs: convolution(lhs, rhs, window_strides, padding, lhs_dilation, rhs_dilation, window_reversal, input_batch_dimension, input_feature_dimension, input_spatial_dimensions, kernel_input_feature_dimension, kernel_output_feature_dimension, kernel_spatial_dimensions, output_batch_dimension, output_feature_dimension, output_spatial_dimensions, feature_group_count, batch_group_count, precision_config), lhs, rhs)。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	張量或每個張量的量化張量	(C1)、(C10-C11)、(C14)、(C25)、(C27-C28)、(C31-C32)、(C34)
(I2)	`rhs`	張量或量化張量	(C1)、(C14-C16)、(C25)、(C27-C29)、(C31-C34)
(I3)	`window_strides`	`si64` 類型的 1 維張量常數	(C2-C3)、(C25)
(I4)	`padding`	`si64` 類型的 2 維張量常數	(C4)、(C25)
(I5)	`lhs_dilation`	`si64` 類型的 1 維張量常數	(C5-C6)、(C25)
(I6)	`rhs_dilation`	`si64` 類型的 1 維張量常數	(C7-C8)、(C25)
(I7)	`window_reversal`	`i1` 類型的 1 維張量常數	(C9)
(I8)	`input_batch_dimension`	`si64` 類型的常數	(C10)、(C13)、(C25)
(I9)	`input_feature_dimension`	`si64` 類型的常數	(C11)、(C13-C14)
(I10)	`input_spatial_dimensions`	`si64` 類型的 1 維張量常數	(C12)、(C13)、(C25)
(I11)	`kernel_input_feature_dimension`	`si64` 類型的常數	(C14)、(C18)
(I12)	`kernel_output_feature_dimension`	`si64` 類型的常數	(C15-C16)、(C18)、(C25)、(C29)
(I13)	`kernel_spatial_dimensions`	`si64` 類型的 1 維張量常數	(C17-C18)、(C25)
(I14)	`output_batch_dimension`	`si64` 類型的常數	(C20)、(C25)
(I15)	`output_feature_dimension`	`si64` 類型的常數	(C20)、(C25)、(C30)
(I16)	`output_spatial_dimensions`	`si64` 類型的 1 維張量常數	(C19-C20)、(C25)
(I17)	`feature_group_count`	`si64` 類型的常數	(C11)、(C14)、(C16)、(C21)、(C23)
(I18)	`batch_group_count`	`si64` 類型的常數	(C10)、(C15)、(C22)、(C23)、(C25)
(I19)	`precision_config`	`DEFAULT`、`HIGH` 和 `HIGHEST` 的可變數量列舉	(C24)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C25-C28)、(C30)、(C32-34)

限制

(C1) N = rank(lhs) = rank(rhs)。
(C2) size(window_strides) = N - 2。
(C3) 0 < window_strides。
(C4) shape(padding) = [N - 2, 2]。
(C5) size(lhs_dilation) = N - 2。
(C6) 0 < lhs_dilation。
(C7) size(rhs_dilation) = N - 2。
(C8) 0 < rhs_dilation。
(C9) size(window_reversal) = N - 2。
(C10) dim(lhs, input_batch_dimension) % batch_group_count = 0。
(C11) dim(lhs, input_feature_dimension) % feature_group_count = 0。
(C12) size(input_spatial_dimensions) = N - 2。
(C13) Given input_dimensions = [input_batch_dimension] + input_spatial_dimensions + [input_feature_dimension]:
- is_unique(input_dimensions)。
- 0 <= input_dimensions < N。
(C14) dim(rhs, kernel_input_feature_dimension) = dim(lhs, input_feature_dimension) / feature_group_count。
(C15) dim(rhs, kernel_output_feature_dimension) % batch_group_count = 0。
(C16) dim(rhs, kernel_output_feature_dimension) % feature_group_count = 0。
(C17) size(kernel_spatial_dimensions) = N - 2。
(C18) Given kernel_dimensions = kernel_spatial_dimensions + [kernel_input_feature_dimension] + [kernel_output_feature_dimension]:
- is_unique(kernel_dimensions)。
- 0 <= kernel_dimensions < N。
(C19) size(output_spatial_dimensions) = N - 2。
(C20) Given output_dimensions = [output_batch_dimension] + output_spatial_dimensions + [output_feature_dimension]:
- is_unique(output_dimensions)。
- 0 <= output_dimensions < N。
(C21) 0 < feature_group_count。
(C22) 0 < batch_group_count。
(C23) feature_group_count = 1 or batch_group_count = 1。
(C24) size(precision_config) = 2。
(C25) dim(result, result_dim) 的定義為：
- dim(lhs, input_batch_dimension) / batch_group_count (如果 result_dim = output_batch_dimension)。
- dim(rhs, kernel_output_feature_dimension) (如果 result_dim = output_feature_dimension)。
- 其他情況則為 num_windows，其中：
- output_spatial_dimensions[spatial_dim] = result_dim。
- lhs_dim = input_spatial_dimensions[spatial_dim]。
- rhs_dim = kernel_spatial_dimensions[spatial_dim]。
- dilated_input_shape[lhs_dim] = dim(lhs, lhs_dim) = 0 ? 0 : (dim(lhs, lhs_dim) - 1) * lhs_dilation[spatial_dim] + 1。
- padded_input_shape[lhs_dim] = padding[spatial_dim, 0] + dilated_input_shape[lhs_dim] + padding[spatial_dim, 1]。
- dilated_window_shape[lhs_dim] = dim(rhs, rhs_dim) = 0 ? 0 : (dim(rhs, rhs_dim) - 1) * rhs_dilation[spatial_dim] + 1。
- is_empty_window[lhs_dim] = padded_input_shape[lhs_dim] = 0 || dilated_window_shape[lhs_dim] > padded_input_shape[lhs_dim]。
- num_windows = is_empty_window[lhs_dim] ? 0 : floor((padded_input_shape[lhs_dim] - dilated_window_shape[lhs_dim]) / window_strides[spatial_dim]) + 1。
(C26) rank(result) = N。
如果作業使用非量化張量：
- (C27) element_type(lhs) = element_type(rhs) = element_type(result)。
如果作業使用量化張量：
- (C28) is_quantized(lhs) = is_quantized(result) and is_quantized(rhs)。
- (C29) If is_per_axis_quantized(rhs), then quantization_dimension(rhs) = kernel_output_feature_dimension.
- (C30) If is_per_axis_quantized(result), then quantization_dimension(result) = output_feature_dimension.
- 如果 is_quantized(lhs)：
- (C31) storage_type(lhs) = storage_type(rhs)。
- (C32) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C33) If is_per_tensor_quantized(rhs), then is_per_tensor_quantized(result).
- 如果 !is_quantized(lhs)：
- (C34) element_type(lhs) = expressed_type(rhs) = element_type(result)。

範例

// %lhs: [[
//        [
//          [1], [2], [5], [6]
//        ],
//        [
//          [3], [4], [7], [8]
//        ],
//        [
//          [10], [11], [14], [15]
//        ],
//        [
//          [12], [13], [16], [17]
//        ]
//      ]]
//
// %rhs: [
//        [[[1]], [[1]], [[1]]],
//        [[[1]], [[1]], [[1]]],
//        [[[1]], [[1]], [[1]]]
//       ]
%result = "stablehlo.convolution"(%lhs, %rhs) {
  window_strides = array<i64: 4, 4>,
  padding = dense<0> : tensor<2x2xi64>,
  lhs_dilation = array<i64: 2, 2>,
  rhs_dilation = array<i64: 1, 1>,
  window_reversal = array<i1: false, false>,
  // In the StableHLO dialect, dimension numbers are encoded via:
  // `[<input dimensions>]x[<kernel dimensions>]->[output dimensions]`.
  // "b" is batch dimension, "f" is feature dimension,
  // "i" is input feature dimension, "o" is output feature dimension,
  // "0/1/etc" are spatial dimensions.
  dimension_numbers = #stablehlo.conv<[b, 0, 1, f]x[0, 1, i, o]->[b, 0, 1, f]>,
  batch_group_count = 1 : i64,
  feature_group_count = 1 : i64,
  precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>]
} : (tensor<1x4x4x1xi64>, tensor<3x3x1x1xi64>) -> tensor<1x2x2x1xi64>
// %result: [[
//            [[10], [26]],
//            [[46], [62]]
//          ]]

更多範例

餘弦

語意

對 operand 張量執行元素級別的餘弦運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 cos。
複數：複數餘弦。
量化型別：dequantize_op_quantize(cosine, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [
//            [0.0, 1.57079632],       // [0, pi/2]
//            [3.14159265, 4.71238898] // [pi, 3pi/2]
//           ]
%result = "stablehlo.cosine"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[1.0, 0.0], [-1.0, 0.0]]

更多範例

count_leading_zeros

語意

逐一計算 operand 張量中開頭零位元的數量，並產生 result 張量。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	整數型別的張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數型別的張量	(C1)

限制

(C1) type(operand) = type(result)。

範例

// %operand: [[0, 1], [128, -1]]
%result = "stablehlo.count_leading_zeros"(%operand) : (tensor<2x2xi64>) -> tensor<2x2xi64>
// %result: [[64, 63], [56, 0]]

更多範例

custom_call

語意

封裝實作定義的作業 call_target_name，該作業會採用 inputs 和 called_computations，並產生 results。has_side_effect、backend_config 和 api_version 可用於提供實作定義的其他中繼資料。

目前，這項作業包含相當雜亂的中繼資料集合，反映了 XLA 編譯器中對應作業的自然演變。我們計畫在日後統一這項中繼資料 (#741)。

輸入

標籤	名稱	類型
(I1)	`inputs`	可變數量的數值
(I2)	`call_target_name`	`string` 類型的常數
(I3)	`has_side_effect`	`i1` 類型的常數
(I4)	`backend_config`	`string` 類型常數或屬性字典
(I5)	`api_version`	`si32` 類型的常數
(I6)	`called_computations`	`string` 類型的常數可變數量
(I7)	`output_operand_aliases`	在輸出和運算元中指定別名部分

輸出內容

名稱	類型
`results`	可變數量的數值

(XLA GPU 支援) 特殊的 custom_call 目標

有三種與 buffer 類型相關的特殊 call_target_name：CreateBuffer 會建立未初始化的 buffer，Pin 會建立初始化的 buffer，而 Unpin 則會取消分配 buffer 並傳回 buffer 的內容。

%uninitialized_buffer = "stablehlo.custom_call"() {
  call_target_name = "CreateBuffer",
  api_version = 4 : i32,
} : () -> memref<4xf64>

%initialized_buffer = "stablehlo.custom_call"(%init_value) {
  call_target_name = "Pin",
  api_version = 4 : i32,
} : (tensor<4xf64>) -> memref<4xf64>

%dealloc_buffer = "stablehlo.custom_call"(%initialized_buffer) {
  call_target_name = "Unpin",
  api_version = 4 : i32,
} : (memref<4xf64>) -> tensor<4xf64>

別名

部分 custom_call 作業可能需要輸出內容中的一部分，以及運算元中的一部分共用相同記憶體。這可以透過 output_operand_aliases 表示。別名配對表示法包含輸出元組索引清單 (代表輸出部分)，以及 operand_index 和運算元元組索引清單 (代表運算元部分)。如果對應的型別不是 tuple 型別，輸出或運算元元組索引清單就會是空白，而且對於任意巢狀元組型別，清單長度可以是任意長度。這與 XLA 別名表示法類似。

別名配對中的輸出和輸入部分必須屬於相同類型。對於並非呼叫 CreateBuffer、Pin 和 Unpin 的 custom_call 作業，buffer 運算元最多只能出現在一組別名中，且 buffer 輸出內容必須出現在一組別名中。

範例

%results = "stablehlo.custom_call"(%input0) {
  call_target_name = "foo",
  has_side_effect = false,
  backend_config = {bar = 42 : i32},
  api_version = 4 : i32,
  called_computations = [@foo]
} : (tensor<f64>) -> tensor<f64>

%updated_buffer = "stablehlo.custom_call"(%buffer) {
  call_target_name = "Update",
  api_version = 4 : i32,
  output_operand_aliases = [
    #stablehlo.output_operand_alias<output_tuple_indices = [],
      operand_index = 0,
      operand_tuple_indices = []>]
} : (memref<4xf64>) -> memref<4xf64>

除號

語意

執行被除數 lhs 和除數 rhs 張量的元素層級除法，並產生 result 張量。視元素類型而定，執行下列操作：

整數：整數除法，產生代數商，並捨棄任何小數部分。
浮點數：IEEE-754 的 division。
複數：複數除法。
如為量化型別：
- dequantize_op_quantize(divide, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)
(I2)	`rhs`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [17.1, -17.1, 17.1, -17.1]
// %rhs: [3.0, 3.0, -3.0, -3.0]
%result = "stablehlo.divide"(%lhs, %rhs) : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>
// %result: [5.66666651, -5.66666651, -5.66666651, 5.66666651]

更多範例

dot_general

語意

計算 lhs 片段與 rhs 片段之間的點積，並產生 result 張量。

更正式地說，result[result_index] = dot_product，其中：

lhs_result_dimensions = [d for d in axes(lhs) and d not in lhs_batching_dimensions and d not in lhs_contracting_dimensions]。
rhs_result_dimensions = [d for d in axes(rhs) and d not in rhs_batching_dimensions and d not in rhs_contracting_dimensions]。
result_batching_index + result_lhs_index + result_rhs_index = result_index 其中 size(result_batching_index) = size(lhs_batching_dimensions)、 size(result_lhs_index) = size(lhs_result_dimensions) 和 size(result_rhs_index) = size(rhs_result_dimensions)。
transposed_lhs = transpose(lhs, lhs_batching_dimensions + lhs_result_dimensions + lhs_contracting_dimensions)。
transposed_lhs_slice = slice(transposed_lhs, result_batching_index + result_lhs_index + [:, ..., :])。
reshaped_lhs_slice = reshape(transposed_lhs_slice, dims(lhs, lhs_contracting_dimensions))。
transposed_rhs = transpose(rhs, rhs_batching_dimensions + rhs_result_dimensions + rhs_contracting_dimensions)。
transposed_rhs_slice = slice(transposed_rhs, result_batching_index + result_rhs_index + [:, ..., :])。
reshaped_rhs_slice = reshape(transposed_rhs_slice, dims(rhs, rhs_contracting_dimensions))。
dot_product = reduce( inputs=[multiply(reshaped_lhs_slice, reshaped_rhs_slice)], init_values=[constant(0, element_type(result))], dimensions=range(size(lhs_contracting_dimensions)), body=lambda x, y: add(x, y))。

如果是量化型別，則會執行 dequantize_op_quantize( lambda lhs, rhs: dot_general(lhs, rhs, lhs_batching_dimensions, rhs_batching_dimensions, lhs_contracting_dimensions, rhs_contracting_dimensions, precision_config), lhs, rhs, type(result))。

如果是混合量化型別，則會執行 hybrid_dequantize_then_op( lambda lhs, rhs: dot_general(lhs, rhs, lhs_batching_dimensions, rhs_batching_dimensions, lhs_contracting_dimensions, rhs_contracting_dimensions, precision_config), lhs, rhs)。

precision_config 可控制加速器後端運算的運算速度和準確度之間的取捨。可以是下列其中一個值 (目前這些列舉值的語意尚未明確指定，但我們計畫在 #755 中解決這個問題)：

DEFAULT：計算速度最快，但最不準確。
HIGH：計算速度較慢，但更接近原始數字。
HIGHEST：計算速度最慢，但最接近原始數字。

DotAlgorithm 定義用於實作點運算的演算法主要屬性，同時也定義精確度。如果設定演算法屬性欄位，則 precision_config 必須為 DEFAULT。DotAlgorithms 沒有預設值，因為預設參數是由實作定義。因此，所有點演算法欄位都可以設為 None，指定空白點演算法，改為使用 precision_config 值。

DotAlgorithm 欄位包括：

和 rhs_precision_type，即運算子左側和右側的精確度。lhs_precision_type精確度類型與輸入和輸出的儲存類型無關。
accumulation_type 用於累計的精確度。
當我們執行演算法，將左側和/或右側分解為多個元件，並對這些值執行多個「原始」點運算時，就會套用 lhs_component_count、rhs_component_count 和 num_primitive_operations，通常是為了模擬更高的精確度 (例如運用 bfloat16 人工智慧資料型別進行高精確度運算：bf16_6x tf32_3x 等)。如果演算法沒有分解，這些值應設為 1。
allow_imprecise_accumulation，指定是否允許部分步驟以較低的精確度累計 (例如 CUBLASLT_MATMUL_DESC_FAST_ACCUM)。

DotAlgorithm 屬性範例：

// Inputs are casted to tf32, and then accumulated in f32:
{lhs_precision_type = tf32,
 rhs_precision_type = tf32,
 accumulation_type = f32,
 lhs_component_count = 1,
 rhs_component_count = 1,
 num_primitive_operations = 1,
 allow_imprecise_accumulation = false}


// bf16_6x: each input is decomposed to 3 bf16 components, then 6 dot operations are done on those components, and the result is accumulated in f32.
{lhs_precision_type = bf16,
 rhs_precision_type = bf16,
 accumulation_type = f32,
 lhs_component_count = 3,
 rhs_component_count = 3,
 num_primitive_operations = 6,
 allow_imprecise_accumulation = false}


// Inputs are (casted to) f8e5m2, and we accumulate in f32, but for some steps we may accumulate in lower precision.
{lhs_precision_type = f8e5m2,
 rhs_precision_type = f8e5m2,
 accumulation_type = f32,
 lhs_component_count = 1,
 rhs_component_count = 1,
 num_primitive_operations = 1,
 allow_imprecise_accumulation = true}

實作方式可自行決定要支援哪些組合。一般而言，StableHLO 的消費者無法保證每個演算法都支援每種加速器類型。如果系統不支援特定演算法，應會引發錯誤，而不是改用替代演算法。StableHLO 驗證會盡力驗證，防止演算法在任何硬體上執行 (如果已知該演算法不支援該硬體)。

如要查看部分支援的演算法值，請參閱 xla_data.proto > Algorithm。支援單 #2483 記錄了為後端支援的演算法建立集中式文件的計畫。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	張量或每個張量的量化張量	(C5-C6)、(C9-C10)、(C12-C14)、(C17-C18)、(C20)
(I2)	`rhs`	張量或量化張量	(C7-C10)、(C12-C20)
(I3)	`lhs_batching_dimensions`	`si64` 類型的 1 維張量常數	(C1)、(C3)、(C5)、(C9)、(C12)
(I4)	`rhs_batching_dimensions`	`si64` 類型的 1 維張量常數	(C1)、(C4)、(C7)、(C9)
(I5)	`lhs_contracting_dimensions`	`si64` 類型的 1 維張量常數	(C2)、(C3)、(C6)、(C10)
(I6)	`rhs_contracting_dimensions`	`si64` 類型的 1 維張量常數	(C2)、(C4)、(C8)、(C10)、(C16)
(I7)	`precision_config`	`DEFAULT`、`HIGH` 和 `HIGHEST` 的可變數量列舉	(C11)、(C21)
(I8)	`lhs_precision_type`	FloatType 或 TensorFloat32	(C21)
(I9)	`rhs_precision_type`	FloatType 或 TensorFloat32	(C21)
(I10)	`accumulation_type`	FloatType 或 TensorFloat32	(C21)
(I11)	`lhs_component_count`	`si32` 類型的常數	(C21)、(C22)
(I12)	`rhs_component_count`	`si32` 類型的常數	(C21)、(C23)
(I13)	`num_primitive_operations`	`si32` 類型的常數	(C21)、(C24)
(I14)	`allow_imprecise_accumulation`	`bool` 類型的常數	(C21)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C12)、(C14)、(C18-C20)

限制

(C1) size(lhs_batching_dimensions) = size(rhs_batching_dimensions)。
(C2) size(lhs_contracting_dimensions) = size(rhs_contracting_dimensions)。
(C3) is_unique(lhs_batching_dimensions + lhs_contracting_dimensions)。
(C4) is_unique(rhs_batching_dimensions + rhs_contracting_dimensions)。
(C5) 0 <= lhs_batching_dimensions < rank(lhs)。
(C6) 0 <= lhs_contracting_dimensions < rank(lhs)。
(C7) 0 <= rhs_batching_dimensions < rank(rhs)。
(C8) 0 <= rhs_contracting_dimensions < rank(rhs)。
(C9) dim(lhs, lhs_batching_dimensions...) = dim(rhs, rhs_batching_dimensions...)。
(C10) dim(lhs, lhs_contracting_dimensions...) = dim(rhs, rhs_contracting_dimensions...)。
(C11) size(precision_config) = 2。
(C12) shape(result) = dim(lhs, lhs_batching_dimensions) + dim(lhs, lhs_result_dimensions) + dim(rhs, rhs_result_dimensions)。
如果作業使用非量化張量：
- (C13) element_type(lhs) = element_type(rhs)。
如果作業使用量化張量：
- (C14) is_quantized(lhs) = is_quantized(result) and is_quantized(rhs)。
- (C15) zero_points(rhs) = 0。
- (C16) 如果 is_per_axis_quantized(rhs)，則 quantization_dimension(rhs) 不在 rhs_contracting_dimensions 中。
- 如果 is_quantized(lhs)：
- (C17) storage_type(lhs) = storage_type(rhs)。
- (C18) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C19) If is_per_tensor_quantized(rhs), then is_per_tensor_quantized(result).
- 如果 !is_quantized(lhs)：
- (C20) element_type(lhs) = expressed_type(rhs) = element_type(result)。
如果 !is_empty_algorithm(lhs_precision_type, rhs_precision_type, accumulation_type, lhs_component_count, rhs_component_count, num_primitive_operations allow_imprecise_accumulation)：
- (C21) precision_config... = DEFAULT。
- (C22) 0 < lhs_component_count。
- (C23) 0 < rhs_component_count。
- (C24) 0 < num_primitive_operations。

範例

// %lhs: [
//        [[1, 2],
//         [3, 4]],
//        [[5, 6],
//         [7, 8]]
//       ]
// %rhs: [
//        [[1, 0],
//         [0, 1]],
//        [[1, 0],
//         [0, 1]]
//       ]
%result = "stablehlo.dot_general"(%lhs, %rhs) {
  dot_dimension_numbers = #stablehlo.dot<
    lhs_batching_dimensions = [0],
    rhs_batching_dimensions = [0],
    lhs_contracting_dimensions = [2],
    rhs_contracting_dimensions = [1]
  >,
  precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>],
  algorithm = #stablehlo.dot_algorithm<
    lhs_precision_type = tf32,
    rhs_precision_type = tf32,
    accumulation_type = f32,
    lhs_component_count = 1,
    rhs_component_count = 1,
    num_primitive_operations = 1,
    allow_imprecise_accumulation = false
  >
} : (tensor<2x2x2xi64>, tensor<2x2x2xi64>) -> tensor<2x2x2xi64>
// %result: [
//           [[1, 2],
//            [3, 4]],
//           [[5, 6],
//            [7, 8]]
//          ]

更多範例

dynamic_broadcast_in_dim

語意

這項作業在功能上與 broadcast_in_dim 作業相同，但結果形狀是透過 output_dimensions 動態指定。

這項作業也會接受選用屬性 known_expanding_dimensions、known_nonexpanding_dimensions，用來表示維度擴展行為的靜態知識。如未指定，系統會假設所有維度都可能擴展。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或量化張量	(C1-C2)、(C5-C6)、(C9)
(I2)	`output_dimensions`	整數類型的一維張量	(C7)
(I3)	`broadcast_dimensions`	整數類型的一維常數張量	(C2-C6)
(I4)	`known_expanding_dimensions`	整數類型的一維常數張量	(C8-C9)
(I5)	`known_nonexpanding_dimensions`	整數類型的一維常數張量	(C8-C9)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C1)、(C3)、(C5-C7)

限制

(C1) element_type(result) 由以下項目提供：
- element_type(operand) (如果 !is_per_axis_quantized(operand))。
- element_type(operand)，但 quantization_dimension(operand)、scales(operand) 和 zero_points(operand) 可能與 quantization_dimension(result)、scales(result) 和 zero_points(result) 不同，除非另有規定。
(C2) size(broadcast_dimensions) = rank(operand)。
(C3) 0 <= broadcast_dimensions < rank(result)。
(C4) is_unique(broadcast_dimensions)。
(C5) 針對 d 中的所有 axes(operand)：
- dim(operand, d) = 1或
- dim(operand, d) = dim(result, broadcast_dimensions[d])。
(C6) If is_per_axis_quantized(result):
- quantization_dimension(result) = broadcast_dimensions[quantization_dimension(operand)]。
- 如果值為 dim(operand, quantization_dimension(operand)) = 1，則 scales(result)[i] = scales(operand)[0] and zero_points(result)[i] = zero_points(operand)[0] for i in range(dim(result, quantization_dimension(result)))。
(C7) size(output_dimensions) = rank(result)。
(C8) is_unique(known_expanding_dimensions + known_nonexpanding_dimensions)。
(C9) 0 <= known_expanding_dimensions < rank(operand)。
(C10) 0 <= known_nonexpanding_dimensions < rank(operand)。

範例

// %operand: [
//            [1, 2, 3]
//           ]
%operand = stablehlo.constant dense<[[1, 2, 3]]> : tensor<1x3xi64>
%output_dimensions = stablehlo.constant dense<[2, 3, 2]> : tensor<3xi64>
%result = "stablehlo.dynamic_broadcast_in_dim"(%operand, %output_dimensions) {
  broadcast_dimensions = array<i64: 2, 1>,
  known_expanding_dimensions = array<i64: 0>,
  known_nonexpanding_dimensions = array<i64: 1>
} : (tensor<1x3xi64>, tensor<3xi64>) -> tensor<2x3x2xi64>
// %result: [
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ],
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ]
//          ]

更多範例

dynamic_conv

語意

這項作業在功能上與捲積作業相同，但填補是透過 padding 動態指定。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	張量或每個張量的量化張量	(C1)、(C10-C11)、(C14)、(C25)、(C26-C27)、(C30-C31)、(C33)
(I2)	`rhs`	張量或量化張量	(C1)、(C14-C16)、(C26-C28)、(C30-C33)
(I3)	`padding`	整數類型的 2 維張量	(C4)
(I4)	`window_strides`	`si64` 類型的 1 維張量常數	(C2-C3)
(I5)	`lhs_dilation`	`si64` 類型的 1 維張量常數	(C5-C6)
(I6)	`rhs_dilation`	`si64` 類型的 1 維張量常數	(C7-C8)
(I7)	`window_reversal`	`i1` 類型的 1 維張量常數	(C9)
(I8)	`input_batch_dimension`	`si64` 類型的常數	(C10)、(C13)
(I9)	`input_feature_dimension`	`si64` 類型的常數	(C11)、(C13-C14)
(I10)	`input_spatial_dimensions`	`si64` 類型的 1 維張量常數	(C12)、(C13)
(I11)	`kernel_input_feature_dimension`	`si64` 類型的常數	(C14)、(C18)
(I12)	`kernel_output_feature_dimension`	`si64` 類型的常數	(C15-C16)、(C18)、(C28)
(I13)	`kernel_spatial_dimensions`	`si64` 類型的 1 維張量常數	(C17-C18)
(I14)	`output_batch_dimension`	`si64` 類型的常數	(C20)
(I15)	`output_feature_dimension`	`si64` 類型的常數	(C20)、(C29)
(I16)	`output_spatial_dimensions`	`si64` 類型的 1 維張量常數	(C19-C20)
(I17)	`feature_group_count`	`si64` 類型的常數	(C11)、(C14)、(C16)、(C21)、(C23)
(I18)	`batch_group_count`	`si64` 類型的常數	(C10)、(C15)、(C22)、(C23)
(I19)	`precision_config`	`DEFAULT`、`HIGH` 和 `HIGHEST` 的可變數量列舉	(C24)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C25-C27)、(C29)、(C31-C33)

限制

(C1) N = rank(lhs) = rank(rhs)。
(C2) size(window_strides) = N - 2。
(C3) 0 < window_strides。
(C4) shape(padding) = [N - 2, 2]。
(C5) size(lhs_dilation) = N - 2。
(C6) 0 < lhs_dilation。
(C7) size(rhs_dilation) = N - 2。
(C8) 0 < rhs_dilation。
(C9) size(window_reversal) = N - 2。
(C10) dim(lhs, input_batch_dimension) % batch_group_count = 0。
(C11) dim(lhs, input_feature_dimension) % feature_group_count = 0。
(C12) size(input_spatial_dimensions) = N - 2。
(C13) Given input_dimensions = [input_batch_dimension] + input_spatial_dimensions + [input_feature_dimension]:
- is_unique(input_dimensions)。
- 0 <= input_dimensions < N。
(C14) dim(rhs, kernel_input_feature_dimension) = dim(lhs, input_feature_dimension) / feature_group_count。
(C15) dim(rhs, kernel_output_feature_dimension) % batch_group_count = 0。
(C16) dim(rhs, kernel_output_feature_dimension) % feature_group_count = 0。
(C17) size(kernel_spatial_dimensions) = N - 2。
(C18) Given kernel_dimensions = kernel_spatial_dimensions + [kernel_input_feature_dimension] + [kernel_output_feature_dimension]:
- is_unique(kernel_dimensions)。
- 0 <= kernel_dimensions < N。
(C19) size(output_spatial_dimensions) = N - 2。
(C20) Given output_dimensions = [output_batch_dimension] + output_spatial_dimensions + [output_feature_dimension]:
- is_unique(output_dimensions)。
- 0 <= output_dimensions < N。
(C21) 0 < feature_group_count。
(C22) 0 < batch_group_count。
(C23) feature_group_count = 1 or batch_group_count = 1。
(C24) size(precision_config) = 2。
(C25) dim(result, result_dim) 的定義為：
- dim(lhs, input_batch_dimension) / batch_group_count (如果 result_dim = output_batch_dimension)。
- dim(rhs, kernel_output_feature_dimension) (如果 result_dim = output_feature_dimension)。
- 其他情況則為 num_windows，其中：
- output_spatial_dimensions[spatial_dim] = result_dim。
- lhs_dim = input_spatial_dimensions[spatial_dim]。
- rhs_dim = kernel_spatial_dimensions[spatial_dim]。
- dilated_input_shape[lhs_dim] = dim(lhs, lhs_dim) = 0 ? 0 : (dim(lhs, lhs_dim) - 1) * lhs_dilation[spatial_dim] + 1。
- padded_input_shape[lhs_dim] = padding[spatial_dim, 0] + dilated_input_shape[lhs_dim] + padding[spatial_dim, 1]。
- dilated_window_shape[lhs_dim] = dim(rhs, rhs_dim) = 0 ? 0 : (dim(rhs, rhs_dim) - 1) * rhs_dilation[spatial_dim] + 1。
- is_empty_window[lhs_dim] = padded_input_shape[lhs_dim] = 0 || dilated_window_shape[lhs_dim] > padded_input_shape[lhs_dim]。
- num_windows = is_empty_window[lhs_dim] ? 0 : floor((padded_input_shape[lhs_dim] - dilated_window_shape[lhs_dim]) / window_strides[spatial_dim]) + 1。
(C26) rank(result) = N。
如果作業使用非量化張量：
- (C27) element_type(lhs) = element_type(rhs) = element_type(result)。
如果作業使用量化張量：
- (C28) is_quantized(lhs) = is_quantized(result) and is_quantized(rhs)。
- (C29) If is_per_axis_quantized(rhs), then quantization_dimension(rhs) = kernel_output_feature_dimension.
- (C30) If is_per_axis_quantized(result), then quantization_dimension(result) = output_feature_dimension.
- 如果 is_quantized(lhs)：
- (C31) storage_type(lhs) = storage_type(rhs)。
- (C32) expressed_type(lhs) = expressed_type(rhs) = expressed_type(result)。
- (C33) If is_per_tensor_quantized(rhs), then is_per_tensor_quantized(result).
- 如果 !is_quantized(lhs)：
- (C34) element_type(lhs) = expressed_type(rhs) = element_type(result)。

範例

// %lhs: [[
//        [[1], [2], [5], [6]],
//        [[3], [4], [7], [8]],
//        [[10], [11], [14], [15]],
//        [[12], [13], [16], [17]]
//      ]]
//
// %rhs: [
//         [[[1]], [[1]], [[1]]],
//         [[[1]], [[1]], [[1]]],
//         [[[1]], [[1]], [[1]]]
//        ]
// %padding: [[1, 1],
//            [1, 1]]
%result = "stablehlo.dynamic_conv"(%lhs, %rhs, %padding) {
  window_strides = array<i64: 4, 4>,
  lhs_dilation = array<i64: 2, 2>,
  rhs_dilation = array<i64: 1, 1>,
  window_reversal = array<i1: false, false>,
  dimension_numbers = #stablehlo.conv<raw
    input_batch_dimension = 0,
    input_feature_dimension = 3,
    input_spatial_dimensions = [0, 1],
    kernel_input_feature_dimension = 2,
    kernel_output_feature_dimension = 3,
    kernel_spatial_dimensions = [0, 1],
    output_batch_dimension = 0,
    output_feature_dimension = 3,
    output_spatial_dimensions = [1, 2]
  >,
  feature_group_count = 1 : i64,
  batch_group_count = 1 : i64,
  precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>]
} : (tensor<1x4x4x1xi64>, tensor<3x3x1x1xi64>, tensor<2x2xi64>) -> tensor<1x2x2x1xi64>
// %result: [[
//            [[1], [5]],
//            [[10], [14]]
//          ]]

更多範例

dynamic_gather

語意

這項作業在功能上與 gather op 相同，但 slice_sizes 是以值的形式動態指定。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1)、(C7)、(C10-C12)、(C14)
(I2)	`start_indices`	整數型別的張量	(C2)、(C3)、(C13)
(I3)	`slice_sizes`	整數類型的一維張量	(C8)、(C11-C13)
(I4)	`offset_dims`	`si64` 類型的 1 維張量常數	(C1)、(C4-C5)、(C13)
(I5)	`collapsed_slice_dims`	`si64` 類型的 1 維張量常數	(C1)、(C6-C8)、(C13)
(I6)	`start_index_map`	`si64` 類型的 1 維張量常數	(C3)、(C9)、(C10)
(I7)	`index_vector_dim`	`si64` 類型的常數	(C2)、(C3)、(C13)
(I8)	`indices_are_sorted`	`i1` 類型的常數

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C5)、(C13-C14)

限制

(C1) rank(operand) = size(offset_dims) + size(collapsed_slice_dims)。
(C2) 0 <= index_vector_dim <= rank(start_indices)。
(C3) size(start_index_map) = index_vector_dim < rank(start_indices) ? dim(start_indices, index_vector_dim) : 1。
(C4) is_unique(offset_dims) and is_sorted(offset_dims)。
(C5) 0 <= offset_dims < rank(result)。
(C6) is_unique(collapsed_slice_dims) and is_sorted(collapsed_slice_dims)。
(C7) 0 <= collapsed_slice_dims < rank(operand)。
(C8) slice_sizes[collapsed_slice_dims...] <= 1。
(C9) is_unique(start_index_map)。
(C10) 0 <= start_index_map < rank(operand)。
(C11) size(slice_sizes) = rank(operand)。
(C12) 0 <= slice_sizes <= shape(operand)。
(C13) shape(result) = combine(batch_dim_sizes, offset_dim_sizes) 其中：
- batch_dim_sizes = shape(start_indices)，但對應於 index_vector_dim 的 start_indices 維度大小不包含在內。
- offset_dim_sizes = shape(slice_sizes)，但對應於 collapsed_slice_dims 的 slice_sizes 中的維度大小不包含在內。
- combine 會將 batch_dim_sizes 放在對應於 batch_dims 的軸上，並將 offset_dim_sizes 放在對應於 offset_dims 的軸上。
(C14) element_type(operand) = element_type(result)。

範例

// %operand: [
//            [[1, 2], [3, 4], [5, 6], [7, 8]],
//            [[9, 10],[11, 12], [13, 14], [15, 16]],
//            [[17, 18], [19, 20], [21, 22], [23, 24]]
//           ]
// %start_indices: [
//                  [[0, 0], [1, 0], [2, 1]],
//                  [[0, 1], [1, 1], [0, 2]]
//                 ]
// %slize_sizes: [1, 2, 2]
%result = "stablehlo.dynamic_gather"(%operand, %start_indices, %slize_sizes) {
  dimension_numbers = #stablehlo.gather<
    offset_dims = [2, 3],
    collapsed_slice_dims = [0],
    start_index_map = [1, 0],
    index_vector_dim = 2>,
  indices_are_sorted = false
} : (tensor<3x4x2xi64>, tensor<2x3x2xi64>, tensor<3xi64>) -> tensor<2x3x2x2xi64>
// %result: [
//            [
//              [[1, 2], [3, 4]],
//              [[3, 4], [5, 6]],
//              [[13, 14], [15, 16]]
//            ],
//            [
//              [[9, 10], [11, 12]],
//              [[11, 12], [13, 14]],
//              [[17, 18], [19, 20]]
//            ]
//          ]

更多範例

dynamic_iota

語意

這項作業在功能上與 iota op 相同，但結果形狀是透過 output_shape 動態指定。

輸入

標籤	名稱	類型	限制
(I1)	`output_shape`	整數類型的一維張量	(C1)、(C2)
(I2)	`iota_dimension`	`si64`	(C1)

輸出內容

名稱	類型	限制
`result`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C2)

限制

(C1) 0 <= iota_dimension < size(output_shape)。
(C2) rank(result) = size(output_shape)。

範例

%output_shape = stablehlo.constant dense<[4, 5]> : tensor<2xi64>
%result = "stablehlo.dynamic_iota"(%output_shape) {
  iota_dimension = 0 : i64
} : (tensor<2xi64>) -> tensor<4x5xi64>
// %result: [
//           [0, 0, 0, 0, 0],
//           [1, 1, 1, 1, 1],
//           [2, 2, 2, 2, 2],
//           [3, 3, 3, 3, 3]
//          ]

更多範例

dynamic_pad

語意

這項作業在功能上與 pad op 相同，但 edge_padding_low、edge_padding_high 和 interior_padding 是以值的形式動態指定。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1)、(C2)、(C4)
(I2)	`padding_value`	0 維度張量或每個張量的量化張量	(C1)
(I3)	`edge_padding_low`	整數類型的一維張量	(C1)、(C4)
(I4)	`edge_padding_high`	整數類型的一維張量	(C1)、(C4)
(I5)	`interior_padding`	整數類型的一維張量	(C2-C4)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C3-C6)

限制

(C1) element_type(operand) = element_type(padding_value) = element_type(result)。
(C2) size(edge_padding_low) = size(edge_padding_high) = size(interior_padding) = rank(operand)。
(C3) 0 <= interior_padding。
(C4) shape(result) = shape(operand) + edge_padding_low + max(shape(operand) - 1, 0) * interior_padding + edge_padding_high。

範例

// %operand: [
//            [1, 2, 3],
//            [4, 5, 6]
//           ]
// %padding_value: 0
// %edge_padding_low: [0, 1]
// %edge_padding_high: [2, 1]
// %interior_padding: [1, 2]
%result = "stablehlo.dynamic_pad"(%operand, %padding_value,
  %edge_padding_low, %edge_padding_high, %interior_padding
) : (tensor<2x3xi64>, tensor<i64>, tensor<2xi64>, tensor<2xi64>, tensor<2xi64>) -> tensor<5x9xi64>
// %result: [
//           [0, 1, 0, 0, 2, 0, 0, 3, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 4, 0, 0, 5, 0, 0, 6, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0]
//          ]

更多範例

dynamic_reshape

語意

這項作業在功能上與 reshape 作業相同，但結果形狀是透過 output_shape 動態指定。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或量化張量	(C1-C3)
(I2)	`output_shape`	整數類型的一維張量	(C4)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C1-C4)

限制

(C1) element_type(result) 由以下項目提供：
- element_type(operand) (如果 !is_per_axis_quantized(operand))。
- element_type(operand)，但 quantization_dimension(operand) 和 quantization_dimension(result) 可能不同。
(C2) size(operand) = size(result)。
(C3) If is_per_axis_quantized(operand)：
- reduce(dims(operand, [0, 1, ..., quantization_dimension(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [0, 1, ..., quantization_dimension(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y)。
- dim(operand, quantization_dimension(operand)) = dim(result, quantization_dimension(result))。
- reduce(dims(operand, [quantization_dimension(operand) + 1, ..., rank(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [quantization_dimension(result) + 1, ..., rank(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y)。
(C4) size(output_shape) = rank(result)。

範例

// %operand: [[1, 2, 3], [4, 5, 6]]
// %output_shape: [3, 2]
%result = "stablehlo.dynamic_reshape"(%operand, %output_shape) : (tensor<2x3xi64>, tensor<2xi64>) -> tensor<3x2xi64>
// %result: [[1, 2], [3, 4], [5, 6]]

更多範例

dynamic_slice

語意

使用動態計算的起始索引，從 operand 中擷取切片，並產生 result 張量。start_indices 包含每個維度切片的起始索引，可能會進行調整，而 slice_sizes 則包含每個維度切片的大小。更正式地說，result[result_index] = operand[operand_index]，其中：

adjusted_start_indices = clamp(0, start_indices, shape(operand) - slice_sizes)。
operand_index = adjusted_start_indices + result_index。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1)、(C2)、(C4)
(I2)	`start_indices`	整數型別的 0 維張量可變數	(C2)、(C3)
(I3)	`slice_sizes`	`si64` 類型的 1 維張量常數	(C2)、(C4)、(C5)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C1)、(C5)

限制

(C1) element_type(operand) = element_type(result)。
(C2) size(start_indices) = size(slice_sizes) = rank(operand)。
(C3) same(type(start_indices...))。
(C4) 0 <= slice_sizes <= shape(operand)。
(C5) shape(result) = slice_sizes。

範例

// %operand: [
//            [0, 0, 1, 1],
//            [0, 0, 1, 1],
//            [0, 0, 0, 0],
//            [0, 0, 0, 0]
//           ]
// %start_indices0: -1
// %start_indices1: 3
%result = "stablehlo.dynamic_slice"(%operand, %start_indices0, %start_indices1) {
  slice_sizes = array<i64: 2, 2>
} : (tensor<4x4xi32>, tensor<i64>, tensor<i64>) -> tensor<2x2xi32>
// %result: [
//           [1, 1],
//           [1, 1]
//          ]

更多範例

dynamic_update_slice

語意

產生 result 張量，該張量等於 operand 張量，但從 start_indices 開始的切片會以 update 中的值更新。更正式地說，result[result_index] 的定義如下：

如果 0 <= update_index < shape(update) 位於 0 <= update_index < shape(update)，則：
- adjusted_start_indices = clamp(0, start_indices, shape(operand) - shape(update))。
- update_index = result_index - adjusted_start_indices。
其他情況則為 operand[result_index]。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1-C4)、(C6)
(I2)	`update`	張量或每個張量的量化張量	(C2)、(C3)、(C6)
(I3)	`start_indices`	整數型別的 0 維張量可變數	(C4)、(C5)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C1)

限制

(C1) type(operand) = type(result)。
(C2) element_type(update) = element_type(operand)。
(C3) rank(update) = rank(operand)。
(C4) size(start_indices) = rank(operand)。
(C5) same(type(start_indices...))。
(C6) 0 <= shape(update) <= shape(operand)。

範例

// %operand: [
//            [1, 1, 0, 0],
//            [1, 1, 0, 0],
//            [1, 1, 1, 1],
//            [1, 1, 1, 1]
//           ]
// %update: [
//           [1, 1],
//           [1, 1]
//          ]
// %start_indices0: -1
// %start_indices1: 3
%result = "stablehlo.dynamic_update_slice"(%operand, %update, %start_indices0, %start_indices1)
  : (tensor<4x4xi32>, tensor<2x2xi32>, tensor<i64>, tensor<i64>) -> tensor<4x4xi32>
// %result: [
//           [1, 1, 1, 1],
//           [1, 1, 1, 1],
//           [1, 1, 1, 1],
//           [1, 1, 1, 1]
//          ]

更多範例

指數

語意

對 operand 張量執行元素級別的指數運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 exp。
複數：複數指數。
量化型別： dequantize_op_quantize(exponential, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[0.0, 1.0], [2.0, 3.0]]
%result = "stablehlo.exponential"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [[1.0, 2.7182818284590451], [7.3890560989306504, 20.085536923187668]]

更多範例

exponential_minus_one

語意

對 operand 張量執行元素級別的指數減一運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 expm1。
如果是複數：複數指數減一。
量化型別： dequantize_op_quantize(exponential_minus_one, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [0.0, 1.0]
%result = "stablehlo.exponential_minus_one"(%operand) : (tensor<2xf64>) -> tensor<2xf64>
// %result: [0.0, 1.71828187]

更多範例

fft

語意

對實數和複數輸入/輸出執行正向和反向傅立葉轉換。

fft_type 是下列其中一項：

FFT：轉送複數到複數的 FFT。
IFFT：複數到複數的 FFT 反向轉換。
RFFT：轉送實數到複數的 FFT。
IRFFT：實數到複數 FFT 的反向運算 (即輸入複數，傳回實數)。

更正式地說，假設函式 fft 會將複雜型別的一維張量做為輸入，產生相同型別的一維張量做為輸出，並計算離散傅立葉轉換：

如為 fft_type = FFT，result 定義為一系列 L 運算的最終結果，其中 L = size(fft_length)。例如，針對 L = 3：

result1[i0, ..., :] = fft(operand[i0, ..., :])。
result2[i0, ..., :, iR-1] = fft(result1[i0, ..., :, iR-1])。
result[i0, ..., :, iR-2, iR-1] = fft(result2[i0, ..., :, iR-2, iR-1])。

此外，假設函式 ifft 具有相同的型別簽章，並計算 fft 的反向：

對於 fft_type = IFFT，result 定義為 fft_type = FFT 的反向運算。例如，針對 L = 3：

result1[i0, ..., :, iR-2, iR-1] = ifft(operand[i0, ..., :, iR-2, iR-1])。
result2[i0, ..., :, iR-1] = ifft(result1[i0, ..., :, iR-1])。
result[i0, ..., :] = ifft(result2[i0, ..., :])。

此外，假設函式 rfft 會採用浮點型別的 1 維張量，產生具有相同浮點語意的複雜型別 1 維張量，並執行下列作業：

rfft(real_operand) = truncated_result (其中
complex_operand... = (real_operand..., 0.0)。
complex_result = fft(complex_operand)。
truncated_result = complex_result[:(rank(complex_result) / 2 + 1)]。

(為實數運算元計算離散傅立葉轉換時，結果的前 N/2 + 1 個元素會明確定義其餘結果，因此 rfft 的結果會遭到截斷，以避免計算多餘的元素)。

如為 fft_type = RFFT，result 定義為一系列 L 運算的最終結果，其中 L = size(fft_length)。例如，針對 L = 3：

result1[i0, ..., :] = rfft(operand[i0, ..., :])。
result2[i0, ..., :, iR-1] = fft(result1[i0, ..., :, iR-1])。
result[i0, ..., :, iR-2, iR-1] = fft(result2[i0, ..., :, iR-2, iR-1])。

最後，假設函式 irfft 具有相同的型別簽章，並計算 rfft 的反向：

對於 fft_type = IRFFT，result 定義為 fft_type = RFFT 的反向運算。例如，針對 L = 3：

result1[i0, ..., :, iR-2, iR-1] = ifft(operand[i0, ..., :, iR-2, iR-1])。
result2[i0, ..., :, iR-1] = ifft(result1[i0, ..., :, iR-1])。
result[i0, ..., :] = irfft(result2[i0, ..., :])。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量	(C1)、(C2)、(C4)、(C5)
(I2)	`fft_type`	`FFT`、`IFFT`、`RFFT` 和 `IRFFT` 的列舉	(C2)、(C5)
(I3)	`fft_length`	`si64` 類型的 1 維張量常數	(C1)、(C3)、(C4)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量	(C2)、(C4)、(C5)

限制

(C1) size(fft_length) <= rank(operand)。
(C2) operand 和 result 元素類型之間的關係各不相同：
- 如果 fft_type = FFT、element_type(operand) 和 element_type(result) 具有相同的複雜型別。
- 如果 fft_type = IFFT、element_type(operand) 和 element_type(result) 具有相同的複雜型別。
- 如果 fft_type = RFFT、element_type(operand) 是浮點類型，而 element_type(result) 是具有相同浮點語意的複雜類型。
- 如果 fft_type = IRFFT、element_type(operand) 是複雜型別，而 element_type(result) 是具有相同浮點語意的浮點型別。
(C3) 1 <= size(fft_length) <= 3。
(C4) 如果 operand 和 result 之間有浮點類型張量 real，則 shape(real)[-size(fft_length):] = fft_length。
(C5) shape(result) = shape(operand)，但下列情況除外：
- 如果 fft_type = RFFT， dim(result, -1) = dim(operand, -1) = 0 ? 0 : dim(operand, -1) / 2 + 1。
- 如果 fft_type = IRFFT， dim(operand, -1) = dim(result, -1) = 0 ? 0 : dim(result, -1) / 2 + 1。

範例

// %operand: [(1.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0)]
%result = "stablehlo.fft"(%operand) {
  fft_type = #stablehlo<fft_type FFT>,
  fft_length = array<i64: 4>
} : (tensor<4xcomplex<f32>>) -> tensor<4xcomplex<f32>>
// %result: [(1.0, 0.0), (1.0, 0.0), (1.0, 0.0), (1.0, 0.0)]

floor

語意

對 operand 張量執行元素層級的下限運算，並產生 result 張量。實作 IEEE-754 規格的 roundToIntegralTowardNegative 運算。如果是量化型別，則會執行 dequantize_op_quantize(floor, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點類型張量或每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點類型張量或每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [-0.8166, -0.2530, 0.2530, 0.8166, 2.0]
%result = "stablehlo.floor"(%operand) : (tensor<5xf32>) -> tensor<5xf32>
// %result: [-1.0, -1.0, 0.0, 0.0, 2.0]

更多範例

彙整

語意

從 operand 張量收集 start_indices 中指定的偏移量，並產生 result 張量。

下圖以具體範例說明 result 中的元素如何對應至 operand 中的元素。這張圖會選取幾個範例索引 result，並詳細說明這些索引對應的 operand 索引。

更正式地說，result[result_index] = operand[operand_index]，其中：

batch_dims = [d for d in axes(result) and d not in offset_dims]。
batch_index = result_index[batch_dims...]。
start_index 的定義如下：
- start_indices[bi0, ..., :, ..., biN]，其中 bi 是 batch_index 中的個別元素，而 : 則會插入 index_vector_dim 索引，前提是 index_vector_dim < rank(start_indices)。
- 其他情況則為 [start_indices[batch_index]]。
適用於 axes(operand) 中的 d_operand，
- full_start_index[d_operand] = clamp(start_index[d_start], 0, dim(operand, d_operand) - slice_sizes[d_operand]) if d_operand = start_index_map[d_start]。
- 其他情況則為 full_start_index[d_operand] = 0。
適用於 axes(operand) 中的 d_operand，
- full_batching_index[d_operand] = batch_index[d_start - (d_start < index_vector_dim ? 0 : 1)] 如果 d_operand = operand_batching_dims[i_batching] 和 d_start = start_indices_batching_dims[i_batching]。
- 其他情況則為 full_batching_index[d_operand] = 0。
offset_index = result_index[offset_dims...]。
其中 oi 是 offset_index 中的個別元素，而 0 則會插入 collapsed_slice_dims 和 operand_batching_dims 的索引。full_offset_index = [oi0, ..., 0, ..., oiN]
operand_index = full_start_index + full_batching_index + full_offset_index。

如果 indices_are_sorted 為 true，則實作程序可以假設 start_indices 會根據 start_index_map 排序，否則行為未定義。更正式地說，對於 indices(result) 中的所有 i1 < i2， full_start_index(i1) <= full_start_index(i2)。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1)、(C8)、(C11)、(C17)、(C19-C21)、(C23)
(I2)	`start_indices`	整數型別的張量	(C2-C3)、(C14)、(C17)、(C22)
(I3)	`offset_dims`	`si64` 類型的 1 維張量常數	(C1)、(C4-C5)、(C22)
(I4)	`collapsed_slice_dims`	`si64` 類型的 1 維張量常數	(C1)、(C6-C9)、(C22)
(I5)	`operand_batching_dims`	`si64` 類型的 1 維張量常數	(C1)、(C6)、(C10-C12)、(C16-C18)、(C22)
(I6)	`start_indices_batching_dims`	`si64` 類型的 1 維張量常數	(C13-C17)
(I7)	`start_index_map`	`si64` 類型的 1 維張量常數	(C3)、(C18-C19)
(I8)	`index_vector_dim`	`si64` 類型的常數	(C2-C3)、(C15)、(C22)
(I9)	`slice_sizes`	`si64` 類型的 1 維張量常數	(C9)、(C12)、(C20-C22)
(I10)	`indices_are_sorted`	`i1` 類型的常數

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C5)、(C22-C23)

限制

(C1) rank(operand) = size(offset_dims) + size(collapsed_slice_dims) + size(operand_batching_dims)。
(C2) 0 <= index_vector_dim <= rank(start_indices)。
(C3) size(start_index_map) = index_vector_dim < rank(start_indices) ? dim(start_indices, index_vector_dim) : 1。
(C4) is_unique(offset_dims) and is_sorted(offset_dims)。
(C5) 0 <= offset_dims < rank(result)。
(C6) is_unique(concatenate(collapsed_slice_dims, operand_batching_dims))
(C7) is_sorted(collapsed_slice_dims)。
(C8) 0 <= collapsed_slice_dims < rank(operand)。
(C9) slice_sizes[collapsed_slice_dims...] <= 1。
(C10) is_sorted(operand_batching_dims)。
(C11) 0 <= operand_batching_dims < rank(operand)。
(C12) slice_sizes[operand_batching_dims...] <= 1。
(C13) is_unique(start_indices_batching_dims)。
(C14) 0 <= start_indices_batching_dims < rank(start_indices)。
(C15) index_vector_dim not in start_indices_batching_dims。
(C16) size(operand_batching_dims) == size(start_indices_batching_dims)。
(C17) dim(operand, operand_batching_dims...) = dim(start_indices, start_indices_batching_dims...)。
(C18) is_unique(concatenate(start_index_map, operand_batching_dims))。
(C19) 0 <= start_index_map < rank(operand)。
(C20) size(slice_sizes) = rank(operand)。
(C21) 0 <= slice_sizes <= shape(operand)。
(C22) shape(result) = combine(batch_dim_sizes, offset_dim_sizes) 其中：
- batch_dim_sizes = shape(start_indices)，但對應於 index_vector_dim 的 start_indices 維度大小不包含在內。
- offset_dim_sizes = slice_sizes，但 slice_sizes 中對應於 collapsed_slice_dims 和 operand_batching_dims 的維度大小不包含在內。
- combine 會將 batch_dim_sizes 放在對應於 batch_dims 的軸上，並將 offset_dim_sizes 放在對應於 offset_dims 的軸上。
(C23) element_type(operand) = element_type(result)。

範例

// %operand: [
//            [
//             [[1, 2], [3, 4], [5, 6], [7, 8]],
//             [[9, 10],[11, 12], [13, 14], [15, 16]],
//             [[17, 18], [19, 20], [21, 22], [23, 24]]
//            ],
//            [
//             [[25, 26], [27, 28], [29, 30], [31, 32]],
//             [[33, 34], [35, 36], [37, 38], [39, 40]],
//             [[41, 42], [43, 44], [45, 46], [47, 48]]
//            ]
//           ]
// %start_indices: [
//                  [
//                   [[0, 0], [1, 0], [2, 1]],
//                   [[0, 1], [1, 1], [0, 9]]
//                  ],
//                  [
//                   [[0, 0], [2, 1], [2, 2]],
//                   [[1, 2], [0, 1], [1, 0]]
//                  ]
//                 ]
%result = "stablehlo.gather"(%operand, %start_indices) {
  dimension_numbers = #stablehlo.gather<
    offset_dims = [3, 4],
    collapsed_slice_dims = [1],
    operand_batching_dims = [0],
    start_indices_batching_dims = [1],
    start_index_map = [2, 1],
    index_vector_dim = 3>,
  slice_sizes = array<i64: 1, 1, 2, 2>,
  indices_are_sorted = false
} : (tensor<2x3x4x2xi32>, tensor<2x2x3x2xi64>) -> tensor<2x2x3x2x2xi32>
// %result: [
//           [
//            [
//             [[1, 2], [3, 4]],
//             [[3, 4], [5, 6]],
//             [[13, 14], [15, 16]]
//            ],
//            [
//             [[33, 34], [35, 36]],
//             [[35, 36], [37, 38]],
//             [[41, 42], [43, 44]]
//            ]
//           ],
//           [
//            [
//             [[1, 2], [3, 4]],
//             [[13, 14], [15, 16]],
//             [[21, 22], [23, 24]]
//            ],
//            [
//             [[43, 44], [45, 46]],
//             [[33, 34], [35, 36]],
//             [[27, 28], [29, 30]]
//            ]
//           ]
//          ]

更多範例

get_dimension_size

語意

產生指定 dimension 的 operand 大小。更正式地說，result = dim(operand, dimension)。語意只與型別的形狀元件有關。元素類型可以是任何類型。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或量化張量	(C1)
(I2)	`dimension`	`si64` 類型的常數	(C1)

輸出內容

名稱	類型
`result`	`si32` 類型的 0 維張量

限制

(C1) 0 <= dimension < rank(operand)。

範例

// %operand: [[1, 2, 3], [4, 5, 6]]
%result = "stablehlo.get_dimension_size"(%operand) {
  dimension = 1 : i64
} : (tensor<2x3xi64>) -> tensor<i32>
// %result: 3

更多範例

get_tuple_element

注意： 根據 StableHLO v1.0 Cleanup #2283，這個作業似乎未獲架構和編譯器使用，因此正在評估是否要淘汰。因此，相容性保證有限 (6 個月)。

語意

擷取 operand 元組 index 位置的元素，並產生 result。更正式地說，result = operand[index]。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	元組	(C1)、(C2)
(I2)	`index`	`si32` 類型的常數	(C1)、(C2)

輸出內容

名稱	類型	限制
`result`	任何值	(C2)

限制

(C1) 0 <= index < size(operand)。
(C2) type(result) = tuple_element_types(operand)[index]。

範例

// %operand: ([1.0, 2.0], (3))
%result = "stablehlo.get_tuple_element"(%operand) <{index = 0 : i32}> : (tuple<tensor<2xf64>, tuple<tensor<i64>>>) -> tensor<2xf64>
// %result: [1.0, 2.0]

更多範例

如果

語意

根據 pred 的值，從 true_branch 或 false_branch 執行一個函式，並產生輸出內容。更正式地說，result = pred ? true_branch() : false_branch()。

輸入

標籤	名稱	類型	限制
(I1)	`pred`	`i1` 類型的 0 維張量
(I2)	`true_branch`	函式	(C1-C3)
(I3)	`false_branch`	函式	(C1)、(C2)

輸出內容

名稱	類型	限制
`results`	可變數量的張量、量化張量或權杖	(C3)

限制

(C1) input_types(true_branch) = input_types(false_branch) = []。
(C2) output_types(true_branch) = output_types(false_branch)。
(C3) type(results...) = output_types(true_branch)。

範例

// %result_true_branch: 10
// %result_false_branch: 11
// %pred: true
%result = "stablehlo.if"(%pred) ({
  "stablehlo.return"(%result_true_branch) : (tensor<i32>) -> ()
}, {
  "stablehlo.return"(%result_false_branch) : (tensor<i32>) -> ()
}) : (tensor<i1>) -> tensor<i32>
// %result: 10

更多範例

imag

語意

從 operand 逐一擷取虛部，並產生 result 張量。更正式地說，對於每個元素 x： imag(x) = is_complex(x) ? imaginary_part(x) : constant(0, element_type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量	(C1)、(C2)

輸出內容

名稱	類型	限制
`result`	浮點類型張量	(C1)、(C2)

限制

(C1) shape(result) = shape(operand)。
(C2) element_type(result) 的定義為：
- complex_element_type(element_type(operand)) (如果 is_complex(operand))。
- 其他情況則為 element_type(operand)。

範例

// %operand: [(1.0, 2.0), (3.0, 4.0)]
%result = "stablehlo.imag"(%operand) : (tensor<2xcomplex<f32>>) -> tensor<2xf32>
// %result: [2.0, 4.0]

更多範例

infeed

語意

從動態消息讀取資料，並產生 results。

infeed_config 的語意是由實作定義。

results 包含酬載值 (在前) 和權杖 (在後)。我們計畫在日後將酬載和權杖分成兩個獨立輸出，以提升清楚度 (#670)。

輸入

標籤	名稱	類型
(I1)	`token`	`token`
(I2)	`infeed_config`	`string` 類型的常數

輸出內容

名稱	類型	限制
`results`	可變數量的張量、量化張量或權杖	(C1-C3)

限制

(C1) 0 < size(results)。
(C2) is_empty(result[:-1]) 或 is_tensor(type(results[:-1]))。
(C3) is_token(type(results[-1]))。

範例

// %token: !stablehlo.token
// infeed_queue[0]: [[1, 2], [3, 4]]
// infeed_queue[1]: [[5, 6], [7, 8]]
%results0:2 = "stablehlo.infeed"(%token) {
  infeed_config = ""
} : (!stablehlo.token) -> (tensor<2x2xi64>, !stablehlo.token)
// results0#0: [[1, 2], [3, 4]]
%results1:2 = "stablehlo.infeed"(%token) {
  infeed_config = ""
} : (!stablehlo.token) -> (tensor<2x2xi64>, !stablehlo.token)
// results1#0: [[5, 6], [7, 8]]

更多範例

iota

語意

沿著 iota_dimension 維度，以遞增順序從零開始填入 output 張量的值。更正式地說，

output[output_index] = constant(is_quantized(output) ? quantize(output_index[iota_dimension], element_type(output)) : output_index[iota_dimension], element_type(output))。

輸入

標籤	名稱	類型	限制
(I1)	`iota_dimension`	`si64`	(C1)

輸出內容

名稱	類型	限制
`output`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) 0 <= iota_dimension < rank(output)。

範例

%output = "stablehlo.iota"() {
  iota_dimension = 0 : i64
} : () -> tensor<4x5xi32>
// %output: [
//           [0, 0, 0, 0, 0],
//           [1, 1, 1, 1, 1],
//           [2, 2, 2, 2, 2],
//           [3, 3, 3, 3, 3]
//          ]

%output = "stablehlo.iota"() {
  iota_dimension = 1 : i64
} : () -> tensor<4x5xi32>
// %output: [
//           [0, 1, 2, 3, 4],
//           [0, 1, 2, 3, 4],
//           [0, 1, 2, 3, 4],
//           [0, 1, 2, 3, 4]
//          ]

更多範例

is_finite

語意

逐一檢查 x 中的值是否為有限值 (即非 +Inf、-Inf 或 NaN)，並產生 y 張量。實作 IEEE-754 規格中的 isFinite 運算。如果是量化型別，結果一律為 true。

輸入

標籤	名稱	類型	限制
(I1)	`x`	浮點類型張量或每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`y`	布林值類型的張量	(C1)

限制

(C1) shape(x) = shape(y)。

範例

// Logical values: -Inf, +Inf, NaN, ...
// %x: [0xFFF0000000000000, 0x7FF0000000000000, 0x7FF8000000000000, -10.0, -0.0, 0.0, 10.0]
%y = "stablehlo.is_finite"(%x) : (tensor<7xf64) -> tensor<7xi1>
// %y: [false, false, false, true, true, true, true]

更多範例

log

語意

對 operand 張量執行元素級別的對數運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 log。
複數：複數對數。
量化型別：dequantize_op_quantize(log, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[1.0, 2.0], [3.0, 4.0]]
%result = "stablehlo.log"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [[0.0, 0.69314718055994529], [1.0986122886681098, 1.3862943611198906]]

更多範例

log_plus_one

語意

對 operand 張量執行元素級別的對數加一運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 logp1。
如為複數： complex(log(hypot(real(x) + 1, imag(x))), atan2(imag(x), real(x) + 1))
量化型別： dequantize_op_quantize(log_plus_one, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [0.0, -0.999, 7.0, 6.38905621, 15.0]
%result = "stablehlo.log_plus_one"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// %result: [0.0, -6.90776825, 2.07944155, 2.0, 2.77258873]

更多範例

logistic

語意

對 operand 張量執行元素級別的邏輯運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 division(1, addition(1, exp(-x)))。
複數：複數邏輯。
量化型別： dequantize_op_quantize(logistic, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[0.0, 1.0], [2.0, 3.0]]
%result = "stablehlo.logistic"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [[0.5, 0.73105858], [0.88079708, 0.95257413]]

更多範例

地圖

注意： 根據 StableHLO v1.0 Cleanup #2283，這個作業似乎未獲架構和編譯器使用，因此正在評估是否要淘汰。因此，相容性保證有限 (6 個月)。

語意

沿著 dimensions 將對應函式 computation 套用至 inputs，並產生 result 張量。

更正式地說，result[result_index] = computation(inputs...[result_index])。

輸入

標籤	名稱	類型	限制
(I1)	`inputs`	可變數量的張量或每個張量的量化張量	(C1-C4)
(I2)	`dimensions`	`si64` 類型的 1 維張量常數	(C3)
(I3)	`computation`	函式	(C4)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C1)、(C4)

限制

(C1) shape(inputs...) = shape(result)。
(C2) 0 < size(inputs) = N。
(C3) dimensions = range(rank(inputs[0]))。
(C4) computation 的型別為 (tensor<E0>, ..., tensor<EN-1>) -> tensor<E'>，其中 Ei = element_type(inputs[i]) 和 E' = element_type(result)。

範例

// %input0: [[0, 1], [2, 3]]
// %input1: [[4, 5], [6, 7]]
%result = "stablehlo.map"(%input0, %input1) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = stablehlo.multiply %arg0, %arg1 : tensor<i64>
    stablehlo.return %0 : tensor<i64>
}) {
  dimensions = array<i64: 0, 1>
} : (tensor<2x2xi64>, tensor<2x2xi64>) -> tensor<2x2xi64>
// %result: [[0, 5], [12, 21]]

更多範例

最高

語意

對張量 lhs 和 rhs 執行元素級別的 max 運算，並產生 result 張量。視元素類型而定，執行下列操作：

布林值：邏輯 OR。
整數：整數上限。
浮點數：IEEE-754 的 maximum。
如果是複數，則為 (real, imaginary) 配對的字典順序最大值。對複數強制排序會涉及令人意外的語意，因此我們計畫在未來移除對這項作業的複數支援 (#560)。
如為量化型別：
- dequantize_op_quantize(maximum, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	張量或每個張量的量化張量	(C1)
(I2)	`rhs`	張量或每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C1)

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [[1, 2], [7, 8]]
// %rhs: [[5, 6], [3, 4]]
%result = "stablehlo.maximum"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 6], [7, 8]]

更多範例

最低

語意

對張量 lhs 和 rhs 執行元素級別的 min 運算，並產生 result 張量。視元素類型而定，執行下列操作：

布林值：邏輯 AND。
整數：整數最小值。
浮點數：IEEE-754 的 minimum。
複數：(real, imaginary) 配對的字典順序最小值。對複數強制排序會涉及令人意外的語意，因此我們計畫在未來移除對這項作業的複數支援 (#560)。
如為量化型別：
- dequantize_op_quantize(minimum, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	張量或每個張量的量化張量	(C1)
(I2)	`rhs`	張量或每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C1)

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [[1, 2], [7, 8]]
// %rhs: [[5, 6], [3, 4]]
%result = "stablehlo.minimum"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[1, 2], [3, 4]]

更多範例

相乘

語意

執行兩個張量 lhs 和 rhs 的元素乘積，並產生 result 張量。視元素類型而定，執行下列操作：

布林值：邏輯 AND。
整數：整數乘法。
浮點數：IEEE-754 的 multiplication。
複數：複數乘法。
如為量化型別：
- dequantize_op_quantize(multiply, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	張量或每個張量的量化張量	(C1)
(I2)	`rhs`	張量或每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.multiply"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 12], [21, 32]]

更多範例

negate

語意

對 operand 張量執行元素層級的否定運算，並產生 result 張量。視元素類型而定，執行下列操作：

如為正負號整數：整數的負數。
無正負號整數：位元轉換為帶正負號整數、整數否定、位元轉換回無正負號整數。
浮點數：IEEE-754 的 negate。
如果是複數，則為複數的負數。
量化型別： dequantize_op_quantize(negate, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// Negation operation with integer Tensors
// %operand: [0, -2]
%result = "stablehlo.negate"(%operand) : (tensor<2xi32>) -> tensor<2xi32>
// %result: [0, 2]

// Negation operation with with complex tensors
// %operand: (2.5, 0.0)
%result = "stablehlo.negate"(%operand) : (tensor<1xcomplex<f32>>) -> tensor<1xcomplex<f32>>
// %result: [-2.5, -0.0]

更多範例

not

語意

對張量 operand 執行元素層級的 NOT 運算，並產生 result 張量。視元素類型而定，執行下列操作：

布林值：邏輯 NOT。
整數：位元 NOT。

引數

名稱	類型	限制
`operand`	布林值或整數類型的張量	(C1)

輸出內容

名稱	類型	限制
`result`	布林值或整數類型的張量	(C1)

限制

(C1) type(operand) = type(result)。

範例

// Bitwise operation with with integer tensors
// %operand: [[1, 2], [3, 4]]
%result = "stablehlo.not"(%operand) : (tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[-2, -3], [-4, -5]]

// Bitwise operation with with boolean tensors
// %operand: [true, false]
%result = "stablehlo.not"(%operand) : (tensor<2xi1>) -> tensor<2xi1>
// %result: [false, true]

更多範例

optimization_barrier

語意

確保產生 operand 的作業會在任何依附於 result 的作業之前執行，並防止編譯器轉換作業跨越障礙。除此之外，這項作業也是身分，即 result = operand。

引數

名稱	類型	限制
`operand`	可變數量的張量、每個張量的量化張量或權杖	(C1)

輸出內容

名稱	類型	限制
`result`	可變數量的張量、每個張量的量化張量或權杖	(C1)

限制

(C1) type(operand...) = type(result...)。

範例

// %operand0: 0.0
// %operand1: 1.0
%result0, %result1 = "stablehlo.optimization_barrier"(%operand0, %operand1) : (tensor<f32>, tensor<f32>) -> (tensor<f32>, tensor<f32>)
// %result0: 0.0
// %result1: 1.0

更多範例

或

語意

對兩個張量 lhs 和 rhs 執行元素層級的 OR 運算，並產生 result 張量。視元素類型而定，執行下列操作：

布林值：邏輯 OR。
整數：位元 OR。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	整數或布林值型別的張量	(C1)
(I2)	`rhs`	整數或布林值型別的張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數或布林值型別的張量	(C1)

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// Bitwise operation with with integer tensors
// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.or"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 6], [7, 12]]

// Logical operation with with boolean tensors
// %lhs: [[false, false], [true, true]]
// %rhs: [[false, true], [false, true]]
%result = "stablehlo.or"(%lhs, %rhs) : (tensor<2x2xi1>, tensor<2x2xi1>) -> tensor<2x2xi1>
// %result: [[false, true], [true, true]]

更多範例

outfeed

語意

將 inputs 寫入輸出饋給，並產生 result 符記。

outfeed_config 的語意是由實作定義。

輸入

標籤	名稱	類型
(I1)	`inputs`	可變數量的張量或量化張量
(I2)	`token`	`token`
(I3)	`outfeed_config`	`string` 類型的常數

輸出內容

名稱	類型
`result`	`token`

範例

%result = "stablehlo.outfeed"(%input0, %token) {
  outfeed_config = ""
} : (tensor<2x2x2xi64>, !stablehlo.token) -> !stablehlo.token

更多範例

pad

語意

擴展 operand，方法是在張量周圍以及張量的元素之間，填入指定的 padding_value。

edge_padding_low 和 edge_padding_high 分別指定在每個維度的低端 (索引 0 旁) 和高端 (最高索引旁) 新增的邊框間距量。填補量可以是負數，負數填補的絕對值表示要從指定維度移除的元素數量。

interior_padding 會指定在每個維度中，任意兩個元素之間新增的邊框間距量，不得為負值。內部邊框間距會先於邊框間距出現，因此負邊框間距會從內部邊框間距運算元中移除元素。

更正式地說，result[result_index] 的定義如下：

operand[operand_index] (如果有的話)。 result_index = edge_padding_low + operand_index * (interior_padding + 1)
其他情況則為 padding_value。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1)、(C2)、(C4)
(I2)	`padding_value`	0 維度張量或每個張量的量化張量	(C1)
(I3)	`edge_padding_low`	`si64` 類型的 1 維張量常數	(C1)、(C4)
(I4)	`edge_padding_high`	`si64` 類型的 1 維張量常數	(C1)、(C4)
(I5)	`interior_padding`	`si64` 類型的 1 維張量常數	(C2-C4)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C3-C6)

限制

(C1) element_type(operand) = element_type(padding_value) = element_type(result)。
(C2) size(edge_padding_low) = size(edge_padding_high) = size(interior_padding) = rank(operand)。
(C3) 0 <= interior_padding。
(C4) shape(result) = shape(operand) + edge_padding_low + max(shape(operand) - 1, 0) * interior_padding + edge_padding_high。

範例

// %operand: [
//            [1, 2, 3],
//            [4, 5, 6]
//           ]
// %padding_value: 0
%result = "stablehlo.pad"(%operand, %padding_value) {
  edge_padding_low = array<i64: 0, 1>,
  edge_padding_high = array<i64: 2, 1>,
  interior_padding = array<i64: 1, 2>
} : (tensor<2x3xi32>, tensor<i32>) -> tensor<5x9xi32>
// %result: [
//           [0, 1, 0, 0, 2, 0, 0, 3, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 4, 0, 0, 5, 0, 0, 6, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0],
//           [0, 0, 0, 0, 0, 0, 0, 0, 0]
//          ]

更多範例

partition_id

語意

產生目前程序的 partition_id。

輸出內容

名稱	類型
`result`	`ui32` 類型的 0 維張量

範例

%result = "stablehlo.partition_id"() : () -> tensor<ui32>

更多範例

popcnt

語意

對 operand 張量中設定的位元數執行元素層級的計數，並產生 result 張量。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	整數型別的張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數型別的張量	(C1)

限制

(C1) type(operand) = type(result)。

範例

// %operand: [0, 1, 2, 127]
%result = "stablehlo.popcnt"(%operand) : (tensor<4xi64>) -> tensor<4xi64>
// %result: [0, 1, 1, 7]

更多範例

功率

語意

對 lhs 張量執行元素級別的指數運算，並產生 result 張量。rhs視元素類型而定，執行下列操作：

整數：整數指數運算。
浮點數：IEEE-754 的 pow。
複數：複數指數。
量化型別：dequantize_op_quantize(power, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)
(I2)	`rhs`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %lhs: [-2.0, -0.0, -36.0, 5.0, 3.0, 10000.0]
// %rhs: [2.0, 2.0, 1.1, 2.0, -1.0, 10.0]
%result = "stablehlo.power"(%lhs, %rhs) : (tensor<6xf64>, tensor<6xf64>) -> tensor<6xf64>
// %result: [4.0, 0.0, -nan, 25.0, 0.333333343, inf]

更多範例

real

語意

從 operand 中逐一擷取實部，並產生 result 張量。更正式地說，對於每個元素 x： real(x) = is_complex(x) ? real_part(x) : x。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量	(C1)、(C2)

輸出內容

名稱	類型	限制
`result`	浮點類型張量	(C1)、(C2)

限制

(C1) shape(result) = shape(operand)。
(C2) element_type(result) 的定義為：
- complex_element_type(element_type(operand)) (如果 is_complex(operand))。
- 其他情況則為 element_type(operand)。

範例

// %operand: [(1.0, 2.0), (3.0, 4.0)]
%result = "stablehlo.real"(%operand) : (tensor<2xcomplex<f32>>) -> tensor<2xf32>
// %result: [1.0, 3.0]

更多範例

recv

語意

從具有 channel_id 的管道接收資料，並產生 results。

如果 is_host_transfer 為 true，則作業會從主機傳輸資料。否則，系統會根據 source_target_pairs 的值，從其他裝置轉移資料。這個旗標會重複 channel_type 中提供的資訊，因此我們日後只會保留其中一個 (#666)。如果 is_host_transfer = false 且 source_target_pairs 為 None 或空白，則視為未定義的行為。

results 包含酬載值 (在前) 和權杖 (在後)。我們計畫在日後將酬載和權杖分成兩個獨立輸出，以提升清楚度 (#670)。

輸入

標籤	名稱	類型	限制
(I1)	`token`	`token`
(I2)	`channel_id`	`si64` 類型的常數
(I3)	`channel_type`	`DEVICE_TO_DEVICE` 和 `DEVICE_TO_HOST` 的列舉	(C5)
(I4)	`is_host_transfer`	`i1` 類型的常數	(C5-C6)
(I5)	`source_target_pairs`	`si64` 類型的 2 維張量常數	(C1-C4)、(C6)

輸出內容

名稱	類型	限制
`results`	可變數量的張量、量化張量或權杖	(C2-C4)

限制

(C1) dim(source_target_pairs, 1) = 2。
(C2) is_unique(source_target_pairs[:, 0])。
(C3) is_unique(source_target_pairs[:, 1])。
(C4) 0 <= source_target_pairs < N，其中 N 定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_partition，則為 num_partitions。
(C5) channel_type 的定義為：
- 若 is_host_transfer = true，則設為DEVICE_TO_HOST
- 其他情況則為 DEVICE_TO_DEVICE。

範例

%results0, %results1 = "stablehlo.recv"(%token) {
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 1>,
  is_host_transfer = false,
  source_target_pairs = dense<[[0, 1], [1, 2]]> : tensor<2x2xi64>
} : (!stablehlo.token) -> (tensor<2x2xi64>, !stablehlo.token)

更多範例

減少

語意

沿著 dimensions 對 inputs 和 init_values 套用縮減函式 body，並產生 results 張量。

縮減順序是由實作定義，也就是說，body 和 init_values 必須形成單體，才能確保運算在所有實作中，對所有輸入產生相同的結果。不過，許多常見的縮減方式並不符合這項條件。舉例來說，body 的浮點加法和 init_values 的零實際上不會形成單體，因為浮點加法不遵守結合律。

更正式地說，results...[j0, ..., jR-1] = reduce(input_slices_converted)，其中：

input_slices = inputs...[j0, ..., :, ..., jR-1]，其中 : 會插入 dimensions。
input_slices_converted = to_destination_type(input_slices..., type(func_inputs(body)[:len(func_inputs(body))//2])...)。
init_values_converted = to_destination_type(init_values..., type(func_inputs(body)[len(func_inputs(body))//2:])...)。
reduce(input_slices_converted) = exec(schedule)，其中： schedule
- exec(node) = body(exec(node.left), exec(node.right))。
- exec(leaf) = leaf.value。
schedule 是實作定義的完整二元樹，其中序遍歷包含：
- input_slices_converted...[index] 值，適用於 index 中所有 index_space(input_slices_converted)，且 index 依字典順序遞增。
- 並在實作定義的位置插入實作定義數量的 init_values_converted。

輸入

標籤	名稱	類型	限制
(I1)	`inputs`	可變數量的張量或每個張量的量化張量	(C1-C4)、(C6)、(C7)
(I2)	`init_values`	可變數量的 0 維度張量或每個張量的量化張量	(C2)、(C3)
(I3)	`dimensions`	`si64` 類型的 1 維張量常數	(C4)、(C5)、(C7)
(I4)	`body`	函式	(C6)

輸出內容

名稱	類型	限制
`results`	可變數量的張量或每個張量的量化張量	(C3)、(C7)、(C8)

限制

(C1) same(shape(inputs...))。
(C2) element_type(inputs...) = element_type(init_values...)。
(C3) 0 < size(inputs) = size(init_values) = size(results) = N。
(C4) 0 <= dimensions < rank(inputs[0])。
(C5) is_unique(dimensions)。
(C6) body 具有 (tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ..., 類型，其中 tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>)。is_promotable(element_type(inputs[i]), Ei)
(C7) shape(results...) = shape(inputs...)，但對應 dimensions 的 inputs... 維度大小不包含在內。
(C8) element_type(results[i]) = Ei 適用於 [0,N) 中的所有 i。

範例

// %input = [[0, 1, 2, 3, 4, 5]]
// %init_value = 0
%result = "stablehlo.reduce"(%input, %init_value) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  dimensions = array<i64: 1>
} : (tensor<1x6xi64>, tensor<i64>) -> tensor<1xi64>
// %result = [15]

更多範例

reduce_precision

語意

對 operand 執行元素層級的轉換，轉換成使用 exponent_bits 和 mantissa_bits 的其他浮點類型，然後再轉換回原始浮點類型，並產生 output 張量。

更正式地來說：

原始值的尾數位元會更新，以使用 roundToIntegralTiesToEven 語意，將原始值四捨五入至 mantissa_bits 可表示的最接近值。
然後，如果 mantissa_bits 小於原始值的尾數位元數，尾數位元會截斷為 mantissa_bits。
接著，如果中間結果的指數位元不符合 exponent_bits 提供的範圍，中間結果就會溢位至無限大 (使用原始符號)，或下溢至零 (使用原始符號)。
如果是量化型別，則會執行 dequantize_op_quantize( lambda operand: reduce_precision(operand, exponent_bits, mantissa_bits), operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點類型張量或每個張量的量化張量	(C1)
(I2)	`exponent_bits`	`si32` 類型的常數	(C2)
(I3)	`mantissa_bits`	`si32` 類型的常數	(C3)

輸出內容

名稱	類型	限制
`output`	浮點類型張量或每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(output)。
(C2) 1 <= exponent_bits。
(C3) 0 <= mantissa_bits。

範例

// Logical values: +Inf, NaN, +Denormal, 0.0, 65519.0, 65520.0
// %operand: [0x7FF0000000000000, 0x7FFFFFFFFFFFFFFF, 0x0000000000000001, 0.0, 65519.0, 65520.0]
%output = "stablehlo.reduce_precision"(%operand) {
  exponent_bits = 5 : i32,
  mantissa_bits = 10 : i32
} : (tensor<6xf64>) -> tensor<6xf64>
// Logical values: +Inf, NaN, 0.0, 0.0, 65504.0, +Inf
// %output: [0x7FF0000000000000, 0x7FFFFFFFFFFFFFFF, 0.0, 0.0, 65504.0, 0x7FF0000000000000]

更多範例

reduce_scatter

語意

reduce_scatter

在 StableHLO 程序格的每個程序群組中，使用 computations 對每個程序的 operand 張量值執行縮減作業，沿著 scatter_dimension 將縮減結果分割成多個部分，並在程序之間分散分割部分，產生 result。

這項作業會將 StableHLO 程序格線分割成 process_groups，定義如下：

cross_replica(replica_groups) if channel_id <= 0 and use_global_device_ids = false。
cross_replica_and_partition(replica_groups) if channel_id > 0 and use_global_device_ids = false。
flattened_ids(replica_groups) if channel_id > 0 and use_global_device_ids = true。

接著，在每個 process_group 中執行下列操作：

reduced_value = all_reduce(operand, replica_groups, channel_id, use_global_device_ids, computation)。
parts@sender = split(reduced_value@sender, dim(process_groups, 1), scatter_dimension)。
result@receiver = parts@sender[receiver_index]，其中 sender 位於 process_group，且 receiver_index = process_group.index(receiver)。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1)、(C2)、(C7)、(C8)
(I2)	`scatter_dimension`	`si64` 類型的常數	(C1)、(C2)、(C8)
(I3)	`replica_groups`	`si64` 類型的 2 維張量常數	(C3-C5)
(I4)	`channel_id`	`si64` 類型的常數	(C6)
(I5)	`use_global_device_ids`	`i1` 類型的常數	(C6)
(I6)	`computation`	函式	(C7)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C8-C9)

限制

(C1) dim(operand, scatter_dimension) % dim(process_groups, 1) = 0。
(C2) 0 <= scatter_dimension < rank(operand)。
(C3) is_unique(replica_groups)。
(C4) size(replica_groups) 的定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_replica_and_partition，則為 num_replicas。
- 如果使用 flattened_ids，則為 num_processes。
(C5) 0 <= replica_groups < size(replica_groups)。
(C6) If use_global_device_ids = true, then channel_id > 0.
(C7) computation 具有 (tensor<E>, tensor<E>) -> (tensor<E>) 類型，其中 is_promotable(element_type(operand), E)。
(C8) shape(result) = shape(operand) 除外：
- dim(result, scatter_dimension) = dim(operand, scatter_dimension) / dim(process_groups, 1)。
(C9) element_type(result) = E。

範例

// num_replicas: 2
// num_partitions: 1
// %operand@(0, 0): [[1, 2, 3, 4],
//                   [5, 6, 7, 8]]
// %operand@(1, 0): [[9, 10, 11, 12],
//                   [13, 14, 15, 16]]
%result = "stablehlo.reduce_scatter"(%operand) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
  %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
  "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  scatter_dimension = 1 : i64,
  replica_groups = dense<[[0, 1]]> : tensor<1x2xi64>,
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 0>
} : (tensor<2x4xi64>) -> tensor<2x2xi64>
//
// %result@(0, 0): [[10, 12],
//                  [18, 20]]
// %result@(1, 0): [[14, 16],
//                  [22, 24]]

更多範例

reduce_window

語意

將縮減函式 body 套用至 inputs 和 init_values 的視窗，並產生 results。

下圖以具體範例說明如何從 inputs... 計算 results... 中的元素。

reduce_window

更正式地說， results...[result_index] = reduce(windows, init_values, axes(inputs...), body) (請參閱 reduce)，其中：

padded_inputs = pad(inputs..., init_values..., padding[:, 0], padding[:, 1], base_dilations - 1)。
window_start = result_index * window_strides。
window_end = window_start + (window_dimensions - 1) * window_dilations + 1。
windows = slice(padded_inputs..., window_start, window_end, window_dilations)。

輸入

標籤	名稱	類型	限制
(I1)	`inputs`	可變數量的張量或每個張量的量化張量	(C1-C4)、(C6)、(C8)、(C10)、(C12)、(C13)、(C15)
(I2)	`init_values`	可變數量的 0 維度張量或每個張量的量化張量	(C1)、(C13)
(I3)	`window_dimensions`	`si64` 類型的 1 維張量常數	(C4)、(C5)、(C15)
(I4)	`window_strides`	`si64` 類型的 1 維張量常數	(C6)、(C7)、(C15)
(I5)	`base_dilations`	`si64` 類型的 1 維張量常數	(C8)、(C9)、(C15)
(I6)	`window_dilations`	`si64` 類型的 1 維張量常數	(C10)、(C11)、(C15)
(I7)	`padding`	`si64` 類型的 2 維張量常數	(C12)、(C15)
(I8)	`body`	函式	(C13)

輸出內容

名稱	類型	限制
`results`	可變數量的張量或每個張量的量化張量	(C1)、(C14-C16)

限制

(C1) 0 < size(inputs) = size(init_values) = size(results) = N。
(C2) same(shape(inputs...))。
(C3) element_type(inputs...) = element_type(init_values...)。
(C4) size(window_dimensions) = rank(inputs[0])。
(C5) 0 < window_dimensions。
(C6) size(window_strides) = rank(inputs[0])。
(C7) 0 < window_strides。
(C8) size(base_dilations) = rank(inputs[0])。
(C9) 0 < base_dilations。
(C10) size(window_dilations) = rank(inputs[0])。
(C11) 0 < window_dilations。
(C12) shape(padding) = [rank(inputs[0]), 2]。
(C13) body 的類型為 (tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ..., tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>)，其中 is_promotable(element_type(inputs[i]), Ei)。
(C14) same(shape(results...))。
(C15) shape(results[0]) = num_windows 其中：
- dilated_input_shape = shape(inputs[0]) = 0 ? 0 : (shape(inputs[0]) - 1) * base_dilations + 1。
- padded_input_shape = padding[:, 0] + dilated_input_shape + padding[:, 1]。
- dilated_window_shape = (window_dimensions - 1) * window_dilations + 1。
- is_empty_window = padded_input_shape = 0 || dilated_window_shape > padded_input_shape。
- num_windows = is_empty_window ? 0 : floor((padded_input_shape - dilated_window_shape) / window_strides) + 1。
(C16) element_type(results[i]) = Ei 適用於 [0,N) 中的所有 i。

範例

// %input = [[1, 2], [3, 4], [5, 6]]
// %init_value = 0
%result = "stablehlo.reduce_window"(%input, %init_value) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  window_dimensions = array<i64: 2, 1>,
  window_strides = array<i64: 4, 1>,
  base_dilations = array<i64: 2, 1>,
  window_dilations = array<i64: 3, 1>,
  padding = dense<[[2, 1], [0, 0]]> : tensor<2x2xi64>
} : (tensor<3x2xi64>, tensor<i64>) -> tensor<2x2xi64>
// %result = [[0, 0], [3, 4]]

更多範例

餘數

語意

對被除數 lhs 和除數 rhs 張量執行元素層級的餘數運算，並產生 result 張量。

更正式地來說，結果的正負號取自被除數，且結果的絕對值一律小於除數的絕對值。餘數計算方式為 lhs - d * rhs，其中 d 的計算方式如下：

整數：stablehlo.divide(lhs, rhs)。
浮點數：來自 IEEE-754，並具有捨入屬性 division(lhs, rhs) roundTowardZero。
複數：待定 (#997)。
如為量化型別：
- dequantize_op_quantize(remainder, lhs, rhs, type(result))。

對於浮點元素型別，這項作業與 IEEE-754 規格中的 remainder 作業相反，後者是與 lhs/rhs 的確切值最接近的整數值，且會將同分值捨入為偶數。d

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)
(I2)	`rhs`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %lhs: [17, -17, 17, -17]
// %rhs: [3, 3, -3, -3]
%result = "stablehlo.remainder"(%lhs, %rhs) : (tensor<4xi64>, tensor<4xi64>) -> tensor<4xi64>
// %result: [2, -2, 2, -2]

更多範例

replica_id

語意

產生目前程序的 replica_id。

輸出內容

名稱	類型
`result`	`ui32` 類型的 0 維張量

範例

%result = "stablehlo.replica_id"() : () -> tensor<ui32>

更多範例

reshape

語意

將 operand 張量重塑為 result 張量。從概念上來說，這相當於保留相同的標準表示法，但可能會變更形狀，例如從 tensor<2x3xf32> 變更為 tensor<3x2xf32> 或 tensor<6xf32>。

更正式地來說，result[result_index] = operand[operand_index] 其中 result_index 和 operand_index 在 index_space(result) 和 index_space(operand) 的字典順序中具有相同位置。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或量化張量	(C1-C3)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C1-C3)

限制

(C1) element_type(result) 由以下項目提供：
- element_type(operand) (如果 !is_per_axis_quantized(operand))。
- element_type(operand)，但 quantization_dimension(operand) 和 quantization_dimension(result) 可能不同。
(C2) size(operand) = size(result)。
(C3) If is_per_axis_quantized(operand)：
- reduce(dims(operand, [0, 1, ..., quantization_dimension(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [0, 1, ..., quantization_dimension(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y)。
- dim(operand, quantization_dimension(operand)) = dim(result, quantization_dimension(result))。
- reduce(dims(operand, [quantization_dimension(operand) + 1, ..., rank(operand) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y) = reduce(dims(result, [quantization_dimension(result) + 1, ..., rank(result) - 1]), init_values=1, dimensions=[0], body=lambda x, y: x * y)。

範例

// %operand: [[1, 2, 3], [4, 5, 6]]
%result = "stablehlo.reshape"(%operand) : (tensor<2x3xi32>) -> tensor<3x2xi32>
// %result: [[1, 2], [3, 4], [5, 6]]

更多範例

反向排序

語意

沿指定 dimensions 翻轉 operand 中的元素順序，並產生 result 張量。更正式地說，result[result_index] = operand[operand_index]，其中：

operand_index[d] = dim(result, d) - result_index[d] - 1 if d in dimensions.
其他情況則為 operand_index[d] = result_index[d]。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1)、(C3)
(I2)	`dimensions`	`si64` 類型的 1 維張量常數	(C2)、(C3)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C1)、(C3)

限制

(C1) type(operand) = type(result)。
(C2) is_unique(dimensions)。
(C3) 0 <= dimensions < rank(result)。

範例

// %operand = [[1, 2], [3, 4], [5, 6]]
%result = "stablehlo.reverse"(%operand) {
  dimensions = array<i64: 1>
} : (tensor<3x2xi32>) -> tensor<3x2xi32>
// %result: [[2, 1], [4, 3], [6, 5]]

更多範例

rng

注意： 根據 StableHLO v1.0 Cleanup #2283，這個作業似乎未獲架構和編譯器使用，因此正在評估是否要淘汰。因此，相容性保證有限 (6 個月)。

語意

使用 rng_distribution 演算法產生隨機數，並產生指定形狀 shape 的 result 張量。

如果是 rng_distribution = UNIFORM，則隨機數字會按照間隔 [a, b) 的均勻分布產生。如果 a >= b，行為未定義。

如果 rng_distribution = NORMAL，系統會按照常態分布產生隨機數字，平均值 = a，標準差 = b。如果是 b < 0，行為則未定義。

隨機數字的確切產生方式取決於實作。舉例來說，這些函式可能具有決定性，也可能不具決定性，而且可能使用隱藏狀態，也可能不使用。

在與許多利害關係人對話時，這個作業已有效停用，因此我們計畫在未來移除這個作業 (#597)。

輸入

標籤	名稱	類型	限制
(I1)	`a`	整數、布林值或浮點值類型的 0 維張量	(C1)、(C2)
(I2)	`b`	整數、布林值或浮點值類型的 0 維張量	(C1)、(C2)
(I3)	`shape`	`si64` 類型的 1 維張量常數	(C3)
(I4)	`rng_distribution`	`UNIFORM` 和 `NORMAL` 的列舉	(C2)

輸出內容

名稱	類型	限制
`result`	整數、布林值或浮點類型張量	(C1-C3)

限制

(C1) element_type(a) = element_type(b) = element_type(result)。
(C2) If rng_distribution = NORMAL, then is_float(a).
(C3) shape(result) = shape。

範例

// %a = 0
// %b = 2
// %shape = [3, 3]
%result = "stablehlo.rng"(%a, %b, %shape) {
  rng_distribution = #stablehlo<rng_distribution UNIFORM>
} : (tensor<i32>, tensor<i32>, tensor<2xi64>) -> tensor<3x3xi32>
// %result: [
//           [1, 0, 1],
//           [1, 1, 1],
//           [0, 0, 0]
//          ]

rng_bit_generator

語意

傳回以均勻隨機位元填滿的 output，以及使用虛擬亂數產生器演算法 rng_algorithm (以初始狀態 initial_state 為準) 更新的輸出狀態 output_state。輸出內容保證是 initial_state 的決定性函式，但不保證在實作之間是決定性的。

rng_algorithm 是下列其中一項：

DEFAULT：實作定義的演算法。
THREE_FRY：Threefry 演算法的實作定義變體。*
PHILOX：Philox 演算法的實作定義變體。*

* 參閱：Salmon 等人，SC 2011。平行隨機數：簡單 3 步驟。

輸入

標籤	名稱	類型	限制
(I1)	`rng_algorithm`	`DEFAULT`、`THREE_FRY` 和 `PHILOX` 的列舉	(C2)
(I2)	`initial_state`	`ui64` 類型的 1 維張量	(C1)、(C2)

輸出內容

名稱	類型	限制
`output_state`	`ui64` 類型的 1 維張量	(C1)
`output`	整數或浮點類型張量

限制

(C1) type(initial_state) = type(output_state)。
(C2) size(initial_state) 的定義為：
- 如果 rng_algorithm = DEFAULT，則為實作定義。
- 2 (如果 rng_algorithm = THREE_FRY)。
- 2 或 3 (如果 rng_algorithm = PHILOX)。

範例

// %initial_state: [1, 2]
%output_state, %output = "stablehlo.rng_bit_generator"(%initial_state) {
  rng_algorithm = #stablehlo<rng_algorithm THREE_FRY>
} : (tensor<2xui64>) -> (tensor<2xui64>, tensor<2x2xui64>)
// %output_state: [1, 6]
// %output: [
//           [9236835810183407956, 16087790271692313299],
//           [18212823393184779219, 2658481902456610144]
//          ]

round_nearest_afz

語意

對 operand 張量執行元素層級的四捨五入，取最接近的整數，並產生 result 張量。實作 IEEE-754 規格中的 roundToIntegralTiesToAway 運算。如果是量化型別，則會執行 dequantize_op_quantize(round_nearest_afz, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點類型張量或每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點類型張量或每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand = [-2.5, 0.4, 0.5, 0.6, 2.5]
%result = "stablehlo.round_nearest_afz"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// %result: [-3.0, 0.0, 1.0, 1.0, 3.0]

更多範例

round_nearest_even

語意

對 operand 張量執行元素層級的四捨五入，取最接近的整數，並將結果存入 result 張量。如果兩個整數與原始值距離相等，則取偶數。實作 IEEE-754 規格的 roundToIntegralTiesToEven 運算。如果是量化型別，則會執行 dequantize_op_quantize(round_nearest_even, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點類型張量或每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點類型張量或每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand = [-2.5, 0.4, 0.5, 0.6, 2.5]
%result = "stablehlo.round_nearest_even"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// %result: [-2.0, 0.0, 0.0, 1.0, 2.0]

更多範例

rsqrt

語意

對 operand 張量執行元素級別的平方根倒數運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 rSqrt。
複數：複數的倒數平方根。
量化型別：dequantize_op_quantize(rsqrt, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[1.0, 4.0], [9.0, 25.0]]
%result = "stablehlo.rsqrt"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[1.0, 0.5], [0.33333343, 0.2]]

更多範例

散布

語意

產生與 inputs 張量相等的 results 張量，但使用 update_computation 時，scatter_indices 指定的幾個切片會以 updates 值更新。

下圖以具體範例說明 updates... 中的元素如何對應至 results... 中的元素。這張圖表會選取幾個範例 updates... 索引，並詳細說明這些索引對應的 results... 索引。

更正式地說，對於 index_space(updates[0]) 中的所有 update_index：

update_scatter_dims = [d for d in axes(updates[0]) and d not in update_window_dims]。
update_scatter_index = update_index[update_scatter_dims...]。
start_index 的定義如下：
- scatter_indices[si0, ..., :, ..., siN]，其中 si 是 update_scatter_index 中的個別元素，而 : 則插入 index_vector_dim 索引，前提是 index_vector_dim < rank(scatter_indices)。
- 其他情況則為 [scatter_indices[update_scatter_index]]。
適用於 axes(inputs[0]) 中的 d_input，
- full_start_index[d_input] = start_index[d_start] (如果有的話)。 d_input = scatter_dims_to_operand_dims[d_start]
- 其他情況則為 full_start_index[d_input] = 0。
適用於 axes(inputs[0]) 中的 d_input，
- full_batching_index[d_input] = update_scatter_index[d_start - (d_start < index_vector_dim ? 0 : 1)] 如果 d_input = input_batching_dims[i_batching] 和 d_start = scatter_indices_batching_dims[i_batching]。
- 其他情況則為 full_batching_index[d_input] = 0。
update_window_index = update_index[update_window_dims...]。
其中 wi 是 update_window_index 中的個別元素，而 0 則會插入 inserted_window_dims 和 input_batching_dims 的索引。full_window_index = [wi0, ..., 0, ..., wiN]
result_index = full_start_index + full_batching_index + full_window_index。

因此，results = exec(schedule, inputs)，其中：

schedule 是 index_space(updates[0]) 的實作定義排列。
exec([update_index, ...], results) = exec([...], updated_results) where:
- 如果 result_index 位於 shape(results...) 的邊界內
- updates_converted = to_destination_type( updates...[update_index], type(func_inputs(update_computation) [len(func_inputs(update_computation))//2:])... )
- updated_values = update_computation(results...[result_index], updates_converted)
- updated_results 是 results 的副本，且 results...[result_index] 設為 updated_values...。
- 你也可以
- updated_results = results。
exec([], results) = results。

如果 indices_are_sorted 為 true，則實作項目可以假設 scatter_indices 會根據 scatter_dims_to_operand_dims 排序，否則行為未定義。更正式地說，對於 indices(result) 中的所有 i1 < i2，full_start_index(i1) <= full_start_index(i2)。

如果 unique_indices 是 true，則實作項目可以假設所有要分散的 result_index 索引都是不重複的。如果 unique_indices 為 true，但要分散到的索引並非不重複，則行為未定義。

輸入

標籤	名稱	類型	限制
(I1)	`inputs`	可變數量的張量或每個張量的量化張量	(C1)、(C2)、(C4-C6)、(C11)、(C13)、(C18)、(C21)、(C23-C24)
(I2)	`scatter_indices`	整數型別的張量	(C4)、(C15)、(C19)、(C22)
(I3)	`updates`	可變數量的張量或每個張量的量化張量	(C3-C6)、(C8)
(I4)	`update_window_dims`	`si64` 類型的 1 維張量常數	(C2)、(C4)、(C7-C8)
(I5)	`inserted_window_dims`	`si64` 類型的 1 維張量常數	(C2)、(C4)、(C9-C11)
(I6)	`input_batching_dims`	`si64` 類型的 1 維張量常數	(C2)、(C4)、(C9)、(C12-13)、(C17-18)、(C20)
(I7)	`scatter_indices_batching_dims`	`si64` 類型的 1 維張量常數	(C14-C18)
(I8)	`scatter_dims_to_operand_dims`	`si64` 類型的 1 維張量常數	(C19-C21)
(I9)	`index_vector_dim`	`si64` 類型的常數	(C4)、(C16)、(C19)、(C22)
(I10)	`indices_are_sorted`	`i1` 類型的常數
(I11)	`unique_indices`	`i1` 類型的常數
(I12)	`update_computation`	函式	(C23)

輸出內容

名稱	類型	限制
`results`	可變數量的張量或每個張量的量化張量	(C24-C25)

限制

(C1) same(shape(inputs...))。
(C2) rank(inputs[0]) = size(update_window_dims) + size(inserted_window_dims) + size(input_batching_dims)。
(C3) same(shape(updates...))。
(C4) shape(updates[0]) = combine(update_scatter_dim_sizes, update_window_dim_sizes) 其中：
- update_scatter_dim_sizes = shape(scatter_indices)，但對應至 index_vector_dim 的 scatter_indices 維度大小不包含在內。
- update_window_dim_sizes <= shape(inputs[0])，但對應於 inserted_window_dims 和 input_batching_dims 的 inputs[0] 維度大小不包含在內。
- combine 會將 update_scatter_dim_sizes 放在對應於 update_scatter_dims 的軸上，並將 update_window_dim_sizes 放在對應於 update_window_dims 的軸上。
(C5) 0 < size(inputs) = size(updates) = N。
(C6) element_type(updates...) = element_type(inputs...)。
(C7) is_unique(update_window_dims) and is_sorted(update_window_dims)。
(C8) 0 <= update_window_dims < rank(updates[0])。
(C9) is_unique(concatenate(inserted_window_dims, input_batching_dims))
(C10) is_sorted(inserted_window_dims)。
(C11) 0 <= inserted_window_dims < rank(inputs[0])。
(C12) is_sorted(input_batching_dims)。
(C13) 0 <= input_batching_dims < rank(inputs[0]))。
(C14) is_unique(scatter_indices_batching_dims)。
(C15) 0 <= scatter_indices_batching_dims < rank(scatter_indices)。
(C16) index_vector_dim not in scatter_indices_batching_dims。
(C17) size(input_batching_dims) == size(scatter_indices_batching_dims)。
(C18) dim(inputs[0], input_batching_dims...) = dim(scatter_indices, scatter_indices_batching_dims...)。
(C19) size(scatter_dims_to_operand_dims) = index_vector_dim < rank(scatter_indices) ? dim(scatter_indices, index_vector_dim) : 1。
(C20) is_unique(concatenate(scatter_dims_to_operand_dims, input_batching_dims))。
(C21) 0 <= scatter_dims_to_operand_dims < rank(inputs[0])。
(C22) 0 <= index_vector_dim <= rank(scatter_indices)。
(C23) update_computation 的類型為 (tensor<E0>, ..., tensor<EN-1>, tensor<E0>, ..., tensor<EN-1>) -> (tensor<E0>, ..., tensor<EN-1>)，其中 is_promotable(element_type(inputs[i]), Ei)。
(C24) shape(inputs...) = shape(results...)。
(C25) element_type(results[i]) = Ei 適用於所有 i 中的 [0,N)。

範例

// %input: [
//          [
//           [[1, 2], [3, 4], [5, 6], [7, 8]],
//           [[9, 10],[11, 12], [13, 14], [15, 16]],
//           [[17, 18], [19, 20], [21, 22], [23, 24]]
//          ],
//          [
//           [[25, 26], [27, 28], [29, 30], [31, 32]],
//           [[33, 34], [35, 36], [37, 38], [39, 40]],
//           [[41, 42], [43, 44], [45, 46], [47, 48]]
//          ]
//         ]
// %scatter_indices: [
//                    [
//                     [[0, 0], [1, 0], [2, 1]],
//                     [[0, 1], [1, 1], [0, 9]]
//                    ],
//                    [
//                     [[0, 0], [2, 1], [2, 2]],
//                     [[1, 2], [0, 1], [1, 0]]
//                    ]
//                   ]
// %update: [
//           [
//            [[[1, 1], [1, 1]], [[1, 1], [1, 1]], [[1, 1], [1, 1]]],
//            [[[1, 1], [1, 1]], [[1, 1], [1, 1]], [[1, 1], [1, 1]]]
//           ],
//           [
//            [[[1, 1], [1, 1]], [[1, 1], [1, 1]], [[1, 1], [1, 1]]],
//            [[[1, 1], [1, 1]], [[1, 1], [1, 1]], [[1, 1], [1, 1]]]
//           ]
//          ]
%result = "stablehlo.scatter"(%input, %scatter_indices, %update) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  scatter_dimension_numbers = #stablehlo.scatter<
    update_window_dims = [3, 4],
    inserted_window_dims = [1],
    input_batching_dims = [0],
    scatter_indices_batching_dims = [1],
    scatter_dims_to_operand_dims = [2, 1],
    index_vector_dim = 3>,
  indices_are_sorted = false,
  unique_indices = false
} : (tensor<2x3x4x2xi64>, tensor<2x2x3x2xi64>, tensor<2x2x3x2x2xi64>) -> tensor<2x3x4x2xi64>
// %result: [
//           [
//            [[3, 4], [6, 7], [6, 7], [7, 8]],
//            [[9, 10],[11, 12], [15, 16], [17, 18]],
//            [[17, 18], [19, 20], [22, 23], [24, 25]]
//           ],
//           [
//            [[25, 26], [28, 29], [30, 31], [31, 32]],
//            [[35, 36], [38, 39], [38, 39], [39, 40]],
//            [[41, 42], [44, 45], [46, 47], [47, 48]]
//           ]
//          ]

更多範例

選取

語意

產生 result 張量，其中每個元素都是根據 pred 對應元素的值，從 on_true 或 on_false 張量中選取。更正式地說，result[result_index] = pred_element ? on_true[result_index] : on_false[result_index]，其中 pred_element = rank(pred) = 0 ? pred[] : pred[result_index]。如果是量化型別，則會執行 dequantize_select_quantize(pred, on_true, on_false, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`pred`	`i1` 類型的張量	(C1)
(I2)	`on_true`	張量或每個張量的量化張量	(C1-C2)
(I3)	`on_false`	張量或每個張量的量化張量	(C2)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C2)

限制

(C1) rank(pred) = 0 or shape(pred) = shape(on_true)。
(C2) baseline_type(on_true) = baseline_type(on_false) = baseline_type(result)。

範例

// %pred: [[false, true], [true, false]]
// %on_true: [[1, 2], [3, 4]]
// %on_false: [[5, 6], [7, 8]]
%result = "stablehlo.select"(%pred, %on_true, %on_false) : (tensor<2x2xi1>, tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[5, 2], [3, 8]]

更多範例

select_and_scatter

語意

使用 scatter，根據 input 張量使用 select 的 reduce_window 結果，分散 source 張量的值，並產生 result 張量。

下圖以具體範例說明如何從 operand 和 source 計算 result 中的元素。

select_and_scatter

更正式地來說：

selected_values = reduce_window_without_init(...)，並提供下列輸入內容：
- inputs = [operand].
- window_dimensions、window_strides 和 padding，並照原樣使用。
- base_dilations = windows_dilations = 1。
- body 定義如下：
```
def body(arg0: tensor<E>, arg1: tensor<E>) -> tensor<E>:
  return select(arg0, arg1) ? arg0 : arg1;
```
其中 E = element_type(operand) 和 reduce_window_without_init 的運作方式與 reduce_window 完全相同，但基礎 reduce 的 schedule (請參閱 reduce) 不包含 init 值。目前尚未指定如果對應視窗沒有值會發生什麼情況 (#731)。
result[result_index] = reduce([source_values], [init_value], [0], scatter) 其中：
- source_values = [source[source_index] for source_index in source_indices]。
- 如果 selected_values[source_index] 含有 operand_index 的 operand 元素，則為 selected_index(source_index) = operand_index。
- source_indices = [source_index for source_index in indices(source) if selected_index(source_index) = result_index]。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1-C4)、(C6)、(C8-C11)
(I2)	`source`	張量或每個張量的量化張量	(C1)、(C2)
(I3)	`init_value`	0 維度張量或每個張量的量化張量	(C3)
(I4)	`window_dimensions`	`si64` 類型的 1 維張量常數	(C2)、(C4)、(C5)
(I5)	`window_strides`	`si64` 類型的 1 維張量常數	(C2)、(C6)、(C7)
(I6)	`padding`	`si64` 類型的 2 維張量常數	(C2)、(C8)
(I7)	`select`	函式	(C9)
(I8)	`scatter`	函式	(C10)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C11-C12)

限制

(C1) element_type(operand) = element_type(source)。
(C2) shape(source) = num_windows 其中：
- padded_operand_shape = padding[:, 0] + shape(operand) + padding[:, 1]。
- is_empty_window = padded_operand_shape = 0 || window_dimensions > padded_operand_shape。
- num_windows = is_empty_window ? 0 : floor((padded_operand_shape - window_dimensions) / window_strides) + 1。
(C3) element_type(init_value) = element_type(operand)。
(C4) size(window_dimensions) = rank(operand)。
(C5) 0 < window_dimensions。
(C6) size(window_strides) = rank(operand)。
(C7) 0 < window_strides。
(C8) shape(padding) = [rank(operand), 2]。
(C9) select 具有 (tensor<E>, tensor<E>) -> tensor<i1> 類型，其中 E = element_type(operand)。
(C10) scatter 的型別為 (tensor<E>, tensor<E>) -> tensor<E>，其中 is_promotable(element_type(operand), E)。
(C11) shape(operand) = shape(result)。
(C12) element_type(result) = E。

範例

// %operand: [[1, 5], [2, 5], [3, 6], [4, 4]]
// %source: [[5, 6], [7, 8]]
// %init_value: 0
%result = "stablehlo.select_and_scatter"(%operand, %source, %init_value) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.compare"(%arg0, %arg1) {
      comparison_direction = #stablehlo<comparison_direction GE>
    } : (tensor<i64>, tensor<i64>) -> tensor<i1>
    "stablehlo.return"(%0) : (tensor<i1>) -> ()
}, {
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %0 = "stablehlo.add"(%arg0, %arg1) : (tensor<i64>, tensor<i64>) -> tensor<i64>
    "stablehlo.return"(%0) : (tensor<i64>) -> ()
}) {
  window_dimensions = array<i64: 3, 1>,
  window_strides = array<i64: 2, 1>,
  padding = dense<[[0, 1], [0, 0]]> : tensor<2x2xi64>
} : (tensor<4x2xi64>, tensor<2x2xi64>, tensor<i64>) -> tensor<4x2xi64>
// %result: [[0, 0], [0, 0], [5, 14], [7, 0]]

更多範例

傳送

語意

將 inputs 傳送至頻道 channel_id。然後，輸入內容會按照 source_target_pairs 指定的順序傳送至其他裝置。這項作業會產生 result 權杖。

如果 is_host_transfer 為 true，作業會將資料傳輸至主機。否則，系統會根據 source_target_pairs 的值將資料轉移至其他裝置。這個旗標會重複 channel_type 中提供的資訊，因此我們日後只會保留其中一個 (#666)。如果 is_host_transfer = false 且 source_target_pairs 為 None 或空白，則視為未定義的行為。

輸入

標籤	名稱	類型	限制
(I1)	`inputs`	可變數量的張量或量化張量
(I2)	`token`	`token`
(I3)	`channel_id`	`si64` 類型的常數
(I4)	`channel_type`	`DEVICE_TO_DEVICE` 和 `DEVICE_TO_HOST` 的列舉	(C5)
(I5)	`is_host_transfer`	`i1` 類型的常數	(C5-C6)
(I6)	`source_target_pairs`	`si64` 類型的 2 維張量常數	(C1-C4)、(C6)

輸出內容

名稱	類型
`result`	`token`

限制

(C1) dim(source_target_pairs, 1) = 2。
(C2) is_unique(source_target_pairs[:, 0])。
(C3) is_unique(source_target_pairs[:, 1])。
(C4) 0 <= source_target_pairs < N，其中 N 定義為：
- 如果使用 cross_replica，則為 num_replicas。
- 如果使用 cross_partition，則為 num_partitions。
(C5) channel_type 的定義為：
- 若 is_host_transfer = true，則設為DEVICE_TO_HOST
- 其他情況則為 DEVICE_TO_DEVICE。

範例

%result = "stablehlo.send"(%operand, %token) {
  channel_handle = #stablehlo.channel_handle<handle = 0, type = 1>,
  is_host_transfer = false,
  source_target_pairs = dense<[[0, 1], [1, 2]]> : tensor<2x2xi64>
} : (tensor<2x2xi64>, !stablehlo.token) -> !stablehlo.token

更多範例

shift_left

語意

對 lhs 張量執行元素級別的左移運算，移動位元數為 rhs，並產生 result 張量。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	整數型別的張量	(C1)
(I2)	`rhs`	整數型別的張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數型別的張量	(C1)

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// %lhs: [-1, 0, 1]
// %rhs: [1, 2, 3]
%result = "stablehlo.shift_left"(%lhs, %rhs): (tensor<3xi64>, tensor<3xi64>) -> tensor<3xi64>
// %result: [-2, 0, 8]

更多範例

shift_right_arithmetic

語意

對 lhs 張量執行元素算術右移運算，位移量為 rhs 位元，並產生 result 張量。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	整數型別的張量	(C1)
(I2)	`rhs`	整數型別的張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數型別的張量	(C1)

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// %lhs: [-1, 0, 8]
// %rhs: [1, 2, 3]
%result = "stablehlo.shift_right_arithmetic"(%lhs, %rhs): (tensor<3xi64>, tensor<3xi64>) -> tensor<3xi64>
// %result: [-1, 0, 1]

更多範例

shift_right_logical

語意

對 lhs 張量執行元素層級的邏輯右移運算，位元數為 rhs，並產生 result 張量。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	整數型別的張量	(C1)
(I2)	`rhs`	整數型別的張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數型別的張量	(C1)

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// %lhs: [-1, 0, 8]
// %rhs: [1, 2, 3]
%result = "stablehlo.shift_right_logical"(%lhs, %rhs): (tensor<3xi64>, tensor<3xi64>) -> tensor<3xi64>
// %result: [9223372036854775807, 0, 1]

更多範例

簽署

語意

傳回 operand 的元素層級符號，並產生 result 張量。更正式地說，對於每個元素 x，語意可使用 Python 語法表示如下：

def sign(x):
  if is_integer(x):
    if compare(x, 0, LT, SIGNED): return -1
    if compare(x, 0, EQ, SIGNED): return 0
    return 1
  elif is_float(x):
    if is_nan(x): return NaN
    if compare(x, -0.0, EQ, FLOAT): return -0.0
    if compare(x, +0.0, EQ, FLOAT): return +0.0
    if compare(x, 0.0, LT, FLOAT): return -1.0
    return 1.0
  elif is_complex(x):
    if is_nan(real(x)) or is_nan(imag(x)): return (NaN, NaN)
    if compare(x, (0.0, 0.0), EQ, FLOAT): return (0.0, 0.0)
    return divide(x, convert(abs(x), type(x)))

如果是量化型別，則會執行 dequantize_op_quantize(sign, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	有符號整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	有符號整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// Logical values: +NaN, -1.0, -0.0, +0.0, 1.0
// operand: [0x7FFFFFFFFFFFFFFF, -1.0, -0.0, 0.0, 1.0]
%result = "stablehlo.sign"(%operand) : (tensor<5xf64>) -> tensor<5xf64>
// Logical values: +NaN, -1.0, -0.0, +0.0, 1.0
// %result: [0x7FFFFFFFFFFFFFFF, -1.0, -0.0, 0.0, 1.0]

更多範例

正弦

語意

對 operand 張量執行元素級別的正弦運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 sin。
複數：複數正弦值。
量化型別：dequantize_op_quantize(sine, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [
//            [0.0, 1.57079632],       // [0, pi/2]
//            [3.14159265, 4.71238898] // [pi, 3pi/2]
//           ]
%result = "stablehlo.sine"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[0.0, 1.0], [0.0, -1.0]]

更多範例

配量

語意

使用靜態計算的起始索引，從 operand 中擷取切片，並產生 result 張量。start_indices 包含每個維度的切片起始索引，limit_indices 包含每個維度的切片結束索引 (不含)，strides 則包含每個維度的步幅。

更正式地說，result[result_index] = operand[operand_index]，其中 operand_index = start_indices + result_index * strides。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或每個張量的量化張量	(C1-C3)、(C5)
(I2)	`start_indices`	`si64` 類型的 1 維張量常數	(C2)、(C3)、(C5)
(I3)	`limit_indices`	`si64` 類型的 1 維張量常數	(C2)、(C3)、(C5)
(I4)	`strides`	`si64` 類型的 1 維張量常數	(C2)、(C4)

輸出內容

名稱	類型	限制
`result`	張量或每個張量的量化張量	(C1)、(C5)

限制

(C1) element_type(operand) = element_type(result)。
(C2) size(start_indices) = size(limit_indices) = size(strides) = rank(operand)。
(C3) 0 <= start_indices <= limit_indices <= shape(operand)。
(C4) 0 < strides。
(C5) shape(result) = ceil((limit_indices - start_indices) / strides)。

範例

// %operand: [
//            [0, 0, 0, 0],
//            [0, 0, 1, 1],
//            [0, 0, 1, 1]
//           ]
%result = "stablehlo.slice"(%operand) {
  start_indices = array<i64: 1, 2>,
  limit_indices = array<i64: 3, 4>,
  strides = array<i64: 1, 1>
} : (tensor<3x4xi64>) -> tensor<2x2xi64>
// % result: [
//            [1, 1],
//            [1, 1]
//           ]

更多範例

排序

語意

根據 comparator 沿著維度 dimension 一起排序 inputs 的 1 維切片，並產生 results。

與其他作業中的類似輸入不同，dimension 允許負值，語意如下所述。基於一致性考量，日後可能會禁止這麼做 (#1377)。

如果 is_stable 為 true，則排序穩定，也就是說，比較子視為相等的元素會保留相對順序。如果只有單一輸入內容，當且僅當 comparator(e1, e2) = comparator(e2, e1) = false 時，比較子才會將兩個元素 e1 和 e2 視為相等。請參閱下方的正式化說明，瞭解如何將這個概念推廣至多個輸入內容。

更正式地說，對於 index_space(results[0]) 中的所有 result_index：

adjusted_dimension = dimension >= 0 ? dimension : rank(inputs[0]) + dimension。
其中 riN 是 result_index 中的個別元素，而 : 則會插入 adjusted_dimension。result_slice = [ri0, ..., :, ..., riR-1]
inputs_together = (inputs[0]..., ..., inputs[N-1]...)。
results_together[result_slice] = sort(inputs_together[result_slice], comparator_together)。
其中 sort 會以非遞減順序排序一維切片，並預期如果左側引數小於右側第二個引數，comparator_together 會傳回 true。

def comparator_together(lhs_together, rhs_together):
  args = []
  for (lhs_el, rhs_el) in zip(lhs_together, rhs_together):
    args.append(lhs_el)
    args.append(rhs_el)
  return comparator(*args)

(results[0]..., ..., results[N-1]...) = results_together。

輸入

標籤	名稱	類型	限制
(I1)	`inputs`	可變數量的張量或每個張量的量化張量	(C1-C5)
(I2)	`dimension`	`si64` 類型的常數	(C4)
(I3)	`is_stable`	`i1` 類型的常數
(I4)	`comparator`	函式	(C5)

輸出內容

名稱	類型	限制
`results`	可變數量的張量或每個張量的量化張量	(C2)、(C3)

限制

(C1) 0 < size(inputs)。
(C2) type(inputs...) = type(results...)。
(C3) same(shape(inputs...) + shape(results...))。
(C4) -R <= dimension < R，其中 R = rank(inputs[0])。
(C5) comparator 具有 (tensor<E1>, tensor<E1>, ..., tensor<EN-1>, tensor<EN-1>) -> tensor<i1> 類型，其中 Ei = element_type(inputs[i])。

範例

// %input0 = [[1, 2, 3], [3, 2, 1]]
// %input1 = [[3, 2, 1], [1, 2, 3]]
%result0, %result1 = "stablehlo.sort"(%input0, %input1) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>, %arg2: tensor<i64>, %arg3: tensor<i64>):
    %predicate = "stablehlo.compare"(%arg0, %arg1) {
      comparison_direction = #stablehlo<comparison_direction GT>
    } : (tensor<i64>, tensor<i64>) -> tensor<i1>
    "stablehlo.return"(%predicate) : (tensor<i1>) -> ()
}) {
  dimension = 0 : i64,
  is_stable = true
} : (tensor<2x3xi64>, tensor<2x3xi64>) -> (tensor<2x3xi64>, tensor<2x3xi64>)
// %result0 = [[3, 2, 3], [1, 2, 1]]
// %result1 = [[1, 2, 1], [3, 2, 3]]

更多範例

sqrt

語意

對 operand 張量執行元素級別的平方根運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 squareRoot。
複數：複數平方根。
量化型別：dequantize_op_quantize(sqrt, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [[0.0, 1.0], [4.0, 9.0]]
%result = "stablehlo.sqrt"(%operand) : (tensor<2x2xf32>) -> tensor<2x2xf32>
// %result: [[0.0, 1.0], [2.0, 3.0]]

更多範例

subtract

語意

執行兩個張量 lhs 和 rhs 的元素級別減法，並產生 result 張量。視元素類型而定，執行下列操作：

整數：整數減法。
浮點數：IEEE-754 的 subtraction。
複數：複數減法。
如為量化型別：
- dequantize_op_quantize(subtract, lhs, rhs, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)
(I2)	`rhs`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	整數、浮點數或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(lhs) = baseline_type(rhs) = baseline_type(result)。

範例

// %lhs: [[6, 8], [10, 12]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.subtract"(%lhs, %rhs) : (tensor<2x2xf32>, tensor<2x2xf32>) -> (tensor<2x2xf32>)
// %result: [[1, 2], [3, 4]]

更多範例

tan

語意

對 operand 張量執行元素級別的正切運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 tan。
複數：複數正切值。
量化型別：dequantize_op_quantize(tan, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [
//            [0.0, 1.57079632],       // [0, pi/2]
//            [3.14159265, 4.71238898] // [pi, 3pi/2]
//           ]
%result = "stablehlo.tan"(%operand) : (tensor<2x2xf64>) -> tensor<2x2xf64>
// %result: [
//           [0.0, 1.63312e+16],
//           [0.0, 5.44375e+15]
//          ]

更多範例

tanh

語意

對 operand 張量執行元素級別的雙曲正切運算，並產生 result 張量。視元素類型而定，執行下列操作：

浮點數：IEEE-754 的 tanh。
複數：複數雙曲正切值。
如為量化型別：
- dequantize_op_quantize(tanh, operand, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_type(operand) = baseline_type(result)。

範例

// %operand: [-1.0, 0.0, 1.0]
%result = "stablehlo.tanh"(%operand) : (tensor<3xf32>) -> tensor<3xf32>
// %result: [-0.76159416, 0.0, 0.76159416]

更多範例

轉置

語意

使用 permutation 排列 operand 張量的維度，並產生 result 張量。更正式地說，result[result_index] = operand[operand_index] 其中 result_index[d] = operand_index[permutation[d]]。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	張量或量化張量	(C1-C4)
(I2)	`permutation`	`si64` 類型的 1 維張量常數	(C2-C4)

輸出內容

名稱	類型	限制
`result`	張量或量化張量	(C1)、(C3-C4)

限制

(C1) element_type(result) 由以下項目提供：
- element_type(operand) (如果 !is_per_axis_quantized(operand))。
- element_type(operand)，但 quantization_dimension(operand) 和 quantization_dimension(result) 可能不同。
(C2) permutation 是 range(rank(operand)) 的排列。
(C3) shape(result) = dim(operand, permutation...)。
(C4) If is_per_axis_quantized(result), then quantization_dimension(operand) = permutation(quantization_dimension(result)).

範例

// %operand: [
//            [[1,2], [3,4], [5,6]],
//            [[7,8], [9,10], [11,12]]
//           ]
%result = "stablehlo.transpose"(%operand) {
  permutation = array<i64: 2, 1, 0>
} : (tensor<2x3x2xi32>) -> tensor<2x3x2xi32>
// %result: [
//           [[1,7], [3,9], [5,11]],
//           [[2,8], [4,10], [6,12]]
//          ]

更多範例

triangular_solve

語意

解開多批聯立線性方程式，係數矩陣為上或下三角矩陣。

更正式地來說，假設有 a 和 b，當 left_side 為 true 或 x * op(a[i0, ..., iR-3, :, :]) = b[i0, ..., iR-3, :, :] 時，result[i0, ..., iR-3, :, :] 是 op(a[i0, ..., iR-3, :, :]) * x = b[i0, ..., iR-3, :, :] 的解，而 left_side 為 false 時，result[i0, ..., iR-3, :, :] 是 op(a[i0, ..., iR-3, :, :]) * x = b[i0, ..., iR-3, :, :] 的解，求解變數 x，其中 op(a) 由 transpose_a 決定，transpose_a 可以是下列其中一項：

NO_TRANSPOSE：使用 a 執行作業。
TRANSPOSE：對 a 的轉置執行運算。
ADJOINT：對 a 的共軛轉置執行運算。

如果 lower 是 true，則輸入資料只會從 a 的下三角形讀取；否則會從 a 的上三角形讀取。輸出資料會以相同三角形傳回，其他三角形中的值則由實作定義。

如果 unit_diagonal 為 true，實作項目可以假設 a 的對角線元素等於 1，否則行為未定義。

如果是量化型別，則會執行 dequantize_op_quantize(lambda x, y: triangular_solve(x, y, left_side, lower, unit_diagonal, transpose_a), a, b, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`a`	浮點或複數型別的張量，或是每個張量的量化張量	(C1-C3)
(I2)	`b`	浮點或複數型別的張量，或是每個張量的量化張量	(C1-C4)
(I3)	`left_side`	`i1` 類型的常數	(C3)
(I4)	`lower`	`i1` 類型的常數
(I5)	`unit_diagonal`	`i1` 類型的常數
(I6)	`transpose_a`	`NO_TRANSPOSE`、`TRANSPOSE` 和 `ADJOINT` 的列舉

輸出內容

名稱	類型	限制
`result`	浮點或複數型別的張量，或是每個張量的量化張量	(C1)

限制

(C1) baseline_element_type(a) = baseline_element_type(b)。
(C2) 2 <= rank(a) = rank(b) = R。
(C3) shape(a) 和 shape(b) 之間的關係定義如下：
- shape(a)[:-3] = shape(b)[:-3]。
- dim(a, -2) = dim(a, -1) = dim(b, left_side ? -2 : -1)。
(C4) baseline_type(b) = baseline_type(result)。

範例

// %a = [
//       [1.0, 0.0, 0.0],
//       [2.0, 4.0, 0.0],
//       [3.0, 5.0, 6.0]
//      ]
// %b = [
//       [2.0, 0.0, 0.0],
//       [4.0, 8.0, 0.0],
//       [6.0, 10.0, 12.0]
//      ]
%result = "stablehlo.triangular_solve"(%a, %b) {
  left_side = true,
  lower = true,
  unit_diagonal = false,
  transpose_a = #stablehlo<transpose NO_TRANSPOSE>
} : (tensor<3x3xf32>, tensor<3x3xf32>) -> tensor<3x3xf32>
// %result: [
//           [2.0, 0.0, 0.0],
//           [0.0, 2.0, 0.0],
//           [0.0, 0.0, 2.0]
//          ]

元組

注意： 根據 StableHLO v1.0 Cleanup #2283，這個作業似乎未獲架構和編譯器使用，因此正在評估是否要淘汰。因此，相容性保證有限 (6 個月)。

語意

從值 val 產生 result 元組。

輸入

標籤	名稱	類型	限制
(I1)	`val`	可變數量的數值	(C1)

輸出內容

名稱	類型	限制
`result`	元組	(C1)

限制

(C1) result 具有 tuple<E0, ..., EN-1> 類型，其中 Ei = type(val[i])。

範例

// %val0: memref[1.0, 2.0]
// %val1: (3)
%result = "stablehlo.tuple"(%val0, %val1) : (memref<2xf32>, tuple<tensor<i32>>) -> tuple<memref<2xf32>, tuple<tensor<i32>>>
// %result: (memref[1.0, 2.0], (3))

更多範例

uniform_dequantize

語意

根據 operand 型別定義的量化參數，將量化張量 operand 的元素逐一轉換為浮點張量 result。

更正式地說，result = dequantize(operand)。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	量化張量	(C1)、(C2)

輸出內容

名稱	類型	限制
`result`	浮點類型張量	(C1)、(C2)

限制

(C1) shape(operand) = shape(result)。
(C2) element_type(result) = expressed_type(operand)。

範例

// %operand: [10, 10]
%result = "stablehlo.uniform_dequantize"(%operand) : (tensor<2x!quant.uniform<i8:f32:0, {0.1:-30,0.5:-20}>>) -> tensor<2xf32>
// %result: [4.0, 15.0]

uniform_quantize

語意

根據 result 型別定義的量化參數，將浮點張量或量化張量 operand 逐一轉換為量化張量 result。

更正式地說，

如果 is_float(operand)：
- result = quantize(operand, type(result))。
如果 is_quantized(operand)：
- float_result = dequantize(operand)。
- result = quantize(float_result, type(result))。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	浮點或量化型別的張量	(C1)、(C2)

輸出內容

名稱	類型	限制
`result`	量化張量	(C1)、(C2)

限制

(C1) shape(operand) = shape(result)。
(C2) expressed_type(result) = is_float(operand) ? element_type(operand) : expressed_type(operand)。

範例

// %operand: [4.0, 15.0]
%result = "stablehlo.uniform_quantize"(%operand) : (tensor<2xf32>) -> tensor<2x!quant.uniform<i8:f32:0, {0.1:-30,0.5:-20}>>
// %result: [10, 10]

// %operand: [10, 10]
%result = "stablehlo.uniform_quantize"(%operand) : (tensor<2x!quant.uniform<i8:f32:0, {0.1:-30,0.5:-20}>>) -> tensor<2x!quant.uniform<i8:f32:0, {0.1:-20,0.2:-30}>>
// %result: [20, 45]

while

語意

在 cond 函式輸出 true 時，執行 body 函式 0 次以上，並產生輸出內容。更正式地說，語意可以使用 Python 語法表示如下：

internal_state = operand
while cond(*internal_state):
  internal_state = body(*internal_state)
results = internal_state

無限迴圈的行為待定 (#383)。

輸入

標籤	名稱	類型	限制
(I1)	`operand`	可變數量的數值	(C1-C3)
(I2)	`cond`	函式	(C1)
(I3)	`body`	函式	(C2)

輸出內容

名稱	類型	限制
`results`	可變數量的數值	(C3)

限制

(C1) cond 的型別為 (T0, ..., TN-1) -> tensor<i1>，其中 Ti = type(operand[i])。
(C2) body 具有 (T0, ..., TN-1) -> (T0, ..., TN-1) 類型，其中 Ti = type(operand[i])。
(C3) type(results...) = type(operand...)。

範例

// %init_i: 1
// %init_sum: 0
// %one: 1
// %ten: 10
%results0, %results1 = "stablehlo.while"(%init_i, %init_sum) ({
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %cond = "stablehlo.compare"(%arg0, %ten) {
      comparison_direction = #stablehlo<comparison_direction LT>
    } : (tensor<i64>, tensor<i64>) -> tensor<i1>
    stablehlo.return %cond : tensor<i1>
  }, {
  ^bb0(%arg0: tensor<i64>, %arg1: tensor<i64>):
    %new_sum = stablehlo.add %arg1, %one : tensor<i64>
    %new_i = stablehlo.add %arg0, %one : tensor<i64>
    stablehlo.return %new_i, %new_sum : tensor<i64>, tensor<i64>
}) : (tensor<i64>, tensor<i64>) -> (tensor<i64>, tensor<i64>)
// %results0: 10
// %results1: 10

更多範例

xor

語意

對兩個張量 lhs 和 rhs 執行元素層級的 XOR 運算，並產生 result 張量。視元素類型而定，執行下列操作：

布林值：邏輯 XOR。
整數：位元 XOR。

輸入

標籤	名稱	類型	限制
(I1)	`lhs`	布林值或整數類型的張量	(C1)
(I2)	`rhs`	布林值或整數類型的張量	(C1)

輸出內容

名稱	類型	限制
`result`	布林值或整數類型的張量	(C1)

限制

(C1) type(lhs) = type(rhs) = type(result)。

範例

// Bitwise operation with with integer tensors
// %lhs: [[1, 2], [3, 4]]
// %rhs: [[5, 6], [7, 8]]
%result = "stablehlo.xor"(%lhs, %rhs) : (tensor<2x2xi32>, tensor<2x2xi32>) -> tensor<2x2xi32>
// %result: [[4, 4], [4, 12]]

// Logical operation with with boolean tensors
// %lhs: [[false, false], [true, true]]
// %rhs: [[false, true], [false, true]]
%result = "stablehlo.xor"(%lhs, %rhs) : (tensor<2x2xi1>, tensor<2x2xi1>) -> tensor<2x2xi1>
// %result: [[false, true], [true, false]]

更多範例

方言互通性

目前，實際使用的 StableHLO 程式有時會包含 StableHLO 未定義的作業。

模組、函式、呼叫和傳回

StableHLO 會使用上游 MLIR 作業，處理 ModuleOp、FuncOp、CallOp 和 ReturnOp。這是為了與現有的 MLIR 機制進行更好的互通性，因為許多實用的傳遞都是以 FuncOp 和 ModuleOp 為目標，而且許多編譯管道都預期會出現這些作業。這些作業會套用完整相容性保證。如果這些作業有任何不相容的變更 (即移除)，系統會新增 StableHLO 對等項目，以維持相容性。

CHLO

CHLO opset 包含可分解為 StableHLO 的高階作業。目前我們不保證 CHLO 的相容性。為確保相容性，必須先使用 chlo-legalize-to-stablehlo 傳遞，才能進行序列化。

形狀作業

在社群中，常見的用途是在動態 StableHLO 程式中使用核心 MLIR 方言的特定作業，執行形狀計算。最常見的包括 shape 方言運算子 (如 shape_of 或 num_elements)、tensor 方言運算子 (如 dim 或 from_elements)，以及內建的 index 型別。

Dynamism RFC > O2 將這些項目標示為超出範圍，但為了互通性，我們仍納入部分 index 類型支援。我們不保證這些作業或型別的相容性。shape-legalize-to-stablehlo 傳遞可用於將這些作業轉換為完全支援的 StableHLO 作業。

已淘汰的作業

有幾項 StableHLO 作業是從 MHLO 繼承而來，這些作業已遭淘汰，即將從 StableHLO 移除。如要瞭解這些移除作業的完整詳細資料，請參閱 StableHLO v1.0 Cleanup #2283。這些淘汰項目的追蹤問題是 #2340。

這些作業可分為幾類：

StableHLO 作業的「Not in HLO」類別 - 這些作業最初是 StableHLO 作業集的一部分，但後來被認為不太適合：broadcast、create_token、cross-replica-sum、dot、einsum、torch_index_select、unary_einsum (#3)。
未使用的作業 - 這些作業可能在某個時間點很有用，但作業不是開發不足，就是使用這些作業的管道已重構，不再需要這些作業。包括 map、tuple (#598)、 get_tuple_element、rng、complex 比較 #560，以及捲積 window_reversal (#1181)。

其中部分作業可使用現有作業 (broadcast、create_token、cross-replica-sum、dot、unary_einsum) 表示，因此可輕鬆移除，並會在現有相容性時間範圍 (6 個月) 結束後移除。其他運算子仍在評估是否要移除 (einsum、get_tuple_element、map、rng、torch_index_select、tuple、complex 比較、window_reversal)。我們會根據社群意見回饋，決定是否移除這些運算子，或是將其加入規格並提供完整支援。在這些作業的未來發展明朗之前，我們只能保證 6 個月的相容性。

執行

依序執行

執行 StableHLO 程式時，需要為 main 函式提供輸入值，並計算輸出值。函式的輸出值是透過執行以對應 return 運算為根的運算圖計算而得。

只要與資料流一致 (也就是在作業使用前執行作業)，執行順序就是實作定義。在 StableHLO 中，所有具有副作用的運算都會消耗一個權杖並產生一個權杖 (多個權杖可透過 after_all 多工處理成一個權杖)，因此副作用的執行順序也會與資料流程保持一致。舉例來說，在下列程式中，可能的執行順序有兩種：%0 → %1 → %2 → return 和 %1 → %0 → %2 → return。

func.func @main() -> tensor<f64> {
  %0 = stablehlo.constant dense<1.0> : tensor<f64>
  %1 = stablehlo.constant dense<2.0> : tensor<f64>
  %2 = stablehlo.add %0, %1 : tensor<f64>
  return %2 : tensor<f64>
}

更正式地來說，StableHLO 程序是以下項目的組合： 1) StableHLO 程式、2) 作業狀態 (尚未執行、已執行)，以及 3) 程序正在處理的中間值。這個程序會先將輸入值傳送至 main 函式，然後透過作業圖表更新作業狀態和中繼值，最後輸出值。正式化作業的後續進展待定 (#484)。

平行執行

StableHLO 程式可以並行執行，並整理成 num_replicas x num_partitions 的 2D 處理網格，兩者皆為 ui32 型別。

在 StableHLO 程序格線中，num_replicas * num_partitions 個 StableHLO 程序會同時執行。每個程序都有專屬的 process_id = (replica_id, partition_id)，其中 replica_ids = range(num_replicas) 和 partition_ids = range(num_partitions) 中的 replica_id 和 partition_id 都是 ui32 型別。

每個程式的程序格線大小都是靜態已知的 (未來我們計畫將其明確納入 StableHLO 程式 #650)，而程序格線中的位置也是每個程序靜態已知的。每個程序都能透過 replica_id 和 partition_id 運算，存取程序格線中的位置。

在程序格中，所有程式可以相同 (「單一程式，多個資料」樣式)，也可以不同 (「多個程式，多個資料」樣式)，或介於兩者之間。我們計畫在日後支援其他定義平行 StableHLO 程式的慣用語，包括 GSPMD (#619)。

在程序格中，程序大多彼此獨立，具有個別的作業狀態、輸入/中間/輸出值，且大部分的作業都會在程序之間個別執行，但少數集體作業除外 (詳情請參閱下文)。

由於大多數作業的執行都只會使用來自相同程序的數值，因此通常可以明確地依名稱參照這些值。不過，在描述集合運算的語意時，這還不夠，因此產生了 name@process_id 符號，用於參照特定程序中的 name 值。(從這個角度來看，不合格的 name 可視為 name@(replica_id(), partition_id()) 的簡寫)。

程序間的執行順序由實作定義，但點對點通訊和集體作業引入的同步除外，詳情請見下文。

點對點通訊

StableHLO 程序可透過 StableHLO 通道彼此通訊。頻道以 si64 類型的正向 ID 表示。透過各種作業，您可以將值傳送至管道，並從管道接收值。

進一步的正式化 (例如這些頻道 ID 的來源、程序程式如何得知這些 ID，以及這些 ID 導入的同步類型) 待定 (#484)。

串流通訊

每個 StableHLO 程序都可以存取兩個串流介面：

可供讀取的動態消息。
可寫入的外送。

與用於程序間通訊的管道不同，管道兩端都有程序，但 infeeds 和 outfeeds 的另一端實作方式則由實作定義。

進一步的正式化 (例如串流通訊如何影響執行順序，以及串流通訊導入的同步類型) 待定 (#484)。

集體作業

StableHLO 中有六個集合運算：all_gather、all_reduce、all_to_all、collective_broadcast、collective_permute 和 reduce_scatter。所有這些作業都會將 StableHLO 程序格中的程序分割成 StableHLO 程序群組，並在每個程序群組中執行聯合運算，與其他程序群組無關。

在每個程序群組中，集體作業可能會造成同步障礙。進一步的正式化 (例如詳細說明確切的同步時間、程序如何確切地到達這個障礙，以及如果沒有到達會發生什麼情況) 待定 (#484)。

如果程序群組涉及跨分區通訊 (也就是程序群組中有分區 ID 不同的程序)，則集體作業的執行需要管道，且集體作業必須提供 si64 類型的正 channel_id。跨副本通訊不需要管道。

集體作業執行的運算作業專屬於個別作業，詳情請參閱上方的個別作業章節。不過，將程序格線分割為程序群組的策略會在這些作業之間共用，並在本節中說明。更正式地來說，StableHLO 支援下列四種策略。

cross_replica

每個程序群組內只會發生跨副本通訊。這項策略會採用 replica_groups (副本 ID 清單的清單)，並計算 replica_groups 的笛卡兒積 (以 partition_ids 為單位)。replica_groups 必須具有專屬元素，並涵蓋所有 replica_ids。更正式地說，使用 Python 語法：

def cross_replica(replica_groups: List[List[ReplicaId]]) -> List[List[ProcessId]]:
  for replica_group in replica_groups:
    for partition_id in partition_ids:
      process_group = []
      for replica_id in replica_group:
        process_group.append((replica_id, partition_id))
      yield process_group

舉例來說，如果輸入 replica_groups = [[0, 1], [2, 3]] 和 num_partitions = 2，cross_replica 會產生 [[(0, 0), (1, 0)], [(0, 1), (1, 1)], [(2, 0), (3, 0)], [(2, 1), (3, 1)]]。

cross_partition

每個程序群組只會發生跨分割區通訊。這項策略會採用 partition_groups (分割區 ID 清單的清單)，並計算 partition_groups 的笛卡兒積 (以 replica_ids 為準)。partition_groups 必須包含專屬元素，並涵蓋所有 partition_ids。更正式地說，使用 Python 語法：

def cross_partition(partition_groups: List[List[PartitionId]]) -> List[List[ProcessId]]:
  for partition_group in partition_groups:
    for replica_id in replica_ids:
      process_group = []
      for partition_id in partition_group:
        process_group.append((replica_id, partition_id))
      yield process_group

舉例來說，如果輸入 partition_groups = [[0, 1]] 和 num_replicas = 4，cross_partition 會產生 [[(0, 0), (0, 1)], [(1, 0), (1, 1)], [(2, 0), (2, 1)], [(3, 0), (3, 1)]]。

cross_replica_and_partition

每個程序群組中都可能發生跨副本和跨分割區的通訊。這項策略會採用 replica_groups (副本 ID 清單的清單)，並計算每個 replica_group 的笛卡兒積 (以 partition_ids 為單位)。replica_groups 必須包含專屬元素，並涵蓋所有 replica_ids。更正式地說，使用 Python 語法：

def cross_replica_and_partition(replica_groups: List[List[ReplicaId]]) -> List[List[ProcessId]]:
  for replica_group in replica_groups:
    process_group = []
    for partition_id in partition_ids:
      for replica_id in replica_group:
        process_group.append((replica_id, partition_id))
    yield process_group

舉例來說，如果輸入 replica_groups = [[0, 1], [2, 3]] 和 num_partitions = 2，cross_replica_and_partition 會產生 [[(0, 0), (1, 0), (0, 1), (1, 1)], [(2, 0), (3, 0), (2, 1), (3, 1)]]。

flattened_ids

這項策略會採用 flattened_id_groups - 一組「扁平化」程序 ID 清單 (格式為 replica_id * num_partitions + partition_id) - 並將其轉換為程序 ID。flattened_id_groups 必須包含專屬元素，並涵蓋所有 process_ids。更正式地說，使用 Python 語法：

def flattened_ids(flattened_id_groups: List[List[ui32]]) -> List[List[ProcessId]]:
  for flattened_id_group in flattened_id_groups:
    process_group = []
    for flattened_id in flattened_id_group:
      replica_id = flattened_id // num_partitions
      partition_id = flattened_id % num_partitions
      process_group.append((replica_id, partition_id))
    yield process_group

舉例來說，如果是 flattened_id_groups = [[0, 1, 2, 3], [4, 5, 6, 7]]、num_replicas = 4 和 num_partitions = 2，flattened_ids 會產生 [[(0, 0), (0, 1), (1, 0), (1, 1)], [(2, 0), (2, 1), (3, 0), (3, 1)]]。

準確率

目前 StableHLO 不保證數值準確度，但日後可能會有所變動 (#1156)。

量化作業的執行語意

量化 StableHLO 作業的解讀方式可能因硬體需求和功能而異。舉例來說，某些硬體可能會選擇使用「取消量化、執行浮點運算，最後再量化」策略來解讀量化作業。其他則可能使用整數算術執行整個計算。因此，量化 StableHLO 作業的解讀方式完全取決於特定實作方式。混合量化 (#1575) 的解讀應以規格中規定的語意為準 (透過 1792)。

錯誤

StableHLO 程式會透過一組廣泛的個別作業限制進行驗證，因此可在執行階段前排除許多類型的錯誤。不過，仍可能發生錯誤情況，例如整數溢位、超出範圍的存取等。除非明確指出，否則所有這些錯誤都會導致實作定義的行為，但這項行為日後可能會變更 (#1157)。

浮點例外狀況

這項規則的例外狀況是 StableHLO 程式中的浮點例外狀況，這類例外狀況具有明確定義的行為。如果作業導致 IEEE-754 標準定義的例外狀況 (無效作業、除以零、溢位、下溢或不精確的例外狀況)，就會產生預設結果 (如標準所定義)，並繼續執行，不會引發對應的狀態旗標；這與標準的 raiseNoFlag 例外狀況處理方式類似。非標準作業 (例如複雜的算術和特定超越函式) 的例外狀況由實作定義。

形狀不符

StableHLO 支援動態形狀張量。不過，形狀必須在執行階段一致，否則行為未定義。StableHLO 不會明確提供可在執行階段判斷張量是否具有特定形狀的作業。製作人有責任生成正確的代碼。

舉例來說，以下程式是有效的。不過，在執行階段，%arg0 和 %arg1 的確切形狀必須相同，否則程式的行為會不確定：

func.func @foo(%arg0: tensor<?xi32>, %arg1: tensor<?xi32>) -> tensor<?xi32> {
    %0 = stablehlo.add %arg0, %arg1 : tensor<?xi32>
    return %0 : tensor<?xi32>
}

Notation

為說明語法，本文採用 EBNF 語法的修改版 (ISO/IEC 14977:1996、Wikipedia)，並進行兩項修改：1) 規則是使用 ::= 而非 = 定義，

2) 串連是使用並列表示，而不是 ,。

如要說明語意 (即「型別」、「常數」和「作業」部分)，我們使用以 Python 語法為基礎的公式，並擴充支援簡潔表示陣列作業，如下所述。這項功能很適合處理小型程式碼片段，但如果需要較大的程式碼片段，我們就會使用標準 Python 語法，並明確說明。

使用公式

讓我們根據dot_general規格中的範例，瞭解公式的運作方式。這項作業的限制條件之一如下所示： dim(lhs, lhs_batching_dimensions...) = dim(rhs, rhs_batching_dimensions...)。

這個公式中使用的名稱來自兩個來源：1) 全域函式，即 dim；2) 對應程式元素的成員定義，即 lhs、lhs_batching_dimensions、rhs 和 rhs_batching_dimensions 輸入內容，定義於 dot_general 的「輸入內容」部分。

如上所述，這項公式的語法是以 Python 為基礎，並加入一些以簡潔為導向的擴充功能。如要瞭解公式，請將公式轉換為標準 Python 語法。

A) 在這些公式中，我們使用 = 代表等號，因此取得 Python 語法的第一步是將 = 替換為 ==，如下所示：dim(lhs, lhs_batching_dimensions...) == dim(rhs, rhs_batching_dimensions...)。

B) 此外，這些公式支援省略號 (...)，可將純量運算式轉換為張量運算式。簡單來說，f(xs...) 大致是指「針對張量 xs 中的每個純量 x，計算純量 f(x)，然後將所有這些純量結果一起做為張量結果傳回」。在原始 Python 語法中，我們的範例公式會變成：[dim(lhs, dim1) for dim1 in lhs_batching_dimensions] == [dim(rhs, dim2) for dim2 in rhs_batching_dimensions]。

有了省略號，通常就能避免處理個別純量。不過，在某些複雜情況下，可能會使用較低層級的半正式語法，例如 gather 規格中的 start_indices[bi0, ..., :, ..., biN] 公式。為求簡潔，我們不會提供將這類語法轉換為原生 Python 的確切形式主義，但希望您仍能根據個別情況直覺地理解。如果某些特定公式看起來不透明，請告訴我們，我們會盡力改善。

此外，您會發現公式使用省略號展開各種清單，包括張量、張量清單 (例如，可能來自可變數量的張量) 等。這是另一個我們未提供確切形式主義的領域 (例如，清單甚至不屬於 StableHLO 型別系統)，而是依賴直覺式的可理解性。

C) 我們採用的最後一個值得注意的標記車輛是隱含廣播。雖然 StableHLO 作業集不支援隱含廣播，但公式支援，同樣是為了簡潔。簡單來說，如果在預期使用張量的環境中使用純量，系統會將純量廣播至預期形狀。

繼續以 dot_general 為例，以下是另一項限制： 0 <= lhs_batching_dimensions < rank(lhs)。如 dot_general 規格所定義，lhs_batching_dimensions 是張量，但 0 和 rank(lhs) 都是純量。套用隱含廣播後，公式會變成 [0, ..., 0] <= lhs_batching_dimensions < [rank(lhs), ..., rank(lhs)]。

套用至特定 dot_general 運算時，這個公式會評估布林值的張量。如果公式用做限制，只要公式的計算結果為 true 或只含 true 元素的張量，限制就會成立。

名稱

在公式中，詞法範圍包括：1) 全域函式、2) 成員定義、

3) 當地定義。以下列出全域函式。元素定義清單取決於套用符號的程式元素：

如果是作業，成員定義會包含「輸入」和「輸出」部分中導入的名稱。
至於其他所有項目，成員定義會包含程式元素的結構部分，並以對應的 EBNF 非終端命名。在大多數情況下，這些結構部分的名稱是透過將非終端名稱轉換為蛇形命名法 (例如 IntegerLiteral => integer_literal) 取得，但有時名稱會在過程中縮寫 (例如 QuantizationStorageType => storage_type)，在這種情況下，名稱會明確導入，類似於作業規格中的「輸入」/「輸出」部分。
此外，成員定義一律會包含 self，以參照對應的程式元素。

值

評估公式時，會使用下列類型的值： 1) Value (實際值，例如 dense<[[1, 2], [3, 4]]> : tensor<2x2xi32>；這些值一律會知道自己的類型)， 2) Placeholder (未來值，例如 lhs、rhs 或 result；這些值的實際值尚未知，只知道類型)， 3) Type (「類型」一節中定義的類型)， 4) Function (「函式」一節中定義的全域函式)。

視情況而定，名稱可能參照不同的值。更具體來說，運算元的「語意」部分 (以及其他程式元素的對等項目) 會定義執行階段邏輯，因此所有輸入內容都會以 Value 形式提供。相較之下，作業 (和同等項目) 的「Constraints」部分會定義「編譯時間」邏輯，也就是通常在執行階段之前執行的項目，因此只有常數輸入可做為 Value，其他輸入只能做為 Placeholder。

名稱	在「語意」中	在「限制」中
全域函式	`Function`	`Function`
常數輸入	`Value`	`Value`
非常數輸入內容	`Value`	`Placeholder`
輸出內容	`Value`	`Placeholder`
本機定義	視定義而定	視定義而定

以下列 transpose 作業為例：

%result = "stablehlo.transpose"(%operand) {
  permutation = dense<[2, 1, 0]> : tensor<3xi64>
} : (tensor<2x3x2xi32>) -> tensor<2x3x2xi32>

在這項作業中，permutation 是常數，因此在語意和限制條件中都可做為 Value 使用。相較之下，operand 和 result 可做為語意中的 Value，但只能做為限制中的 Placeholder。

函式

型別的建構

沒有可用於建構型別的函式。我們改為直接使用型別語法，因為通常更簡潔。例如：(tensor<E>, tensor<E>) -> (tensor<E>)，而不是 function_type( [tensor_type([], E), tensor_type([], E)], [tensor_type([], E)])。

類型函式

element_type 是在張量類型和量化張量類型上定義，並分別傳回對應 TensorType 或 QuantizedTensorType 的 TensorElementType 或 QuantizedTensorElementType 部分。

def element_type(x: Value | Placeholder | Type):
 if type(x) == TensorType:
    return tensor_element_type(x)
  if type(x) == QuantizedTensorType:
    return quantized_tensor_element_type(x)
  if type(x) is not Type:
    return element_type(type(x))

is_per_axis_quantized(x: Value | Placeholder | Type) -> Value 是 is_quantized(x) and quantization_dimension(x) is not None 的捷徑。
is_per_tensor_quantized(x: Value | Placeholder | Type) -> Value 是 is_quantized(x) and quantization_dimension(x) is None 的捷徑。
is_promotable(x: Type, y: Type) -> bool 會檢查型別 x 是否可以升級為型別 y。如果 x 和 y 都是 QuantizedTensorElementType，促銷活動只會套用至 storage_type。這項促銷活動的特定版本目前用於計算折扣 (詳情請參閱 RFC)。

def is_promotable(x: Type, y: Type) -> Value:
  is_same_type = (is_bool(x) and is_bool(y)) or
    (is_integer(x) and is_integer(y)) or (is_float(x) and is_float(y)) or
    (is_complex(x) and is_complex(y)) or
    (is_quantized(x) and is_quantized(y) and expressed_type(x) = expressed_type(y))

  if is_same_type == False:
    return False

  if is_integer(x) or is_float(x):
    return bitwidth(x) <= bitwidth(y)

  if is_complex(x):
    return bitwidth(element_type(x)) <= bitwidth(element_type(y))

  if is_quantized(x):
    return bitwidth(storage_type(x)) <= bitwidth(storage_type(y))

  return false

is_quantized(x: Value | Placeholder | Type) -> Value 是 is_quantized_tensor_element_type(x) 的捷徑。
is_type_name(x: Value | Placeholder | Type) -> Value。適用於所有類型。舉例來說，如果 x 是 FloatType，is_float(x) 會傳回 true。如果 x 是值或預留位置，這個函式是 is_type_name(type(x)) 的捷徑。
max_value(x: Type) -> Value 會傳回 TensorElementType 的最大值。如果 x 不是 TensorElementType，則會傳回 None。
min_value(x: Type) -> Value 會傳回 TensorElementType 的最小值。如果 x 不是 TensorElementType，則會傳回 None。
member_name(x: Value | Placeholder | Type) -> Any。適用於所有類型的所有成員定義 member_name。舉例來說，tensor_element_type(x) 會傳回對應 TensorType 的 TensorElementType 部分。如果 x 是值或預留位置，這個函式是 member_name(type(x)) 的捷徑。如果 x 不是具有適當成員的型別，或這類型別的值或預留位置，則會傳回 None。
is_empty_algorithm(*args: Type) 會檢查是否已將所有點演算法欄位設為 None。這是因為點演算法已定義預設行為，因此指定預設值會不正確。

建構值

operation_name(*xs: Value | Type) -> Value。適用於所有作業。舉例來說，add(lhs, rhs) 會採用兩個張量值 lhs 和 rhs，並傳回使用這些輸入值評估 add 運算的輸出內容。對於某些運算 (例如 broadcast_in_dim)，輸出內容的類型屬於「負重」，也就是評估運算時需要用到。在這種情況下，函式會將這些型別做為引數。

值的函式

您可以使用所有 Python 運算子和函式。舉例來說，Python 的訂閱和切片標記都可用於索引張量、量化張量和元組。
to_destination_type(x: Value, destination_type: Type) -> Value 是在張量上定義，並根據 type(x) 和 destination_type 傳回 x 的轉換值，如下所示：

def to_destination_type(x: Value, destination_type: Type) -> Value:
  if type(x) == destination_type:
    return x

  if is_quantized(destination_type):
    if is_quantized(type(x)):
      return quantize(x, destination_type)
    assert is_float(type(x))
    return quantize(x, destination_type)

  if is_quantized(type(x)):
    assert destination_type = expressed_type(type(x))
    return dequantize(type(x))

  return convert(x, destination_type)

目前正在初步討論合併 convert、uniform_quantize 和 uniform_dequantize 作業 (#1576)。合併後，我們不需要上述函式，可以改用 convert 的作業名稱。

is_nan(x: Value) -> Value 是在張量上定義，如果 x 的所有元素都是 NaN，則會傳回 true，否則會傳回 false。如果 x 不是張量，則會傳回 None。
is_sorted(x: Value) -> Value 是針對張量定義，如果 x 的元素是依索引的遞增字典順序排序，則傳回 true，否則傳回 false。如果 x 不是張量，則會傳回 None。
is_unique(x: Value) -> Value 是在張量上定義，如果 x 沒有重複元素，則會傳回 true，否則會傳回 false。如果 x 不是張量，則會傳回 None。
系統會為所有值的成員定義 member_name(x: Value) -> Any。member_name舉例來說，real_part(x) 會傳回對應 ComplexConstant 的 RealPart 部分。如果 x 不是具有適當成員的值，則傳回 None。
same(x: Value) -> Value 是在張量上定義，如果 x 的元素全部相等，則傳回 true，否則傳回 false。如果張量沒有元素，則視為「彼此相等」，也就是函式會傳回 true。如果 x 不是張量，則會傳回 None。
split(x: Value, num_results: Value, axis: Value) -> Value 是在張量上定義，並沿著軸 axis 傳回 x 的 num_results 切片。如果 x 不是張量或 dim(x, axis) % num_results != 0，則會傳回 None。
is_defined_in_parent_scope(x: Value) -> Value 是在字串上定義，如果 x 是在與相關運算元的父項函式相同範圍內定義的函式名稱，則會傳回 true。
is_namespaced_op_name(x: Value) -> Value 是針對字串定義，如果 x 是有效的運算名稱 (即符合下列規則運算式：[a-zA-Z][a-zA-Z0-9_]*([.][a-zA-Z0-9_$]+)+)，則會傳回 true。

形狀運算

axes(x: Value | Placeholder | Type) -> Value 是 range(rank(x)) 的捷徑。
dim(x: Value | Placeholder | Type, axis: Value) -> Value 是 shape(x)[axis] 的捷徑。
dims(x: Value | Placeholder | Type, axes: List) -> List 是 list(map(lambda axis: dim(x, axis), axes)) 的捷徑。
index_space(x: Value | Placeholder | Type) -> Value 是在張量上定義，並傳回以遞增字典順序排序的對應 TensorType 索引，即 [0, ..., 0]、[0, ..., 1]、...、shape(x) - 1。size(x)如果 x 不是張量型別、量化張量型別，或其中一種型別的值/預留位置，則會傳回 None。
rank(x: Value | Placeholder | Type) -> Value 是 size(shape(x)) 的捷徑。
shape(x: Value | Placeholder | Type) -> Value 是透過「Functions on types」部分中的 member_name 定義。
size(x: Value | Placeholder | Type) -> Value 是 reduce(lambda x, y: x * y, shape(x)) 的捷徑。

量化運算

def baseline_element_type(x: Value | Placeholder | Type) -> Type 是 element_type(baseline_type(x)) 的捷徑。
baseline_type 是在張量型別和量化張量型別上定義，並將其轉換為「基準」，也就是形狀相同但元素型別的量化參數重設為預設值的型別。這項實用技巧可統一比較張量和量化張量類型，這類比較相當常見。如果是量化型別，這項功能可讓您比較型別，並忽略量化參數，也就是說，shape、storage_type、expressed_type、storage_min、storage_max 和 quantization_dimension (適用於每個軸的量化型別) 都必須相符，但 scales 和 zero points 可能不同。

def baseline_type(x: Value | Placeholder | Type) -> Type:
  if type(x) == TensorType:
    return x
  if type(x) == QuantizedTensorType:
    element_type = quantized_tensor_element_type(x)
    baseline_element_type = QuantizedTensorElementType(
      storage_type = storage_type(element_type),
      storage_min = storage_min(element_type),
      storage_max = storage_max(element_type),
      expressed_type = expressed_type(element_type),
      quantization_dimension = quantization_dimension(element_type),
      scales = [constant(1.0, expressed_type(element_type))] * dim(x, quantization_dimension(element_type)),
      zero_points = [constant(0, storage_type(element_type))] * dim(x, quantization_dimension(element_type)))
    return QuantizedTensorType(shape(x), baseline_element_type)
  if type(x) is not Type:
    return baseline_element_type(type(x))

dequantize 是針對量化張量類型定義，可將其轉換為浮點張量類型。方法是使用與量化元素類型相關聯的零點和比例，將代表儲存類型整數值的量化元素，轉換為對應的表示類型浮點值。

def compute_zero_points(quantized_type, result_type):
  if is_per_tensor_quantized(quantized_type):
    return broadcast_in_dim(constant(zero_point(quantized_type), storage_type(quantized_type)), [], result_type)
  if is_per_axis_quantized(quantized_type):
    for i in index_space(result_type):
      d = quantization_dimension(quantized_type)
      zero_points[i] = zero_points(quantized_type)[i[d]]
    return zero_points

def compute_scales(quantized_type, result_type):
  if is_per_tensor_quantized(quantized_type):
    return broadcast_in_dim(constant(scale(quantized_type), expressed_type(quantized_type)), [],
            type(result_type))
  if is_per_axis_quantized(quantized_type):
    for i in index_space(result_type):
      d = quantization_dimension(quantized_type)
      scales[i] = scales(quantized_type)[i[d]]
    return scales

def dequantize(x: Value) -> Value:
  assert is_quantized(x)
  x_storage = bitcast_convert(x, storage_type(x))
  x_storage_sub = x_storage - compute_zero_points(type(x), type(x_storage))
  x_expressed_sub = convert(x_storage_sub, expressed_type(x))
  return x_expressed_sub * compute_scales(type(x), type(x_expressed_sub))

quantize 是針對浮點張量類型定義，並會將其轉換為量化張量類型。具體做法是使用與量化元素類型相關聯的零點和比例，將所表示類型的浮點值轉換為儲存類型的對應整數值。

def quantize(x: Value, result_type: Type) -> Value:
  assert is_float(x) and is_quantized(result_type)
  zero_points = compute_zero_points(result_type, TensorType(shape(x), storage_type(result_type)))
  converted_zero_points = convert(zero_points, expressed_type(result_type))
  converted_min = convert(storage_min(result_type), expressed_type(result_type))
  converted_max = convert(storage_max(result_type), expressed_type(result_type))

  x_scaled = x / compute_scales(result_type, type(x))
  x_scaled_add_zp = x_scaled + converted_zero_points
  x_clamped = clamp(converted_min, x_scaled_add_zp, converted_max)
  x_rounded = round_nearest_even(x_clamped)
  return convert(x_rounded, result_type)

dequantize_op_quantize 用於指定量化張量的元素運算。也就是說，它會先將量化元素轉換為所表示的型別，然後執行運算，再將結果轉換回儲存型別。目前這項功能僅適用於張量量化。正在開發逐軸量化功能 (#1574)。

def dequantize_op_quantize(op, *inputs_and_output_type):
  inputs = inputs_and_output_type[:-1]
  output_type = inputs_and_output_type[-1]

  float_inputs = map(dequantize, inputs)
  float_result = op(*float_inputs)
  return quantize(float_result, output_type)

def dequantize_batch_norm_grad_or_training_quantize(op, *inputs_and_output_types):
  inputs = inputs_and_output_type[:-3]
  float_inputs = map(dequantize, inputs)
  float_results = op(*float_inputs)
  return map(quantize, float_results, inputs_and_output_type[-3:])

def dequantize_compare(lhs, rhs, comparison_direction):
  float_lhs = dequantize(lhs)
  float_rhs = dequantize(rhs)
  return compare(float_lhs, float_rhs, comparison_direction, FLOAT)

def dequantize_select_quantize(pred, on_true, on_false, output_type):
  float_on_true = dequantize(on_true)
  float_on_false = dequantize(on_false)
  float_result = select(pred, float_on_true, float_on_false)
  return quantize(float_result, output_type)

hybrid_dequantize_then_op 用於指定混合運算的權重專屬量化，這類運算會接受浮點數的 lhs，以及量化型別的 rhs。這個函式會將量化輸入內容去量化為其表示類型，並以浮點數執行運算。浮點數左側張量的元素類型，以及量化右側張量的表示類型應相同。

def hybrid_dequantize_then_op(op, lhs, rhs):
  assert(is_float(lhs) and is_quantized(rhs) and element_type(lhs) == expressed_type(rhs))
  return op(lhs, dequantize(rhs))

格線運算

cross_partition(replica_groups: Value) -> Value。請參閱上方的「cross_replica」一節。
cross_replica(replica_groups: Value) -> Value。請參閱上方的「cross_replica」一節。
cross_replica_and_partition(replica_groups: Value) -> Value。請參閱上方的「cross_replica_and_partition」一節。
flattened_ids(replica_groups: Value) -> Value。請參閱上方的「flattened_ids」部分。

動態

StableHLO 值可以有動態維度大小，例如 tensor<?xi64>。不過，StableHLO 值不能有動態維度數量 (未排序的動態，例如 tensor<*xi64>)。運算元和結果可使用動態維度大小，即使大小有限制也沒問題。系統會盡可能靜態驗證限制，否則會延後至執行階段，不符的限制會導致未定義的行為。請查看以下範例。

一元元素運算的形狀不符

請參考下列玩具程式：

func.func @foo(%arg0: tensor<?xf64>) {
  %0 = stablehlo.abs %arg0 : (tensor<?xf64>) -> tensor<2xf64>
  return
}

這類程式並不常見，因為通常會知道結果的形狀，但不知道輸入的形狀。不過，這仍是有效的 StableHLO 程式。在這個程式中，無法靜態驗證 abs 作業，因為運算元的確切形狀不明。不過，這些形狀確實相容，而且可以靜態檢查：? 可能會在執行階段變成 2，但不會有問題。不過，? 也可能變成其他整數，在這種情況下，行為未定義。

請注意，如果結果中的維度大小是動態的，則不得有未定義的行為。事實上，沒有「預期」大小，因此不會發生不符的情況。

二元元素運算的形狀不符

請參考下列玩具程式：

func.func @foo(%arg0: tensor<?xf64>, %arg1: tensor<?xf64>) {
  %0 = stablehlo.add %arg0, %arg0 : (tensor<?xf64>, tensor<?xf64>) -> tensor<?xf64>
  return
}

如果是二元元素運算，輸入和結果的形狀必須在執行階段一致。在編譯時，靜態維度必須相等，否則只要相容即可。如果輸入內容中任何維度是動態的，執行階段可能會出現未定義的行為，因為動態大小可能與其他運算元 (靜態或動態) 中的對應大小不符。如果所有輸入內容都是靜態，結果是否為動態並不重要：靜態已知的維度會經過靜態檢查，動態維度則不會施加任何限制。

將輸出形狀做為運算元的作業形狀不符

請參考下列玩具程式：

func.func @foo(%arg0: tensor<2xi32>) {
  %0 = stablehlo.dynamic_iota %arg0, dim = 0 : (tensor<2xi32>) -> tensor<3x4xi64>
  return
}

執行階段的形狀運算元值必須與結果的形狀相符，否則行為未定義。也就是說，在執行階段，%arg0 的值必須為 dense<[3, 4]> : tensor<2xi32>。如果形狀運算元是常數，則可靜態驗證。如果結果形狀完全動態，就不會發生不符情況。

StableHLO 規格 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

程式

函式

ID

類型

作業

常數

作業數

ABS

語意

輸入

輸出內容

限制

範例

add

語意

輸入

輸出內容

限制

範例

after_all

語意

輸入

輸出內容

範例

all_gather

語意

輸入

輸出內容

限制

範例

all_reduce

語意

輸入

輸出內容

限制

範例

all_to_all

語意

輸入

輸出內容

限制

範例

和

語意

輸入

輸出內容

限制

範例

atan2

語意

輸入

輸出內容

限制

範例

batch_norm_grad

語意

輸入

輸出內容

限制

範例

batch_norm_inference

語意

輸入

輸出內容

限制

範例

batch_norm_training

語意

輸入

輸出內容

限制

範例

bitcast_convert

語意

輸入

輸出內容

限制

範例

broadcast_in_dim

StableHLO 規格