This document explains how to install and use the HLO Isolation API and CLI. The HLO isolation tooling helps developers and researchers isolate, verify, and debug numeric mismatches and stability issues in compiled HLO modules.
Installation
You can use HLO isolation components through the standard OpenXLA/TensorFlow build mechanism (Bazel) or via binary distribution.
Source and Bazel Setup
When building the compiler and tools stack from source, include or depend on the following libraries:
- API Target:
//third_party/tensorflow/compiler/xla/tools/hlo_isolation:hlo_isolation_api - CLI Target:
//third_party/tensorflow/compiler/xla/tools/hlo_isolation:hlo_isolation_test
To build the standalone CLI tool:
bazel build -c opt //third_party/tensorflow/compiler/xla/tools/hlo_isolation:hlo_isolation_test
Command-Line Interface (CLI)
The hlo_isolation_test CLI allows you to isolate and run numeric mismatch and
stability checks against compiled HLO modules directly from a terminal. It
compares the execution results across different environments (e.g., TPU vs.
Defused TPU / CPU / Interpreter).
Flag Reference
The CLI supports the following flags:
--hlo_file: Path to the input.hloor.pbtxtfile to load. Can be text or proto format (required).--test_platform: Target platform to run the primary test on (e.g.,cpu,gpu,tpu). Defaults tocpu.--reference_platform: Reference platform for baseline comparison (e.g.,interpreter). If empty, reference comparison is disabled.--filter_by_name: Regular expression to match the module name. Only matching modules will be run. Defaults to.*.--skip_by_name: Regular expression to match the module name. Matching modules will be skipped.--filter_by_opcode: Regular expression to match instruction opcodes. Only modules containing at least one matching opcode will be run. Defaults to.*.--skip_by_opcode: Regular expression to match instruction opcodes. Modules containing any matching opcode will be skipped.--abs_error_bound: Absolute error bound used for comparison. Defaults to0.01.--rel_error_bound: Relative error bound used for comparison. Defaults to0.1.--run_hlo_passes: Boolean flag to determine whether to run standard HLO passes on the submodules. Defaults tofalse.--shard_index: The specific shard index to run (zero-based). Defaults to-1(disabled).--num_shards: The total number of shards. Defaults to1.
Basic Invocation
./hlo_isolation_test \
--hlo_file=/path/to/failing_fusion.hlo \
--test_platform=gpu \
--reference_platform=interpreter
Result and Artifact Dumps
When a submodule encounters a numeric mismatch or other failure during isolation testing, the tool automatically serializes debug artifacts to disk for deeper inspection.
Dump Contents
On numeric mismatch, the tool writes the following debug artifacts:
- The failed HLO submodule text (
failed-module-<module_name>.txt). - The expected output literal (
failed-<module_name>-expected.txt). - The actual output literal (
failed-<module_name>-actual.txt). - The mismatching elements summary (
failed-<module_name>-mismatches.txt).
Dump Target Location
- Test Environment: If run via
bazel testor an environment defining theTEST_UNDECLARED_OUTPUTS_DIRenvironment variable, the results are placed directly in that directory with the exact file names listed above (e.g.,failed-<module_name>-expected.txt). - Standard/Manual Run: When executed manually via the command line, the
artifacts are written to the operating system's temporary directory (e.g.,
/tmp), preserving the exact same unified file naming conventions (e.g.,/tmp/failed-<module_name>-expected.txt).
C++ Integration and API
For developers building custom compiler passes, testing rigs, or automated pipelines, the C++ API provides a direct way to integrate isolation tests.
Using the API Directly
The core API provides functional interfaces to run modules and fetch structured reports:
#include "third_party/tensorflow/compiler/xla/tools/hlo_isolation/hlo_isolation_api.h"
xla::hlo_isolation::PipelineIsolationOptions options;
options.module_options.abs_error_bound = 0.01;
options.module_options.rel_error_bound = 0.1;
// Filter specific opcodes programmatically
options.filter_by_opcode = "exponential";
absl::StatusOr<std::vector<xla::HloIsolationTestResult>> results =
xla::hlo_isolation::RunIsolationPipeline(
input_hlo_module,
&my_test_runner,
&my_reference_runner,
options);
Using the Test Mixin
When writing GoogleTest C++ test suites, you can inherit from
HloIsolationTestMixin for built-in assertion handling. The base class must
provide both a test runner and a reference runner (e.g., via
HloPjRtInterpreterReferenceMixin):
#include "third_party/tensorflow/compiler/xla/tests/hlo_pjrt_interpreter_reference_mixin.h"
#include "third_party/tensorflow/compiler/xla/tools/hlo_isolation/hlo_isolation_test_base.h"
class MyCustomPassIsolationTest : public xla::hlo_isolation::HloIsolationTestMixin<
xla::HloPjRtInterpreterReferenceMixin<xla::HloPjRtTestBase>> {};
TEST_F(MyCustomPassIsolationTest, ChecksMyFusionSanity) {
RunAndVerifyIsolationTest(my_failing_module);
}
Sharded Execution (K8s/Slurm)
For large modules or heavy test matrices, you can partition execution across
multi-device clusters (such as Google Kubernetes Engine or Slurm) using the
sharding flags. Each shard_index deterministically runs an isolated subset of
the decomposed submodules. This allows for reproducible distributed verification
and targeted re-execution of failing partitions.
Example: Kubernetes Job
Each test shard is executed as a separate Kubernetes pod using completionMode:
Indexed. The JOB_COMPLETION_INDEX is passed directly to the CLI's
--shard_index flag.
apiVersion: batch/v1
kind: Job
metadata:
name: hlo-isolation-job
spec:
completions: 50
parallelism: 50
completionMode: Indexed
template:
spec:
containers:
- name: test-runner
image: gcr.io/my-project/hlo-isolation-tools:latest
command: ["/bin/sh", "-c"]
args:
- |
./hlo_isolation_test \
--num_shards=50 \
--shard_index=$JOB_COMPLETION_INDEX \
--hlo_file=/data/path/to/hlo.hlo
volumeMounts:
- name: hlo-data-volume
mountPath: /data
volumes:
- name: hlo-data-volume
csi:
driver: gcsfuse.csi.storage.gke.io
volumeAttributes:
bucketName: my-xla-debug-bucket
Key Capabilities
- Portability: Decouples internal test wrappers from the standalone API, making it easy to debug HLO mismatches locally.
- Granularity: Granular opcode and name filtering improves the debugging loop when interacting with massive HLO dumps.
- Extensibility: Custom runner execution callbacks and data injectors
(
make_fake_arguments_fn) permit full customization for advanced verification workflows.