XLA, which stands for Accelerated Linear Algebra, is a domain-specific compiler,
developed by Google, designed to optimize and accelerate the execution of the core
linear algebra operations commonly used in machine learning and other numerical
computing tasks. XLA is often associated with the TensorFlow, where it is used to
improve the performance of TensorFlow models, and more recently, combined with an
autograd engine, forms the backbone of JAX. The primary goal of XLA is to take
high-level mathematical operations defined in the frontend framework and compile
them into optimized low-level code that can be executed on various hardware
accelerators, such as GPUs and TPU, which XLA was specifically designed to support.
Its pipeline consists of decomposing the input program into a fixed set of high
level operations (HLO), then performing a series of target-independent optimization
passes, such as weight update sharding or kernel fusion, on this HLO IR, before
offloading to device specific libraries like CuDNN for Nvidia GPUs or Eigen for
CPUs, for handling hardware-dependent optimizations. Though originally built with a
unique IR syntax, over the years XLA has gradually integrated MLIR into its
compilation pipeline, providing the ability to both import and export MLIR modules
from its corresponding HLO dialects (MHLO, LHLO) as well as adding optimization
passes written fully in MLIR. In 2022, Google fully open sourced XLA, moving it to a
new repository independent of Tensorflow, and created the OpenXLA foundation to
continue its support and broaden the ecosystem with the development of new tools
such as StableHLO, a backwards-compatible MLIR dialect for representing HLO ops,
and IREE.