Table of Contents

Unify API

Compression

February 23, 2024

10 min read

Having covered quantization and pruning, it’s time to move on to some of the more popular algorithms and libraries to leverage tensorization. As usual, we assume that you have gone over the introductory post of our model compression series.

Let’s quickly recap what tensorization is all about first. Tensorization is a model compression technique that involves decomposing the weight tensors of deep neural networks into smaller, lower rank tensors.

In machine learning, it is used to reveal underlying patterns and structures within the data whilst also reducing its size. Tensorization has many practical use cases in ML such as detecting latent structure in the data for e.g representing temporal or multi-relational data, as well as latent variable modelling.

In this blog post, we’ll dive into tensorization libraries to outline their unique features and algorithms. Let’s dive in!

As of the time of writing this post, PyTorch doesn’t have a tensorzation API for these high-dimensional tensor decomposition algorithms, but external libraries can be used to compensate for this, one of the most prominent being TensorLy.

TensorLy is a Python library that aims at making tensor learning simple and accessible. It allows simple performing of tensor decomposition, tensor learning and tensor algebra. Its backend system allows it to seamlessly perform computation with NumPy, PyTorch, JAX, MXNet, TensorFlow or CuPy, and run methods at scale on CPU or GPU.

TensorLy supports various tensor algebra operations that serve as building blocks to its core tensor decomposition algorithms and involve various ways to manipulate tensors. Some common tensor operations include the:

**Kronecker Product**, also known as the tensor product or the direct product, is a mathematical operation that combines two tensors and extends element-wise multiplication to higher dimensions, resulting in a larger tensor.**Khatri-Rao Product**, also known as the column-wise Kronecker product, is a type of matrix multiplication that operates on two matrices containing column vectors. Each column of the Khatri-Rao product is obtained by taking the Kronecker product of the corresponding columns in the input matrices.**N-Mode Product**is used in tensor decompositions, such as CANDECOMP/PARAFAC (CP) decomposition and Tucker decomposition. It is a generalization of the outer product to higher dimensions and provides a way to compute a new tensor by contracting along a specified mode.

TensorLy also has an *einsum* backend which allows for faster execution of these tensor algebra operations.

TensorLy builds on top of these fundamental algebraic blocks to provide several off-the-shelf tensor decomposition algorithms, including CP, TT, and Tucker decompositions which we have discussed in our introductory post.

Extending the core features of TensorLy, TensorLy-Torch provides out-of-the-box tensor layers that can be used to implement and train tensorized networks from scratch or fine-tuning existing models by replacing the layers with their tensorized counterparts from TensorLy-Torch.

Factorized / Tensorized layers provided by TensorLy include any order (2D, 3D, and more):

**Factorized Convolutions**which decompose the convolution filter into two or more smaller filters that are applied sequentially.**Tensorized Linear Layers**where the 2D weight matrix of a linear layer is first tensorized (reshaped into a higher dimensional tensor) and then factored using a high-dimensional decomposition / tensorization method**Factorized Embedding Layers**which can act as a drop-in replacement for Pytorch’s embeddings but using efficient tensor parametrization that doesn’t need to reconstruct the table, as well as**Tensor Regression and Contraction Layers**which leverages the multi-linear structure of convolution activation maps while significantly reducing the number of parameters, thus alleviating the need of having a series of fully connected linear layers containing a large number of parameters at the end of CNNs for obtaining the output.

Notably, Tensorized-layers can also directly replace their standard pytorch counterparts in pre-trained networks in which case the tensorized layer is initialised with the weights of the pytorch layer. This approach however requires fine-tuning to retain model performance.

While AIMET mostly supports quantization and pruning algorithms, it also comes packaged with a few tensorization utilities including:

**Spatial SVD**: Spatial singular value decomposition is a tensor decomposition technique which decomposes one large layer into two smaller layers. Given a conv layer, with kernel (𝑚,𝑛,ℎ,𝑤) where 𝑚 is the input channels, 𝑛 the output channels, and ℎ, 𝑤 giving the height and width of the kernel itself, Spatial SVD will decompose the kernel into two kernels. One of size (𝑚,𝑘,ℎ,1) and one of size (𝑘,𝑛,1,𝑤), where k is the rank. The smaller the value of k the larger the degree of compression achieved.**Weight SVD**: Weight SVD differs from spatial SVD in the shapes of the decomposed tensors. Specifically, weight SVD will decompose the kernel into one of size (𝑚,𝑘,1,1) and another of size (𝑘,𝑛,h,𝑤).

While TensorLy and a few other libraries make up the most of the tensorization tools in Python, it is worth mentioning that other languages / frameworks provide robust tools to apply tensorization techniques. Some of those tools include:

**Tensorlab**: TensorLab is a software package developed in MATLAB that facilitates the creation of tensor calculations and decompositions. It offers functionalities for handling tensors of any dimension, including building, visualizing, and applying several tensor decompositions, such as Canonical Polyadic Decomposition (CPD) and Multilinear Singular Value Decomposition (MLSVD). TensorLab also supports Structured Data Fusion (SDF), enabling users to simultaneously analyze multiple datasets by enforcing structural constraints through factor transformation matrices.**TVM**: Apache TVM's "tensorize" function helps optimize tensor calculations by utilizing specific hardware features and micro-kernel functions. This feature lets users generate highly optimized plans for tensor operations, resulting in better performance on different types of hardware such as CPUs, GPUs, and specialized processors. Moreover, TVM's automatic tensorization pass identifies suitable micro-kernels to substitute the initial loop structures, ultimately boosting the efficiency of computations with variable shapes.

This concludes our tour of model tensorization tools and techniques. Tensorization remains vastly undercovered with most algorithms being implemented manually. Just like with other model compression techniques, using the best algorithm requires careful analysis of the target model architecture and its behavior.

We hope this post helped you get a high level understanding of tensorization! Stay tuned for the next post of our model compression series!

Faster, Cheaper and Simpler?

Use the Unify API to send your prompts to the best LLM endpoints and get your LLM applications flying