Having covered quantization and pruning, it’s time to move on to some of the more popular algorithms and libraries to leverage tensorization. As usual, we assume that you have gone over the introductory post of our model compression series.
Let’s quickly recap what tensorization is all about first. Tensorization is a model compression technique that involves decomposing the weight tensors of deep neural networks into smaller, lower rank tensors.
In machine learning, it is used to reveal underlying patterns and structures within the data whilst also reducing its size. Tensorization has many practical use cases in ML such as detecting latent structure in the data for e.g representing temporal or multi-relational data, as well as latent variable modelling.
In this blog post, we’ll dive into tensorization libraries to outline their unique features and algorithms. Let’s dive in!
As of the time of writing this post, PyTorch doesn’t have a tensorzation API for these high-dimensional tensor decomposition algorithms, but external libraries can be used to compensate for this, one of the most prominent being TensorLy.
TensorLy is a Python library that aims at making tensor learning simple and accessible. It allows simple performing of tensor decomposition, tensor learning and tensor algebra. Its backend system allows it to seamlessly perform computation with NumPy, PyTorch, JAX, MXNet, TensorFlow or CuPy, and run methods at scale on CPU or GPU.
TensorLy supports various tensor algebra operations that serve as building blocks to its core tensor decomposition algorithms and involve various ways to manipulate tensors. Some common tensor operations include the:
TensorLy also has an einsum backend which allows for faster execution of these tensor algebra operations.
TensorLy builds on top of these fundamental algebraic blocks to provide several off-the-shelf tensor decomposition algorithms, including CP, TT, and Tucker decompositions which we have discussed in our introductory post.
Extending the core features of TensorLy, TensorLy-Torch provides out-of-the-box tensor layers that can be used to implement and train tensorized networks from scratch or fine-tuning existing models by replacing the layers with their tensorized counterparts from TensorLy-Torch.
Factorized / Tensorized layers provided by TensorLy include any order (2D, 3D, and more):
Notably, Tensorized-layers can also directly replace their standard pytorch counterparts in pre-trained networks in which case the tensorized layer is initialised with the weights of the pytorch layer. This approach however requires fine-tuning to retain model performance.
While AIMET mostly supports quantization and pruning algorithms, it also comes packaged with a few tensorization utilities including:
While TensorLy and a few other libraries make up the most of the tensorization tools in Python, it is worth mentioning that other languages / frameworks provide robust tools to apply tensorization techniques. Some of those tools include:
This concludes our tour of model tensorization tools and techniques. Tensorization remains vastly undercovered with most algorithms being implemented manually. Just like with other model compression techniques, using the best algorithm requires careful analysis of the target model architecture and its behavior.
We hope this post helped you get a high level understanding of tensorization! Stay tuned for the next post of our model compression series!
Use the Unify API to send your prompts to the best LLM endpoints and get your LLM applications flying