Compression

Tensorization: Breaking Through The Ranks

Tensorization is a model compression technique that breaks down weight tensors of deep neural networks into smaller, lower rank tensors to reveal underlying patterns and reduce...

Engineering

Agents: A New Paradigm or a Passing Phase?

If the last century has taught us anything about intelligence, it's that general intelligence is always an emergent property of an optimization algorithm. It is not hand crafted or hand engineered, it just pops out from a simple set of rules mixed with a lot of data and compute...

Model Pruning: Keeping the Essentials

In the previous blog post of our model compression series we went over the available quantization libraries and their features. In a similar fashion, we will now go over the packages and...

Introducing the Unify LLM Hub

We’re very excited to announce The Unify LLM Hub: a collection of LLM endpoints, with live runtime benchmarks all plotted across time 📈 Knowing which LLM to use is very complex, and even after deciding which model to use, it’s equally complex to choose the right provider.

Guillermo Sanchez-Brizuela

February 6, 2024

5 min read

Engineering

Static LLM Benchmarks Are Not Enough

The LLM landscape is incredibly fast moving, with new models coming out every week. In the last few weeks alone, Mamba showed that structured state space models are more...

Guillermo Sanchez-Brizuela

February 6, 2024

5 min read

Compression

Quantization: A Bit Can Go a Long Way

Following up with our model compression blog post series, we will now delve into quantization, one of the more powerful compression techniques that we can leverage to reduce the size and memory footprint of our models.‍Going forward, we will assume that you have read the first blog post of the series, where we introduced the concept of quantization. Building on top of this introduction....

Shah Anwaar Khalid

January 27, 2024

10min read