Candle

Tags

apache-2.0
compilers
framework
inference-optimizer
mit

Link

https://github.com/huggingface/candle

Description

Candle is a Rust-based, minimalist machine learning framework maintained by HuggingFace and designed to offer efficient serverless inference. Though recently developed, Candle already offers excellent support for efficient performance on both GPU and CPU, leveraging compute primitive libraries such as cuDNN for Nvidia GPUs and Intel MKL for CPUs, as well as providing the ability to easily add custom kernels for state-of-the-art algorithms like Flash-Attention. By removing all Python dependencies while retaining an easy to use, torch-like syntax, Candle can greatly improve the efficiency of the production workload and allow for the deployment of lightweight binaries. In addition to this, it also supports WASM, allowing models to be run fully in the browser.

Features

  • Efficient Serverless Inference
  • Lightweight Binaries
  • Easy Integration of HuggingFace Model Weights
  • WASM support