Prunes keys and values based on the CUR decomposition using approximate leverage scores. Integrated into NVIDIA's KVPress library for efficient LLM inference.

CURPress in NVIDIA KVPress Library

This toolkit provides a flexible framework for selectively fine-tuning large language models using different selective Parameter-Efficient Fine-Tuning (PEFT) methods.

Selective PEFT Toolkit

This toolkit provides a framework for distilling knowledge from large language models into smaller, more efficient models using collaborative frameworks.

Collaborative Model Distillation (MPDistil)

This package contains implementations of manifold-preserving transformer architectures that maintain the geometric properties of data during transformations.

Manifold-Preserving Transformers (TransJect)

A comprehensive suite of tools for compressing large language models using various techniques.

Efficient LLM Compression Suite

A toolkit for distilling large language models into smaller, efficient models.

LLM Distillation Toolkit

A forked version of LoRA that incorporates robustness techniques to improve model performance under adversarial conditions.