Publications
A list of works from our research group on making LLMs efficient.
ArXiv Preprint
The Art of Scaling Test-Time Compute for Large Language Models
Test-Time ScalingLarge Language Models
NeurIPS 2025
Value-Guided KV Compression for LLMs via Approximated CUR Decomposition
KV Cache CompressionLong-Context Inference
ACL 2025
On the Generalization vs Fidelity Paradox in Knowledge Distillation
Knowledge DistillationEfficient Architectures
TMLR 2025
Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation
Parameter-Efficient Fine-tuningBayesian Methods
TMLR 2025
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Scaling LawsNeural Networks
ICLR 2025
You Only Prune Once: Designing Calibration-Free Model Compression with Policy Learning
Model CompressionPruning
ICML 2025
Enough of Scaling LLMs! Let’s Focus on Downscaling
Scaling LawsDownscaling
TACL 2025
Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models
Parameter-Efficient Fine-tuningSparse Training
ICLR 2024
A Good Learner can Teach Better: Teacher-Student Collaborative Knowledge Distillation
Knowledge DistillationEfficient Architectures
EMNLP 2023
Manifold-Preserving Transformers are Effective for Short-Long Range Encoding
TransformersDeep Learning