🆕 Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding, ICML 2025. [PDF][X Post]
🆕 SLIM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression, ICML 2025. [PDF]
🆕 SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity, ICML 2025. [TBD]
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws, ICLR 2025. [PDF]
SLOPE: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs, ICLR 2025. [PDF]
Effective Interplay between Sparsity and Quantization: From Theory to Practice, ICLR 2025. [PDF] ** 🏆 Spotlight Presentation **
Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers, CPAL 2025. [PDF][Source] ** Oral Presentation **
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization, NeurIPS 2024. [PDF]
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models, ICML 2024. [PDF]
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models, ICASSP 2024. [PDF] ** Oral Presentation **
Jaxpruner: A Concise Library for Sparsity Research, CPAL 2024. [PDF][Source] ** Oral Presentation **
STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition, ICML 2023. [PDF]
ReLeQ: A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks, IEEE MICRO 2020. [PDF]
🆕 ECO: An LLM-Driven Efficient Code Optimizer for Warehouse Scale Computers, arXiv 2025. [PDF]
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion, ISCA 2025. [PDF]
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture, IEEE CAL 2025. [PDF]
CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming, NeurIPS 2024. [PDF]
TAO: Re-Thinking DL-based Microarchitecture Simulation, ACM SIGMETRICS / IFIP PERFORMANCE 2024. [PDF]
Learning Performance-Improving Code Edits, ICLR 2024. [PDF][Source] ** 🏆 Spotlight Presentation ** || ** 🏆 MICRO Top Picks 2025 **
GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation, IISWC 2022. [PDF][Source]
An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks, IISWC 2022. [PDF]
Data-Driven Offline Optimization for Architecting Hardware Accelerators, ICLR 2022. [PDF][Source]
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation, ICLR 2020. [PDF]
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving, ISCA 2025. [PDF]
LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading, ISCA 2025. [TBD]
DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics, ISCA 2024. [PDF] ** 🏆 Distinguished Artifact Award **
Tandem Processor: Grappling with Emerging Operators in Neural Networks, ASPLOS 2024. [PDF][Source] ** 🏆 MICRO Top Picks Honorable Mention **
In-Storage Domain-Specific Acceleration for Serverless Computing, ASPLOS 2024. [PDF][Source]
MESA: Microarchitecture Extensions for Spatial Architecture Generation, ISCA 2023. [PDF] ** 🎖️ Inducted into the ISCA Hall of Fame **
Architecture Gym for Benchmarking Machine-Learning Aided Design, ISCA 2023. [PDF][Source]
Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation, MICRO 2022. [PDF]
Accelerating Attention through Gradient-Based Learned Runtime Pruning, ISCA 2022. [PDF]
AxMemo: Hardware-Compiler Co-design for Approximate Code Memoization, ISCA 2019. [PDF]
GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks, ISCA 2018. [PDF]
SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks, ISCA 2018. [PDF]
Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration, ISCA 2016. [PDF]
Neural Acceleration for GPU Throughput Processors, MICRO 2015. [PDF]
General-purpose Code Acceleration with Limited-precision Analog Computation, ISCA 2014. [PDF][Retrospective] ** 🏆 MICRO Top Picks Honorable Mention **