Table of Links
-
Analysis
-
Experiments Results
-
Practical Inference Speedup Evaluation
A. Appendix / supplemental material
C Broader Impact
The paper introduces a dReLU-based sparsification method and verifies its effectiveness on both dense and MoE LLMs. This approach significantly reduces computational demands, addresses environmental concerns through lower energy consumption, and helps democratize access to advanced AI technologies. We believe that our work can better support smaller organizations, educational institutions, and researchers, who previously faced barriers due to resource limitations, in accessing LLMs more easily.
Authors:
(1) Yixin Song, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;
(2) Haotong Xie, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;
(3) Zhengyan Zhang, Department of Computer Science and Technology, Tsinghua University;
(4) Bo Wen, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;
(5) Li Ma, Shanghai Artificial Intelligence Laboratory;
(6) Zeyu Mi, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University Mi [email protected]);
(7) Haibo Chen, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University.
This paper is
