This story draft by @escholar has not been reviewed by an editor, YET.

Related Work

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

  1. Introduction
  2. Related Work
  3. Methodology
  4. Experiments
  5. Conclusion and References

2. Related Work

The early-bird ticket hypothesis was first introduced by Frankle et al. [5] in the context of CNNs. They discovered that subnetworks capable of matching the performance of fully-trained networks could be identified early in the training process. This finding has led to the development of various techniques to identify and exploit early-bird tickets in CNNs [1, 13]. In the domain of Transformers, there have been limited explorations of the early-bird ticket hypothesis. One notable work is EarlyBERT by Kovaleva et al. [2], which investigated the applicability of the early-bird ticket hypothesis to BERT. They found that early-bird tickets exist in BERT and can be used to optimize the fine-tuning process. However, their work focused solely on BERT and did not provide a comparative analysis across different Transformer architectures. Other works have explored various techniques to optimize the training and inference of Transformer models. For example, Michel et al. [8] proposed a method to prune attention heads in Transformers, reducing the computational requirements while maintaining performance. Sanh et al. [9] introduced DistilBERT, a distilled version of BERT that achieves comparable performance with fewer parameters and faster inference times. Despite these efforts, the potential speedup and resource optimization achievable through the early-bird ticket hypothesis in Transformers have not been fully explored. Many existing works rely on the slow and rigorous process of the train-prune-retrain methodology [6], which can be time-consuming and resource-intensive. In this research, we aim to address these limitations by investigating the early-bird ticket hypothesis across different Transformer architectures, including vision transformers and language models. We explore efficient methods to identify early-bird tickets and evaluate their performance in comparison to fully-trained models. Our goal is to provide insights into the applicability of the early-bird ticket hypothesis in Transformers and contribute to the development of more efficient training strategies for these powerful models.


Author:

(1) Shravan Cheekati, Georgia Institute of Technology ([email protected]).


This paper is available on arxiv under CC BY-SA 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks