Transformers have emerged as the leading architecture in deep learning for a wide range of applications, particularly in natural language processing (NLP) and computer vision (CV). Despite their success, designing effective Transformer models remains a complex and resource-intensive task due to their intricate architecture and the substantial computational demands of training and optimization. Neural Architecture Search (NAS) offers a promising solution to these challenges by automating the search for optimal Transformer architectures. In this report, I examine the key concepts related to NAS and Transformers, present notable results achieved in the NAS for Transformers field, and discuss existing limitations as well as potential future directions.
For a more detail analysis please refer to report_nas_for_transformers.pdf