NLP-based method to mine gene and function relationships from published articles.
Understanding gene function is crucial for advancing our knowledge of biological systems and developing new treatments for diseases. However, it is a complex and challenging task due to the complexity of biological systems, the constantly evolving understanding of gene function, and the specialized language used in research articles. To address this, the authors propose PATHAK, a new method that uses a pre-trained Transformer language model to identify relationships between genes and their potential GO term definition. They applied this method to a large dataset of papers on Arabidopsis thaliana and hope to continue exploring its potential applications in advancing our understanding of gene function and facilitating the discovery of new treatments for diseases.