Using transformers to transform chemistry
In these tutorials, we will build some key pieces for recent applications of transformers, in particular LLMs, in chemistry and materials science from scratch.
pip install -r requirements.txt
For parts of this tutorial, you will need to have access to a GPU, TPU, or Apple Silicon. This is because we will train small language models, and it will not be feasible to train them on a CPU.
If you do not have access, to a GPU, you can consider the following options:
-
Google Colab: You can use Google Colab, which provides free access to a GPU. The downsides is that in the free tier, you cannot let it run in the background.
-
Kaggle: You can use Kaggle, which provides free access to a GPU.
-
Lightning studio: You can use Lightning Studio, which provides some free GPU hours. However, you need to wait for your account to be validated
In addition, the major cloud providers provide free credits to students:
- Google Vertex AI - A suite of AI and machine learning APIs provided by Google Cloud. $150 credits upon signup
- Azure AI Studio - A collection of AI services and APIs offered by Microsoft Azure. Students start with $100 free Azure credits
Some parts assume that you have access to an LLM using an API key. We recommend using OpenAI, but you can also choose other providers such as
Make sure to add the API key(s) to an .env
file. You can see an example in the .env.template
file.
Notebook | Description | Colab |
---|---|---|
llm-from-scratch.ipynb | Building an LLM that generates molecules from scratch | |
llm-agent.ipynb | Building a tool-augmented LLM agent |
These tutorials are based on blog post originally published on Kevin Jablonka's blog. There you can also find solutions for the unfilled cells.
- arxiv-synth: Shows how to retrieve papers from ArXiv and summarize them using an LLM
This work was supported by the Carl Zeiss Foundation.