In small context settings (currently 5 papers of 4 pages each) -- can we make llms provide answers with citations at inferecnce?
- Nothing like this has been done before. [Please do send material if there is.]
To Run (using Ollama)
-
Follow this Gist to setup Ollama locally.
-
Run all on
ollama.ipynb
- Get Latex Zipz
- Unzip and Make them into indiviual folders.
- Get all files from the folder.
- Remove all non .tex files.
- Remove all the latex commands.
- Merge all files into one and then remove all subdirectories. - (Optional) Here we have cut the papers a 1000 characters each from back and front.
- Make a dictionary of Paper Name (taken from folder) [Key] and Paper Content [Value]
- With The prompt,
inference
it for each paper - Collect those responses in a nice way
- Create the prompt based on output of inference 1.
- Ask it for the answer to your question summed up so it can account for multiple papers having answers to the same thing.
- Make a prompt based on the above collected material and run inference
- Unzip all the
.tar
files into\dataset
. It should look like:
├── Paper 1 Folder containing all files.
├── Paper 2 Folder containing all files.
├── ... Paper N ...
That's it!
- [] (Limitation) Test for more number of papers
- [] Figure out a way to generate a nice baseline?
- [] If we scale and fail, convert this in to a few-shot problem?
- [] (Limitation_ Only works for tex downloads from arxiv. How to solve that?
HUGE THANKS TO CEREBRAS AI for giving us free credits on their inference accelarator platform.