Zero-Shot citation generation for Large Language Models in academic papers

In small context settings (currently 5 papers of 4 pages each) -- can we make llms provide answers with citations at inferecnce?

Motivation

Nothing like this has been done before. [Please do send material if there is.]

To Run (using Ollama)

Follow this Gist to setup Ollama locally.
Run all on ollama.ipynb

Basic Methodology Summary:

Dataset:

Get Latex Zipz
Unzip and Make them into indiviual folders.

Preprocessing:

Get all files from the folder.
Remove all non .tex files.
Remove all the latex commands.
Merge all files into one and then remove all subdirectories. - (Optional) Here we have cut the papers a 1000 characters each from back and front.
Make a dictionary of Paper Name (taken from folder) [Key] and Paper Content [Value]

Inference 1: (Single Paper Wise)

With The prompt, inference it for each paper
Collect those responses in a nice way

Inference 2: (All relevant paper wise)

Create the prompt based on output of inference 1.
Ask it for the answer to your question summed up so it can account for multiple papers having answers to the same thing.

Inference 3: (Follow up! Asking Quetions on the collected Inference)

Make a prompt based on the above collected material and run inference

Dataset Creation

Unzip all the .tar files into \dataset. It should look like:

├── Paper 1 Folder containing all files.
├── Paper 2 Folder containing all files.
├── ... Paper N ...

That's it!

To-Do's (and Limitations)

[] (Limitation) Test for more number of papers
[] Figure out a way to generate a nice baseline?
[] If we scale and fail, convert this in to a few-shot problem?
[] (Limitation_ Only works for tex downloads from arxiv. How to solve that?

Acknowledgement.

HUGE THANKS TO CEREBRAS AI for giving us free credits on their inference accelarator platform.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
dataset		dataset
README.md		README.md
cerebras.ipynb		cerebras.ipynb
ollama.ipynb		ollama.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-Shot citation generation for Large Language Models in academic papers

Motivation

To Run (using Ollama)

Basic Methodology Summary:

Dataset:

Preprocessing:

Inference 1: (Single Paper Wise)

Inference 2: (All relevant paper wise)

Inference 3: (Follow up! Asking Quetions on the collected Inference)

Dataset Creation

To-Do's (and Limitations)

Acknowledgement.

About

Releases

Packages

Languages

aymuos15/Promptly-Cited

Folders and files

Latest commit

History

Repository files navigation

Zero-Shot citation generation for Large Language Models in academic papers

Motivation

To Run (using Ollama)

Basic Methodology Summary:

Dataset:

Preprocessing:

Inference 1: (Single Paper Wise)

Inference 2: (All relevant paper wise)

Inference 3: (Follow up! Asking Quetions on the collected Inference)

Dataset Creation

To-Do's (and Limitations)

Acknowledgement.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages