PhD student at @ucl-dark. Interested in understanding LLM fine-tuning, AI safety and (super)alignment.
- London
-
13:21
(UTC) - https://robertkirk.github.io/
- @_robertkirk
Highlights
- Pro
Pinned Loading
-
facebookresearch/rlfh-gen-div
facebookresearch/rlfh-gen-div PublicThis is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
-
tinystories-wrappers
tinystories-wrappers PublicCode for the TinyStories experiments from "Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks".
-
facebookresearch/minihack
facebookresearch/minihack PublicMiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
-
stanford_alpaca
stanford_alpaca PublicForked from tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.