Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix(readme): local path loading and custom strategy type #264

Merged
merged 2 commits into from
Jul 6, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ Have dataset(s) in one of the following format (JSONL recommended):
#### How to add custom prompts

1. Add your method to a file in [prompt_strategies](src/axolotl/prompt_strategies). Please see other files as example.
2. Use your custom file name as the dataset type.
2. Use your custom file name as the dataset type `<prompt_strategies_file>.load_<load_fn>`.

Optionally, download some datasets, see [data/README.md](data/README.md)

Expand All @@ -255,10 +255,18 @@ See sample configs in [configs](configs) folder or [examples](examples) for quic

- dataset
```yaml
sequence_len: 2048 # max token length for prompt

# huggingface repo
datasets:
- path: vicgalle/alpaca-gpt4 # local or huggingface repo
- path: vicgalle/alpaca-gpt4
type: alpaca # format from earlier

# local
datasets:
- path: json
data_files: data.jsonl # or json
type: alpaca # format from earlier
sequence_len: 2048 # max token length / prompt
```

- loading
Expand Down Expand Up @@ -328,10 +336,10 @@ tf32: true # require >=ampere

# a list of one or more datasets to finetune the model with
datasets:
# this can be either a hf dataset, or relative path
# hf dataset repo | "json" for local dataset, make sure to fill data_files
- path: vicgalle/alpaca-gpt4
# The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
type: alpaca # format OR format:prompt_style (chat/instruct)
type: alpaca # format | format:<prompt_style> (chat/instruct) | <prompt_strategies>.load_<load_fn>
data_files: # path to source data files
shards: # number of shards to split data into

Expand Down
Loading