Skip to content

Commit

Permalink
Merge pull request axolotl-ai-cloud#264 from OpenAccess-AI-Collective…
Browse files Browse the repository at this point in the history
…/NanoCode012-patch-1

Fix(readme): local path loading and custom strategy type
  • Loading branch information
NanoCode012 authored Jul 6, 2023
2 parents 3dbf6fe + 2528e93 commit a4acb28
Showing 1 changed file with 13 additions and 5 deletions.
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ Have dataset(s) in one of the following format (JSONL recommended):
#### How to add custom prompts

1. Add your method to a file in [prompt_strategies](src/axolotl/prompt_strategies). Please see other files as example.
2. Use your custom file name as the dataset type.
2. Use your custom file name as the dataset type `<prompt_strategies_file>.load_<load_fn>`.

Optionally, download some datasets, see [data/README.md](data/README.md)

Expand All @@ -255,10 +255,18 @@ See sample configs in [configs](configs) folder or [examples](examples) for quic

- dataset
```yaml
sequence_len: 2048 # max token length for prompt
# huggingface repo
datasets:
- path: vicgalle/alpaca-gpt4
type: alpaca # format from earlier
# local
datasets:
- path: vicgalle/alpaca-gpt4 # local or huggingface repo
- path: json
data_files: data.jsonl # or json
type: alpaca # format from earlier
sequence_len: 2048 # max token length / prompt
```

- loading
Expand Down Expand Up @@ -328,10 +336,10 @@ tf32: true # require >=ampere
# a list of one or more datasets to finetune the model with
datasets:
# this can be either a hf dataset, or relative path
# hf dataset repo | "json" for local dataset, make sure to fill data_files
- path: vicgalle/alpaca-gpt4
# The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
type: alpaca # format OR format:prompt_style (chat/instruct)
type: alpaca # format | format:<prompt_style> (chat/instruct) | <prompt_strategies>.load_<load_fn>
data_files: # path to source data files
shards: # number of shards to split data into
Expand Down

0 comments on commit a4acb28

Please sign in to comment.