Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add alpaca gpt4 dataset (LAION-AI#2610)
The inputs can be quite a lot of different versions of `no input`, therefore don't use the `input` column for that. In some cases the text in `input` is already in the instruction, in these cases, we also don't use the `input` column. I am not quite sure how to concatenate the `instruction` and the `input` column. In most cases it seems fine to just replace last appearance of `.`, `!` or `?` with a colon, e.g.: Instruction: `Identify the odd one out.` Input: `Twitter, Instagram, Telegram` or Instruction: `How dense is a given material?` Input: `Steel` But we also have some questions like: Instruction: `Given the following synopsis, what is the moral lesson of this story?` Input: `Once upon a time, there was a poor young boy who wanted some candy. He begged his father for money to buy it, but his father said no and ordered him to go to bed. As he was going to bed, the boy saw a five-dollar bill on the counter, which he took and bought the candy.` Where this might not be the best case. Either way, I think the this one token will not make significant difference the model and therefore I just concatenate instruction and input with a space.
- Loading branch information